updown.data.datasets¶
-
class
updown.data.datasets.
TrainingDataset
(vocabulary: allennlp.data.vocabulary.Vocabulary, captions_jsonpath: str, image_features_h5path: str, max_caption_length: int = 20, in_memory: bool = True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
A PyTorch :class:`~torch.utils.data.Dataset providing access to COCO train2017 captions data for training
UpDownCaptioner
. When wrapped with aDataLoader
, it provides batches of image features and tokenized ground truth captions.Note
Use
collate_fn
when wrapping with aDataLoader
.- Parameters
- vocabulary: allennlp.data.Vocabulary
AllenNLP’s vocabulary containing token to index mapping for captions vocabulary.
- captions_jsonpath: str
Path to a JSON file containing COCO train2017 caption annotations.
- image_features_h5path: str
Path to an H5 file containing pre-extracted features from COCO train2017 images.
- max_caption_length: int, optional (default = 20)
Maximum length of caption sequences for language modeling. Captions longer than this will be truncated to maximum length.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.
-
class
updown.data.datasets.
EvaluationDataset
(image_features_h5path: str, in_memory: bool = True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
A PyTorch
Dataset
providing image features for inference. When wrapped with aDataLoader
, it provides batches of image features.Note
Use
collate_fn
when wrapping with aDataLoader
.- Parameters
- vocabulary: allennlp.data.Vocabulary
AllenNLP’s vocabulary containing token to index mapping for captions vocabulary.
- image_features_h5path: str
Path to an H5 file containing pre-extracted features from nocaps val/test images.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.
-
class
updown.data.datasets.
EvaluationDatasetWithConstraints
(vocabulary: allennlp.data.vocabulary.Vocabulary, image_features_h5path: str, boxes_jsonpath: str, wordforms_tsvpath: str, hierarchy_jsonpath: str, nms_threshold: float = 0.85, max_given_constraints: int = 3, in_memory: bool = True)[source]¶ Bases:
updown.data.datasets.EvaluationDataset
A PyTorch
Dataset
providing image features for inference, along with constraints forConstrainedBeamSearch
. When wrapped with aDataLoader
, it provides batches of image features, Finite State Machines built (per instance) from constraints, and number of constraints used to make these.Finite State Machines as represented as adjacency matrices (Tensors) with state transitions corresponding to specific constraint (word) occurrence while decoding). We return the number of constraints used to make an FSM because it is required while selecting which decoded beams satisfied constraints. Refer
select_best_beam_with_constraints()
for more details.Note
Use
collate_fn
when wrapping with aDataLoader
.- Parameters
- vocabulary: allennlp.data.Vocabulary
AllenNLP’s vocabulary containing token to index mapping for captions vocabulary.
- image_features_h5path: str
Path to an H5 file containing pre-extracted features from nocaps val/test images.
- boxes_jsonpath: str
Path to a JSON file containing bounding box detections in COCO format (nocaps val/test usually).
- wordforms_tsvpath: str
Path to a TSV file containing two fields: first is the name of Open Images object class and second field is a comma separated list of words (possibly singular and plural forms of the word etc.) which could be CBS constraints.
- hierarchy_jsonpath: str
Path to a JSON file containing a hierarchy of Open Images object classes as here.
- nms_threshold: float, optional (default = 0.85)
NMS threshold for suppressing generic object class names during constraint filtering, for two boxes with IoU higher than this threshold, “dog” suppresses “animal”.
- max_given_constraints: int, optional (default = 3)
Maximum number of constraints which can be specified for CBS decoding. Constraints are selected based on the prediction confidence score of their corresponding bounding boxes.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.