updown.data.datasets

class updown.data.datasets.TrainingDataset(vocabulary: allennlp.data.vocabulary.Vocabulary, captions_jsonpath: str, image_features_h5path: str, max_caption_length: int = 20, in_memory: bool = True)[source]

Bases: torch.utils.data.dataset.Dataset

A PyTorch :class:`~torch.utils.data.Dataset providing access to COCO train2017 captions data for training UpDownCaptioner. When wrapped with a DataLoader, it provides batches of image features and tokenized ground truth captions.

Note

Use collate_fn when wrapping with a DataLoader.

Parameters
vocabulary: allennlp.data.Vocabulary

AllenNLP’s vocabulary containing token to index mapping for captions vocabulary.

captions_jsonpath: str

Path to a JSON file containing COCO train2017 caption annotations.

image_features_h5path: str

Path to an H5 file containing pre-extracted features from COCO train2017 images.

max_caption_length: int, optional (default = 20)

Maximum length of caption sequences for language modeling. Captions longer than this will be truncated to maximum length.

in_memory: bool, optional (default = True)

Whether to load all image features in memory.

classmethod from_config(config:updown.config.Config, **kwargs)[source]

Instantiate this class directly from a Config.

class updown.data.datasets.EvaluationDataset(image_features_h5path: str, in_memory: bool = True)[source]

Bases: torch.utils.data.dataset.Dataset

A PyTorch Dataset providing image features for inference. When wrapped with a DataLoader, it provides batches of image features.

Note

Use collate_fn when wrapping with a DataLoader.

Parameters
vocabulary: allennlp.data.Vocabulary

AllenNLP’s vocabulary containing token to index mapping for captions vocabulary.

image_features_h5path: str

Path to an H5 file containing pre-extracted features from nocaps val/test images.

in_memory: bool, optional (default = True)

Whether to load all image features in memory.

classmethod from_config(config:updown.config.Config, **kwargs)[source]

Instantiate this class directly from a Config.

class updown.data.datasets.EvaluationDatasetWithConstraints(vocabulary: allennlp.data.vocabulary.Vocabulary, image_features_h5path: str, boxes_jsonpath: str, wordforms_tsvpath: str, hierarchy_jsonpath: str, nms_threshold: float = 0.85, max_given_constraints: int = 3, in_memory: bool = True)[source]

Bases: updown.data.datasets.EvaluationDataset

A PyTorch Dataset providing image features for inference, along with constraints for ConstrainedBeamSearch. When wrapped with a DataLoader, it provides batches of image features, Finite State Machines built (per instance) from constraints, and number of constraints used to make these.

Finite State Machines as represented as adjacency matrices (Tensors) with state transitions corresponding to specific constraint (word) occurrence while decoding). We return the number of constraints used to make an FSM because it is required while selecting which decoded beams satisfied constraints. Refer select_best_beam_with_constraints() for more details.

Note

Use collate_fn when wrapping with a DataLoader.

Parameters
vocabulary: allennlp.data.Vocabulary

AllenNLP’s vocabulary containing token to index mapping for captions vocabulary.

image_features_h5path: str

Path to an H5 file containing pre-extracted features from nocaps val/test images.

boxes_jsonpath: str

Path to a JSON file containing bounding box detections in COCO format (nocaps val/test usually).

wordforms_tsvpath: str

Path to a TSV file containing two fields: first is the name of Open Images object class and second field is a comma separated list of words (possibly singular and plural forms of the word etc.) which could be CBS constraints.

hierarchy_jsonpath: str

Path to a JSON file containing a hierarchy of Open Images object classes as here.

nms_threshold: float, optional (default = 0.85)

NMS threshold for suppressing generic object class names during constraint filtering, for two boxes with IoU higher than this threshold, “dog” suppresses “animal”.

max_given_constraints: int, optional (default = 3)

Maximum number of constraints which can be specified for CBS decoding. Constraints are selected based on the prediction confidence score of their corresponding bounding boxes.

in_memory: bool, optional (default = True)

Whether to load all image features in memory.

classmethod from_config(config:updown.config.Config, **kwargs)[source]

Instantiate this class directly from a Config.