updown.data.readers

A Reader simply reads data from disk and returns it _almost_ as is. Readers should be utilized by PyTorch Dataset. Too much of data pre-processing is not recommended in the reader, such as tokenizing words to integers, embedding tokens, or passing an image through a pre-trained CNN. Each reader must implement at least two methods:

  1. __len__ to return the length of data this Reader can read.

  2. __getitem__ to return data based on an index or a primary key (such as image_id).

class updown.data.readers.ImageFeaturesReader(features_h5path: str, in_memory: bool = False)[source]

Bases: object

A reader for H5 files containing pre-extracted image features. A typical image features file should have at least two H5 datasets, named image_id and features. It may optionally have other H5 datasets, such as boxes (for bounding box coordinates), width and height for image size, and others. This reader only reads image features, because our UpDown captioner baseline does not require anything other than image features.

Example of an h5 file:

image_bottomup_features.h5
|--- "image_id" [shape: (num_images, )]
|--- "features" [shape: (num_images, num_boxes, feature_size)]
+--- .attrs {"split": "coco_train2017"}
Parameters
features_h5pathstr

Path to an H5 file containing image ids and features corresponding to one of the four splits used: “coco_train2017”, “coco_val2017”, “nocaps_val”, “nocaps_test”.

in_memorybool

Whether to load the features in memory. Beware, these files are sometimes tens of GBs in size. Set this to true if you have sufficient RAM.

class updown.data.readers.CocoCaptionsReader(captions_jsonpath: str)[source]

Bases: object

A reader for annotation files containing training captions. These are JSON files in COCO format.

Parameters
captions_jsonpathstr

Path to a JSON file containing training captions in COCO format (COCO train2017 usually).

class updown.data.readers.ConstraintBoxesReader(boxes_jsonpath: str)[source]

Bases: object

A reader for annotation files containing detected bounding boxes (in COCO format). The JSON file should have categories, images and annotations fields (similar to COCO instance annotations).

For our use cases, the detections are from an object detector trained using Open Images. These can be produced for any set of images by following instructions here.

Parameters
boxes_jsonpath: str

Path to a JSON file containing bounding box detections in COCO format (nocaps val/test usually).