updown.data.readers¶
A Reader simply reads data from disk and returns it _almost_ as is. Readers should be
utilized by PyTorch Dataset
. Too much of data pre-processing is not
recommended in the reader, such as tokenizing words to integers, embedding tokens, or passing
an image through a pre-trained CNN. Each reader must implement at least two methods:
__len__
to return the length of data this Reader can read.
__getitem__
to return data based on an index or a primary key (such asimage_id
).
-
class
updown.data.readers.
ImageFeaturesReader
(features_h5path: str, in_memory: bool = False)[source]¶ Bases:
object
A reader for H5 files containing pre-extracted image features. A typical image features file should have at least two H5 datasets, named
image_id
andfeatures
. It may optionally have other H5 datasets, such asboxes
(for bounding box coordinates),width
andheight
for image size, and others. This reader only reads image features, because our UpDown captioner baseline does not require anything other than image features.Example of an h5 file:
image_bottomup_features.h5 |--- "image_id" [shape: (num_images, )] |--- "features" [shape: (num_images, num_boxes, feature_size)] +--- .attrs {"split": "coco_train2017"}
- Parameters
- features_h5pathstr
Path to an H5 file containing image ids and features corresponding to one of the four splits used: “coco_train2017”, “coco_val2017”, “nocaps_val”, “nocaps_test”.
- in_memorybool
Whether to load the features in memory. Beware, these files are sometimes tens of GBs in size. Set this to true if you have sufficient RAM.
-
class
updown.data.readers.
CocoCaptionsReader
(captions_jsonpath: str)[source]¶ Bases:
object
A reader for annotation files containing training captions. These are JSON files in COCO format.
- Parameters
- captions_jsonpathstr
Path to a JSON file containing training captions in COCO format (COCO train2017 usually).
-
class
updown.data.readers.
ConstraintBoxesReader
(boxes_jsonpath: str)[source]¶ Bases:
object
A reader for annotation files containing detected bounding boxes (in COCO format). The JSON file should have
categories
,images
andannotations
fields (similar to COCO instance annotations).For our use cases, the detections are from an object detector trained using Open Images. These can be produced for any set of images by following instructions here.
- Parameters
- boxes_jsonpath: str
Path to a JSON file containing bounding box detections in COCO format (nocaps val/test usually).