updown.config¶
-
class
updown.config.
Config
(config_file: Optional[str] = None, config_override: List[Any] = [])[source]¶ Bases:
object
This class provides package-wide configuration management. It is a nested dict-like structure with nested keys accessible as attributes. It contains sensible default values, which can be modified by (first) a YAML file and (second) a list of attributes and values.
This class definition contains default hyperparameters for the UpDown baseline from our paper. Modification of any parameter after instantiating this class is not possible, so you must override required parameter values in either through
config_file
orconfig_override
.- Parameters
- config_file: str
Path to a YAML file containing configuration parameters to override.
- config_override: List[Any], optional (default= [])
A list of sequential attributes and values of parameters to override. This happens after overriding from YAML file.
Examples
Let a YAML file named “config.yaml” specify these parameters to override:
RANDOM_SEED: 42 OPTIM: BATCH_SIZE: 512
>>> _C = Config("config.yaml", ["OPTIM.BATCH_SIZE", 2048]) >>> _C.RANDOM_SEED # default: 0 42 >>> _C.OPTIM.BATCH_SIZE # default: 150 2048
- Attributes
- RANDOM_SEED: 0
Random seed for NumPy and PyTorch, important for reproducibility.
- __________
- DATA:
Collection of required data paths for training and evaluation. All these are assumed to be relative to project root directory. If elsewhere, symlinking is recommended.
- DATA.VOCABULARY: “data/vocabulary”
Path to a directory containing caption vocabulary (readable by AllenNLP).
- DATA.TRAIN_FEATURES: “data/coco_train2017_vg_detector_features_adaptive.h5”
Path to an H5 file containing pre-extracted features from COCO train2017 images.
- DATA.INFER_FEATURES: “data/nocaps_val_vg_detector_features_adaptive.h5”
Path to an H5 file containing pre-extracted features from nocaps val/test images.
- DATA.TRAIN_CAPTIONS: “data/coco/annotations/captions_train2017.json”
Path to a JSON file containing COCO train2017 captions in COCO format.
- DATA.INFER_CAPTIONS: “data/nocaps/annotations/nocaps_val_image_info.json”
Path to a JSON file containing nocaps val/test image info. Captions are not available publicly.
- DATA.MAX_CAPTION_LENGTH: 20
Maximum length of caption sequences for language modeling. Captions longer than this will be truncated to maximum length.
- __________
- DATA.CBS:
Collection of required data paths and configuration parameters for Constrained Beam Search decoding.
- DATA.CBS.INFER_BOXES: “data/nocaps_val_oi_detector_boxes.json”
Path to a JSON file containing detected bounding boxes (in COCO format) from nocaps val/test images.
- DATA.CBS.CLASS_HIERARCHY: “data/cbs/class_hierarchy.json”
Path to a JSON file containing a hierarchy of Open Images object classes as here.
- DATA.CBS.WORDFORMS: “data/cbs/constraint_wordforms.tsv”
Path to a TSV file containing word-forms of CBS constraints. First column is a word in Open Images class names, second column are comma separated word-forms (singular, plural etc.) which can satisfy the constraint.
- DATA.CBS.NMS_THRESHOLD: 0.85
NMS threshold for suppressing generic object class names during constraint filtering, for two boxes with IoU higher than this threshold, “dog” suppresses “animal”.
- DATA.CBS.MAX_GIVEN_CONSTRAINTS: 3
Maximum number of constraints which can be specified for CBS decoding. Constraints are selected based on the prediction confidence score of their corresponding bounding boxes.
- DATA.CBS.MAX_WORDS_PER_CONSTRAINT: 3
Maximum number of allowed words in a multi-word object class name. Note that is is not the number of word-forms for a particular constraint. For example: this parameter is 1 for
{"dog", "dogs"}
constraint, 3 for{"wood burning stove"}
.- __________
- MODEL:
Parameters controlling the model architecture of UpDown Captioner.
- MODEL.IMAGE_FEATURE_SIZE: 2048
Size of the bottom-up image features.
- MODEL.EMBEDDING_SIZE: 1000
Size of the word embedding input to the captioner.
- MODEL.HIDDEN_SIZE: 1200
Size of the hidden / cell states of attention LSTM and language LSTM of the captioner.
- MODEL.ATTENTION_PROJECTION_SIZE: 768
Size of the projected image and textual features before computing bottom-up top-down attention weights.
- MODEL.BEAM_SIZE: 5
Beam size for finding the most likely caption during decoding time (evaluation).
- MODEL.USE_CBS: False
Whether to use Constrained Beam Search during decoding.
- MODEL.MIN_CONSTRAINTS_TO_SATISFY: 2
Minimum number of constraints to satisfy during CBS decoding.
- __________
- OPTIM:
Optimization hyper-parameters, mostly relevant during training.
- OPTIM.BATCH_SIZE: 150
Batch size during training and evaluation.
- OPTIM.NUM_ITERATIONS: 70000
Number of iterations to train for, batches are randomly sampled.
- OPTIM.LR: 0.015
Initial learning rate for SGD. This linearly decays to zero till the end of training.
- OPTIM.MOMENTUM: 0.9
Momentum co-efficient for SGD.
- OPTIM.WEIGHT_DECAY: 0.001
Weight decay co-efficient for SGD.
- OPTIM.CLIP_GRADIENTS
Gradient clipping threshold to avoid exploding gradients.