How to setup this codebase?¶

This codebase requires Python 3.6 or higher. The recommended way to set up this codebase up through Anaconda/Miniconda.

Install Dependencies¶

Install Anaconda or Miniconda distribution based on Python3+ from their downloads site.

Clone the repository.

git clone https://www.github.com/nocaps-org/updown-baseline
cd updown-baseline

Create a conda environment and install all the dependencies, and this codebase as a package in development version.
conda create -n updown python=3.6 conda activate updown pip install -r requirements.txt python setup.py develop
Note

If evalai package install fails, install libxml2-dev and libxstl1-dev via apt.

Now you can import updown from anywhere in your filesystem as long as you have this conda environment activated.

Download Image Features¶

We provide pre-extracted bottom-up features for COCO and nocaps splits. These are extracted using a Faster-RCNN detector pretrained on Visual Genome, made available by Anderson et al. 2017. We call this VG Detector. We extract features from 100 region proposals for an image, and select them based on a confidence threshold of 0.2 - we finally get 10-100 features per image (adaptive).

Download (or symlink) the image features under $PROJECT_ROOT/data directory:

coco_train2017_vg_detector_features_adaptive.h5

coco_val2017_vg_detector_features_adaptive.h5

nocaps_val_vg_detector_features_adaptive.h5

nocaps_test_vg_detector_features_adaptive.h5

Download Annotation Files¶

Download COCO Captions and nocaps val/test image info and arrange in a directory structure as follows:

$PROJECT_ROOT/data
    |-- coco
    |   +-- annotations
    |       |-- captions_train2017.json
    |       +-- captions_val2017.json
    +-- nocaps
        +-- annotations
            |-- nocaps_val_image_info.json
            +-- nocaps_test_image_info.json

COCO Captions: http://images.cocodataset.org/annotations/annotations_trainval2017.zip
nocaps val image info: https://s3.amazonaws.com/nocaps/nocaps_val_image_info.json
nocaps test image info: https://s3.amazonaws.com/nocaps/nocaps_test_image_info.json

[Optional] Download files for Constrained Beam Search¶

If you wish to decode using Constrained Beam Search, download pre-extracted detections from a detector trained using Open Images (we call it OI Detector`) into ``$PROJECT_ROOT/data.

nocaps_val_oi_detector_boxes.json (in COCO bounding box annotations format)

nocaps_test_oi_detector_boxes.json (in COCO bounding box annotations format)

Download Open Images meta data files into $PROJECT_ROOT/data/cbs:

class_hierarchy.json : A hierarchy of object classes declared by Open Images. Our file is in a format which is more human-readable.

constraint_wordforms.tsv : wordforms of all words which could be CBS constraints. This is how one could allow either of singular-plural words to satisfy a constraint (or even close synonym words).

Build Vocabulary¶

Build caption vocabulary using COCO train2017 captions.

python scripts/build_vocabulary.py -c data/coco/captions_train2017.json -o data/vocabulary

Evaluation Server¶

nocaps val and test splits are held privately behind EvalAI. To evaluate on nocaps, create an account on EvalAI and get the auth token from profile details. Set the token through EvalAI CLI:

evalai set_token <your_token_here>

How to setup this codebase?¶

Install Dependencies¶

Download Image Features¶

Download Annotation Files¶

[Optional] Download files for Constrained Beam Search¶

Build Vocabulary¶

Evaluation Server¶

You are all set to use this codebase!¶

updown

Navigation

Related Topics