How to setup this codebase?¶
This codebase requires Python 3.6 or higher. The recommended way to set up this codebase up through Anaconda/Miniconda.
Install Dependencies¶
Install Anaconda or Miniconda distribution based on Python3+ from their downloads site.
Clone the repository.
git clone https://www.github.com/nocaps-org/updown-baseline cd updown-baselineCreate a conda environment and install all the dependencies, and this codebase as a package in development version.
conda create -n updown python=3.6 conda activate updown pip install -r requirements.txt python setup.py develop
Note
If
evalaipackage install fails, installlibxml2-devandlibxstl1-devviaapt.
Now you can import updown from anywhere in your filesystem as long as you have this conda
environment activated.
Download Image Features¶
We provide pre-extracted bottom-up features for COCO and nocaps splits. These are extracted
using a Faster-RCNN detector pretrained on Visual Genome, made available by
Anderson et al. 2017. We call this VG Detector.
We extract features from 100 region proposals for an image, and select them based on a confidence
threshold of 0.2 - we finally get 10-100 features per image (adaptive).
Download (or symlink) the image features under $PROJECT_ROOT/data directory:
See also
Our image-feature-extractors repo for more info on VG Detector, and how these
features are extracted from it.
Download Annotation Files¶
Download COCO Captions and nocaps val/test image info and arrange in a directory structure as follows:
$PROJECT_ROOT/data
|-- coco
| +-- annotations
| |-- captions_train2017.json
| +-- captions_val2017.json
+-- nocaps
+-- annotations
|-- nocaps_val_image_info.json
+-- nocaps_test_image_info.json
COCO Captions: http://images.cocodataset.org/annotations/annotations_trainval2017.zip
nocaps val image info: https://s3.amazonaws.com/nocaps/nocaps_val_image_info.json
nocaps test image info: https://s3.amazonaws.com/nocaps/nocaps_test_image_info.json
[Optional] Download files for Constrained Beam Search¶
If you wish to decode using Constrained Beam Search, download pre-extracted detections from a detector trained using Open Images (we call it
OI Detector`) into ``$PROJECT_ROOT/data.
nocaps_val_oi_detector_boxes.json (in COCO bounding box annotations format)
nocaps_test_oi_detector_boxes.json (in COCO bounding box annotations format)
Download Open Images meta data files into
$PROJECT_ROOT/data/cbs:
class_hierarchy.json : A hierarchy of object classes declared by Open Images. Our file is in a format which is more human-readable.
constraint_wordforms.tsv : wordforms of all words which could be CBS constraints. This is how one could allow either of singular-plural words to satisfy a constraint (or even close synonym words).
See also
Our image-feature-extractors repo for more info on OI Detector, and how these
bounding box detections are extracted from it.
Build Vocabulary¶
Build caption vocabulary using COCO train2017 captions.
python scripts/build_vocabulary.py -c data/coco/captions_train2017.json -o data/vocabulary
Evaluation Server¶
nocaps val and test splits are held privately behind EvalAI. To evaluate on nocaps,
create an account on EvalAI and get the auth token from
profile details. Set the token through EvalAI CLI:
evalai set_token <your_token_here>