How to setup this codebase?¶
This codebase requires Python 3.6 or higher. The recommended way to set up this codebase up through Anaconda/Miniconda.
Install Dependencies¶
Install Anaconda or Miniconda distribution based on Python3+ from their downloads site.
Clone the repository.
git clone https://www.github.com/nocaps-org/updown-baseline cd updown-baseline
Create a conda environment and install all the dependencies, and this codebase as a package in development version.
conda create -n updown python=3.6 conda activate updown pip install -r requirements.txt python setup.py develop
Note
If
evalai
package install fails, installlibxml2-dev
andlibxstl1-dev
viaapt
.
Now you can import updown
from anywhere in your filesystem as long as you have this conda
environment activated.
Download Image Features¶
We provide pre-extracted bottom-up features for COCO and nocaps
splits. These are extracted
using a Faster-RCNN detector pretrained on Visual Genome, made available by
Anderson et al. 2017. We call this VG Detector
.
We extract features from 100 region proposals for an image, and select them based on a confidence
threshold of 0.2 - we finally get 10-100 features per image (adaptive).
Download (or symlink) the image features under $PROJECT_ROOT/data
directory:
See also
Our image-feature-extractors repo for more info on VG Detector
, and how these
features are extracted from it.
Download Annotation Files¶
Download COCO Captions and nocaps val/test image info and arrange in a directory structure as follows:
$PROJECT_ROOT/data
|-- coco
| +-- annotations
| |-- captions_train2017.json
| +-- captions_val2017.json
+-- nocaps
+-- annotations
|-- nocaps_val_image_info.json
+-- nocaps_test_image_info.json
COCO Captions: http://images.cocodataset.org/annotations/annotations_trainval2017.zip
nocaps val image info: https://s3.amazonaws.com/nocaps/nocaps_val_image_info.json
nocaps test image info: https://s3.amazonaws.com/nocaps/nocaps_test_image_info.json
[Optional] Download files for Constrained Beam Search¶
If you wish to decode using Constrained Beam Search, download pre-extracted detections from a detector trained using Open Images (we call it
OI Detector`) into ``$PROJECT_ROOT/data
.
nocaps_val_oi_detector_boxes.json (in COCO bounding box annotations format)
nocaps_test_oi_detector_boxes.json (in COCO bounding box annotations format)
Download Open Images meta data files into
$PROJECT_ROOT/data/cbs
:
class_hierarchy.json : A hierarchy of object classes declared by Open Images. Our file is in a format which is more human-readable.
constraint_wordforms.tsv : wordforms of all words which could be CBS constraints. This is how one could allow either of singular-plural words to satisfy a constraint (or even close synonym words).
See also
Our image-feature-extractors repo for more info on OI Detector
, and how these
bounding box detections are extracted from it.
Build Vocabulary¶
Build caption vocabulary using COCO train2017 captions.
python scripts/build_vocabulary.py -c data/coco/captions_train2017.json -o data/vocabulary
Evaluation Server¶
nocaps
val and test splits are held privately behind EvalAI. To evaluate on nocaps
,
create an account on EvalAI and get the auth token from
profile details. Set the token through EvalAI CLI:
evalai set_token <your_token_here>