How to setup this codebase?

This codebase requires Python 3.6 or higher. The recommended way to set up this codebase up through Anaconda/Miniconda.

Install Dependencies

  1. Install Anaconda or Miniconda distribution based on Python3+ from their downloads site.

  2. Clone the repository.

    git clone https://www.github.com/nocaps-org/updown-baseline
    cd updown-baseline
    
  3. Create a conda environment and install all the dependencies, and this codebase as a package in development version.

    conda create -n updown python=3.6
    conda activate updown
    pip install -r requirements.txt
    python setup.py develop
    

    Note

    If evalai package install fails, install libxml2-dev and libxstl1-dev via apt.

Now you can import updown from anywhere in your filesystem as long as you have this conda environment activated.

Download Image Features

We provide pre-extracted bottom-up features for COCO and nocaps splits. These are extracted using a Faster-RCNN detector pretrained on Visual Genome, made available by Anderson et al. 2017. We call this VG Detector. We extract features from 100 region proposals for an image, and select them based on a confidence threshold of 0.2 - we finally get 10-100 features per image (adaptive).

Download (or symlink) the image features under $PROJECT_ROOT/data directory:

See also

Our image-feature-extractors repo for more info on VG Detector, and how these features are extracted from it.

Download Annotation Files

Download COCO Captions and nocaps val/test image info and arrange in a directory structure as follows:

$PROJECT_ROOT/data
    |-- coco
    |   +-- annotations
    |       |-- captions_train2017.json
    |       +-- captions_val2017.json
    +-- nocaps
        +-- annotations
            |-- nocaps_val_image_info.json
            +-- nocaps_test_image_info.json

Build Vocabulary

Build caption vocabulary using COCO train2017 captions.

python scripts/build_vocabulary.py -c data/coco/captions_train2017.json -o data/vocabulary

Evaluation Server

nocaps val and test splits are held privately behind EvalAI. To evaluate on nocaps, create an account on EvalAI and get the auth token from profile details. Set the token through EvalAI CLI:

evalai set_token <your_token_here>

You are all set to use this codebase!