updown.utils.evalai

class updown.utils.evalai.NocapsEvaluator(phase: str = 'val')[source]

Bases: object

A utility class to submit model predictions on nocaps splits to EvalAI, and retrieve model performance based on captioning metrics (such as CIDEr, SPICE).

This class and the training script together serve as a working example for “EvalAI in the loop”, showing how evaluation can be done remotely on privately held splits. Annotations (captions) and evaluation-specific tools (e.g. coco-caption) are not required locally. This enables users to select best checkpoint, perform early stopping, learning rate scheduling based on a metric, etc. without actually doing evaluation.

Parameters
phase: str, optional (default = “val”)

Which phase to evaluate on. One of “val” or “test”.

Notes

This class can be used for retrieving metrics on both, val and test splits. However, we recommend to avoid using it for test split (at least during training). Number of allowed submissions to test split on EvalAI are very less, and can exhaust in a few iterations! However, the number of submissions to val split are practically infinite.

evaluate(self, predictions:List[updown.types.Prediction], iteration:Union[int, NoneType]=None) → Dict[str, Dict[str, float]][source]

Take the model predictions (in COCO format), submit them to EvalAI, and retrieve model performance based on captioning metrics.

Parameters
predictions: List[Prediction]

Model predictions in COCO format. They are a list of dicts with keys {"image_id": int, "caption": str}.

iteration: int, optional (default = None)

Training iteration where the checkpoint was evaluated.

Returns
Dict[str, Dict[str, float]]

Model performance based on all captioning metrics. Nested dict structure:

{
    "B1": {"in-domain", "near-domain", "out-domain", "entire"},  # BLEU-1
    "B2": {"in-domain", "near-domain", "out-domain", "entire"},  # BLEU-2
    "B3": {"in-domain", "near-domain", "out-domain", "entire"},  # BLEU-3
    "B4": {"in-domain", "near-domain", "out-domain", "entire"},  # BLEU-4
    "METEOR": {"in-domain", "near-domain", "out-domain", "entire"},
    "ROUGE-L": {"in-domain", "near-domain", "out-domain", "entire"},
    "CIDEr": {"in-domain", "near-domain", "out-domain", "entire"},
    "SPICE": {"in-domain", "near-domain", "out-domain", "entire"},
}