VOTSr2026 Referral Challenge tutorial
In the Referral Challenge, each video sequence comes with multiple text prompts describing a target object. Your tracker must produce a segmentation mask for each frame, for each prompt, independently.
Unlike the other VOTS challenges, trackers are not integrated with the toolkit through the TraX protocol. The toolkit is used only to (i) download the dataset, (ii) format and validate the predictions and (iii) package them for submission. You are free to run your tracker any way you like.
Step 1: Download the dataset
pip install vot-toolkit
vot initialize vots2026/votsr --workspace ~/referral_workspace
This creates ~/referral_workspace/sequences/ with one folder per sequence:
~/referral_workspace/
├── sequences/
│ ├── list.txt (list of sequence names, one per line)
│ ├── aurora_2_1_160/
│ │ ├── sequence (metadata: name, fps, prompts, frame pattern)
│ │ ├── color/ (video frames)
│ │ │ ├── 00000001.jpg
│ │ │ ├── 00000002.jpg
│ │ │ └── ...
│ │ └── ...
│ ├── biol_1_1_99/
│ │ └── ...
│ └── smoke_2_70_256/
│ └── ...
└── ...For development you can use the smaller validation dataset, which ships with ground-truth masks so you can run analysis locally:
vot initialize vots2026/votsrval --workspace ~/referral_dev_workspace
The sequence metadata file
Each sequence directory contains a sequence file with key=value metadata:
channels.color=color/%08d.jpg
format=default
fps=24
name=aurora_2_1_160
prompts=the aurora;the lights in the sky;the northern lights at night;the aurora in the eveningprompts— semicolon-separated text prompts. Your tracker must produce one set of per-frame masks per prompt.channels.color— printf-style pattern for frame filenames (1-indexed).fps— frame rate of the video.
Reading the workspace data
import os
from PIL import Image
workspace = os.path.expanduser("~/referral_workspace")
sequences_dir = os.path.join(workspace, "sequences")
with open(os.path.join(sequences_dir, "list.txt")) as f:
sequence_names = [line.strip() for line in f if line.strip()]
for seq_name in sequence_names:
seq_dir = os.path.join(sequences_dir, seq_name)
prompts = []
with open(os.path.join(seq_dir, "sequence")) as f:
for line in f:
if line.startswith("prompts="):
prompts = line.strip().split("=", 1)[1].split(";")
frame_dir = os.path.join(seq_dir, "color")
frame_files = sorted(os.listdir(frame_dir)) # ["00000001.jpg", ...]
for prompt in prompts:
for frame_file in frame_files:
image = Image.open(os.path.join(frame_dir, frame_file))
# mask = your_tracker(image, prompt) -> numpy array (H, W) of 0s and 1s
Step 2: Run your tracker
Run your tracker however you like. For each sequence, for each prompt, produce a binary segmentation mask per frame. Do not use vot evaluate — it is not supported for the referral challenge.
Step 3: Format the predictions
File naming
Each prediction file is named <sequence_name>_<prompt_hash>.txt, where prompt_hash is the first 8 characters of the MD5 hash of the prompt string:
from hashlib import md5
prompt = "the aurora"
prompt_hash = md5(prompt.encode()).hexdigest()[:8] # -> "b2654d24"
filename = f"aurora_2_1_160_{prompt_hash}.txt" # -> "aurora_2_1_160_b2654d24.txt"
File contents
Each .txt file has one line per frame. Each line is an RLE-encoded binary mask using the VOT toolkit’s mask format:
from vot.region import Mask
import numpy as np
# your_mask: a 2D numpy array (H, W) of 0s and 1s
mask_rle = str(Mask(your_mask)) # -> "m42,0,320,480,..."
If the object is not visible in a frame, write a single 0 for that line.
Directory structure
Place the .txt files under a baseline/ directory:
baseline/
├── aurora_2_1_160/
│ ├── aurora_2_1_160_b2654d24.txt (prompt: "the aurora")
│ ├── aurora_2_1_160_d6ee0fdd.txt (prompt: "the lights in the sky")
│ ├── aurora_2_1_160_211dfaae.txt (prompt: "the northern lights at night")
│ └── aurora_2_1_160_4b34746a.txt (prompt: "the aurora in the evening")
├── biol_1_1_99/
│ └── ...
└── smoke_2_70_256/
└── ...Each sequence folder must contain one .txt file per prompt, and the number of lines in each file must equal the number of frames in the sequence.
Step 4: Pack the submission
Place your predictions inside the workspace results directory and register the tracker in trackers.ini:
~/referral_workspace/results/<tracker_id>/baseline/<seq_name>/<seq_name>_<hash>.txt# ~/referral_workspace/trackers.ini
[my_tracker]
label = My Tracker
protocol = trax
command = dummy
The section name (e.g. my_tracker) is your tracker identifier. The command field is required by the toolkit but is not used here, since the tracker was run externally.
Then run vot pack to validate the predictions and produce a submission zip:
vot pack --workspace ~/referral_workspace my_tracker
This produces my_tracker_<timestamp>.zip containing baseline/ and manifest.yml in the workspace.
Step 5: Submit
Upload the zip on the VOTSr2026 evaluation server (link tba). The tracker identifier in manifest.yml (the [section] name in trackers.ini) must match the tracker short name registered on the VOTSr2026 registration form; otherwise the submission will be rejected.
Performance scores are sent to the registered email after evaluation completes.
Quick reference
| Item | Detail |
|---|---|
| Workspace init | vot initialize vots2026/votsr --workspace <path> |
| Prompt hash | md5(prompt.encode()).hexdigest()[:8] |
| Mask format | str(Mask(numpy_array)), or 0 for absent |
| File naming | <seq_name>_<prompt_hash>.txt, one line per frame |
| Submission zip | vot pack <tracker_id> (contains baseline/ + manifest.yml) |
| Identifier | section in trackers.ini = identifier in manifest.yml = registered tracker short name |
Helper: convert predictions to the submission format
import os
from hashlib import md5
from vot.region import Mask
def write_predictions(predictions, results_dir):
"""
predictions: dict mapping seq_name -> dict mapping prompt -> list of (H, W) numpy masks
(one mask per frame; use None or an all-zero mask for "object absent").
results_dir: <workspace>/results/<tracker_id>
"""
for seq_name, prompt_preds in predictions.items():
out_dir = os.path.join(results_dir, "baseline", seq_name)
os.makedirs(out_dir, exist_ok=True)
for prompt, masks in prompt_preds.items():
prompt_hash = md5(prompt.encode()).hexdigest()[:8]
with open(os.path.join(out_dir, f"{seq_name}_{prompt_hash}.txt"), "w") as f:
for mask in masks:
if mask is None or mask.sum() == 0:
f.write("0\n")
else:
f.write(str(Mask(mask)) + "\n")
After writing the files, run vot pack (Step 4) to validate and zip the submission.
