Home

TV_VTT (TrecVid Video-To-Text) Dataset

Data provided by National Institute of Standards and Technology

This dataset contains short videos (ranging from 3 seconds to 10 seconds) from TRECVID VTT task from 2016 to 2024. There are 73,893 videos with captions. Each video has between 2 and 5 captions, which have been written by dedicated annotators hired by NIST.

About this Dataset

Updated: 2025-04-06

Metadata Last Updated: 2025-01-06 00:00:00

Date Created: N/A

Data Provided by:

Dataset Owner: N/A

Access this data

View JSON data

Contact dataset owner Access URL
Landing Page URL

Table representation of structured data
Title	TV_VTT (TrecVid Video-To-Text) Dataset
Description	This dataset contains short videos (ranging from 3 seconds to 10 seconds) from TRECVID VTT task from 2016 to 2024. There are 73,893 videos with captions. Each video has between 2 and 5 captions, which have been written by dedicated annotators hired by NIST.
Modified	2025-01-06 00:00:00
Publisher Name	National Institute of Standards and Technology
Contact	mailto:[email protected]
Keywords	video captioning , video retrieval , video to text , image captioning

{
    "identifier": "ark:\/88434\/mds2-2545",
    "accessLevel": "public",
    "contactPoint": {
        "hasEmail": "mailto:[email protected]",
        "fn": "George Awad"
    },
    "programCode": [
        "006:045"
    ],
    "landingPage": "https:\/\/data.nist.gov\/od\/id\/mds2-2545",
    "title": "TV_VTT (TrecVid Video-To-Text) Dataset",
    "description": "This dataset contains short videos (ranging from 3 seconds to 10 seconds) from TRECVID VTT task from 2016 to 2024. There are 73,893 videos with captions. Each video has between 2 and 5 captions, which have been written by dedicated annotators hired by NIST.",
    "language": [
        "en"
    ],
    "distribution": [
        {
            "accessURL": "https:\/\/ir.nist.gov\/tv_vtt_data\/",
            "format": "videos are in mp4 and captions are in plain text.",
            "description": "This dataset contains short videos (ranging from 3 seconds to 10 seconds) from TRECVID VTT task from 2016 to 2021. There are 10,862 videos with captions. Each video has between 2 and 5 captions, which have been written by dedicated annotators hired by NIST.",
            "title": "TV_VTT"
        },
        {
            "accessURL": "https:\/\/ir.nist.gov\/tv_vtt_data\/Readme.txt",
            "format": "txt",
            "description": "A high-level readme file explaining how the dataset (videos and captions) are organized.",
            "title": "Readme"
        },
        {
            "accessURL": "https:\/\/doi.org\/10.18434\/mds2-2545",
            "title": "DOI Access for TV_VTT (TrecVid Video-To-Text) Dataset"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/mds2-2545\/V3C_VTT_Org.Form.txt",
            "format": "text file",
            "description": "Please submit the following data agreement form to access the TV_VTT (video to text development dataset)",
            "mediaType": "text\/plain",
            "title": "data agreement form"
        }
    ],
    "bureauCode": [
        "006:55"
    ],
    "modified": "2025-01-06 00:00:00",
    "publisher": {
        "@type": "org:Organization",
        "name": "National Institute of Standards and Technology"
    },
    "theme": [
        "Information Technology:Data and informatics"
    ],
    "keyword": [
        "video captioning",
        "video retrieval",
        "video to text",
        "image captioning"
    ]
}

Commerce Data Hub

TV_VTT (TrecVid Video-To-Text) Dataset

About this Dataset

Access this data

Department of Commerce

Breadcrumb

TV_VTT (TrecVid Video-To-Text) Dataset

About this Dataset

Access this data

Share this page