Home

IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.

Data provided by National Institute of Standards and Technology

Cross-language information extraction and retrieval datasets developed for the evaluation of the IARPA BETTER program. The documents come from CommonCrawl. The IE annotations in three schemas are by MITRE and ARLIS. The IR queries and relevance judgments were done at NIST, and NIST was asked by IARPA to distribute the data in its final form. The tasks are all cross-language from English into one of Arabic, Farsi, Russian, Chinese, and Korean

About this Dataset

Updated: 2025-04-06

Metadata Last Updated: 2023-02-24 00:00:00

Date Created: N/A

Data Provided by:

Dataset Owner: N/A

Access this data

View JSON data

Contact dataset owner Landing Page URL
Download URL

Table representation of structured data
Title	IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.
Description	Cross-language information extraction and retrieval datasets developed for the evaluation of the IARPA BETTER program. The documents come from CommonCrawl. The IE annotations in three schemas are by MITRE and ARLIS. The IR queries and relevance judgments were done at NIST, and NIST was asked by IARPA to distribute the data in its final form. The tasks are all cross-language from English into one of Arabic, Farsi, Russian, Chinese, and Korean
Modified	2023-02-24 00:00:00
Publisher Name	National Institute of Standards and Technology
Contact	mailto:[email protected]
Keywords	information extraction; information retrieval; cross-language information retrieval

{
    "identifier": "ark:\/88434\/mds2-2946",
    "accessLevel": "public",
    "contactPoint": {
        "hasEmail": "mailto:[email protected]",
        "fn": "Ian Soboroff"
    },
    "programCode": [
        "006:045"
    ],
    "landingPage": "https:\/\/ir.nist.gov\/better\/",
    "title": "IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.",
    "description": "Cross-language information extraction and retrieval datasets developed for the evaluation of the IARPA BETTER program.  The documents come from CommonCrawl.  The IE annotations in three schemas are by MITRE and ARLIS.  The IR queries and relevance judgments were done at NIST, and NIST was asked by IARPA to distribute the data in its final form.  The tasks are all cross-language from English into one of Arabic, Farsi, Russian, Chinese, and Korean",
    "language": [
        "en"
    ],
    "distribution": [
        {
            "downloadURL": "https:\/\/ir.nist.gov\/better\/",
            "mediaType": "application\/octet-stream",
            "title": "The BETTER datasets"
        }
    ],
    "bureauCode": [
        "006:55"
    ],
    "modified": "2023-02-24 00:00:00",
    "publisher": {
        "@type": "org:Organization",
        "name": "National Institute of Standards and Technology"
    },
    "theme": [
        "Information Technology:Data and informatics"
    ],
    "keyword": [
        "information extraction; information retrieval; cross-language information retrieval"
    ]
}

Commerce Data Hub

IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.

About this Dataset

Access this data

Department of Commerce

Breadcrumb

IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.

About this Dataset

Access this data

Share this page