U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock () or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Breadcrumb

  1. Home

NIST Statistical Reference Datasets - SRD 140

The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

About this Dataset

Updated: 2024-02-22
Metadata Last Updated: 2003-11-20 00:00:00
Date Created: N/A
Views:
Data Provided by:
ANOVAs
Dataset Owner: N/A

Access this data

Contact dataset owner Access URL
Landing Page URL
Table representation of structured data
Title NIST Statistical Reference Datasets - SRD 140
Description The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.
Modified 2003-11-20 00:00:00
Publisher Name National Institute of Standards and Technology
Contact mailto:[email protected]
Keywords ANOVAs , Bayes theorems , Bayesian , Bayesian computations , Bayesian statistics , MCMC , algorithms , averages , benchmark data , benchmarks , computational accuracies , least squares , linear regressions , nonlinear regressions , numerical accuracies , numerical analysis , round off , rounding errors , roundings , software evaluations , standard deviations , statistical reference datasets , statistical software , summary statistics , variance analysis
{
    "identifier": "FF429BC178718B3EE0431A570681E858224",
    "accessLevel": "public",
    "references": [
        "http:\/\/www.itl.nist.gov\/div898\/strd\/general\/howto.html",
        "http:\/\/www.itl.nist.gov\/div898\/strd\/general\/faq.html"
    ],
    "contactPoint": {
        "hasEmail": "mailto:[email protected]",
        "@type": "vcard:Contact",
        "fn": "William F. Guthrie"
    },
    "programCode": [
        "006:052"
    ],
    "@type": "dcat:Dataset",
    "landingPage": "https:\/\/data.nist.gov\/od\/id\/FF429BC178718B3EE0431A570681E858224",
    "description": "The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.   Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method.   Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software.   The Statistical Reference Datasets is also supported by the Standard Reference Data Program.",
    "language": [
        "en"
    ],
    "title": "NIST Statistical Reference Datasets - SRD 140",
    "distribution": [
        {
            "accessURL": "https:\/\/dx.doi.org\/10.18434\/T43G6C",
            "format": "text\/html",
            "description": "DOI Access to NIST Statistical Reference Datasets - SRD 140",
            "mediaType": "text\/html",
            "title": "DOI Access to NIST Statistical Reference Datasets - SRD 140"
        }
    ],
    "license": "https:\/\/www.nist.gov\/open\/license",
    "bureauCode": [
        "006:55"
    ],
    "modified": "2003-11-20 00:00:00",
    "publisher": {
        "@type": "org:Organization",
        "name": "National Institute of Standards and Technology"
    },
    "accrualPeriodicity": "irregular",
    "theme": [
        "Standards:Reference data"
    ],
    "keyword": [
        "ANOVAs",
        "Bayes theorems",
        "Bayesian",
        "Bayesian computations",
        "Bayesian statistics",
        "MCMC",
        "algorithms",
        "averages",
        "benchmark data",
        "benchmarks",
        "computational accuracies",
        "least squares",
        "linear regressions",
        "nonlinear regressions",
        "numerical accuracies",
        "numerical analysis",
        "round off",
        "rounding errors",
        "roundings",
        "software evaluations",
        "standard deviations",
        "statistical reference datasets",
        "statistical software",
        "summary statistics",
        "variance analysis"
    ]
}

Was this page helpful?