U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock () or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Breadcrumb

  1. Home

Challenging Medically-Relevant Genes Benchmark Set

CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.

About this Dataset

Updated: 2025-04-06
Metadata Last Updated: 2021-09-29 00:00:00
Date Created: N/A
Data Provided by:
Dataset Owner: N/A

Access this data

Contact dataset owner Access URL
Landing Page URL
Table representation of structured data
Title Challenging Medically-Relevant Genes Benchmark Set
Description CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.
Modified 2021-09-29 00:00:00
Publisher Name National Institute of Standards and Technology
Contact mailto:[email protected]
Keywords Human genomics , DNA sequencing , Reference materials , Medical genomics , Bioinformatics , Bioinformatics
{
    "identifier": "ark:\/88434\/mds2-2475",
    "accessLevel": "public",
    "contactPoint": {
        "hasEmail": "mailto:[email protected]",
        "fn": "Nathanael David Olson"
    },
    "programCode": [
        "006:045"
    ],
    "landingPage": "https:\/\/data.nist.gov\/od\/id\/mds2-2475",
    "title": "Challenging Medically-Relevant Genes Benchmark Set",
    "description": "CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https:\/\/doi.org\/10.1038\/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https:\/\/doi.org\/10.1101\/2021.06.07.444885.",
    "language": [
        "en"
    ],
    "distribution": [
        {
            "accessURL": "https:\/\/ftp-trace.ncbi.nlm.nih.gov\/ReferenceSamples\/giab\/release\/AshkenazimTrio\/HG002_NA24385_son\/CMRG_v1.00\/",
            "description": "NCBI Hosted Genome In A Bottle FTP Site",
            "title": "GIAB FTP Site"
        },
        {
            "accessURL": "https:\/\/github.com\/usnistgov\/giab-cmrg-benchmarkset",
            "description": "Github repository with code used to generate benchmark sets.",
            "title": "Code Repository"
        },
        {
            "accessURL": "https:\/\/github.com\/usnistgov\/cmrg-benchmarkset-manuscript",
            "description": "Github repository with code used to generate figures and perform analysis for manuscript.",
            "title": "Code for Manuscript Analysis Repository"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/README.md",
            "mediaType": "text\/plain"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SmallVariant\/HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SmallVariant\/HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SmallVariant\/HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002_CHM13_CMRG_smallvar_v1.00_GRCh38-equiv-regions_draft.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.dip.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.dip.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.dip.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.hap1.bam",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.hap1.bam.bai",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.hap2.bam",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/CHM13v1.0\/SupplementaryFiles\/HG002v11-align2-CHM13v1.0\/HG002v11-align2-CHM13v1.0.hap2.bam.bai",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SmallVariant\/HG002_GRCh37_CMRG_smallvar_v1.00.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SmallVariant\/HG002_GRCh37_CMRG_smallvar_v1.00.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SmallVariant\/HG002_GRCh37_CMRG_smallvar_v1.00.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/StructuralVariant\/HG002_GRCh37_CMRG_SV_v1.00.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/StructuralVariant\/HG002_GRCh37_CMRG_SV_v1.00.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/StructuralVariant\/HG002_GRCh37_CMRG_SV_v1.00.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/GRCh37_CMRG_benchmark_gene_coordinates.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.hap2.bam",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.hap2.bam.bai",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SmallVariant\/HG002_GRCh38_CMRG_smallvar_v1.00.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SmallVariant\/HG002_GRCh38_CMRG_smallvar_v1.00.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.dip.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.dip.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.dip.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.hap1.bam",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh37\/SupplementaryFiles\/HG002v11-align2-GRCh37\/HG002v11-align2-GRCh37.hap1.bam.bai",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_curation_medicalgene_smallvar_complexrepeat_errorsorunsure_repeatexpanded.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_hifiasm_error.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_mrg_full_gene.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/HiCanu_2.1_HG002_GRCh37_difficult_medical_gene_smallvar_benchmark_v0.02.03_intersected_FPs_repeatexpanded_slop50.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SmallVariant\/HG002_GRCh38_CMRG_smallvar_v1.00.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/StructuralVariant\/HG002_GRCh38_CMRG_SV_v1.00.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/StructuralVariant\/HG002_GRCh38_CMRG_SV_v1.00.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/StructuralVariant\/HG002_GRCh38_CMRG_SV_v1.00.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/GRCh38_CMRG_benchmark_gene_coordinates.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.dip.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.dip.vcf.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.dip.vcf.gz.tbi",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.hap1.bam",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.hap1.bam.bai",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.hap2.bam",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/HiCanu_2.1_HG002_GRCh38_difficult_medical_gene_smallvar_benchmark_v0.02.03_intersected_FPs_repeatexpanded_slop50.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/HiCanu_2.1_HG002_GRCh38_difficult_medical_gene_smallvar_benchmark_v0.02.03_intersected_subtract_FPs_repeatexpanded_slop50_manual_curation_sites.tsv_manual_curation_sites.txt",
            "mediaType": "text\/plain"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/benchmark_sets\/GRCh38\/SupplementaryFiles\/HG002v11-align2-GRCh38\/HG002v11-align2-GRCh38.hap2.bam.bai",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/chksum.md5",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh37_MRG_GAPs.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh37_curation_medicalgene_SV_errorsorunsure.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh37_curation_medicalgene_smallvar_complexrepeat_errorsorunsure_repeatexpanded.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh37_hifiasm_error.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh37_mrg_full_gene.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_CD4_gaps.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_CD4_gaps_slop50.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_MRG_GAPs.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/GRCh38_curation_medicalgene_SV_errorsorunsure_repeatexpanded.bed",
            "mediaType": "application\/octet-stream"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/dependencies\/combined%20curation%20responses%20from%20benchmarking%20with%20sm%20variant%20v0.02.03%20-%20GRCh37andGRCh38.tsv",
            "mediaType": "text\/tab-separated-values"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/hifiasm-assembly\/HG002-v0.11.mat.fa.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/hifiasm-assembly\/HG002-v0.11.mat.gff.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/hifiasm-assembly\/HG002-v0.11.pat.fa.gz",
            "mediaType": "application\/gzip"
        },
        {
            "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-2475\/hifiasm-assembly\/HG002-v0.11.pat.gff.gz",
            "mediaType": "application\/gzip"
        },
        {
            "accessURL": "https:\/\/doi.org\/10.18434\/mds2-2475",
            "title": "DOI Access for Challenging Medically-Relevant Genes Benchmark Set"
        }
    ],
    "bureauCode": [
        "006:55"
    ],
    "modified": "2021-09-29 00:00:00",
    "publisher": {
        "@type": "org:Organization",
        "name": "National Institute of Standards and Technology"
    },
    "theme": [
        "Bioscience:Genomics"
    ],
    "keyword": [
        "Human genomics",
        "DNA sequencing",
        "Reference materials",
        "Medical genomics",
        "Bioinformatics",
        "Bioinformatics"
    ]
}