Releases: xjtu-omics/ChineseQuartetGenome
chinesequartet-v2.2
chinesequartet-v2.1
v2.1
Release notes
-
Defined a more exclusive Tier 1 benchmark regions.
-
For SVs, high-confidence calls (Tier 1) and corresponding regions of v2.1 of the benchmark were obtained according to the following steps:
- Initially, we identified candidate high-confidence SVs within benchmark regions. This included all SVs supported by two or more technologies. For technology-specific SVs, we considered the following three criteria to classify them as candidate high-confidence SVs:
- They were not located within segmental duplications, VNTR (variable number tandem repeat), or simple repeat regions.
- They were not reported through short-read analysis.
- They were reported in an orthogonal callset.
- Next, we refined the benchmark regions to create Tier 1 (v2.1) regions by excluding the following:
- Regions spanning SVs that did not meet the criteria in step a) were excluded.
- Non-diploid regions were excluded.
- Simple repeats that were overlap with variants not meet the criteria in step a) were excluded.
- The candidate high-confidence SVs fully within the Tier 1 regions were defined as high-confidence SVs.
- Initially, we identified candidate high-confidence SVs within benchmark regions. This included all SVs supported by two or more technologies. For technology-specific SVs, we considered the following three criteria to classify them as candidate high-confidence SVs:
-
For small variants, high-confidence calls (Tier 1) of v2.1 were obtained according to the three steps:
- Initially, we identified candidate high-confidence small variants within benchmark regions. This included all small variants supported by all three technologies. For variants supported by two technologies, we considered those were not located within segmental duplications, VNTR, simple repeat, or short tandem repeat regions as candidate high-confidence calls.
- Next, we refined the benchmark regions to create Tier 1 (v2.1) regions for small variants by excluding the following:
- Regions spanning SVs were excluded.
- Non-diploid regions were excluded.
- Regions spanning small variants that did not meet the criteria in step a) were excluded.
- Repeat regions (Simple repeats, segmental duplications, short tandem repeats, and VNTRs) that were overlap with small variants not meet the criteria in step a) were excluded.
- The candidate high-confidence small variants fully within the Tier 1 regions were defined as high-confidence small variants.
-
In this release, to determine whether a variant belongs to a specific region,
we require that the variant is 100% within that region.
Users have the flexibility to adjust this parameter (SV_overlap_ratio and Small_overlap_ratio) through the configuration file. -
In the query set, we also include variants in the bordering regions of a given region,
such as Tier 1. This inclusion helps mitigate false negatives caused by inaccurate
variant breakpoints. Users can manually verify the accuracy of variants
in these border regions. When calculating false positives, variants in the
bordering regions are excluded. Users can also adjust parameter size_tolerate_region_boundary_SV and
size_tolerate_region_boundary_Small in light of requirements.