Skip to Main Content

Blacklists of non-pathogenic NGS variants

Blacklists performance demonstration. STAT1 is a gene known to cause CMC, and becomes significant only after applying the blacklists filter.

Links:

 

Introduction

Practical analysis of human exomes from patients with rare genetic diseases is aimed at obtaining a short list of candidate variations that can be computationally and experimentally explored. In this process, a key step is the removal of non-pathogenic variations (NPV) that otherwise pollute and complicate exome analysis. Public databases, such as the 1,000 Genomes Project and the Exome Aggregation Consortium (ExAC), provide the allele frequencies of hundreds of thousands of variants in the general population, allowing the efficient removal of many variants whose frequencies do not match the prevalence of a disease in question, taking into account clinical penetrance. However, despite these resources, thousands of variants remain in exomes that are absent from public databases or are considered rare (MAF <1%) variants. No current approaches use databases of variants occurring in an internal cohort to remove variants too common to explain a rare disease from patient exomes. Therefore, a simple and automated tool for the creation of such internal databases would be an invaluable tool for researchers.

The blacklist approach is a method of creating a list of variations from an investigated cohort that are too common to explain rare diseases with an extremely low false-negative rate. Because blacklists are based on the exomes under analysis, they are highly specific and efficient for the removal of internally common variants. We have observed that the blacklist approach captures a subset of variants that are not present or rare in public databases but are disproportionately common within a cohort. Thus, blacklists are able to capture NPV that are otherwise resistant to elimination.

We provide our pre-calculated blacklists calculated using cutoffs of 0.01, 0.03, 0.05, 0.10 in a cohort of 3,104 exomes from patients suffering from severe infectious diseases and immune deficiency syndromes ("PID"). In addition, we provide the individual and combined blacklists generated from the following cohorts: 3,869 exomes from patients suffering from neurological diseases (“Neuro”); (2) 902 exomes from patients suffering from diseases with an infectious phenotype (“Infection”); and (3) 400 exomes (100 from Europeans and 300 from Africans) from a study on the demographic history of Central Africans (“Africa”). Finally, we have developed ReFiNE (Reducing False-positives in NGS Elucidation) an easy-to-use tool for the creation of blacklists from internal cohorts. Lastly, we designed a web-server to annotate whether a list of variants are included in our blacklists.

 

Citation

When using the blacklist method, ReFiNE, or the associated web-server, please cite the following paper:

Maffucci P*, Bigio B*, Rapaport F, Cobat A, Borghesi A, Lopez M, Patin E, Bolze A, Shang L, Bendavid M, Scott EM, Stenson PD, Cunningham-Rundles C, Cooper DN, Gleeson JG, Fellay J, Quintana-Murci L, Casanova JL, Abel L, Boisson B#, Itan Y#. Blacklisting variants common in private cohorts but not in public databases optimizes human exam analysis. PNAS 2018 doi: 10.1073/pnas.1808403116


Contact

In case of problems or questions, please e-mail Benedetta Bigio at bbigio@rockefeller.edu or Patrick Maffucci at patrick.maffucci@icahn.mssm.edu

Links:The Blacklists (BL) serverProgramsThe Gene Damage Index (GDI)GDI serverThe Human Gene Connectome (HGC)The Human Gene Connectome server (HGCS)The Mutation Significance Cutoff (M