Skip to Main Content

Human Gene Damage Index (GDI)

Patients' genes by number of variants.




Germline genetic mutations underlie a variety of diseases. A patient suffering from a rare genetic disease contains tens of thousands of genetic variations, only one (or a few) of which is disease-causing and the others are irrelevant for the disease. Genes that are highly mutated in healthy individuals are unlikely to be disease-causing. Therefore an estimate of accumulated mutational damage of each human gene can be particularly helpful in filtering out genes that are irrelevant for disease or phenotype.

The gene damage index (GDI) is the accumulated mutational damage of each human gene in healthy human population, based on the 1000 Genomes Project database (Phase 3) gene variations of healthy individuals and of the CADD score for calculating impact. We have shown that highly damaged human genes are unlikely to be disease-causing. GDI is very effective to filter out variants harbored in highly damaged (high GDI) genes that are unlikely to be disease-causing.

Please see below simple step-by-step instructions on how to use our provided computer programs for applying and automating the gene damage metric for human genetic disease and phenotypic studies with input data.

Step-by-step instructions for the use of the command line programs  

  1. Adding the Human Gene Damage Index (GDI) values to a list of human genes of any size:
    • Ensure that the Python (2.X version) programming language is installed.
    • Create a text file for the list of human gene names to be ranked called gene_list.txt, one gene name below the other, as follows:
    • Download and the file containing GDI_full.txt (containing the 3 metrics for each human gene) from the programs page into the same folder as the gene list text file.
    • Open a terminal window and, assuming that the files are located in a directory called /Users/Johndoe/GDI and the gene list file name is gene_list.txt, type the following lines:
      cd /Users/Johndoe/GDI
    • The output file in this case would contain the list of genes with their corresponding GDI raw and Phred-scaled values.

  2. Estimating a GDI cutoff values for a specific disease, above which a gene is likely to be irrelevant for the disease:
    • Collate all genes that are already known to be associated with the disease (if none, then ignore all below and use a cutoff value of GDI=13.84, based on the distribution of all disease-causing human genes). As in (1), create a text file named gene_list.txt
    • Add the GDI values for each gene in the list as in (1).
    • Install the Scipy and Numpy Python packages (2.X version). Download the program in the programs page.
    • Generate a text file of the genes' GDI values named values.txt, demonstrated in the programs page with the PID_genes_AR_GDI.txt file.
    • Open a terminal window and, assuming that the files are located in a directory called /Users/Johndoe/GDI and the gene list file name is PID_genes_AR_GDI.txt, type the following lines:
      cd /Users/Johndoe/GDI
    • The program will output the following for the 95% CI of the GDI values provided by the user: mean, median, lower boundary, upper boundary, minimum value, maximum value. The recommended GDI value above which a gene is unlikely to be disease-causing is the 95% CI upper boundary value.


When using the Gene Damage Index (GDI) of any human gene or the associated web server (also for extracting selective pressure estimates for human genes), please cite the following paper:

Yuval Itan, Bertrand Boisson, Lei Shang, Etienne Patin, Alexandre Bolze, Marcela Moncada-Vélez, Eric Scott, Michael Ciancanelli, Fabien Lafaille, Janet Markle, Ruben Martinez-Barricarte, Sarah Jill De Jong, Xiao-Fei Kong, Patrick Nitschke, Aziz Belkadi, Jacinta Bustamante, Anne Puel, Stéphanie Boisson-Dupuis, Peter D. Stenson, Joseph G. Gleeson, David N. Cooper, Lluis Quintana-Murci, Jean-Michel Claverie, Shen-Ying Zhang, Laurent Abel and Jean-Laurent Casanova (2015). The human gene damage index as a gene-level approach to prioritizing exome variants. PNAS.

Please refer to this paper for additional information about accumulated mutational damage of human genes and its applications in high throughput sequencing and human disease genetics and genomics studies.


In case of problems or questions, please e-mail Yuval Itan at:

Links:The Gene Damage Index (GDI) serverProgramsThe Human Gene Connectome (HGC)The Human Gene Connectome server (HGCS)The Mutation Significance Cutoff (MSC)MSC server Introd