The Human Gene Connectome (HGC)

Top 5% of the TLR3 connectome.Links:

Citation and usage

When using the Human Gene Connectome Server (HGCS), a gene-specific connectome or any of the programs provided, please cite the following papers:

Itan Y, Zhang SY, Vogt G, Abhyankar A, Herman M, Nitschke P, Fried D, Quintana-Murci L, Abel L, Casanova JL (2013). The Human Gene Connectome as a Map of Short Cuts for Morbid Allele Discovery. Proceedings of the National Academy of Sciences of the United States of America.

Itan Y, Mazel M, Mazel B, Abhyankar A, Nitschke P, Quintana-Murci L, Boisson-Dupuis S, Boisson B, Abel L, Zhang SY, Casanova JL (2014). HGCS: an online tool for prioritizing disease-causing gene variants by biological distance. BMC Genomics.

Please refer to this paper for additional information about the human gene connectome concept and its applications in morbid allele discovery and genotype-phenotype studies.

The human gene-specific connectomes data and computer programs are freely available to non-commercial users.

Introduction

The HGC is the set of all biologically plausible routes, distances, and degrees of separation between all pairs of human genes. A gene-specific connectome contains the set of all available human genes sorted on the basis of their predicted biological proximity to the specific gene of interest. The HGC is a powerful approach for human genotype-phenotype high-throughput studies, for which it can be used to rank any list of genes within a gene-specific connectome for an experimentally validated core gene. See (1) below.

The human gene connectome server (HGCS) is an effective and easy-to-use interactive web server that enables researchers to prioritize any list of genes by their biological proximity to defined core genes (i.e. genes that are known to be associated with the phenotype), and to predict novel gene pathways.

Functional genomic alignment (FGA) is equivalent to traditional multiple sequence alignment (MSA), except that it clusters genes in trees on the basis of the functional biological distance between them predicted by HGC, rather than on the basis of molecular evolutionary genetic distance. This method is therefore more suitable for disease and phenotypic studies. See (2) below.

The "human gene-specific connectomes" folder contains a gene-specific connectome file for each human gene. Alternatively you can download all human gene-specific connectomes in one compressed file.

The "programs" folder contains the computer programs for ranking lists of genes within a gene-specific connectome, clustering and plotting the genes by the functional genomic alignment (FGA) approach and generating gene-specific connectomes (also see (3) below). All these aspects are described in more detail in the paper. The programs were developed and tested on Mac and Linux systems. The external software required for running these programs is open-source and free of charge.

Step-by-step instructions for the use of the programs  

  1. Ranking a list of genes on the basis of their biological proximity to any list of core genes of interest:
    • Ensure that the Python (2.X version) programming language is installed.
    • Create a text file for the list of human gene names to be ranked, one below the other, as follows:
      IKBKG
      IFNGR2
      STAT1
    • Create a text file for the list of human core gene names to be ranked by, one below the other, as follows
      TLR3
      IFNG
    • Download HGC_ranking.py and the gene-specific connectomes of the core gene of interest (for example - TLR3.txt and IFNG.txt, or download all gene-specific connectomes at once) into the same folder as the gene list text file.
    • Open a terminal window and, assuming that the files are located in a directory called /Users/Johndoe/hgc and the gene list file name is gene_list.txt type the following lines:
      cd /Users/Johndoe/hgc
      python HGC_ranking.py gene_list.txt core_gene_list.txt
    • The output file in this case would contain the list of genes extracted from the gene-specific connectome, ranked by biological distance, which can be opened in Microsoft Excel.
  2. Clustering genes by their biological distance from each other (FGA) and tree plotting:
    • Ensure that the R and Python programming languages and the Python Package NetworkX and the R package APE are installed.
    • Create a text file called list.txt for the list of human gene names to be clustered, as in (1).
    • Download the files Nodes_binding.txt, Edges_binding.txt, biological_distances_matrix.py, and FGA_plot.R into the same folder as list.txt
    • Open a terminal window and go to the folder containing the files, as in (1).
    • Type:
      python biological_distances_matrix.py
    • Open the output file biological_distance_matrix.txt in Microsoft Excel. Remove rows and columns that contain the number  999999 — these indicate genes that could not be traced in the HGC. Save as a text file named final_matrix.txt
    • Type:
      Rscript FGA_plot.R
    • The biological distances clustering plot is named FGA_tree.pdf
    • If the fonts and/or page size are not appropriate then open the file FGA_plot.R in a text editor. Change "width" and "height" to control page size, and change "cex" to control font size. Save the file and rerun.
  3. Generating a new gene-specific connectome:
    • Download the files Nodes_binding.txt and Edges_binding.txt, and Gene-specific_connectome.py into the same folder.
    • Add any new genes of interest (named, for example, X1) to Nodes_binding.txt and save the file. Add all known connections between X1 and other genes to Edges_binding.txt, where the 1st column is X1, the 2nd column is the gene that X1 connects to, and the 3rd column is the estimated distance (i.e. strength of connection) between the two genes, where 1.000 is the shortest possible distance (strongest connection) and 6.667 the longest distance. If distance cannot be estimated, then a value of 1.623 is suggested (the median distance between two directly connected genes). For example:
      X1    TLR3    1.1
      X1    X2    1.623
    • Open a terminal window and go to the folder containing the files, as in (1).
    • Type:
      python Gene-specific_connectome.py X1
    • The output file of the X1 connectome will be named X1.txt and can be opened and sorted with Microsoft Excel.
    • The same procedure could be repeated with any user-specific database of genes and direct connections of interest.

Contact

In case of problems or questions, please e-mail Yuval Itan at: yitan@rockefeller.edu