DGN: The Disease Gene Network

The DGN tables come from Supplementary Information distributed with the paper:
Kwang-Il Goh, Michael E. Cusick David Valle Hum, Barton Childs Hum, Marc Vidal, and Albert-Laszlo Barabasi, The human disease network, PNAS, May 22, 2007, vol. 104, no. 21, 8685.

Schema in CLSD

DISEASES (also known as Table 2: "Network characteristics of diseases")
This table provides data about each disease, and DISEASE_ID is its primary key.

GENES (also known as Table 3: "Network characteristics of genes")
This table provides data about each gene, and GENE_ID is its primary key.

PPI_RUAL_STELZL (also known as Table 4: The Protein-Protein Interaction table
This table encodes edges that connect any pair of genes whose proteins interact in large scale screens as reported in two separate papers

  • Rual J-P, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al., Towards a proteome scale map of the human protein-protein interaction network, Nature 437:1173, 2005.

  • Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner N, Schoenherr A, Koeppen S, et al., A human protein-protein interaction network: A resource for annotating the proteome, Cell 122:957-968, 2005.

This table contains 2 foreign keys, GENE_1_ID and GENE_2_ID, into Table 3. Note that this table is presented in a half-matrix format so that getting a list of all genes mentioned in the table requires a union of the distinct genes in each column. See the PPI_GENES table below for a list of all genes that appear in either column.

SUB_DISEASES (aka Table 1: "Curated Morbid Map file with disease ID and class assignment (Dec 21,2005 version)")
This table shows the sub-diseases clustered under each disease assigned a DISEASE_ID. DISEASE_ID occurs only as foreign key, but this table has no primary key. Note that each gene in a cluster (and represented by the SAME DISEASE_ID) may implicate different genes and chromosomes, and may be assigned different OMIM IDs.


Tables derived from the tables above

GENE_DISEASE_CONNECTOR
Each entry in this table connects a gene to a disease, when that gene has been implicated in the disease process. The table includes one foreign key: GENE_ID. This table should contain the same "data" as DISEASE_GENE_CONNECTOR, but in a different order.

DISEASE_GENE_CONNECTOR
Each entry in this table connects a disease to a gene implicated in the disease process. The table includes one foreign key: DISEASE_ID. This table should contain the same "data" as GENE_DISEASE_CONNECTOR, but in a different order. Both tables have been retained to help validate data.

GENE_NETWORK
Each entry in this table connects a pair of genes that share a connection to the same disease. These are edges in the gene-gene disease network. The table includes nothing but 2 foreign keys: GENE_1_ID and GENE_2_ID.

PPI_SHORTEST_PATH_LENGTHS
This table includes a set of triples: source, target, and path length for each gene in PPI_RUAL_STELZL. It was built by finding the shortest path through PPI_RUAL_STELZL from every gene in PPI_GENES to every other gene in PPI_GENES. This table is quite large size (over 50 million records) which is partly a result of storing the full matrix of directed connections rather than a half-matrix.

PPI_SHORTEST_PATH_LENGTHS can be used to find the node degree of a gene by counting all the paths of length = 1 for that gene.

PPI_GENES
This table identifies the GENE_ID and GENE_SYMBOL for each gene in the PPI_RUAL_STELZL network. This list includes about 7500 genes, compared to the 1777 genes in the DGN GENES table, but about one third of the genes in the GENES table do not occur in the PPI_GENES table.

Notes:

  • The GENE_NETWORK table can be constructed by joins involving DISEASE_GENE_CONNECTOR and/or GENE_DISEASE_CONNECTOR. For example, here is a self-join on GENE_DISEASE_CONNECTOR that produces the same network edges as collected in the GENE_NETWORK table (although the order of edge pairs is different): select distinct a.GENE_ID, b.GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR a, DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR b where a.DISEASE_ID = b.DISEASE_ID and a.GENE_ID < b.GENE_ID order by a.GENE_ID, b.GENE_ID A self-join on DISEASE_GENE_CONNECTOR should produce identical output, as should an inner-join of GENE_DISEASE_CONNECTOR against DISEASE_GENE_CONNECTOR.

    The GENE_NETWORK table was included for user convenience, and as an aid to data validation.

  • To get a list of all 8 genes associated with Alzheimer's one can use: select GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR where DISEASE_ID = 98 For Alzheimer's, that gives 12 genes.

  • To get a list of the 16 disease IDs associated with those Alzheimer's-related genes one can use: select distinct DISEASE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR a where a.GENE_ID in (select GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR where DISEASE_ID = 98 )

  • To get a list of the 95 genes associated with those diseases related to Alzheimer's-related genes, one can use: select * from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR d where d.DISEASE_ID in (select distinct DISEASE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR a where a.GENE_ID in (select GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR where DISEASE_ID = 98 ) )