DGN: The Disease Gene Network
- The DGN tables come from Supplementary Information distributed
with the paper:
Kwang-Il Goh, Michael E. Cusick David Valle Hum, Barton Childs Hum, Marc Vidal, and Albert-Laszlo Barabasi, The human disease network, PNAS, May 22, 2007, vol. 104, no. 21, 8685.
Schema in CLSD
- DISEASES (also known as Table 2: "Network characteristics of diseases")
-
This table provides data about each disease, and
DISEASE_ID is its primary key.
- GENES (also known as Table 3: "Network characteristics of genes")
-
This table provides data about each gene, and GENE_ID
is its primary key.
- PPI_RUAL_STELZL (also known as Table 4: The Protein-Protein Interaction table
-
This table encodes edges that connect any pair of genes whose proteins
interact in large scale screens as reported in two separate papers
- Rual J-P, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A,
Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al.,
Towards a proteome scale map of the human protein-protein interaction
network, Nature 437:1173, 2005.
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H,
Stroedicke M, Zenkner N, Schoenherr A, Koeppen S, et al.,
A human protein-protein interaction network: A resource for annotating
the proteome, Cell 122:957-968, 2005.
- Rual J-P, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A,
Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al.,
Towards a proteome scale map of the human protein-protein interaction
network, Nature 437:1173, 2005.
- SUB_DISEASES (aka Table 1: "Curated Morbid Map file with disease ID and class assignment (Dec 21,2005 version)")
-
This table shows the sub-diseases clustered under each disease
assigned a DISEASE_ID.
DISEASE_ID occurs only as foreign key, but this table has
no primary key. Note that each gene in a cluster (and represented
by the SAME DISEASE_ID) may implicate different genes and
chromosomes, and may be assigned different OMIM IDs.
Tables derived from the tables above
- GENE_DISEASE_CONNECTOR
-
Each entry in this table connects a gene to a disease, when
that gene has been implicated in the disease process. The table
includes one foreign key: GENE_ID. This table should contain the
same "data" as DISEASE_GENE_CONNECTOR, but in a different order.
- DISEASE_GENE_CONNECTOR
-
Each entry in this table connects a disease to a gene
implicated in the disease process. The table includes one
foreign key: DISEASE_ID.
This table should contain the same "data" as GENE_DISEASE_CONNECTOR,
but in a different order. Both tables have been retained to help
validate data.
- GENE_NETWORK
-
Each entry in this table connects a pair of genes that share a connection
to the same disease. These are edges in the gene-gene disease
network. The table includes nothing but 2 foreign keys: GENE_1_ID
and GENE_2_ID.
- PPI_SHORTEST_PATH_LENGTHS
-
This table includes a set of triples: source, target, and path length for
each gene in PPI_RUAL_STELZL. It was built by finding the shortest path
through PPI_RUAL_STELZL from every gene in PPI_GENES to every other gene
in PPI_GENES.
This table is quite large size (over 50 million records) which is
partly a result of storing the full matrix of directed connections rather
than a half-matrix.
PPI_SHORTEST_PATH_LENGTHS can be used to find the node degree of a gene by counting all the paths of length = 1 for that gene.
- PPI_GENES
- This table identifies the GENE_ID and GENE_SYMBOL for each gene in the PPI_RUAL_STELZL network. This list includes about 7500 genes, compared to the 1777 genes in the DGN GENES table, but about one third of the genes in the GENES table do not occur in the PPI_GENES table.
Notes:
- The GENE_NETWORK table can be constructed by joins involving
DISEASE_GENE_CONNECTOR and/or GENE_DISEASE_CONNECTOR.
For example, here is a self-join on GENE_DISEASE_CONNECTOR that
produces the same network edges as collected in the GENE_NETWORK table
(although the order of edge pairs is different):
select distinct a.GENE_ID, b.GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR a, DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR b where a.DISEASE_ID = b.DISEASE_ID and a.GENE_ID < b.GENE_ID order by a.GENE_ID, b.GENE_ID A self-join on DISEASE_GENE_CONNECTOR should produce identical output, as should an inner-join of GENE_DISEASE_CONNECTOR against DISEASE_GENE_CONNECTOR.The GENE_NETWORK table was included for user convenience, and as an aid to data validation.
- To get a list of all 8 genes associated with Alzheimer's one can use:
select GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR where DISEASE_ID = 98 For Alzheimer's, that gives 12 genes. - To get a list of the 16 disease IDs associated with those
Alzheimer's-related genes one can use:
select distinct DISEASE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR a where a.GENE_ID in (select GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR where DISEASE_ID = 98 ) - To get a list of the 95 genes associated with those diseases
related to Alzheimer's-related genes, one can use:
select * from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR d where d.DISEASE_ID in (select distinct DISEASE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR a where a.GENE_ID in (select GENE_ID from DISEASE_GENE_NET.GENE_DISEASE_CONNECTOR where DISEASE_ID = 98 ) )




