TRRD data: selections from the TRRD website data

TRRD comes from the Institute of Cytology and Genetics, and contains an extract (circa March 2009) including selected data from the database on Transcription Regulatory Regions of Eukaryotic Genes (TRRD). This database contains data on:

  • each gene containing a binding region or generating a transcription factor,
  • each transcription factor regulating a gene, and
  • each binding region (site) bound by a transcription factor.

Note that a gene may have mulitple binding regions, a binding region may be bound by mulitple transcription factors, and a transcription factor may bind multiple binding sites.

  • Gene IDs take the form "Axxxxx" and there are also with gene accession Ids of the form "Species:Name",
  • Transcription factor IDs take the form "Fxxxxx"; and
  • Binding region IDs take the form "Sxxxxx".

Fields containing these IDs are usually either primary or foreign key fields in the tables in which they appear.

In this deployment of the TRRD data, there are table for gene data and transcription factor data, but NOT binding region data.

To express the mulitplicity of interconnections, the TRRD data for genes contains records that link binding regions and genes with transcription factors. Locally, the genes data has been separated into 2 tables TRRD.GENES and TRRD.GENES_DR where the latter consists of the DR records from the downloaded TRRD genes file, and provides cross-linking identifiers to many gene ID systems, particularly SWISSPROT and NCBI Entrez GIDs.

See the TRRD web site or the summary paper

http://www.bionet.nsc.ru/meeting/bgrs_proceedings/papers/2006/BGRS_2006_V1_017.pdf

for more details.

Linkages within TRRD are made through the IDs assigned to each gene, transcription factor and binding region:

Fields containing these IDs are usually either primary or foreign key fields in the tables in which they appear.

This data is also available from the gene-regulation.com web site via Web forms.

Within CLSD this TRRD data can, of course, be accessed via SQL commands that can merge it with other data within CLSD.

TRRD Gene data

There are 2 tables for TRRD gene data. All data is of type VARCHAR. The main table is TRRD.GENES with the following layout:

Field nameVARCHAR
Length
TRRD_GENE_ID20
TRRD_ACC20
TRRD_SPECIES60
TRANSFAC_GENE_ID10
TRDD_GENE_NAME100

These fields were selected from the gene records provided through the Web site.

The table TRRD.GENES_DR contains entries for the "DR" records associated with a specific gene:

Field nameVARCHAR
Length
TRRD_ACC20
TRRD_GENE_ID20
DR_SCHEME15
DR_SCHEME_PART130
DR_SCHEME_PART2100

Note that this table includes GO annotations for each gene. Annotations from each of the 3 main GO categories (molecular function, biological process, and cellular component) are included, and there may be multiple annotations for a single GO component.

TRRD Factor data

There are 7 tables for the TRRD factor data. The main table is TRRD.FACTORS with the following layout:

Field nameVARCHAR
Length
TRRD_FACTOR_ID20
TRRD_GENE_ID20
TRRD_SITE_ACC20
TRRD_FACTOR_NAME100
TRRD_FACTOR_SUBUNIT_NAME80
TRRD_FACTOR_NAME_SYNONYM80
TRRD_FACTOR_SPECIES80
TRANSFAC_FACTOR_ID40
TRRD_FACTOR_SOURCE40

Here is an example showing how to list all the GENES entries for gene Hs:ADH3:

select 
   TRRD_FACTOR_ID, TRRD_GENE_ID, TRRD_SITE_ACC, TRRD_FACTOR_NAME 
from 
   trrd.factors a
where 
  a.trrd_gene_id = 'Hs:ADH3'

which gets the following result:

TRRD_FACTOR_ID (VARCHAR)TRRD_GENE_ID (VARCHAR)TRRD_SITE_ACC (VARCHAR)TRRD_FACTOR_NAME (VARCHAR)
F1945.1Hs:ADH3S1945RARbeta; retinoic acid receptor beta
F1946.1Hs:ADH3S1946C/EBPalpha; CCAAT/enhancer binding protein alpha
F1947.1Hs:ADH3S1947C/EBPalpha; CCAAT/enhancer binding protein alpha
F1950.1Hs:ADH3S1950TFIID;

showing that 4 transcription factors that bind to Hs:ADH3 at 4 unique sites.

The next example links the records that link to F1945.1 to other ID systems, by joining the GENES table with the GENES_DR table:

select 
   a.TRRD_FACTOR_ID, a.TRRD_GENE_ID, a.TRRD_SITE_ACC, b.TRRD_ACC,
   b.TRRD_GENE_ID, DR_SCHEME, DR_SCHEME_PART1, DR_SCHEME_PART2
from 
   trrd.factors a
join
   trrd.genes_DR b
on
   a.trrd_gene_id = b.trrd_acc
where
   a.trrd_gene_id = 'Hs:ADH3'
and
   a.trrd_factor_id = 'F1945.1'
The result is:

TRRD_
FACTOR_ID
TRRD_
GENE_ID
TRRD_
SITE_ACC
TRRD_ACCTRRD_
GENE_ID
Continued=>
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
F1945.1Hs:ADH3S1945Hs:ADH3A00374
Continued=>DR_SCHEMEDR_SCHEME_PART1DR_SCHEME_PART2
SWISS-PROTADHG_HUMANP00326
CleanExHGNC:251ADH1C
EnsemblENSG00000196616null
GenAtlasADH1Cnull
GeneCardsADH1Cnull
GeneLynxADH1Cnull
GOGO:0005737Cellular component: cytoplasm
GOGO:0004024Molecular function: alcohol dehydrogenase activity, zinc-dependent
GOGO:0006066Biological process: alcohol metabolism
HGNCHGNC:251ADH1C
HOVERGENP00326null
MIM103730null
SOURCEADH1CHs
EntrezGeneADH1C126