A. Mouse genome
mm7 predictions: The coordinates are positions in the "mm7" build of the mouse genome (August 2005 assembly).
mm8 predictions: The coordinates are positions in the "mm8" build of the mouse genome (February 2006 assembly).
B. Human genome
hg17 predictions: The coordinates are positions in the "hg17" build of the mouse genome (May 2004 assembly).
hg18 predictions: The coordinates are positions in the "hg18" build of the mouse genome (Mar 2006 assembly). The preCRMs are genomic intervals that meet three thresholds:
- a regulatory potential (RP) score greater than 0.05
The RP scores (Taylor et al. 2006, Genome Research 16: 1596-1604. Full text of publication ) are based on a 7-species alignment. A conserved match to the GATA-1 consensus binding site (GATAcc) is a match to WGATAR in the reference sequence that also aligns with a match to WGATAR in at least one species from a different mammalian order. Specifically, a WGATAR motif in mouse must align with that same motif in human, chimp or dog; and a WGATAR motif in human must align with that same motif in mouse, rat or dog. Exons from UCSC Known Genes Track are subtracted.
- length of at least 200bp
- a conserved match to the consensus GATA-1 binding site.
To download the file (in bed format, what is a bed file?), CLICK HERE.
To view the preCRMs as a custom track in the UCSC Genome Browser, CLICK HERE .
To download the file (in bed format), CLICK HERE.
To view the preCRMs as a custom track in the UCSC Genome Browser, CLICK HERE .
To download the file (in bed format), CLICK HERE.
To view the preCRMs as a custom track in the UCSC Genome Browser, CLICK HERE .
To download the file (in bed format), CLICK HERE.
To view the preCRMs as a custom track in the UCSC Genome Browser, CLICK HERE .
Some of these DNA intervals were predicted to be erythroid CRMs, named as GenenameRn. These have:
Other intervals were predicted to be neutral, named as GenenameNCn. These have:
Other intervals were predicted to be promoters, named as Genenameprn. These have:
To download the data files in bed format,
To view the preCRMs as a custom track in the UCSC Genome Browser,All of these intervals come from eight mouse loci with genes whose expression changes (up or down) in G1E cells upon restoration of GATA-1 function. The combined targets cover about 1 million bp.
- RP scores greater than zero
Note that the threshold for RP is lower than from the one used in the genome-wide predictions. Based on the results in this paper, an RP score of at least 0.05 gives considerably better predictive value. We also discovered that size matters, and now set a minimum length of 200bp. Another result was that matches to the GATA-1 consensus motif were more reliable predictors of enhancer function than are matches to the weight matrix for GATA-1 binding sites.
- no minimal length
- conserved match to either the consensus GATA-1 binding site or a weight matrix describing that site.- alignments in multiple species
- RP scores less than zero
- no minimal length
- no match to either the consensus GATA-1 motif or the binding site weight matrix
- no evidence for constraint based on phastCons scores.- high RP score
- close to or overlap a transcription start site
CLICK HERE for mm7 coordinates.
CLICK HERE for mm8 coordinates.
CLICK HERE for hg17 coordinates (lifted over from mouse).
CLICK HERE for hg18 coordinates.
CLICK HERE for mm7 coordinates.
CLICK HERE for mm8 coordinates.
CLICK HERE for hg17 coordinates (lifted over from mouse).
CLICK HERE for hg18 coordinates.
A. Mouse genome
mm8 predictions: The coordinates are positions in the "mm8" build of the mouse genome (February 2006 assembly).
The preCRMs are genomic intervals that meet four criteria:
- a regulatory potential (RP) score greater than 0.05
- length of at least 200bp
- a match to WGATAR (the consensus GATA-1 binding site).
- an additional match to either WGATAR (GATA-1 motif) or CCNCACCCW (EKLF motif) or GGGCGG (Sp1 motif) or two CCWGs seperated by 6 Ns (CP2 motif).
To download the file (in bed format), CLICK HERE.
To view the preCRMs as a custom track in the UCSC Genome Browser, CLICK HERE .