Hardison Lab Datasets and Supplements
e-mail address: rch8@psu.edu
Current Directory Entry
We are generating datasets informative for both comparative genomics and epigenetic regulation of gene expression during hematopoiesis (especially erythroid and myeloid differentiation in mouse). These are deposited in appropriate databases. The information on this page not only provides pointers to those resources, it also has links to datasets derived from the initial data, such as the collection of DNA segments that show conservation of transcription factor occupancy between mouse and human.
If you use these datasets, please cite the listed reference in any publications.
Mouse ENCODE: Conservation of occupancy by transcription factors between mouse and human
- Reference: Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, .. many authors .. Stamatoyannopoulos JA, Snyder MP, Guigo R, Gingeras TR, Gilbert DM, Hardison RC, Beer MA, Ren B; Mouse ENCODE Consortium. (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355-364. doi: 10.1038/nature13992. PubMed PMID: 25409824; PubMed Central PMCID: PMC4266106. Full text.
- Reference: Cheng Y, Ma Z, Kim BH, Wu W, Cayting P, Boyle AP, Sundaram V, Xing X, Dogan N, Li J, Euskirchen G, Lin S, Lin Y, Visel A, Kawli T, Yang X, Patacsil D, Keller CA, Giardine B; Mouse ENCODE Consortium, Kundaje A, Wang T, Pennacchio LA, Weng Z, Hardison RC, Snyder MP. (2014) Principles of regulatory information conservation between mouse and human. Nature 515:371-375. PubMed PMID: 25409826. Full text.
- All the DNA segments at which TF occupancy was conserved between mouse and human are available in this zipped file . The files include the DNA segments of conserved occupancy ascertained on mouse as the first species (e.g. CHD1 in mouse CH12 cells) and on human as the first species (e.g. CHD1 on human GM12878 cells). Only the DNA segments at which occupancy by the homologous TF was also found on an orthologous DNA segment in the second species are included. Each TF-cell line pair is a separate file. The mouse DNA segments are in mm9 coordinates; the human ones are in hg19 coordinates. A readme file is also included.
- Reference: Denas O, Sandstrom R, Cheng Y, Beal K, Herrero J, Hardison RC, and Taylor J (2015) Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution. BMC Genomics 16:87. Full text.
- A database with a query interface for the data analyzed in Denas et al. You can find mouse DNA segments bound by various TFs in different cell types, and you can specify the level of epigenomic conservation (defined in the paper). The data were generated and compiled by Olgert Denas. Belinda Giardine installed them in a database with a query engine and a history page to enable downloads and some simple analyses. Results can be ported to Galaxy for additional analyses (e.g. this was how the CCO set was generated, see next item).
- CCO (combined conserved occupancy) mouse DNA segments with TF-cell line annotations: A combined set of DNA segments showing conservation of TF occupancy was generated from the data for both the Cheng et al. (2014) and Denas et al. (2015) papers. This dataset combines the Denas et al. functCons and functActive segments with the Cheng et al. segments of conserved occupancy. Overlapping DNA segments were merged to obtain a set of 58,649 mouse DNA intervals (mm10 coordinates). This file has the locations and annotations for all TF-cell type combinations that mapped to each DNA segments. The file is very large (16.7 MB).
- CCO mouse DNA segments without TF-cell line annotations: This file has the locations but no annotations for TF-cell type combinations that mapped to each DNA segments. The file is 1.3 MB.
- CCO human DNA segments without TF-cell line annotations: This file has the locations but no annotations for TF-cell type combinations that mapped to each DNA segments, in hg19 coordinates. The file is 1.3 MB.
Dynamics of transcription factor occupancy and RNA production during mouse hematopoietic differentiation to megakaryocytes and erythroblasts (Mouse ENCODE)
- Reference: Pimkin M, Kossenkov AV, Mishra T, Morrissey CS, Wu W, Keller CA, Blobel GA, Lee D, Beer MA, Hardison RC, Weiss MJ. (2014) Divergent functions of hematopoietic transcription factors in lineage priming and differentiation during erythro-megakaryopoiesis. Genome Res. 24:1932-1944. doi: 10.1101/gr.164178.113. PubMed PMID: 25319996; PubMed Central PMCID: PMC4248311. Full text.
- All ChIP-seq data are available from the UCSC Genome Browser ; under Expression and Regulation use the multitracks PSU TFBS and PSU Histone. They are also available from GEO and the ENCODE Data Portal . The microarray expression data are at GEO under Series Accession: GSE51337. All RNA-seq data are available from UCSC Genome Browser ; under Expression and Regulation use the multitrack PSU RNA-seq.
- Reference: Wu W, Morrissey CS, Keller CA, Mishra T, Pimkin M, Blobel GA, Weiss MJ, Hardison RC. (2014) Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis. Genome Res. 24:1945-62. doi: 10.1101/gr.164830.113. Epub 2014 Oct 15. PubMed PMID: 25319994; PubMed Central PMCID: PMC4248312. Full text.
- All ChIP-seq data determined in our lab are available from the UCSC Genome Browser ; under Expression and Regulation use the multitracks PSU TFBS and PSU Histone. They are also available from GEO , Series Accession: GSE51338, and the ENCODE Data Portal .
Dynamics of the epigenetic landscape during GATA1-dependent erythroid maturation in mouse: Histone modifications, DNase sensitivity, TAL1 and GATA2 binding genome-wide
- Reference: Wu W, Cheng Y, Keller CA, Ernst J, Kumar SA, Mishra T, Morrissey C, Dorman CM, Chen KB, Drautz D, Giardine B, Shibata Y, Song L, Pimkin M, Crawford GE, Furey TS, Kellis M, Miller W, Taylor J, Schuster SC, Zhang Y, Chiaromonte F, Blobel GA, Weiss MJ, Hardison RC. (2011) Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. Genome Res. 21:1659-1671. PubMed PMID: 21795386. Full text.
- Supplementary Material.
- A Data Library in Galaxy with Illumina reads, mapped reads, peak calls and signal (wiggle) files. Use this for easy downloads or for using the data in your own analyses, e.g. using Galaxy. The epigenetic data are:
- H3K4me1
- H3K4me3
- H3K27me3
- H3K9me3
- GATA1
- GATA2
- TAL1
- DNase-seq
- RNA-seq
- chromatin states based on histone modification patterns
in these mouse cell types (some features are not included for some cell types):
- G1E (Gata1- cells, model for erythroid progenitor cells)
- G1E-ER4 cells induced with estradiol (Gata1 restored as a hybrid gene with ER; model for differentiating erythroblasts)
- fetal liver erythroblasts (Ter119+ primary cells)
- Customized genome browser to view the data in a browser. This is a PSU customization of UCSC Genome Browser; go to mouse genome section of this browser. Data are on both mm8 and mm9.
- UCSC Genome Browser to view these data as part of the mouse ENCODE consortium. Data are being added in Aug, Sept and Oct 2011; they are on mm9.
Mapping DNA segments occupied by GATA1 and the response in gene expression throughout the the mouse erythroid genome; Distinguishing induction from repression
- Yong Cheng, Weisheng Wu, Swathi Ashok Kumar, Duonan Yu, Wulan Deng, Tamara Tripic, David C. King, Kuan-Bei Chen, Ying Zhang, Daniela Drautz, Belinda Giardine, Stephan C. Schuster, Webb Miller, Francesca Chiaromonte, Yu Zhang, Gerd A. Blobel, Mitchell J. Weiss, and Ross C. Hardison (2009) Erythroid GATA1 function revealed by genome-wide analysis of transcription factor occupancy, histone modifications and mRNA expression. Genome Research 19: 2172-2184. Full text of publication.
- Supplementary Material.
- An Excel spreadsheet with 15,859 genes, expression levels (and standard deviations) during induction, and categorization into induced, repressed or nonresponsive.
- An Excel spreadsheet with 15,360 peaks of GATA1 occupancy, partitioned into all, those found by ChIP-seq and ChIP-chip, and those found by only one approach.
- To view the data in a browser (PSU customization of UCSC Genome Browser) and to download the data from a "Table Browser" go to the mouse genome section of this browser (Data are on both mm8 and mm9).
Identification of DNA segments occupied by GATA1 on mouse chromosome 7, and demonstration that binding site motifs in enhancers are subject to evolutionary constraint
- Cheng Y, King DC, Dore LC, Zhang X, Zhou Y, Zhang Y, Dorman C, Abebe D, Kumar SA, Chiaromonte F, Miller W, Green RD, Weiss MJ, and Hardison RC (2008) Transcriptional enhancement by GATA1-occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif. Genome Research 18: 1896-1905 Full text of publication.
- For datasets used in this paper, including occupancy data, enhancement, and phylogenetic conservation of motifs (cladistic motifs), click here.
Predictions of erythroid cis-regulatory modules and results of experimental tests
- Hao Wang, Ying Zhang, Yong Cheng, Yuepin Zhou, David C. King, James Taylor, Francesca Chiaromonte, Jyotsna Kasturi, Hanna Petrykowska, Bryan Gibb, Christine Dorman, Webb Miller, Louis C. Dore, John Welch, Mitchell J. Weiss, Ross C. Hardison (2006) Experimental Validation of Predicted Mammalian Erythroid Cis-Regulatory Modules. Genome Res. 16 : 1480-1492 Full text of publication.
- Predicted erythroid cis-regulatory modules in the mouse and human genomes, plus the intervals tested in Wang et al. 2006. Click here.
- Results for all the 99 tested fragments, pdf file. Click here.
- Results for all the 99 tested fragments, Excel file. Click here.
- DNA sequences of erythroid predicted cis-regulatory modules (preCRMs) in eight selected mouse loci, all tested for function. Click here.
Predictions of cis-regulatory modules genome-wide
- James Taylor, Svitlana Tyekucheva, David C. King, Ross C. Hardison, Webb Miller, and Francesca Chiaromonte (2006) ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res. 16 : 1596-1604. Full text of publication.
- Predicted cis-regulatory modules in the human genome (hg17, May 2004 assembly), from the Taylor et al. work. These have an RP score of at least 0.05 for at least 200 bp and do not overlap exons in the Known Genes set. These can be downloaded in BED format here.
- Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D, Coulombe B, Robert F. (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16 : 656-668 publication and website . Our lab had nothing to do with this work, but it is a really good source of CRMs predicted by conservation of binding sites.
- Here are the PReMods from Blanchette et al., for the human genome (hg17, May 2004 assembly) in BED format. Click here.
- Intersection of the High RP segments and the PReMods for the human genome (hg17, May 2004 assembly) in BED format. I call them PRPs. Click here.
Reference sets of known CRMs in the human beta-globin gene complex
- King, David C., James Taylor, Laura Elnitski, Francesca Chiaromonte, Webb Miller, and Ross C. Hardison (2005) Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequence. Genome Res. 15 : 1051-1060. Reprint
- Known cis-regulatory modules in the human HBB gene complex, hg16 coordinates, custom track for UCSC Genome Browser. Click here.
- Known cis-regulatory modules in the human HBB gene complex, hg17 coordinates, custom track for UCSC Genome Browser. Click here.
- Known cis-regulatory modules in the human HBB gene complex, hg17 coordinates, file in bed format for use in Galaxy. Click here.
Microarray expression results on MEL_RL5 cells induced to mature to hemoglobinized erythroblasts with HMBA.
- Almost 7000 mouse genes, represented by 70-mers (from Qiagen/Oligo) spotted at the PSU/Huck Institute Microarray Facility, were hybridized with labeled cDNA from mRNA at progressive stages of induction, days 0-6. The Excel spreadsheets of the results can be obtained by clicking here.
- Citation for the microarray results is Hao Wang, Ying Zhang, Yong Cheng, Yuepin Zhou, David C. King, James Taylor, Francesca Chiaromonte, Jyotsna Kasturi, Hanna Petrykowska, Bryan Gibb, Christine Dorman, Webb Miller, Louis C. Dore, John Welch, Mitchell J. Weiss, Ross C. Hardison (2006) Experimental Validation of Predicted Mammalian Erythroid Cis-Regulatory Modules. Genome Res. 16 : 1480-1492. Full text of publication.
Conservation and constraint in ENCODE predicted transcriptional regulatory regions
- David C. King, James Taylor, Ying Zhang, Yong Cheng, Heather A. Lawson, Joel Martin, ENCODE groups for Transcriptional Regulation and Multispecies Alignment, Francesca Chiaromonte, Webb Miller, and Ross C. Hardison (2007) Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. Genome Research 17 : 775-786. Full text.
- Predicted transcriptional regulatory regions (pTRRs) in ENCODE regions, which are high-probability hits to sites occupied by sequence specific binding proteins supported by data on chromatin alterations or histone modifications. This is a composite set based on ENCODE data as filtered and analyzed in the King, Taylor et al. 2007 paper. The coordinates are for hg17, the May 2004 assembly of the human genome. Click here.
- DNase hypersensitive sites and promoters from the ENCODE transcriptional regulation group. The coordinates are for hg17, the May 2004 assembly of the human genome. Click here.
Co-variation in frequencies of substitution, deletion, transposition and recombination during eutherian evolutions
- Hardison, R.C., K.M. Roskin, S. Yang, M. Diekhans, W.J. Kent, R. Weber, L. Elnitski, J. Li, M. O'Connor, D. Kolbe, S. Schwartz, T.S. Furey, S. Whelan, N. Goldman, A. Smit, W. Miller, F. Chiaromonte and D. Haussler (2003) Co-variation in frequencies of substitution, deletion, transposition and recombination during eutherian evolution. Genome Res. 13 : 13-26. Full text.
- Description of columns in datafiles Click here.
- Various measures of sequence similarity (compared to mouse) and genomic properties were computed in windows across the human genome. The data files for 1 Mb non-overlapping windows are here and the ones for 5 Mb windows with a 1Mb slide (overlapping by 4Mb) are here.
Presentations on using vertebrate genome comparisons and epigenetics to predict and test cis-regulatory modules
Page created: Thursday, 13-Jan-2005, updated 28-Jul-2015