TABLE OF CONTENTS
[This page does not yet have documentation for all of the formats, but what is here should be correct.]
Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. When used by Galaxy, this format is tab-separated. It has three required fields and 12 additional optional ones. Files in this format must have the file extension '.bed'. More information is available in UCSC's document on custom tracks.
The first three BED fields (required) are:
The 12 additional BED fields (optional) are:
In order to use a field, all fields before it must be filled. The value used to indicate that a field is empty varies, as follows:
Example
Here's an example of two BED format lines:
chrom | chromStart | chromEnd | name | score | strand | thickStart | thickEnd | reserved | blockCount | blockSizes | blockStarts |
---|---|---|---|---|---|---|---|---|---|---|---|
chr3 | 214671 | 265280 | Hs.517745 | 300 | + | 214671 | 265280 | 0 | 3 | 104,80,2030, | 0,46624,48579, |
chrX | 156881 | 157496 | Hs.530320 | 300 | + | 156881 | 157496 | 0 | 2 | 231,384, | 0,231, |
Extended BED format is also tab-separated. The first 15 fields are the same as UCSC standard BED format, and it has three additional fields to accommodate multiple/flexible scores. Files in this format must have the file extension '.xbed'.
The three additional fields are:
As with standard BED format, all fields preceding the ones you want to use must be filled. The values used to indicate empty fields are the same as listed for UCSC standard BED format. Again, the value "NaN" (not-a-number) used by some databases is not supported.
Example
Here's an example of a complete line in extended BED format.
chrom | chromStart | chromEnd | name | score | strand | thickStart | thickEnd | reserved | blockCount | blockSizes | blockStarts | expCount | expIds | expScores | scoreCount | scores | scoreNames |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr15 | 93312259 | 93312615 | Hs.269535 | 300 | + | 93312259 | 93312615 | 0 | 1 | 356, | 0, | 0 | . | . | 2 | 80,20, | name1,name2, |
This is similar to ordinary tab-separated format, with the additional restriction that the first three fields must be chrom, chromStart, and chromEnd (as in BED format). Thus this format is intermediate in flexibility, providing Galaxy with the main fields it needs to perform operations and some other analyses, without being as restrictive as BED with regard to the other fields.
This is a format for storing pairwise genomic sequence alignments. For more information, please see UCSC's document axt Alignment Format.
The Stitch tool merges all alignment sets within an alignment (.axt, .stitch) into one alignment. Can have multiple alignment sets within a stitch file. Uses Genome Browser coordinates.
.stitch file format:
Example:
0 2 hg17,mm5 Version=1 2 chr7,+,27055889,27056316,427,248330 chr7,+,27057906,27058163,257,248330 ATGGAGAGCCGAAAGGACATGGTTGTGTTTCTGGATGGGGGTCAGCTTGGCACTCTGGTTGGCAAGAGAGTCTCAAATTTGTCCGAAGCCGTGGGCAGCCCGCTGCCGGAGCCGCCCGAGAAAATG ... 2 chr6,+,52261447,52261874,427,248330 chr6,+,52263340,52263597,257,248330 ATGGAGAGCCGAAAGGACATGGTTATGTTTCTGGATGGGGGTCAGCTTGGCACTCTGGTTGGTAAGAGGGTCTCTAATTTGTCCGAAGCCGTGAGCAGCCCGCTGCCTGAACCGCCAGAGAAGATG ... 1 2 hg17,mm5 Version=1 1 chr7,+,27058744,27059284,567,248330 GTGTGGTTCCAGAACCGGCGCATGAAGGACAAGCGGCAGCGCCTGGCCATGACGTGGCCGCACCCGGCGGACCCCGCCTTCTACACTTACATGATGAGCCATGCGGCGGCCGCGGGCGGCCTGCCC ... 1 chr6,+,52264137,52264704,567,248330 GTGTGGTTTCAGAACCGGCGCATGAAGGACAAGCGTCAGCGGCTGGCCATGACGTGGCCGCACCCGGCCGACCCTGCCTTCTACACCTACATGATGAGCCACGCGGCGGCCGCGGGCGGCCTGCCC ...