Example 3: Using Galaxy to look for disease SNPs in a pedigree
For this example we will use an artificial set of disease SNPs from the
CFMDB
database. The SNPs are real but wouldn't necessarily all be in one
family. Three SNPs were chosen to cover looking for SNPs in different parts
of the gene. One is coding, one a splice site, and another the promoter.
Five genomes from the Complete Genomics
CEPH pedigree are used to plant the
disease associated SNPs in. This gives us a realistic background for the SNPs
as well as realistic results when using the genomes and the pedigree
information to filter the SNPs. These disease SNPs were chosen because
cystic fibrosis is a good example of recessive inherited disease and is
found in CEU.
Disease SNPs planted in the sample dataset
chr7 |
117119336 |
117119337 |
G |
promoter |
chr7 |
117144340 |
117144341 |
T |
exon |
chr7 |
117174423 |
117174424 |
A |
splicing |
Genomes used for pedigree
NA12877 |
father |
NA12878 |
mother |
NA12879 |
daughter |
NA12880 |
daughter |
NA12882 |
son |
This example builds a single sequential history, but there are links
to specific parts if you are interested in just one section.
The later parts do not go in as much detail if a similar step was done in
earlier ones, so if you are very unfamiliar with Galaxy it is best to go
through the full example.
Part 1:
Preparing input data.
- Importing files
- Concatenating datasets
Part 2:
Using the pedigree and recessive inheritance to filter SNPs.
- Filtering heterozygous/homozygous SNPs from a pgSnp file
- Filtering using pedigree
Part 3:
Removing SNPs found in healthy controls.
- Concatenate and merge control SNPs
- Subtracting from result set
Part 4:
Finding SNPs that are likely to be phenotype associated.
- aaChanges tool
- Computing flanking regions of exons
- Using ENCODE segmentations for non-coding SNPs
Part 5:
Using known gene-disease associations.
- Get genes associated with our disease
- Get gene neighborhoods
- Further narrow results