Phenotype Association Tools in Galaxy

Example 3: Using Galaxy to look for disease SNPs in a pedigree

For this example we will use an artificial set of disease SNPs from the CFMDB database. The SNPs are real but wouldn't necessarily all be in one family. Three SNPs were chosen to cover looking for SNPs in different parts of the gene. One is coding, one a splice site, and another the promoter. Five genomes from the Complete Genomics CEPH pedigree are used to plant the disease associated SNPs in. This gives us a realistic background for the SNPs as well as realistic results when using the genomes and the pedigree information to filter the SNPs. These disease SNPs were chosen because cystic fibrosis is a good example of recessive inherited disease and is found in CEU.

Disease SNPs planted in the sample dataset

chr7	117119336	117119337	G	promoter
chr7	117144340	117144341	T	exon
chr7	117174423	117174424	A	splicing

Genomes used for pedigree

NA12877	father
NA12878	mother
NA12879	daughter
NA12880	daughter
NA12882	son

This example builds a single sequential history, but there are links to specific parts if you are interested in just one section. The later parts do not go in as much detail if a similar step was done in earlier ones, so if you are very unfamiliar with Galaxy it is best to go through the full example.

Part 1: Preparing input data.

Importing files
Concatenating datasets

Part 2: Using the pedigree and recessive inheritance to filter SNPs.

Filtering heterozygous/homozygous SNPs from a pgSnp file
Filtering using pedigree

Part 3: Removing SNPs found in healthy controls.

Concatenate and merge control SNPs
Subtracting from result set

Part 4: Finding SNPs that are likely to be phenotype associated.

aaChanges tool
Computing flanking regions of exons
Using ENCODE segmentations for non-coding SNPs

Part 5: Using known gene-disease associations.

Get genes associated with our disease
Get gene neighborhoods
Further narrow results