Phenotype Association Tools in Galaxy
Galaxy is a software framework
that provides web-based tools for bioinformatics, including tasks
useful in the analysis of human variation. The developers maintain a
public server at Penn State, and the software is also freely available
for local installation and customization. Galaxy is highly extensible,
so as new tools become available (not necessarily written specifically
for Galaxy) they can be added to increase the power and flexibility of
the system.
This tutorial focuses on some of the tools available on the public
Galaxy server that are useful for exploring possible associations
between human genetic variants and phenotypes. It traces step-by-step
through several examples. For a more general introduction to using
Galaxy, please see the documentation available at
galaxyproject.org.
Basics:
A brief orientation to the fundamentals of using Galaxy
Example 1:
Using Galaxy to look for disease SNPs in a full genome
This example illustrates several methods for examining a single
full-coverage genome to look for single-nucleotide polymorphisms
(SNPs) that are either known to be associated with disease, or
suspected to have impact for other reasons. It makes use of public
genomic data, tools designed specifically for working with variants,
and also some general tools for text manipulation and operations on
genomic coordinates.
Example 2:
Using Galaxy to look for SNPs differing between populations
This example illustrates methods for comparing two populations
to look for fixed differences between them. It starts with publicly
available SNP data and a known phenotype-associated SNP, and looks
for the SNP in the same manner as you would in a case-control
study. This example uses both general tools and some tools specifically
intended for working with variants.
Example 3:
Using Galaxy to look for disease-associated SNPs in a pedigree
This example illustrates methods for looking for disease-associated SNPs
in full-coverage genomes in a family. It uses the CEPH pedigree genomes
provided by Complete Genomics, plus planted disease SNPs from the CFMDB
database.
Example 4:
Using Galaxy to look for population structure and selective sweeps
This example illustrates using low-coverage sequence data with tools
for examining population structure and detecting selective sweeps. This is
an intermediate-level example, and assumes you already have some of the basic
Galaxy skills, such as importing datasets, that are covered in the earlier
examples.
Conventions used in this tutorial:
Red arrows or boxes on the screenshots indicate settings or things you
will need to do. Green arrows are the "go" buttons once the settings
or parameters are selected. Blue arrows or boxes point out additional
information that you should note, but they don't require any action.
Funding for our work on assembling and documenting the Phenotype Association
tools was provided by NIH grant UL1 RR033184-01 to the Penn State Clinical
and Translational Science Institute. This project is funded, in part, under
a grant with the Pennsylvania Department of Health using Tobacco CURE Funds.
The Department specifically disclaims responsibility for any analyses,
interpretations, or conclusions.