Part 4: Finding SNPs that fall in suspected functional regions
Overview:
Filter the input dataset (from Part 1) to keep only rows whose intervals
overlap those in a library dataset of predicted regulatory regions.
In a similar fashion, find rows in the same input dataset that overlap
with those in an ENCODE regulatory dataset (DNase clusters) obtained
from UCSC.
Run the PhyloP tool on the same input dataset to add a column of
interspecies conservation scores. Then use the Histogram tool to help
choose a suitable score threshold, and filter the SNPs on the score column
to keep only those at highly conserved positions.