Joining on identifiers

Back at Galaxy, we are ready to do a join to get the gene symbols associated with the damaging SNPs. This time we don't have genomic positions in both sets, but they do have a field in common: the UniProt ID. We will do the join by matching up the values in those columns. In the section Join, Subtract, and Group there is a tool called Join Two Datasets. The parameters for this tool are the two datasets, the columns to match up, and what to do with unmatched rows. By opening the two datasets in the history you can find the column numbers containing the UniProt IDs (blue boxes). Make the selections shown, putting the SNP dataset first so its fields will be first in the result rows. We are not interested in any of the rows that don't join, so leave the rest of the options set to "No". Then click Execute.

[screen shot]