Finding a gene

After the check for the top hits, another natural question is to assess the evidence of association in particular genes which may have strong priori probability of influencing a phenotype (for example because they have been associated to the phenotype in other studies,, or because the gene is a target of a drug in a pharmacogenetic study, and so on).

To manually answer this question is time-consuming and error-prone. The first problem is, are the physical coordinates listed in the MAP file used for the WGA project up to date? What version of genome build are they based on? Can they be directly used to align with the gene coordinates in the currently available public databases? And, further, what version of genome build are the public databases based on? Another issue is how to consider all of the alternative transcripts.

WGAViewer offers an accurate and efficient way to solve these problems by always applying the latest available genome build coordinates to all data that will be used to align with each other. More specifically, all the annotated coordinates listed as annotation results by this software are always based on the latest Ensembl Core, Variation, and GO databases. Every coordinate from other sources, for example, the coordinates input from the MAP file, or coordinates from HapMap database (based on NCBI b36, dbSNP b136 as in Aug, 2007) will always be compared with core databases and then aligned. This enables an accurate annotation and solves the problem of discrepancies of genome builds from different sources, especially when trying to align the WGA result set with gene context. Furthermore, this process takes advantage of the efficient monthly updating system of Ensembl.

For alternative transcripts, WGAViewer reports the longest form, alongside all the available short forms based on the Ensembl core database, plus the up- and down-stream span specified by the user when trying to locate a gene and annotate the related genomic region.

For example, HLA-B variants have been related to HIV-related phenotypes in a number of previous publications. To find HLA-B in the WGA dataset, click on menu “ Tools->Find a gene ”. This will bring up dialogs for parameters (Figure 3.4-1, 3.4-2). One has the options to exclude the annotation for selection score and LD matrix, and the options for the sources of LD calculation.

Figure 3.4-1. Parameters for finding a gene.

Figure 3.4-2. Parameters for annotating chromosomal region. (Click to enlarge)

Figure 3.4-3 shows the annotation results for finding HLA-B in the example dataset. As shown, there seems no particularly interesting finding located within HLA-B. However, not far from HLA-B, there are two genome-wide significant hits, one is close to HLA-C (rs9264942), the other is located within HCP5 (rs2395029).

Figure 3.4-3. Finding a gene: annotation results. (Click to enlarge)