![]() |
|
|
Vol. 29, Issue 4, Part 2, 544-547, April 2001
John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia (P.G.B., G.C., L.J., N.T., A.C.B.); and Department of Pharmacology and Physiology, University of Rochester Medical Center, Rochester, New York (H.-F.T., M.W.A.)
| |
Abstract |
|---|
|
|
|---|
The human expressed sequence tag (EST) database can be searched by
different sequence alignment strategies to identify new members of gene
families and allelic variants. To illustrate the value of database
analysis for gene discovery, we have focused on the glutathione
S-transferase (GST) super family, an approach that has
led to the identification of the Zeta class. The Zeta class GSTs
catalyze the glutathione-dependent biotransformation of
-haloacids
and the isomerization of maleylacetoacetic acid to fumarylacetoacetic
acid, an essential step in the catabolism of tyrosine. Allelic
variants of the GST Z1 and GST A2 genes
have also been identified by EST database analysis. One GST Z1 variant (GST Z1A) has significantly higher activity with dichloroacetic acid as
a substrate than other GST Z1 isoforms. This variant may be important
in the clinical treatment of lactic acidosis where dichloroacetic acid
is prescribed. Our experience with the application of EST database
searching methods suggests that it may be productively applied to other
gene families of pharmacogenetic interest.
| |
Introduction |
|---|
|
|
|---|
The human expressed sequence tag (EST1) database contains in excess of 1.6 million partial cDNA sequences. These sequences have been obtained from over 200 cDNA libraries prepared from most tissues. The EST database therefore contains cDNA sequences from a large proportion of the genes expressed in human tissues. Since the cDNA libraries have been prepared from tissues obtained from different individuals, the database also contains polymorphic variants of many genes. Thus, the application of bioinformatic searching strategies to the vast amount of information contained within the EST database can reveal new members of gene families and polymorphic variants of previously discovered genes. Although this approach could be used for many gene families, it is particularly valuable for pharmacogenetic studies wherein it is important to identify variants that may have unusual capacities for the metabolism of xenobiotics and therapeutic agents.
In several experiments carried out to illustrate the value of EST
database analysis, we have focused our attention on the glutathione
transferase (GST) gene family (Board et al., 1997
; Blackburn et al.,
2000
). The GSTs are a large family of phase II enzymes that conjugate
glutathione to a wide range of generally hydrophobic and electrophilic
compounds including many carcinogens, therapeutic drugs, and the
products of oxidative metabolism. Genetic factors that modulate GST
expression and function can be clinically important. Overexpression of
certain GSTs in tumors has been shown to contribute to drug resistance,
and genetically determined deficiencies of the GST M1 or
GST T1 genes have been shown to be risk factors for several
cancers (for review see Chenevix-Trench et al., 1995
; Hayes and
Pulford, 1995
). In addition, homozygosity for the GST M1
null allele is a significant positive prognostic indicator for
successful chemotherapy and long-term survival in children with acute
lymphoblastic leukemia (Hall et al., 1994
). In recent studies,
polymorphic variants of GST P1 that influence substrate specificity
have been shown to be risk factors for Parkinson's disease in subjects
exposed to pesticides (Zimniak et al., 1994
; Menegon et al., 1998
).
Our analysis of the EST database has utilized differing strategies to identify new members of the GST super family and allelic variants of particular genes. Although there are limitations and pitfalls with this approach, it has successfully identified the Zeta class of GSTs and allelic variants that significantly influence the function of GST Z1-1. Here we review the application of these techniques to the identification of the Zeta class and the detection of allelic variants in the Alpha and Zeta class GSTs.
| |
Materials and Methods |
|---|
|
|
|---|
The BLAST Programs Used in These Studies (Altschul et al.,
1998
) are available online from the National Center for
Biotechnology Information (www.ncbi.nlm.nih.gov) and can be used to
efficiently search sequence databases by aligning a query sequence with
all entries in the database.
Identification of New GST Classes. A variant of the BLAST program, tblastn, can be used to search translations of nucleotide databases with protein sequence. Because of the degeneracy of the genetic code, searching for homologs across wide evolutionary distances has greater sensitivity if protein sequence is used as the query sequence.
To determine whether GST-like proteins identified in lower species have counterparts in humans, we used the tblastn program to search the human EST database with sequences from several plants and insects. The default parameters for the BLAST search were retained, and the similar ESTs thus identified were studied in further detail to determine whether they represented novel or previously identified GSTs. Subsequently, where there was evidence of multiple clones encoding a previously undescribed sequence, a representative EST clone was sequenced completely.Identification of Allelic Variants. To search the EST database for allelic variants of a particular gene, the complete cDNA sequence was used as the query sequence in the blastn program using the default settings. The output of the program used the "flat query-anchored with identities" format, which allows rapid visual scanning of the aligned nucleotides for variation.
| |
Results and Discussion |
|---|
|
|
|---|
Identification of the Zeta Class GSTs.
A BLAST search of the human EST database with the amino terminal
sequence of a GST-like protein identified in carnation (Dianthus caryophyllus) petals (Meyer et al., 1991
) revealed a number of similar clones from several cDNA libraries derived from a range of
tissues including brain, breast, fibroblast, heart, liver, melanocyte,
placenta, skeletal muscle, and pancreas (Board et al., 1997
). A
representative EST clone (N31040) was completely sequenced and was
found to contain an open reading frame encoding 216 residues with a
deduced molecular size of 24166 Da. The size of the encoded peptide
falls well within the range of the previously described cytosolic GST
subunits (Board et al., 1997
).
|
Functional Characterization of GST Z1-1.
Although database analysis has led to the discovery of the Zeta class,
this strategy provides few specific indications of the protein's
function. The detection of GST Z1 cDNA clones in libraries from a wide
range of tissues and the detection of homologs, over such a wide
evolutionary range, suggested that the Zeta class GSTs may catalyze the
metabolism of a common metabolic product or a significant component in
the environment (Board et al., 1997
). In initial studies with
recombinant GST Z1-1 expressed in Escherichia coli, a range
of substrates utilized by other members of the GST family were tested.
With the exception of glutathione peroxidase activity with
t-butyl hydroperoxide and cumene hydroperoxide, there was
little detectable activity. Recent studies have shown that GST Z1-1
can catalyze the glutathione-dependent oxygenation of dichloroacetic
acid to glyoxylic acid (Tong et al., 1998a
). Subsequently, a number of
other
-haloalkanoic acids have been shown to be substrates (Tong et
al., 1998b
). Dichloroacetic acid has been shown to be carcinogenic in
male B6C3 F1 mice and male Fischer 334 rats (Herren-Friend et al.,
1987
; Bull et al., 1990
; DeAngelo et al., 1996
; Pereira, 1996
). Humans
can be exposed to dichloroacetic acid in drinking water, as it is one
of the most common disinfection by-products found in chlorinated water
supplies (Uden and Miller, 1983
). Despite its potential
carcinogenicity, dichloroacetic acid is used clinically in the
treatment of congenital lactic-acidosis because of its ability to
stimulate mitochondrial pyruvate dehydrogenase (Stacpoole et al.,
1997
). Fernandez-Canon and Penalva (1998)
cloned and sequenced a cDNA
encoding human maleylacetoacetate isomerase (MAAI) and found that it
was identical to GST Z1. MAAI catalyzes the glutathione dependent
isomerization of maleylacetoacetate to fumarylacetoacetate. This
reaction is an essential step in the catabolism of phenylalanine and
tyrosine. Deficiencies of other steps in this pathway cause
alcaptonuria, phenylketonuria, and tyrosinemia (Fernandez-Canon and
Penalva, 1998
). This essential metabolic role for GST Z1 provides an
explanation for its conservation and expression in such a wide range of species.
Identification of Allelic Variation by Database Analysis. Allelic variants that alter the substrate specificity, reaction kinetics, or stability of an enzyme can be of particular clinical significance and result in increased or decreased drug clearance, drug toxicity, and susceptibility to environmental carcinogens or toxins. Identification of such variants has often been dependent on the detection of an associated phenotype or the chance sequencing of a variant cDNA. Thus, many variants may exist that have not been discovered and characterized.
Because of the number of sequences and the number of individuals represented in the EST database, it provides an excellent resource to screen for genetic variation in most frequently expressed genes. In developing a strategy to detect allelic variation embedded in the EST database, we have again used the BLAST programs. In this case, we have used complete cDNA sequences as the query sequence and the blastn alignment option. The output is selected in the "flat query-anchored with identities" format that places the query sequence at the top and aligns all the matched sequences below (Fig. 2). Comparison of the vertical columns rapidly reveals base substitutions that differ from the query sequences and can be evaluated as potential polymorphisms. However, because the EST sequences are generated by single pass automated sequencing, many of the deposited sequences contain errors that generate false positives in the search for polymorphisms. If it is assumed that the sequencing errors are random, then the number of positive positions can be substantially reduced to those that show the same base substitution in more than one EST sequence. Unfortunately, this step tends to limit the detection of rare variants. Figure 2 shows sample data of this type obtained with the human GST Z1 cDNA sequence. In this alignment, there are several examples of single base substitutions that are excluded from further study. However, at nucleotides 94 and 124, the A to G substitutions are clearly frequent. EST clones containing repeated variations such as these can be obtained for re-sequencing, and if confirmed, a diagnostic test using PCR and restriction enzyme digestion or related procedures can be designed to search for the polymorphism in a normal population sample.
|
Polymorphism of GST Z1.
This EST approach has led us to identify a number of polymorphic sites
in GST Z1. As shown in Table 1, each
allele that has been identified so far has a different combination of
nucleotide changes that give rise to amino acid substitutions. Each of
these variants reaches polymorphic frequencies in the Australian
European population (Blackburn et al., 2000
).
|
Alpha Class Variants. We have also screened the EST database for allelic variants in the Alpha class GSTs. There are three members of the human Alpha class for which full cDNA sequences are known (GST A1, GST A2, and GST A4). So far, two variants in GST A2 have been confirmed. These polymorphisms result in Thr112Ser and Glu210Ala substitutions that reach polymorphic frequencies in the Australian European populations.
Limitations Associated with EST Database Analysis.
Our strategy for the analysis of the EST database for allelic variants
has used readily available BLAST programs and simple rules for the
elimination of false positives. While this approach has obvious
advantages, such as simplicity and demonstrable productivity, there are
a number of disadvantages and limitations that should be recognized.
| 1. | The cDNA that is being studied must be present in libraries from a number of different tissues to ensure that the clones searched are derived from as many individuals as possible. The more clones identified in different libraries, the greater the sensitivity of the search. Rare variants will be missed by this procedure, as they will probably be attributed to sequencing errors. |
| 2. | There is little information available on the ethnic origin of the tissues used to construct the cDNA libraries represented in the database. Some variants that are frequent in one ethnic group may be rare or absent in another. |
| 3. | As the ESTs are only sequenced from the 5' and 3' ends, there are relatively few sequences covering the central regions of large cDNAs, thus lowering the probability of detecting variation in these areas. |
| |
Footnotes |
|---|
Send reprint requests to: P. G. Board, Molecular Genetics Group, John Curtin School of Medical Research, P.O. Box 334, Australian National University, Canberra ACT 2601, Australia. E-mail: Philip.Board{at}anu.edu.au
| |
Abbreviations |
|---|
Abbreviations used are: GST, glutathione transferase; EST, expressed sequence tag; MAAI, maleylacetoacetate isomerase.
| |
References |
|---|
|
|
|---|
-haloacids.
Chem Res Toxicol
11:
1332-1338[Medline].This article has been cited by other articles:
![]() |
E. L. Abel, S. M. Opp, C. L. M. J. Verlinde, T. K. Bammler, and D. L. Eaton Characterization of Atrazine Biotransformation by Human and Murine Glutathione S-Transferases Toxicol. Sci., August 1, 2004; 80(2): 230 - 238. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Abel, T. K. Bammler, and D. L. Eaton Biotransformation of Methyl Parathion by Glutathione S-Transferases Toxicol. Sci., June 1, 2004; 79(2): 224 - 232. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Ye and J. M. Parry The discovery and confirmation of single nucleotide polymorphisms in the human p53R2 gene by EST database analysis Mutagenesis, September 1, 2002; 17(5): 361 - 364. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||