 |
Introduction |
The
human cytochrome P450 (CYP1) 2A gene
subfamily was previously found to comprise three genes,
CYP2A6, CYP2A7, and CYP2A13, as well
as two identical copies of a CYP2A7 pseudogene,
CYP2A7PT and CYP2A7PC (or CYP2A7P1),
which contain putative CYP2A coding sequences corresponding
to exons 1 through 5 (Fernandez-Salguero et al., 1995
). Both CYP2A6 and
CYP2A13 are active toward many carcinogens and other toxicants, such as
the tobacco-specific carcinogen
4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (Fernandez-Salguero and
Gonzalez, 1995
; Su et al., 2000
) and the herbicide
2,6-dichlorobenzonitrile (Ding et al., 1996
; Su et al., 2000
), whereas
CYP2A7 is not functional (Yamano et al., 1990
; Ding et al., 1995
).
CYP2A6 is also the major coumarin 7-hydroxylase and nicotine C-oxidase
in the liver (Fernandez-Salguero and Gonzalez, 1995
; Messina et al.,
1997
; Yamazaki et al., 1999
). Genetic polymorphisms in the
CYP2A6 gene have been identified (Fernandez-Salguero et al.,
1995
; Nunoya et al., 1999
; Oscarson et al., 1999
), and intensive
efforts are being made to determine the role of CYP2A6 gene
mutations in individual susceptibility to disease and environmental
toxicity (Pianezza et al., 1998
; Kamataki et al., 1999
; Kitagawa et
al., 1999
; Miyamoto et al., 1999
). However, the genotyping methods used
in at least some of these studies have been found to be
unreliable (Oscarson et al., 1998
; Pianezza et al., 1998
; Sabol and
Hamer, 1999
), partly because of the presence of multiple
CYP2A genes with high sequence homology.
The CYP2A genes are located in the CYP2A-2B-2F
gene cluster (Hoffman et al., 1995
) on human chromosome 19, where two
mutant CYP2G genes, CYP2GP1 and
CYP2GP2, were also found (Sheng et al., 2000
). In mammalian
animals, there appears to be only a single CYP2G gene, named
CYP2G1, which is expressed only in the olfactory mucosa, and
is active toward sex steroid hormones as well as odorants and other
xenobiotics (cf. Hua et al., 1997
; Gu et al., 1998
). However, the two
human CYP2G genes both have loss-of-function mutations:
CYP2GP1 contains a single nucleotide deletion in exon 2 and
a 2.4-kbp deletion between exons 3 and 7, whereas CYP2GP2 contains two nonsense mutations in exons 1 and 3, respectively (Sheng
et al., 2000
). In the current study, in an effort to identify additional, potentially functional CYP2G-related genes, we
have isolated a new CYP2A gene fragment (named
CYP2A7P2), which is linked to CYP2GP1 in an
outward opposite orientation. The deduced partial amino acid sequence
of CYP2A7P2 is 81 to 87% identical to corresponding
sequences (exons 6-9) in the other human CYP2As. An analysis of CYP2A
sequence alignment suggests that CYP2A7P2 may be derived
from the same ancestral gene that gave rise to CYP2A7P1.
Materials and Methods
An EMBL3 genomic library, which was prepared with genomic DNA
derived from the leukocytes of an adult female (CLONTECH, Palo Alto,
CA), was screened by plaque hybridization with a
32P-labeled E7-8 probe (about 0.2 kbp). The probe
was obtained previously by PCR from human genomic DNA and contained
parts of exons 7 and 8 and the entire intron 7 of a putative human
CYP2G gene (Sheng and Ding, 1996
). A total of 1.2 × 106 phages were screened at a density of 200 plaque-forming units/cm2. A single
positive clone was isolated. The cloned DNA fragment (named S34) was
released by digestion with SalI and inserted into pCR-Script vector (Stratagene, La Jolla, CA) to give pS34
plasmid. The latter was further digested with BamHI,
EcoRI, HindIII, or Sau3AI to obtain
various subclones. Nucleic acid sequences were determined initially
from the subclones using vector primers, which revealed both
CYP2A- and CYP2GP1-like sequences. Subsequently, pS34 was sequenced directly using primers designed according to CYP2GP1, CYP2A6, and CYP2A13
sequences, and by primer walking.
A bacteriophage P1 genomic library constructed with DNA from the
foreskin fibroblasts of a Caucasian male was also screened (by Genome
Systems, St Louis, MO) using the E7-8 probe. Only one positive clone,
named P1G1, was identified; the recombinant DNA was isolated using a
modified alkaline lysis procedure provided by Genome Systems. The
isolated P1G1 DNA, containing an 85-kbp insert, was digested with
PstI or BstEII for Southern blot detection of
CYP2-related fragments. DNA fragments were resolved by
pulsed-field agarose gel electrophoresis with use of a Hoefer PC500
Switchback pulse controller (Hoefer Pharmacia Biotech, San Francisco,
CA), transferred to a Hybond-N nylon membrane (Amersham,
Arlington Heights, IL), and analyzed by hybridization with
32P-labeled oligonucleotide probes derived from
CYP2A, CYP2B, or CYP2G coding sequences (Yamano et al., 1989
; Sheng et
al., 2000
; Su et al., 2000
). Positive fragments were gel-purified and
subcloned into pCR-Script vector. The P1 phage DNA and subcloned
fragments were analyzed by sequencing with numerous primers derived
from CYP2A6, CYP2A7P1, CYP2A7P2,
CYP2A13, CYP2B6, CYP2B7,
CYP2F1, CYP2GP1, and CYP2GP2 to detect
the presence of these genes and to identify potential
CYP2A7P2 exons 1 to 5 sequences [see Nelson et al. (1996)
for sources of P450 sequences]. DNA sequence analysis was performed with an automated DNA sequencer from Applied Biosystems, model 373A (Foster City, CA), at the Molecular Genetics Core Facility of the
Wadsworth Center.
 |
Results and Discussion |
As shown in Fig. 1, pS34 contained a
15-kbp insert comprising exons 7 through 9 of CYP2GP1 (Sheng
et al., 2000
) and exons 6 through 9 of a previously unidentified
CYP2A gene (CYP2A7P2); the two fragments were
arranged in outward opposite directions with about 8 kbp of intervening
sequence. The P1G1 clone contained a full-length CYP2GP1
gene (with the 2.4-kbp deletion between exons 3 and 7) and exons 6 through 9 of CYP2A7P2. The distance between
CYP2A7P2 and CYP2GP1 was initially estimated
using long distance PCR (data not shown), and was confirmed by the
length and partial sequence of a PstI subclone of P1G1
(P1G1-SUBI; Fig. 1). All exon sequences were determined at least twice
and from both orientations. CYP2A7P2 introns and over 50%
of CYP2GP1 introns were sequenced at least once. Intron
sizes were confirmed by PCR with adjacent exon primers.

View larger version (8K):
[in this window]
[in a new window]
|
Fig. 1.
Schematic representation of the structure of
CYP2A7P2 with close linkage to CYP2GP1.
The linked CYP2A7P2 and CYP2GP1 sequences
were identified in two different genomic clones, S34 and P1G1, which
were isolated from two independent genomic libraries. Putative exons
(solid box for CYP2GP1 and open box for
CYP2A7P2) were identified on the basis of sequence
homology to known CYP2A and CYP2G genes,
respectively, and are numbered on top. Solid lines indicate introns and
other noncoding sequences. The restriction sites used for subcloning
are also indicated. The cloned insert in S34 is about 15 kbp, and that
in P1G1 is approximately 85 kbp.
|
|
Available sequence upstream of CYP2A7P2 exon 6 in the S34
clone, about 1.8 kbp, was also determined, but sequence with homology to known CYP2A exons 1 through 5 was not detected. In the
P1G1 clone, the sequence upstream of CYP2A7P2 exon 6 was at
least 4 kbp (Fig. 1). However, CYP2A-like exons 1 through 5 were not identified by sequence analysis of a subcloned 9-kbp
BstEII fragment (P1G1-SUBII; Fig. 1) containing
CYP2A7P2, with about 4 kbp upstream of exon 6, or by direct
sequencing and PCR analyses of the P1G1 clone with numerous primers
derived from known CYP2A exons 1 through 5. Since the sizes
of intron 5 in CYP2A6 and CYP2A13 are 875 bp and
1719 bp, respectively (Fernandez-Salguero et al., 1995
), these results
suggest that the putative exons 1 through 5 of CYP2A7P2 have
been translocated elsewhere or deleted.
The nucleotide sequence of putative CYP2A7P2 coding regions
(exons 6-9) and partial intron sequences and intron sizes are shown in
Fig. 2. Exon-intron junctions are
predicted according to alignment with other human CYP2A
sequences. CYP2A7P2 intron sizes are very similar to those
reported for CYP2A6 and CYP2A13 (Fernandez-Salguero et al., 1995
). The deduced amino acid sequence (exons 6-9) of CYP2A7P2 is 81.1 to 86.6% identical to
corresponding sequences in the other human CYP2As (Fig.
3 and Table
1). Interestingly, the putative coding
sequence (exons 1-5) of CYP2A7P1 is 84.1 to 87.7%
identical to those of the known human CYP2A proteins (Table 1). In
contrast, the sequence identities among CYP2A6, CYP2A7, and CYP2A13 are
greater than 90.5%. Thus, it appears that CYP2A7P1 and
CYP2A7P2 may be derived from the same ancestral gene, which was corrupted possibly by the insertion at intron 5 of a large genomic
fragment translocated from another part of the genome.

View larger version (68K):
[in this window]
[in a new window]
|
Fig. 2.
Nucleotide sequence of human CYP2A7P2.
The putative exon sequences and exon-intron junctions of
CYP2A7P2 were derived from the S34 human genomic clone.
Identical sequences were also found in the P1G1 clone. Exon sequences
are shown in upper case whereas partial intron and 3'-flanking
sequences are shown in lower case with the size of the introns
indicated in parentheses. Coding nucleotides are numbered to the right
of each line. The dotted lines represent abbreviated sequences. The
sequence in brackets is from the CYP2A6 gene, which is
complementary to the R4 reverse primer sequence used for genotyping
CYP2A6 alleles (Fernandez-Salguero et al., 1995 ). The
putative translation termination codon is underlined, whereas the
conserved polyadenylation signal is double-underlined. Sequences
corresponding to putative exons 1 through 5 of a CYP2A
gene were not detected in either clone. -, a gap introduced in the R4
sequence to obtain optimal alignment. This sequence has been assigned
GenBank accession nos. AF296253 through AF296256.
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 1
Identity in deduced amino acid sequences among different human CYP2A
genes and alleles
Amino acid sequences were aligned as shown in Fig. 3. The number of
positions with identical amino acids for individual pairs, which was
calculated using CLUSTAL with the gap penalty parameter set to 0, is
shown as a percentage of the full-length proteins (494 residues) unless
indicated otherwise.
|
|
Nonsense mutations and frame-shift deletions are present in the
putative coding sequences of CYP2A7P1 (Fernandez-Salguero et
al., 1995
), consistent with its designation as a pseudogene; however,
similar loss-of-function mutations were not found in the putative
CYP2A7P2 coding sequence. Nevertheless, the deduced amino
acid sequence of CYP2A7P2 contained a deletion of a conserved Glu
residue at position 330 and an unusually cysteine-rich segment at the
carboxyl terminus, which may significantly alter the structure of the
putative protein (Fig. 3). It is not known whether a second copy of
CYP2A7P2 exists; however, the two copies of
CYP2A7P1, CYP2A7PT and CYP2A7PC
(Fernandez-Salguero et al., 1995
), may have been generated by gene
duplication after separation of CYP2A7P1 from
CYP2A7P2.
The identification of CYP2A7P2 should facilitate future
efforts to study genomic polymorphisms in other human CYP2A
genes. However, the presence of CYP2A7P2 should not affect
the results of recent CYP2A6 genotyping experiments (e.g.,
Pianezza et al., 1998
) using the primers designed in a previous study
(Fernandez-Salguero et al., 1995
). The 3'-primer (R4 in
Fernandez-Salguero et al., 1995
) used in the first round of PCR has
three nucleotide differences from the corresponding sequence in
CYP2A7P2, one of them at the 3'-terminus, as well as a
single nucleotide gap (Fig. 2). Furthermore, nonspecific products
generated from CYP2A7P2 are unlikely to be reamplified by
the nested PCR with primers derived from exon 3 of CYP2A6
(Fernandez-Salguero et al., 1995
). More recent CYP2A6 genotyping protocols (Oscarson et al., 1998
; Kitagawa et al., 1999
) use
primers derived from the 5'-half of the gene and thus would not detect
the CYP2A7P2 sequence.
A full-length CYP2B7 gene, but not the functional
CYP2B6 gene, was also detected in the P1G1 clone by using
Southern blot analysis with oligonucleotide probes and by genomic PCR
with primers derived from CYP2B6 and CYP2B7 cDNAs (data not shown).
Sequence analysis of the P1G1 clone and a BstEII subclone
containing most of the CYP2B7 gene revealed that the coding
region was 100% identical to the reported CYP2B7 cDNA sequence, with a
nonsense mutation in exon 7 of the gene sequence, as was previously
found in a cDNA clone (Yamano et al., 1989
). This result confirms that
CYP2B7 is likely a pseudogene. Furthermore, the
identification of CYP2B7 on P1G1 clone maps
CYP2GP1 and CYP2A7P2 to the middle of the
CYP2A-2B-2F gene cluster (Hoffman et al., 1995
), close to
CYP2B7. The precise distance and orientation of the
CYP2B7 gene with regard to the CYP2GP1 and
CYP2A7P2 genes are not yet known; attempts to map this
physical distance were unsuccessful using long-distance PCR. However,
the distance between the CYP2B7 gene and the
CYP2GP1-CYP2A7P2 fragment may be less than 30 kbp
since the entire insert in P1G1 was about 85 kbp, the length of the
CYP2B7 gene is estimated to be about 25 kbp (based on the
size of the highly homologous CYP2B6 gene, GenBank accession
no. AC023172), and the combined length of the characterized sequences
in the CYP2GP1-CYP2A7P2 region was at least 30 kbp (Fig. 1 and Sheng et al., 2000
). This conclusion is consistent with
the lack of detection of other CYP2 genes known to be near
CYP2B7 on chromosome 19 (but at least about 35 kbp away from
CYP2B7) in the P1G1 clone, including CYP2A6,
CYP2A7, CYP2A7P1, CYP2A13,
CYP2B6, and CYP2F1 (data not shown).
CYP2GP2 was not detected in the P1G1 clone, either. It is
not known whether CYP2B7P, which was reported to be about 20 kbp from CYP2B7 (Fernandez-Salguero et al., 1995
), is
present in the P1G1 clone because details of the structure of this
pseudogene are not available. Finally, although a functional human
CYP2G gene or allele (Sheng et al., 2000
) has yet to be
identified, the present finding that CYP2GP1 is located in
the middle of the CYP2A-2B-2F gene cluster lends further
support to the proposed evolutionary relationship among these four
CYP2 subfamilies, which share the highest amino acid
sequence similarities and are believed to have evolved from a single
ancestral CYP gene (Hoffman et al., 1995
).
We gratefully acknowledge the use of the Molecular Genetics Core
facility of the Wadsworth Center.
Received August 31, 2000; accepted October 5, 2000.
This work was supported in part by Public Health Service Grants
DC-02640 and ES-07462 from the National Institutes of Health.
Abbreviations used are:
CYP, cytochrome P450;
kbp, kilo base pair;
PCR, polymerase chain reaction.