![]() |
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Molecular Pharmacology and Experimental Therapeutics (N.A.K., P.A., L.L.P., I.M., J.-S.R., J.A.G., O.E.S., R.M.W., M.M.A.) and Department of Biochemistry and Molecular Biology (B.W.E., E.D.W.), Mayo Clinic, Rochester, Minnesota; and Department of Biochemistry, Case Western Reserve University School of Medicine, Cleveland, Ohio (V.Y.)
(Received February 12, 2008; Accepted June 11, 2008)
| Abstract |
|---|
|
|
|---|
|
| Materials and Methods |
|---|
|
|
|---|
DCK and CMPK Gene Resequencing. Each of the 240 DNA samples studied was used to perform polymerase chain reaction (PCR) amplifications of all DCK and CMPK exons, splice junctions, and a portion of the 5'-FRs for each gene. This study was powered to make it possible to reliably detect variant alleles with minor allele frequency (MAF) of 2% in the 120 alleles required for studies in each of the four ethnic groups. For CMPK, a region identified by rVISTA (Loots et al., 2002
), located upstream of the site of transcription initiation, contained a cluster of putative transcription factor binding sites and showed high sequence homology to orthologous sequences in the dog and mouse genomes. This region was also resequenced. Primer sequences are listed in Supplemental Tables 1 and 2. Amplification reactions were performed with FastStart Taq DNA polymerase, and amplicons were sequenced on both strands in the Mayo Molecular Biology Core Facility with an ABI 3730X1 DNA sequencer using BigDye dye terminator sequencing chemistry (PerkinElmer Life and Analytical Sciences, Boston, MA). To exclude PCR-related artifacts, independent amplifications were performed for any SNP observed in only a single DNA sample or for any sample with an ambiguous chromatogram. The sequencing chromatograms were analyzed using Mutation Surveyor version 2.41, PolyPhred 3.0 (Maring et al., 2005
), and Consed 8.0 (Maring et al., 2005
). GenBank accession numbers for the DCK reference sequences used in these experiments were NT_006216
[GenBank]
.16 and NM_000788
[GenBank]
.1 and for CMPK were NT_032977
[GenBank]
.7 and NM_016308
[GenBank]
.1.
DCK and CMPK Expression Constructs and Transient Expression. The wild-type (WT) DNA open reading frame (ORF) sequences for both genes were cloned into the eukaryotic expression vector pCR3.1 (Invitrogen, Carlsbad, CA). These expression constructs were then used to perform site-directed mutagenesis with the QuikChange kit (Stratagene, La Jolla, CA). Circular PCR was used to create variant allozyme constructs. The sequences of primers used to perform site-directed mutagenesis are also listed in Supplemental Tables 1 and 2. The sequences of all inserts were confirmed by sequencing both DNA strands. COS-1 cells were then transfected with expression constructs encoding WT and variant DCK and CMPK allozymes, as well as an "empty" vector that lacked an insert as a control, using TransFast reagent (Promega, Madison, WI) at a charge ratio of 1:2. Specifically, 7 µg of construct DNA was cotransfected with 7 µg of pSV-β-galactosidase DNA (Promega) to correct for possible variation in transfection efficiency. The coefficient of variation for the cotransfected β-galactosidase averaged 8.2%. After 48 h, the cells were harvested in 50 mM Tris-HCl (pH 7.6), 100 mM KCl, 1 mM MgCl2, and 2 mM dithiothreitol for DCK or with 50 mM Tris-HCl (pH 8.0), 5 mM MgCl2, and 10 mM dithiothreitol for cells transfected with CMPK constructs. The cells were then homogenized with a Polytron homogenizer (Brinkmann Instruments, Westbury, NY), homogenates were centrifuged at 100,000g for 1 h, and supernatant preparations were stored at –80°C for use in the functional genomic studies.
Enzyme Assays and Substrate Kinetics. DCK catalyzes the formation of 2',2'-difluorodeoxycytidine 5'-monophosphate (gemcitabine monophosphate) from gemcitabine. DCK activity was measured using a modification of the assay of Usova and Eriksson (2002
) with gemcitabine (Eli Lilly & Co., Indianapolis, IN) as a substrate. Our assay measured the product of the reaction using the extraction procedure and high-performance liquid chromatography separation conditions described by Ruiz van Haperen et al. (1994
). CMPK catalyzes the formation of 2',2'-difluorodeoxycytidine 5'-diphosphate (gemcitabine diphosphate) from gemcitabine monophosphate. A modification of the assay described by van Rompay et al. (1999
) was used to measure CMPK activity. This assay measured the depletion of gemcitabine monophosphate (Eli Lilly & Co.) during the reaction and used the extraction procedure and high-performance liquid chromatography separation described by Ruiz van Haperen et al. (1994
).
Western Blot Analysis. Quantitative Western blot analyses were performed with recombinant DCK and CMPK allozymes after correction for transfection efficiency on the basis of the cotransfected β-galactosidase activity measured using the Promega β-galactosidase enzyme assay system. For DCK, these assays used rabbit polyclonal antibody generated by Cocalico Biologicals (Reamstown, PA) that was directed against the C-terminal 59 to 80 amino acids of DCK. A rabbit polyclonal antibody to purified recombinant CMPK was kindly provided by Dr. Yung-Chi Cheng, Yale University School of Medicine (New Haven, CT). To perform quantitative Western blot analysis, COS-1 cell cytosol was loaded on 12% Tris-HCl acrylamide gels on the basis of the cotransfected β-galactosidase activity, and proteins were separated by SDS-polyacrylamide gel electrophoresis before transfer to polyvinylidene difluoride membranes (BioRad, Hercules, CA). Immunoreactive proteins were detected using the ECL Western Blotting System (Amersham Pharmacia, Piscataway, NJ). The IPLab Gel H (Biosystemetica, Plymouth, UK) system and the National Institutes of Health image program (http://rsb.info.nih.gov/nih-image) were used to quantify immunoreactive proteins.
Rabbit Reticulocyte Lysate Studies. Radioactively labeled WT and variant CMPK allozyme proteins were synthesized using the TNT coupled rabbit reticulocyte lysate system (Promega, Madison, WI) as described by Wang et al. (2003
). CMPK has two potential in-frame ATG translation initiation codons, resulting in an ORF encoding either a 228- or a 196-amino acid protein (Van Rompay et al., 1999
).
Data Analysis. Values for
,
, and Tajima's D were calculated as described by Tajima (1989
). D', a measure of linkage disequilibrium that is independent of allele frequency, was calculated as described by Hartl and Clark (2000
) and Hendrick (2000
). Haplotype analysis was performed as described by Schaid et al. (2002
) using the expectation maximization algorithm. Mean protein and Km values were compared using Student's t test. To construct models of the DCK and CMPK variants, the altered amino acids were computationally substituted within the 1.9-Å resolution crystal structure of the human DCK dimer bound to deoxycytidine and ADP (Protein Data Bank accession code 1P60) (Sabini et al., 2003
) and the 2.1-Å resolution crystal structure of the human CMPK monomer (Protein Data Bank accession code 1TEV
[PDB]
) (Segura-Peña et al., 2004
) using the interactive graphics program O (Jones et al., 1991
). Molecular figures were prepared with MolScript (Kraulis, 1991
) and Raster3D (Merritt and Bacon, 1997
).
| Results |
|---|
|
|
|---|
|
We also determined "nucleotide diversity," a quantitative measure of genetic variation, adjusted for allele frequency, in all four ethnic groups by calculating
, a population mutation measure that is theoretically equal to the neutral mutation variable, and
, the average heterozygosity per site (Maring et al., 2005
). Tajima's D values were also estimated as a test of the neutral mutation hypothesis (Table 2). For both genes, as anticipated, samples from AA subjects displayed the greatest nucleotide diversity. Although negative values for Tajima's D indicate a departure from neutrality, none of the values listed in Table 2 was statistically significant (Table 2).
|
Haplotype and Linkage Disequilibrium Analysis. We also performed population-specific linkage disequilibrium and haplotype analysis for both genes. Haplotype designations were based on the encoded amino acid sequence of the allozyme, with the WT sequence designated as *1. For example, for DCK, the *2, *3, and *4 designations were used for sequences that encoded Val24, Gly119, and Ser122, respectively (Table 3). Alleles encoding DCK Val24, Gly119, and Ser122 as well as Val24/Ser122 and Val24/Gly119/Ser122 variant allozymes were also present. Therefore, we designated these multiple amino acid variant haplotypes as *5 and *6, respectively. *1A was observed with high frequency in all four populations, whereas the *4 haplotype was observed in only AA and HCA subjects. The *2, *3, *5, and *6 haplotypes were observed only in AA DNA (Table 3). For CMPK, the *2, *3, and *4 designations were used for sequences that encoded His48, Lys75, and Ser83, respectively. Combinations of these cSNPs were not observed or inferred in any of the DNA samples. CMPK haplotypes *1B, *1C, *1H, *1L, and *2A were observed in all populations, whereas the *3 haplotype was only seen in MA subjects (Table 3). We also calculated population-specific linkage disequilibrium based on pairwise D' values for all DCK and CMPK SNPs (data not shown).
|
Functional Genomic Studies. Expression constructs for the WT and variant allozymes for both DCK and CMPK were created to determine the effect of nonsynonymous SNPs on level of protein, level of enzyme activity, and substrate kinetics. Because our haplotype analysis had shown the presence of multiple variant DCK alleles, which encoded Val24/Ser122 and Val24/Gly119/Ser122, we also created expression constructs for those allozymes. Although the DCK double variants, Gly119/Ser122 and Val24/Gly119, were not observed in our samples, we also created those variants to help us understand the effect of combinations of the polymorphisms on enzyme activity. However, before expressing the CMPK constructs, one issue required clarification. CMPK contains two in-frame ATGs at the 5'-end of the gene, creating confusion with regard to which might be used in vivo as the translation initiation codon. To address that issue before studying the results obtained with our expression constructs, we created four CMPK "test" expression constructs to determine which ATG might be used to initiate translation. These additional constructs included a construct with a mutated initial ATG (Mut ATG1/WT ATG2), a construct with the second ATG mutated (WT ATG1/Mut ATG2), a construct with a perfect Kozak sequence (Kozak, 1986
) for the initial ATG and the wild-type second ATG (Kozak ATG1/WT ATG2), and a final construct with a perfect Kozak sequence surrounding the initial ATG and a mutated second ATG (Kozak ATG1/Mut ATG2) (Fig. 2). The sequences of the ATG in these constructs are listed in Table 4. The top panel in Fig. 2 shows this situation diagrammatically, with the location of the two in-frame ATGs, separated by 99 nucleotides (33 putative codons), indicated. When these constructs were expressed in COS-1 cells, the 196-amino acid sequence was always produced in the presence of WT ATG2, but the 228-amino acid sequence was only seen if a perfect Kozak sequence was created for the initial (Kozak ATG1/WT ATG2), putative, translational initiation codon (Fig. 2B). Identical results were observed during rabbit reticulocyte lysate expression, with the 196-amino acid recombinant protein encoded by only the ORF that began with naturally occurring "ATG2" being expressed (Fig. 2C). These results indicated clearly that the second in-frame ATG (ATG2) was used as the translation initiation codon, so all subsequent data relate to the 196-amino acid CMPK protein.
|
|
The DCK and CMPK expression constructs were transfected into COS-1 cells, and the cells were cotransfected with β-galactosidase to make it possible to correct for transfection efficiency. Immunoreactive protein levels were then measured by quantitative Western blot analysis, and levels of enzyme activity were measured in the same samples. Each transfection was performed in triplicate and was repeated three times. Levels of DCK and CMPK immunoreactive protein, as well as the enzyme activity for all of these transfections, are shown in Fig. 3. Because DCK and CMPK are both widely expressed, it was necessary to include an empty vector control in all transfections, and the results for those samples are shown in the figure for levels of enzyme activity. These empty vector control samples displayed an average 8.1% of WT construct values for DCK and 18.7% for CMPK. However, no evidence of endogenous immunoreactive protein for either DCK or CMPK was detected when the quantitative Western blot studies were performed.
|
Immunoreactive DCK protein levels showed significant differences compared with those for the WT protein, specifically Val24 (67 ± 7%), Gly119 (51 ± 7%), Ser122 (75 ± 7%), Val24/Gly119 (50 ± 7%), and the combination of Val24/Ser122 (60 ± 3%) (Fig. 3A). The "triple" variant (*6), which included Val24/Gly119/Ser122 (99 ± 7%), and the combination Gly119/Ser122 (85 ± 8%) did not show significant differences from the WT protein. DCK allozyme enzyme activity levels compared with those for the WT were Val24 (70 ± 11%), Gly119 (42 ± 7%), Ser122 (66 ± 15%), Val24/Gly119/Ser122 (105 ± 7%), Gly119/Ser122 (82 ± 8%), Val24/Gly119 (44 ± 4%), and Val24/Ser122 (32 ± 5%). These results for activity, with gemcitabine as a substrate, correlated significantly with corresponding protein levels (Rp = 0.89, P = 0.0004) (Fig. 3B). Recombinant CMPK protein levels did not differ significantly from those for the WT for His48 (106 ± 9%) or Lys75 (88 ± 10%). However, the protein level for CMPK Ser83 was significantly lower than that for the WT (68 ± 11%) (Fig. 3C). Once again, levels of protein and enzyme activity were very similar (Rp = 0.82), but, in this case, the correlation was not significant (P = 0.095).
Because of the possibility that some of the differences in levels of enzyme activity that we had observed might have been due to alterations in substrate kinetics as a result of changes in the encoded amino acids, substrate kinetic studies were also performed. For those experiments, gemcitabine substrate concentrations for the DCK studies ranged from 1.25 to 75 µM and gemcitabine monophosphate concentrations for the CMPK studies ranged from 50 to 2000 µM. The apparent Km values observed are listed in Table 5. Only the DCK Val24/Ser122 apparent Km value differed significantly from that for the WT of the enzyme being studied.
|
DCK and CMPK Structural Models. High resolution X-ray crystal structures of human DCK and CMPK (Sabini et al., 2003
; Segura-Peña et al., 2004
) were used as starting scaffolds to map variant amino acids encoded by the naturally occurring nonsynonymous cSNPs in DCK and CMPK. We then assessed the computationally substituted variant residues for compatibility with the wild-type native protein structures to determine whether our observations with recombinant protein were plausible within the context of the structures of these enzymes (Fig. 4). The DCK monomer structure has a central β-sheet surrounded by helices. Ile24 is located in a β-strand which is first in the amino acid sequence, but physically in the middle of the β-sheet, and the Ile24 side chain is buried in a hydrophobic environment. Because the Val24 variant amino acid is smaller than Ile24, it could be accommodated structurally, but a small void resulting from the change in amino acid could result in a locally destabilizing conformational change. Ala119Gly is located in a loop near the N-terminus of a strand next to the Ile24 strand in the central β-sheet and far from the active site (
25 Å). The Ala119 carbonyl oxygen atom forms a hydrogen bond with the Arg20 side chain and tethers the Ala119 loop to the Ile24-containing strand. The Ala119 amide nitrogen atom also interacts with the carbonyl oxygen of Leu116 to stabilize the loop conformation. In addition, the Ala119 side chain is partially buried, as it packs against Pro122. The smaller Gly119 variant could be easily accommodated in the structure sterically, but Gly119 might introduce an increase in local flexibility because of its lack of a bulky side chain, altering the local conformation so that interactions between this loop and adjacent β-strands would be destabilized, resulting in the striking decrease in enzyme protein that we observed (Fig. 3A). DCK Pro122Ser is located at the N-terminus of the strand next to the Ile24 strand in the central β-sheet and distant from the active site (
22 Å). The Pro122 side chain is mostly buried in a hydrophobic environment formed by several residues, one of which is Ala119. In addition, the main chain Pro122 carbonyl oxygen atom forms a long hydrogen bond with the Lys22 main chain amide nitrogen (located in the adjacent Ile24 strand). The smaller Ser122 could be easily accommodated sterically, but it could introduce increased local flexibility as a result of loss of the rigid proline ring. This increased flexibility might alter the local conformation so that interactions between the Pro122 strand and the adjacent Ala119 loop and Ile24 strand would be destabilized. This analysis provides structural explanations for how the single DCK variants may result in the observed decreased protein quantity and activity. Unfortunately, it is not possible to reliably model the specific altered conformations of the single variants to provide a basis for a similar structural analysis of the double or triple variants.
|
All three CMPK variant residues are located far from the active site (Fig. 4B). However, Gln48His alters a highly exposed residue that forms a hydrogen bond with Arg74. His48 would be easily accommodated sterically, and it would allow the hydrogen bond to be retained. Glu75Lys affects an exposed residue that interacts only with a water molecule, so the Lys substitution would be accommodated easily. Asn83Ser also alters an exposed residue, and the Ser substitution would be easily accommodated. The structural predictions for these CMPK residues are compatible with the observed functional effects of all three SNPs, indicating modest to no consequences after variant substitution (Fig. 3C). Overall, the three DCK variant substitutions were less compatible with the native protein structure than were the three CMPK variants, results consistent with our experimental observations that the decreases in protein and activity levels for the DCK variants were more significant than those for the CMPK variants.
| Discussion |
|---|
|
|
|---|
In the present study, we set out to perform comprehensive studies of DCK and CMPK pharmacogenomics. We began by resequencing both genes using 240 DNA samples from four different ethnic groups. We observed a total of 28 SNPs in DCK, 16 of which were novel. Joerger et al. (2006
) had previously discovered 6 DCK SNPs in their resequencing study of a mixed ethnic cohort of healthy volunteers (–243G>T, –135G>C, 261G>A, 364C>T, 727A>C, and IVS 6 + 41T>A). We observed two of those SNPs, 364C>T and IVS 6 + 41T>A, but not the other four in our samples. Recently, Lamba et al. (2007
) studied polymorphisms in DCK, using genomic DNA from the 30 Centre d'Etude du Polymorphisme Humain (CA) and 30 Yoruban population (African) trios (90 subjects of each ethnicity in groups of three that included related parent-child trios) that had been used in the HapMap Project. Those investigators resequenced DCK exons and all of the first intron plus an additional 300 base pairs of the 5'-FR beyond what we resequenced. In the studies reported here, we have extended DCK resequencing to include previously unreported ethnic populations (HCA and MA) and performed haplotype analysis, genetic diversity determination, neutrality testing, and linkage disequilibrium analysis for all four (AA, CA, HCA, and MA) ethnic groups.
To determine the effect of nonsynonymous SNPs on protein and activity levels, eight DCK expression constructs were created, four of which included more than one SNP, three of which were double variants (Val24/Gly119, Val24/Ser122 and Gly119/Ser122), and one of which was a triple variant (Val24/Gly119/Ser122). Levels of immunoreactive protein and allozyme activity were determined when these constructs were used to transfect COS-1 cells. The common DCK 24A>G (Ile24Val) and DCK 122C>T (Pro122Ser) polymorphisms resulted in moderate decreases in levels of both activity and protein. The correlation between levels of immunoreactive protein and allozyme activity for all DCK variant allozymes was highly significant (Fig. 3B). It was of interest that the DCK "compound variants" with two or three of the naturally occurring variant amino acids displayed less striking alterations in activity and protein levels than did those with only a single variant amino acid (Fig. 3A). The mechanism(s) responsible will have to be the subject of future studies, but these results clearly indicate that the functional effects of nonsynonymous SNPs cannot be assumed to be purely "additive." Our substrate kinetics analyses indicated that, except for the compound variant Val24/Ser122, no variant DCK allozyme had a Km value significantly different from that of the WT. Our results contrasted with those of Lamba et al. (2007
) who reported that not only were "single variant" DCK allozymes expressed at levels equivalent to those of the WT but that Gly119 and Ser122 had significantly lower Km values than the WT. However, no direct comparison of those results with our data can be made accurately, as mammalian COS-1 cells were used in our studies for the expression of DCK allozymes to ensure appropriate post-translational modification of expressed proteins as well as the presence of mammalian protein degradation systems, as opposed to the bacteria used by Lamba et al. In addition, we used the pyrimidine analog gemcitabine (2',2'-difluorodeoxycytidine) as the substrate in our deoxycytidine kinase kinetics studies, as opposed to the purine analog cladribine (2-chloro-2'-deoxyadenosine) used by Lamba et al.
The CMPK cDNA presented an interesting challenge as it has two possible in-frame translational start codons, one encoding a 228- and the other a 196-amino acid protein, assuming that the N-terminal methionine is retained (Liou et al., 2002
). To determine which ATG was used in vitro, we mutated each ATG in turn. The results shown in Fig. 2 indicated that the 196-residue protein was that expressed in mammalian cells, consistent with previous reports (Bucurenci et al., 1996
; Liou et al., 2002
). Therefore, this 196-amino acid protein was used in our recombinant variant allozyme studies. We identified 17 novel CMPK SNPs among the 28 observed. Functional genomic studies of variant allozymes after COS-1 cell transfection showed that protein levels for the CMPK His48 and Lys75 allozymes were similar to that for the WT allozyme. However, activity and protein levels for CMPK Ser83 were decreased to 78 and 67% of those for the WT, respectively (p < 0.05) (Fig. 3). Our apparent Km value of 250 µM for recombinant WT CMPK with gemcitabine monophosphate as a substrate can be compared with a value of 450 µM that Van Rompay et al. (1999
) observed using the same substrate. Analysis of the CMPK structure was compatible with our observation that these CMPK variants failed to alter apparent Km values because they were far from the active site.
In summary, we have performed comprehensive pharmacogenomic studies of genes encoding two enzymes required for the metabolic activation of antineoplastic cytidine antimetabolites. We identified a series of novel SNPs for these two genes in four ethnic groups by resequencing these genes, followed by comprehensive functional genomic characterization of variant allozymes. These observations may now contribute to translational pharmacogenomic studies of gemcitabine, AraC, and other cytidine antimetabolites.
| Acknowledgments |
|---|
| Footnotes |
|---|
N.A.K. and P.A. contributed equally to this work.
Article, publication date, and citation information can be found at http://dmd.aspetjournals.org.
ABBREVIATIONS: dFdC, 2'-deoxy-2',2'-difluorocytidine, gemcitabine; AraC, cytosine arabinoside; DCK, deoxycytidine kinase; CMPK, cytidine monophosphate kinase; dFdCDP, gemcitabine diphosphate; dFdCTP, gemcitabine triphosphate; AA, African-American; CA, Caucasian-American; HCA, Han Chinese-American; MA, Mexican-American; SNP, single nucleotide polymorphism; cSNP, coding single nucleotide polymorphism; PCR, polymerase chain reaction; 5'-FR, 5'-flanking region; MAF, minor allele frequency; ORF, open reading frame; WT, wild type; kb, kilobase(s).
The online version of this article (available at http://dmd.aspetjournals.org) contains supplemental material. ![]()
1 Current affiliation: University of Gazi, Faculty of Pharmacy, Department of Toxicology, Ankara, Turkey. ![]()
2 On leave of absence from Istanbul University, Pharmacy Faculty, Department of Biochemistry, Istanbul, Turkey. ![]()
3 Current affiliation: Inha University Hospital, Incheon, South Korea. ![]()
4 Current affiliation: Roswell Park Cancer Institute, Buffalo, New York. ![]()
Address correspondence to: Dr. Matthew M. Ames, Division of Medical Oncology, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905. E-mail: ames.matthew{at}mayo.edu
| References |
|---|
|
|
|---|
rzu O, et al. (1996) CMP kinase from Escherichia coli is structurally related to other nucleoside monophosphate kinases. J Biol Chem 271: 2856–2862.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||