ACLARA BioSciences, Inc., Mountain View, California (M.T.C., M.P.);
and Protogene Laboratories, Inc., Menlo Park, California (D.D.,
F.F., L.S., T.B.)
It has become widely accepted that individual genetic variation is
a prime determinant in both disease susceptibility and toxic response
to therapeutic agents and xenobiotics. Emerging genetic sequence data
and phenotype association studies are expected to enable disease risk
prediction and guide subsequent therapeutic approaches in individual
cases. However, making a good match between an individual genetic
profile, disease risk prediction, and appropriate therapeutic
intervention will require genotyping many polymorphic sites in large
numbers of genes or single nucleotide polymorphism sites
throughout the genome. Additionally, each polymorphism will have to be
associated with a phenotype. Presumably, a composite phenotype may be
predicted by integrating anticipated contributions from each
polymorphism contributing to the complex genotype. Methods for
executing such large-scale genotyping studies are rapidly evolving and
becoming available. DNA microarray technology applied in
hybridization-based genotyping assays is particularly well suited to
respond to the accelerating pace of polymorphism discovery and the
associated demand for highly parallel genotyping capability.
 |
Background |
Nearly forty years ago, a dedicated cadre of
scientists observed that human response to xenobiotics is variable, and
they initiated lines of research that now form the foundations of
pharmacogenetics. During the last 10 years, in response to the Human
Genome Project, pharmacogenetics has undergone a revolution. It has now
expanded into the broader discipline of pharmacogenomics.
Pharmacogenomics encompasses the study of functional variability not
only in drug transport and metabolism but also in every aspect of human
genetics that affects drug disposition and response. This meeting's
agenda clearly reflects the impact the Human Genome Project is having on efforts to characterize variability in human response to xenobiotics.
During the past 10 years, the Human Genome Project has generated a
second revolution, one of a technical nature. Its product is the wide
variety of technological innovations that support pharmacogenomic
research. Pharmacogenetics at its outset primarily focused on
characterizing phenotypic drug responses in intact organisms using
biochemical or physical endpoints (Kalow, 1997
). Today pharmacogenetics
has evolved into largely molecular characterization of drug responses
at the cellular level. Our new capability to broadly define human
variability at the genetic level and to associate genetic variability
with functional diversity has largely been enabled by this recent
technical revolution, including large scale functional genomic,
genetic, and proteomic analyses.
One seminal invention that has had a particularly transforming effect
on both genetic analysis and functional gene expression profiling is
the DNA microarray (Pease et al., 1994
; Southern et al., 1994
; Brown
and Botstein, 1999
). Although there are several methods for assembling
DNA arrays, they all have a common origin in the DNA blotting methods
pioneered by Southern in the early 1970s (Southern, 1975
). The common
elements of this approach to nucleic acid analysis are an immobilized
or tethered nucleic acid (DNA or RNA) species that is hybridized with a
second, solution phase DNA or RNA species. The sequence of the unknown
"target" nucleic acid is discerned by decoding its complementarity
with the nucleic acid "probe" of known sequence. Whether the probe or target nucleic acid is immobilized varies among the different array
methods, but most commonly, the probe is tethered to a surface, and the
target to be analyzed is in solution. This sequence analysis method is
broadly useful. It has been applied to detecting polymorphic forms of
consensus sequences, detecting and quantitating RNA transcripts, and
scoring genomic samples for specific, known polymorphic sequences, among other examples (Cronin et al., 1996
; Hacia et al., 1996
; Lockhart
et al., 1996
).
DNA microarrays are innovative in that they apply the principle of
hybridization analysis on a physically miniaturized scale while at the
same time vastly expanding parallel sampling and analysis capability.
Micro chemical synthesis and robotic microfluidic delivery techniques
permit manufacture of hybridization arrays capable of performing
hundreds of thousands of parallel analyses on a single sample in a
single assay. This makes it possible to scan an entire genome for known
polymorphic variants or to query a cell type for every messenger RNA
transcript expressed during the course of an experiment (DeRisi et al.,
1997
; Heller et al., 1997
; Wodicka et al., 1997
; Wang et al., 1998
).
Measurements of this scope are yielding vast databases of information
that provide high-resolution snapshots of cellular activity or
comprehensive images of genetic complexity for an entire tissue or
organism. Direct comparison of parallel tests on many organisms or
tissues permits population-wide genetic complexity to be measured or
population-wide polymorphism distributions and allele frequencies to be assessed.
Designing hybridization arrays capable of yielding large quantities of
high quality genetic information is as big a challenge as analyzing the
complex data derived from these arrays. The sheer magnitude of the
probe set size in array-based assays makes traditional probe design and
quality control strategies ineffective. Approaches to probe and primer
design developed for
PCR1 primer selection
have not proven to be very effective for selecting sets of tethered
probes for hybridization array-based assays. The ideal hybridization
array probe set is one in which a single hybridization experiment
provides maximum data quantity of the highest possible quality.
Achieving this goal requires hybridization-based assays to be optimized
and validated just as any other clinical laboratory test must be. To
develop a functional microarray genotyping test, several components
must be optimized and integrated, and then performance must be
validated. The components include 1) the array probe set, which is
selected based on the information needed from the hybridization test;
2) the target preparation method, which includes isolation from the
sample source, amplification, and labeling; 3) a detection method that
must be coordinated with the target labeling strategy chosen; and
finally 4) an automated data analysis method.
Current array technologies are not generally compatible with efficient
assay performance optimization and validation due to high economic
penalties associated with manufacturing iterative and customized array
designs. A DNA microarray format is described here that overcomes this
limitation by combining in situ oligonucleotide synthesis with
on-the-fly design capability enabled by surface tension chemistry
localization. The standard, high-coupling yield phosphoramidite
synthesis chemistry it uses supports oligonucleotide manufacturing
quality comparable with traditional column-based oligonucleotide
synthesis. The combined result of using traditional chemistry and
surface tension localization is a unique array technology capable of
supporting efficient, economic assay design optimization for any
customized analysis application. Surface tension DNA microarrays are an
array platform that enables research to be done on how arrays
themselves are best configured; consequently, for the first time the
full potential of DNA microarrays can be exploited in specific analysis applications.
Despite significant technical innovation in methods for synthesizing,
immobilizing, and optimizing DNA oligonucleotide collections into array
formats, equally significant hurdles remain to be overcome before this
analytical method is capable of supporting routine, reliable,
high-throughput assays. Most of the challenges center on developing
effective methods for optimizing and standardizing array hybridization
performance. Sources of variability in array hybridization assays
include the purity of in situ synthesized or immobilized
oligonucleotides, efficiency of in situ synthesis or DNA
oligonucleotide immobilization chemistries, oligonucleotide density and
availability for hybridization, the effects of the array surface
chemistry on hybridization, and the relative uniformity of
hybridization efficiency for all the oligonucleotides in an array.
Arrays are effective research tools at this stage, but new commercial
synthesis, quality assurance, and quality control strategies
will have to be devised to make them useful in a clinical setting.
Although arrays still hold promise as the tool of choice for
high-density, high-throughput genotyping, other technologies remain as
candidates to supplement or supplant this technology. Today genotyping
continues to be supported by sequencing or primer extension type
biochemistries coupled with gel- or capillary-based separation.
Capillary electrophoresis offers speed and economy over gels but is
already evolving from traditional fused silica capillary systems to
microfluidic electrophoresis systems called "lab-on-a-chip"
devices. The high degree of parallelism and miniaturization possible
with these devices may compete quite effectively with hybridization
arrays. Microfluidic chip devices have the additional attribute of
being modular and adaptable to the continuously evolving landscape of
genomic and genetic information that must be sampled. These devices are
inherently more manufacturable and robust since they combine well
developed manufacturing methods with robust, well developed
biochemistries such as PCR, primer extension, and sequencing. It is not
likely that any single technology will have the attributes necessary to
satisfy the demands of every segment of the diverse genomics community;
however, arrays, whether they are hybridization arrays or microfluidic
arrays, are sure to play a significant role.
In a demonstration project, the polymorphic human NAT2
gene, a biomarker for cancer and drug metabolism, was selected
as a model system to develop general strategies for designing and
optimizing DNA microarrays. The goal was to develop homozygous and
heterozygous DNA genotyping tests to yield a maximum of high-confidence
genotype assignments. The approaches developed using this model can be generalized to design hybridization arrays of virtually any complexity for genotyping and other applications, including gene expression profiling. Furthermore, once an array is designed, it can be expanded in scope and complexity without sacrificing the development invested in
the original array.
 |
Materials and Methods |
The scope of the genotyping assay design process includes the
following steps: 1) selecting the polymorphisms to be genotyped; 2)
collecting the sequence contexts around each polymorphism; 3) selecting
the target labeling and hybridization conditions; 4) designing the
hybridization probes that discriminate among the possible genotypes; 5)
testing the assay performance with samples of known genotype; and 6)
redesigning the probe set until the maximum genotyping performance is
obtained under the chosen conditions. Once this process is complete,
final test validation requires that an unknown sample set be genotyped
using the microarray assay and the results to be confirmed using a
reference genotyping method such as direct Sanger sequencing.
Surface tension DNA array synthesis is the combination of two
processes, substrate surface preparation and in situ DNA synthesis. The
substrate preparation begins with glass cleaning followed by spin
coating with a layer of photoresist. Photolithographic patterning with
a mask to define the desired size and distribution of the array
features follows spin coating. This process is illustrated in Fig.
1.

View larger version (35K):
[in this window]
[in a new window]
|
Fig. 1.
Schematic illustrating the initial steps
required for surface tension patterning the array substrates.
A photoresist spin coated onto a glass surface is patterned by exposing
it to light through a mask. Resist on the exposed surface is developed,
leaving the intended array features protected and the rest of the
surface as bare glass.
|
|
The patterned arrays are developed then immersed in a solution of
fluorosilane to generate a hydrophobic silane layer surrounding the
array features, which are still protected with photoresist. After the
fluorosilane is cured, acetone treatment removes the remaining
photoresist exposing the array feature sites. The features are coated
with an aminosilane then coupled with a linker molecule that will
support subsequent DNA oligonucleotide synthesis. These processed
substrates display the surface tension behavior for aqueous and polar
organic solvents shown in Fig. 2.

View larger version (130K):
[in this window]
[in a new window]
|
Fig. 2.
Once the glass surface is coated with
fluorosilane, the remaining photoresist is removed and the array
features are coated with an aminosilane.
Reagents delivered to the surface are repelled by the fluorosilane and
retained in the array features by surface tension.
|
|
The surface tension patterned substrates are aligned on a chuck in the
robotic array synthesizer where piezoelectric nozzles are used to
deliver solutions of activated standard DNA synthesis amidites as shown
in Fig. 3. Washing, deblocking, capping,
and oxidizing reagents are delivered by bulk flooding the reagent onto
the substrate surface and spinning the chuck mount to remove excess
reagents between reactions. The substrate surface is environmentally protected throughout the synthesis by a blanket of dry
N2 gas. Localizing and metering amidite delivery
is mediated by a computer command file that directs delivery of the
four amidites during each pass of the piezoelectric nozzle bank so the
predetermined oligonucleotide is synthesized at each array coordinate.
Array design iterations are accomplished by altering this synthesis command file.

View larger version (30K):
[in this window]
[in a new window]
|
Fig. 3.
Activated phosphoramidite monomers are
delivered to the array features by piezoelectric nozzles under the
control of a command file that guides synthesis of the correct sequence
at each array location.
Washing, deblocking, capping, and oxidizing reagents are flooded onto
the surface and centrifuged off between steps.
|
|
DNA array probe design was aimed at discriminating the seven
polymorphic sites in the human N-acetyltransferase gene
shown in Table 1
(NAT2, GenBank accession NM 000015). Specific
probes were designed by selecting sequences that overlapped each
polymorphic site and met additional design criteria such as length or
sequence composition. In each case, probe sets were assembled that
represented each polymorphism and sets of possible base pairing
mismatches for both the coding and noncoding DNA strands. Additional
sets of 5' and 3' probe sets were also selected for each polymorphism. An example probe set where all sequences are of a constant 17 nucleotide length is shown in Table 2.
View this table:
[in this window]
[in a new window]
|
TABLE 1
The seven common polymorphisms in the human NAT2 gene and the
combinations that define each of the most common alleles
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 2
A sample set of 40 oligonucleotide probes for the G191A polymorphism
taken from the first array design
Highlighted probe sequences are perfectly complementary to both strands
of the two polymorphic forms of the NAT2 gene at the G191A
polymorphic site. "C" and "NC" in the probe name designate the
coding and noncoding strand sequences, respectively. Substituted
nucleotide positions are in bold type.
|
|
Hybridization targets were prepared by using PCR primers
(5'-GTCACACGAGGAAATCAAATGC-3' and 5'-GTTTTCTAGCATGAATCACTCTGC-3') that
amplify a 1.2-kb fragment from genomic DNA that contains the entire 870 coding nucleotide, single exon of NAT2 as well as 5' and 3' noncoding
sequences. The PCR product was purified, nicked with DNase to generate
random fragments of about 50 to 100 nucleotides, and end-labeled with
biotin-2',3'-dideoxy-ATP. This product was hybridized to the
microarrays under stringent, discriminating conditions. Following
washing, the biotin-labeled targets were stained with cyanine-3
fluorescent dye-streptavidin conjugates, and the array was covered with
a microscope slide coverslip before fluorescence imaging.
 |
Results |
The first array design investigated was constructed with probes
set at a constant 17-mer length. Twenty probes selected to identify
each polymorphism on the coding strand and 20 probes on the noncoding
strand were designed following the pattern shown in Table 2.
Hybridization with samples homozygous for the *4 NAT2
allele (Table 1), confirmed by Sanger sequencing, showed that
only a subset of these probes hybridized with enough stability to
provide adequate fluorescence intensity for reliable detection. The
average discrimination of each *4 polymorphism over the alternate polymorphism was about 4:1. Alternatively, when a microarray was designed using probes chosen to have a common algorithmically calculated value for the temperature at which 50% of a DNA duplex denatures of (63 ± 2°C) (Breslauer et al., 1986
; Rychlik
and Rhoads, 1989
), all of the probes hybridized well enough to provide
adequate fluorescence intensity, but the average discrimination ratio
for the *4 polymorphisms decreased to approximately 3:1. The next array
design incorporated probes from this set that were empirically lengthening and shortening to maximize the discrimination ratio of
hybridization signals between the two polymorphisms. After several
rounds of empirical optimization to improve discrimination ratios, this strategy resulted in a microarray with an average discrimination ratio for the *4 polymorphisms of greater than 6:1. A
series of typical images showing this progression of array designs is
shown in Fig. 4. Progressive global
improvements in array discrimination ratios are summarized in Table
3.

View larger version (56K):
[in this window]
[in a new window]
|
Fig. 4.
Typical images taken from the series of
array designs showing the progressive improvement of hybridization
discrimination for specific positive probes relative to closely related
mismatched probes.
Tm, temperature at which 50% of a DNA duplex denatures.
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 3
Summarized characteristics for each of six array designs for NAT2
genotyping
Average probe length, calculated Tm, and %GC composition are given for
each polymorphism for each design as well as for the mismatch probes.
Average discrimination ratios were calculated for each polymorphism for
the heterozygous and homozygous cases. Ideally, heterozygote results
should approach unity, and the homozygous cases should be as large as
possible.
|
|
 |
Discussion |
Oligonucleotide microarrays have most commonly been used to
profile gene expression patterns; however, they are now often being
used for genotyping applications. Experience with reverse dot blot
membranes, which have a long history of being used for genotyping
applications, shows that it is challenging to optimize oligonucleotide
probes to achieve global maximum discrimination of many genetic
variants simultaneously. This problem is amplified considerably in
complex DNA oligonucleotide hybridization arrays. Nevertheless, it is a
challenge that must be met to take full advantage of DNA array
potential in genotyping applications. In situ synthesized surface
tension defines DNA oligonucleotide microarrays that offer new options
in approaching this problem. Because surface tension arrays do not
require complex lithographic masking schemes or large libraries of
presynthesized, chemically modified oligonucleotides, there is no
economic penalty for testing multiple array design options to arrive at
an optimal set of detection probes by an empirical process. This
process has been demonstrated by developing a simple genotyping
microarray assay modeled on the polymorphic human NAT2 gene.
The process described here can be generalized and captured by using
algorithms, then reapplied as an automated "intelligent" system to
design microarray probe sets based on the polymorphisms to be detected
and their local sequence compositions. Although it is unlikely that
algorithms will be 100% successful in anticipating optimal probe sets,
it is likely to offer a close approximation that can be completed using
empirical optimization. In addition, since hybridization probe
performance is very dependent on hybridization conditions and
discrimination performance is dependent on labeling and detection
strategies, it is also likely that these variables will introduce a
need for some additional empirical array design optimization before a
fully optimized assay is possible. Once a probe set has been selected
to detect a given set of polymorphic variants, by using this method new
polymorphisms are easily added to the set by reiterating essentially
the same process. The original probe set is kept constant and a new
potential set of matched probes would be selected for the new set of
polymorphisms. Once these probes are fine tuned using the same
empirical process that guided selection of the original probe set, they
can be added, generating an expanded, optimized hybridization array. In
this way, hybridization assays of great complexity can be assembled for
increasingly large sets of polymorphisms by a standard, reproducible method.
We thank Dr. Wendell Weber, University of Michigan, for providing us
with the NAT2 *4 homozygous sample; and Francois Chatelain, John Butler, and Albrecht Frauendorf, our colleagues at
Protogene, for assistance with substrate preparation and array production.