Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues☆
Introduction
Genome-wide expression profiling with DNA microarrays has been widely used to identify new cancer subtypes and expression signatures associated with prognosis [1], [2], [3], [4], [5]. The expression data of thousands of tumor samples, each characterized by the expression levels of up to ∼40,000 transcripts, are being quickly accumulated in the public repositories (reviewed in [6]). Due to technological limitations and the inherent complexity of the gene regulatory mechanism, such data are often noisy and extremely multivariate, leading to difficulties in data interpretation.
Besides the need for robust computational tools, integration of additional biological information about genes is essential for uncovering molecular mechanisms underling expression profiles. For example, functional categories from the Gene Ontology (GO) consortium [7], KEGG databases of molecular interaction pathways [8], and genome sequences of promoters are playing important roles in understanding a cluster of genes defined by expression profiling. In this paper, we introduce another kind of information that concerns each gene's expression pattern in a panel of normal tissues.
Only a small portion of the 30,000–40,000 protein-coding genes [9], [10] in the human genome are essential to the survival of individual cells, hence are constitutively expressed in different types of tissues [11]. Transcription of most genes is regulated by a cell differentiation process, and thus is often highly variable among tissue/cell types and developmental stages. While ubiquitously expressed genes (so-called maintenance genes [11]) play key roles in basic cellular processes, tissue-specific genes are related to the functioning of particular organs. Although it is still difficult to obtain expression profiles for individual cell types that constitute normal organs, genome-wide expression profiles of bulk tissues has been carried out by serial analysis of gene expression (SAGE) and DNA microarrays [11], [12], [13], [14], [15], [16], [17]. For each gene, such studies define its breadth of expression (BOE) in normal tissues, which tell where a certain gene is expressed under normal physiological conditions. Categorization of genes based on BOE might serve as additional sources of information to help us decipher the complex expression profiles observed in cancers.
In this paper, we performed additional microarray experiments of normal tissues to search extensively for tissue-specific genes and then systematically reanalyzed previously published DNA microarray data of various cancers. We employed oligonucleotide microarrays to measure the expression of ∼20,000 transcripts in 3 fetal and 33 adult normal human tissues (full list is given in Fig. 1A). Pooled RNA samples are used to maximize tissue coverage, which is important for defining tissue specificity. We retrieved data from a collection of previously published datasets of liver, brain, breast, and lung cancers. Then we focused on the genes that are specifically expressed in certain normal tissues but are differentially expressed in tumors arising from the same anatomical sites. Our strategy is to create a small but carefully selected dataset of normal tissue gene expression profiles, and use it as a seed to reanalyze large datasets in the public domain.
Section snippets
Expression profiling of normal tissues
Using the Affymetrix U133A array, we performed expression profiles of 36 common normal human tissues, each represented by a pooled RNA sample (see Supplementary Information for details). The raw data are available at our web site: http://www.genome.rcast.u-tokyo.ac.jp/normal/, and can also be queried through a graphical web interface at http://www.lsbm.org. After eliminating genes with little variation (max/min > 2, max–min > 100, see Method for details) in their expression among tissues, we
Discussion
Through expression profiling of a spectrum of normal human tissues, we identified sets of tissue-specific genes, and then studied their expression in cancers by analyzing a wealth of previously published DNA microarray datasets. Through unsupervised clustering of tissue-specific genes differentially expressed in tumors from the same anatomical site, we identified groups of coexpressed genes characteristic of different cell types within the organ, thus revealing cell lineage of tumor subtypes.
Sample preparation
Twenty-five total RNA specimens were purchased from Clontech (Palo Alto, CA), Ambion (Austin, TX) and Strategene (La Jolla, CA). In order to define breadth of expression accurately at a reasonable cost, we tried to cover as many tissue types as possible by using pooled RNA samples. Each specimen represents a human organ. We used RNA samples pooled from 2 to 84 donors to avoid differences at the individual level. But still many specimens from single donors are included because of the difficulty
Acknowledgments
The authors are indebted to Hirokazu Taniguchi for help with tissue acquisition, Hiroko Meguro for technical assistance, Yoshitaka Hippo, Naoko Nishikawa, Chen Yongxin, and Guo Yongqiu for stimulating discussions and Jiang Fu for proofreading. This work was partially supported by Grants-in-Aid for Scientific Research (S) 16101006 from The Ministry of Education, Culture, Sports, Science and Technology, Japan (to H.A.), and Health and Labour Sciences Research Grants (to H.A.). This work has been
References (42)
- et al.
Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling
Cancer Cell
(2002) - et al.
Comparison and meta-analysis of microarray data: from the bench to the computer desk
Trends Genet.
(2003) - et al.
Profiling gene expression using Onto-Express
Genomics
(2002) Gene expression correlates of clinical prostate cancer behavior
Cancer Cell
(2002)- et al.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
(1999) - et al.
Gene expression profiling predicts clinical outcome of breast cancer
Nature
(2002) - et al.
Chemosensitivity prediction by transcriptional profiling
Proc. Natl. Acad. Sci. USA
(2001) - et al.
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning
Nat. Med.
(2002) Gene Ontology: tool for the unification of biology
Nat. Genet.
(2000)- et al.
The KEGG resources for deciphering the genome
Nucleic Acids Res.
(2004)
Initial sequencing and analysis of the human genome
Nature
The sequence of the human genome
Science
Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes
Physiol. Genomics
Analysis of human transcriptomes
Nat. Genet.
A compendium of gene expression in normal human tissues
Physiol. Genomics
Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray
DNA Res.
Large-scale analysis of the human and mouse transcriptomes
Proc. Natl. Acad. Sci. USA
The human transcriptome map: clustering of highly expressed genes in chromosomal domains
Science
SAGEmap: a public gene expression resource
Genome Res.
Global gene expression analysis of gastric cancer by oligonucleotide microarrays
Cancer Res.
Applied Nonparametric Statistical Methods (Texts in Statistical Science)
Cited by (216)
Deciphering radiological stable disease to immune checkpoint inhibitors
2022, Annals of OncologyCitation Excerpt :PI was defined by the meta-proliferating cell nuclear antigen (PCNA) signature from Venet et al. This study defined a proliferation score as the 1% of genes most positively correlated with PCNA expression in normal human organs from a gene expression database (n = 131 genes, including markers of proliferation such as MKI67 and MCM2).16 TCGA RNA-seq data were downloaded from the Genomic Data Commons Portal (portal.gdc.cancer.gov) between 19 October 2020 and 21 October 2020.
Identification of two novel SNPs affecting lambing traits in sheep by using a 50K SNP-Chip
2020, Small Ruminant ResearchRBFOX1 Regulates the Permeability of the Blood-Tumor Barrier via the LINC00673/MAFF Pathway
2020, Molecular Therapy OncolyticsMultiomics Evaluation of Gastrointestinal and Other Clinical Characteristics of COVID-19
2020, Gastroenterology