Elsevier

Genomics

Volume 86, Issue 2, August 2005, Pages 127-141
Genomics

Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues

https://doi.org/10.1016/j.ygeno.2005.04.008Get rights and content

Abstract

A critical and difficult part of studying cancer with DNA microarrays is data interpretation. Besides the need for data analysis algorithms, integration of additional information about genes might be useful. We performed genome-wide expression profiling of 36 types of normal human tissues and identified 2503 tissue-specific genes. We then systematically studied the expression of these genes in cancers by reanalyzing a large collection of published DNA microarray datasets. We observed that the expression level of liver-specific genes in hepatocellular carcinoma (HCC) correlates with the clinically defined degree of tumor differentiation. Through unsupervised clustering of tissue-specific genes differentially expressed in tumors, we extracted expression patterns that are characteristic of individual cell types, uncovering differences in cell lineage among tumor subtypes. We were able to detect the expression signature of hepatoctyes in HCC, neuron cells in medulloblastoma, glia cells in glioma, basal and luminal epithelial cells in breast tumors, and various cell types in lung cancer samples. We also demonstrated that tissue-specific expression signatures are useful in locating the origin of metastatic tumors. Our study shows that integration of each gene's breadth of expression (BOE) in normal tissues is important for biological interpretation of the expression profiles of cancers in terms of tumor differentiation, cell lineage, and metastasis.

Introduction

Genome-wide expression profiling with DNA microarrays has been widely used to identify new cancer subtypes and expression signatures associated with prognosis [1], [2], [3], [4], [5]. The expression data of thousands of tumor samples, each characterized by the expression levels of up to ∼40,000 transcripts, are being quickly accumulated in the public repositories (reviewed in [6]). Due to technological limitations and the inherent complexity of the gene regulatory mechanism, such data are often noisy and extremely multivariate, leading to difficulties in data interpretation.

Besides the need for robust computational tools, integration of additional biological information about genes is essential for uncovering molecular mechanisms underling expression profiles. For example, functional categories from the Gene Ontology (GO) consortium [7], KEGG databases of molecular interaction pathways [8], and genome sequences of promoters are playing important roles in understanding a cluster of genes defined by expression profiling. In this paper, we introduce another kind of information that concerns each gene's expression pattern in a panel of normal tissues.

Only a small portion of the 30,000–40,000 protein-coding genes [9], [10] in the human genome are essential to the survival of individual cells, hence are constitutively expressed in different types of tissues [11]. Transcription of most genes is regulated by a cell differentiation process, and thus is often highly variable among tissue/cell types and developmental stages. While ubiquitously expressed genes (so-called maintenance genes [11]) play key roles in basic cellular processes, tissue-specific genes are related to the functioning of particular organs. Although it is still difficult to obtain expression profiles for individual cell types that constitute normal organs, genome-wide expression profiles of bulk tissues has been carried out by serial analysis of gene expression (SAGE) and DNA microarrays [11], [12], [13], [14], [15], [16], [17]. For each gene, such studies define its breadth of expression (BOE) in normal tissues, which tell where a certain gene is expressed under normal physiological conditions. Categorization of genes based on BOE might serve as additional sources of information to help us decipher the complex expression profiles observed in cancers.

In this paper, we performed additional microarray experiments of normal tissues to search extensively for tissue-specific genes and then systematically reanalyzed previously published DNA microarray data of various cancers. We employed oligonucleotide microarrays to measure the expression of ∼20,000 transcripts in 3 fetal and 33 adult normal human tissues (full list is given in Fig. 1A). Pooled RNA samples are used to maximize tissue coverage, which is important for defining tissue specificity. We retrieved data from a collection of previously published datasets of liver, brain, breast, and lung cancers. Then we focused on the genes that are specifically expressed in certain normal tissues but are differentially expressed in tumors arising from the same anatomical sites. Our strategy is to create a small but carefully selected dataset of normal tissue gene expression profiles, and use it as a seed to reanalyze large datasets in the public domain.

Section snippets

Expression profiling of normal tissues

Using the Affymetrix U133A array, we performed expression profiles of 36 common normal human tissues, each represented by a pooled RNA sample (see Supplementary Information for details). The raw data are available at our web site: http://www.genome.rcast.u-tokyo.ac.jp/normal/, and can also be queried through a graphical web interface at http://www.lsbm.org. After eliminating genes with little variation (max/min > 2, max–min > 100, see Method for details) in their expression among tissues, we

Discussion

Through expression profiling of a spectrum of normal human tissues, we identified sets of tissue-specific genes, and then studied their expression in cancers by analyzing a wealth of previously published DNA microarray datasets. Through unsupervised clustering of tissue-specific genes differentially expressed in tumors from the same anatomical site, we identified groups of coexpressed genes characteristic of different cell types within the organ, thus revealing cell lineage of tumor subtypes.

Sample preparation

Twenty-five total RNA specimens were purchased from Clontech (Palo Alto, CA), Ambion (Austin, TX) and Strategene (La Jolla, CA). In order to define breadth of expression accurately at a reasonable cost, we tried to cover as many tissue types as possible by using pooled RNA samples. Each specimen represents a human organ. We used RNA samples pooled from 2 to 84 donors to avoid differences at the individual level. But still many specimens from single donors are included because of the difficulty

Acknowledgments

The authors are indebted to Hirokazu Taniguchi for help with tissue acquisition, Hiroko Meguro for technical assistance, Yoshitaka Hippo, Naoko Nishikawa, Chen Yongxin, and Guo Yongqiu for stimulating discussions and Jiang Fu for proofreading. This work was partially supported by Grants-in-Aid for Scientific Research (S) 16101006 from The Ministry of Education, Culture, Sports, Science and Technology, Japan (to H.A.), and Health and Labour Sciences Research Grants (to H.A.). This work has been

References (42)

  • Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • J.C. Venter

    The sequence of the human genome

    Science

    (2001)
  • J.A. Warrington et al.

    Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes

    Physiol. Genomics

    (2000)
  • V.E. Velculescu

    Analysis of human transcriptomes

    Nat. Genet.

    (1999)
  • L.L. Hsiao

    A compendium of gene expression in normal human tissues

    Physiol. Genomics

    (2001)
  • A. Saito-Hisaminato et al.

    Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray

    DNA Res.

    (2002)
  • A.I. Su et al.

    Large-scale analysis of the human and mouse transcriptomes

    Proc. Natl. Acad. Sci. USA

    (2002)
  • H. Caron et al.

    The human transcriptome map: clustering of highly expressed genes in chromosomal domains

    Science

    (2001)
  • A.E. Lash et al.

    SAGEmap: a public gene expression resource

    Genome Res.

    (2000)
  • Y. Hippo et al.

    Global gene expression analysis of gastric cancer by oligonucleotide microarrays

    Cancer Res.

    (2002)
  • P. Sprent et al.

    Applied Nonparametric Statistical Methods (Texts in Statistical Science)

    (2000)
  • Cited by (216)

    • Deciphering radiological stable disease to immune checkpoint inhibitors

      2022, Annals of Oncology
      Citation Excerpt :

      PI was defined by the meta-proliferating cell nuclear antigen (PCNA) signature from Venet et al. This study defined a proliferation score as the 1% of genes most positively correlated with PCNA expression in normal human organs from a gene expression database (n = 131 genes, including markers of proliferation such as MKI67 and MCM2).16 TCGA RNA-seq data were downloaded from the Genomic Data Commons Portal (portal.gdc.cancer.gov) between 19 October 2020 and 21 October 2020.

    View all citing articles on Scopus

    DNA microarray data from this article have been deposited with NCBI Gene Expression Omnibus (GEO) under accession: GSE2361.

    View full text