Introduction

In human cell nuclei, genomic DNA is packed into nucleosomes. Approximately 150 bp are wrapped around a histone octamer, consisting of two copies of ‘core’ histones (H2A, H2B, H3 and H4), to form a single nucleosome.1 Nucleosomes are connected by linker DNA (50–100 bp long), a part of which may be bound with ‘linker’ histone H1.2 In a typical diploid human cell, 3 × 107 nucleosomes are present to pack 6 × 109 bp of genomic DNA, which is 2 m long when fully stretched, into a relatively small 5–10-μm diameter nucleus. This tight packaging would suggest that nucleosomes are barriers to gene expression as their positive charge sticks to the negative charge of DNA and inhibits the access of DNA-binding proteins such as transcription factors. Recent studies, however, have revealed that histones function both positively and negatively in the regulation of gene expression.3 This is mainly governed by post-translational modifications on specific amino acid residues on histones. Within a nucleosome, the H3–H4 tetramer particularly forms a tight complex with DNA.4 This stable tetramer allows post-translational histone modifications to be inheritable epigenetic marks, much like direct modifications on DNA. Compared with the limited diversity of DNA modifications, which is mostly 5-methylcytosine even though other modifications have recently been discovered,5, 6, 7 histones are subjected to a variety of modifications, including acetylation, methylation and phosphorylation, on many different amino acid residues.8, 9

All four core histones have long flexible N-terminal tails that are extruded from their central structured domains (Figure 1). Most, but not all, of the amino acids that are subjected to modification reside on these tails, where they are presumably more accessible to ‘reader’ proteins.10 Modifications on histones include acetylation, methylation and ubiquitination on lysine, methylation and citrullination on arginine, and phosphorylation on serine, threonine and tyrosine (Figure 2). In particular, acetylation and methylation on specific lysine residues are important for epigenetic gene regulation. Lysine methylation can occur at three different levels by replacing hydrogen on the primary amine, that is, monomethylation (me1), dimethylation (me2) and trimethylation (me3).

Figure 1
figure 1

Schematic nucleosome structure. A nucleosome consists of two copies of each core histone (H2A, H2B, H3 and H4) and 150 bp DNA. The N-terminal tail of each histone is extruded from the nucleosome.

Figure 2
figure 2

Modifications on core histones. Modification sites are indicated.

Major acetylation sites on histone H3 include K9, K14, K18, K23 and K27, and all acetylation is essentially correlated with transcriptional activation, being localized to transcription start sites and/or enhancers of (potentially) actively transcribed genes. In contrast, the correlation of methylation on H3 with transcription depends on its level and the residue. For example, trimethylation of H3K4 and H3K9 is associated with activation and repression of transcription, respectively. Different levels of methylation on the same residue affect transcription differently. These histone modifications are not necessarily prerequisite ‘codes’ for switching genes on and off, as some modifications are added as a consequence of gene activation and RNA polymerase elongation.11 Nevertheless, specific histone modifications may still serve as good epigenetic indicators of chromatin state (Figure 3).

Figure 3
figure 3

Distribution of histone modifications. Distributions of six modifications with respect to genes are schematically illustrated. TSS, transcription start site; TES, transcription end site. H3K4me3 is enriched around TSSs. H3K4me1 is enriched around enhancers and more downstream. H3K27ac is enriched around active enhancers and TSSs. In undifferentiated stem cells, both H3K4me3 and H3K27me3 (active and inactive marks, respectively) are enriched around TSSs on many genes. H3K27me3 is enriched around inactive TSS in somatic cells. H3K9me3 is broadly distributed on inactive regions. H3K27me3 and H3K9me3 are usually not colocalized. TSSs are generally devoid of nucleosomes.

In this review, we describe six emerging classes of histone modifications subjected for epigenome profiling by the International Human Epigenome Consortium (http://www.ihec-epigenomes.org/), which aims to produce high-resolution reference human epigenome maps for various normal and disease cell types for making them available to the research community. We then describe how histone modifications are commonly analyzed and conclude with a brief discussion of future research directions.

Marks around transcription start sites and enhancers of actively transcribed genes

Among histone modifications, trimethylated H3K4 (H3K4me3) is a good marker associated with actively transcribed genes. In active genes, H3K4me3 is enriched around transcription start sites (TSS) (Figure 3), whereas H3K4me2 peaks just downstream, followed by monomethylated H3K4 (H3K4me1) further downstream towards the gene body.3, 12, 13

Methylation of H3K4 is mediated through the balance between methyltransferases such as Set1/COMPASS-family proteins including MLL1–4 (Mixed-Lineage Leukemia 1–4) and demethylases such as KDM5A–D/JARID1A–D.14, 15, 16 The proper regulation of H3K4 methylation by these enzymes is important in normal cell function. Rearrangements of the MLL1 gene by translocation are often associated with aggressive acute leukemias.17 Dysregulation of KDM5 genes are also observed in cancer; a chimeric gene of KDM5A fused with a nucleoporin NUP98 is observed in leukemia, and overexpression of KDM5A and KDM5B is found in a variety of cancer types.18 In contrast, KDM5C and KDM5D are supposed to have tumor-suppression functions.15

Unmethylated or methylated H3K4 is recognized by protein domains such as the chromodomain, Plant Homeo Domain finger and Tudor domain (see recent reviews10, 19 and reference therein for the structural basis of modification–domain interactions). Unmethylated H3K4 and DNA methylation are closely associated to form silent chromatin, as unmethylated H3K4 is recognized by DNA methyltransferase 3L, a component of the DNA methyltransferase complex.20 In contrast, methylated H3K4 can directly recruit the transcription complex. For example, TAF3, a component of TATA-binding protein-associating factors binds to H3K4me3.21 Moreover, this interaction is enhanced by acetylation on K9 and K14, which also function as transcriptional activators through the recruitment of bromodomain-containing proteins that assist in chromatin remodeling and RNA polymerase elongation. Acetylation can also be added preferentially to histone H3 harboring trimethylation on K4, which is a target of Sgf29, a component of the Spt-Ada-Gcn5 acetyltransferase.22 Thus, the transcriptionally inactive or active chromatin state can be robustly maintained through the crosstalk among different histone modifications and DNA methylation.

Active enhancers are generally marked by H3K4me1 and acetylated H3K27 (H3K27ac) (Figure 3).23, 24, 25 However, these modifications are also associated with other regions downstream of (H3K4me1) and around (H3K27ac) TSSs of actively transcribed genes (Figure 3).26 Therefore, co-occupancy of H3K4me1 and H3K27ac is a better indication of active enhancers than either mark alone. Active enhancers have also been shown to be associated with H3K4me3,27 which is reasonable because enhancers are often within the transcription unit of RNA polymerase II28 and there is a link between H3K4me3 and RNA polymerase II activity, as described above. The relatively lower enrichment of H3K4me3 compared with H3K4me1 in the active enhancers may be regulated by demethylation through KDM5C/JARID1C.29 H3K4me1 can also be a mark of poised enhancers, and repressive modifications like trimethylated H3K27 (H3K27me3) may be present together in some cases.26

To summarize, TSSs of actively transcribed genes are marked by H3K4me3 and H3K27ac, and active enhancers can be identified by enrichments of both H3K4me1 and H3K27ac (Figure 3).

Histone modification within the body of transcribed genes

Trimethylated H3K36 (H3K36me3) is associated with gene bodies of actively transcribed genes30 (Figure 3) and is distributed on decondensed chromatin regions in the nucleus (Figure 4). Transcription-coupled H3K36 methylation is mediated through SETD2 histone methyltransferase, a tumor-suppressor candidate,31 which binds to elongating RNA polymerase II that is phosphorylated at the C-terminal domain of its largest subunit.32 H3K36me3 then recruits Rpd3S histone deacetylase complex that removes acetylation from histones within the gene body.33 This deacetylation is supposed to prevent aberrant gene body transcription, possibly by suppressing histone exchange.34

Figure 4
figure 4

Distribution of histone modifications at the microscopic level. Distributions of DNA (4′,6-diamidino-2-phenylindole, DAPI), H3K36me3, H3K27me3 and H3K9me3 within the nucleus are shown. Human hTERT-RPE1 cells were immunolabeled with antibodies against individual modifications. All three marks occupy different areas in the nucleus. H3K36me3 is localized to transcriptionally active euchromatin, which is localized within the interior of the nucleus. H3K27me3 is localized preferentially to facultative inactive chromatin, including a part of the inactive X chromosome. H3K9me3 is localized preferentially to DAPI-dense heterochromatin around the nuclear periphery and nucleoli, and a part of the inactive X chromosome. Arrows indicate the inactive X chromosome.

H3K36me3 and splicing are also thought to influence each other. The distribution of H3K36me3 depends on splicing,35, 36 and splicing is regulated by the H3K36me3 mark.37, 38 The H3K36me3 signal can be transmitted to the splicing machinery through an interaction between a chromodomain-containing protein, MORF-related gene 15, which directly binds to H3K36me3, and polypyrimidine tract-binding protein, which inhibits exon inclusion.37 A splicing factor-interacting protein (Psip1; PC4 and SF2-interacting protein 1) is also shown to bind to H3K36me3.38

Marks with silent chromatin

Gene repression can be mediated through two distinct mechanisms involving H3K9me3 and H3K27me3. H3K9me3 is generally associated with constitutive heterochromatin, which is originally described as a cytologically distinct condensed chromatin structure but is now often regarded as transcriptionally repressed chromatin.8, 39, 40 As shown in Figure 4, H3K9me3 indeed overlaps with DNA-dense domains around the nuclear periphery and nucleoli. Although repressive chromatin marks such as DNA methylation and H3K9me3 were thought to be stable, enzymes that assist in removing these marks have been found,41, 42, 43 even though the turnover of H3K9me3 is much slower (t1/2 1.3 day) than H3K9me1, which is associated with actively transcribed genes (t1/2 0.3 day) in HeLa cells.44

Trimethylation on H3K9 is added by methyltransferases (like SUV39H1 and ESET/SETDB1) and removed by KDM4A/JMJD2A/JHDM3A.41, 42, 45 Different methyltransferases have distinct target genome loci; SUV39H and ESET are responsible for transcriptional repression from major satellite and endogenous retrovirus elements, respectively, in mouse cells.46, 47 Heterochromatin protein 1 family proteins have a key role in connecting H3K9me3 and gene silencing.48, 49 Heterochromatin protein 1 binds to H3K9me3 with its chromodomain and a number of effector proteins including histone methyltransferases and transcription repressors (like KAP1/TRIM28/TIF1beta) with the other domains,50 thereby maintaining chromatin silencing. Heterochromatin protein 1 is also likely to be involved in chromatin compaction, both at local and higher-order levels.51, 52 A typical H3K9me3-enriched region is mouse pericentromeric heterochromatin, or major satellite repeats.53 In human, H3K9me3 is also enriched in centromeric alpha-satellite repeats, but is also found in unique gene loci, with a broader distribution throughout the enriched domains without distinct peaks (Figure 3).12, 52

H3K27me3 is generally associated with facultative heterochromatin, which is developmentally regulated silent chromatin.54, 55 Trimethylation on H3K27 is added by Polycomb repressive complex (PRC) 2, which contains a methyltransferase, enhancer of Zeste holomog 2. H3K27me3 can recruit another protein complex, PRC1, through chromodomain-containing proteins. A component of PRC1 (Ring1b) then mediates monoubiquitination of H2A at K119.56 This H2AK119ub1 is required for PRC-dependent silencing and functions to inhibit RNA polymerase II elongation.57 H3K27me3 is demethylated by KDM6A/UTX and KDM6B/JMJD3.54, 58 Dysregulation of either the methyltransferase or demethylase of H3K27me3 is associated with cancer.59, 60

Mutually exclusive silent marks

Whereas both H3K9me3 and H3K27me3 are associated with silent genes, these two marks are mutually exclusive and do not exist in the same gene loci.61 Their nonoverlapping distribution is also observed at the cytological level (Figure 4). Although it remains elusive how these modifications target specific loci, some mechanisms have been presented. H3K27me3 is generally localized to the gene-rich regions with low DNA methylation. PRCs can be recruited to specific DNA loci through binding to specific cis-elements (known as Polycomb response elements in Drosophila, but less defined in human) and/or through the interaction with noncoding RNA like HOTAIR or XIST.62, 63 H3K9me3 is supposed to be localized to tandem repeats in a small RNA-dependent manner.53 Besides repeat sequences, H3K9me3 is also distributed in genic regions and enriched in subdomains of human inactive X chromosomes, which are typical facultative heterochromatin.52, 64 Thus, cells can achieve gene silencing by using either H3K9me3- or H3K27me3-dependent pathways. Therefore, it is recommended to analyze both modifications to determine if chromatin at a given locus is in the transcriptionally repressed state.

Bivalent modifications of active and silent marks

In undifferentiated stem cells, TSSs of most genes are associated with H3K4me3, regardless of gene expression status.65 Repressed genes are further associated with H3K27me3, resulting in harboring the active and inactive bivalent modifications on the same nucleosome.65, 66 During differentiation, either H3K4me3 or H3K27me3 is removed to establish repressed or activated genes (Figure 3). Recent studies have shown a possible mechanism for how H3K27me3 spreads towards active chromatin regions during differentiation. A PRC2 component PHF19 binds to H3K36me3 and recruits a histone demethylase to erase the H3K36me3 mark on active transcription units.67, 68

Histone-modification analysis

Histone modifications can be analyzed in various ways, depending on the purpose. Mass spectrometry has proven useful for comparing the global levels of specific modifications and their combinations among different samples.69, 70 The analysis of enrichments on specific genome loci requires chromatin immunoprecipitation (ChIP) using an antibody directed against the site-specific modification (Figure 5).71 As core histones are tightly bound to DNA unlike most transcription factors, chromatin can be prepared from both unfixed and formaldehyde-fixed cells; the latter is generally required for ChIP with transcription factors. After fragmentation by sonication and/or micrococcal nuclease treatment, histone–DNA complexes harboring a specific modification can be immunoprecipitated using the specific antibody. After purification, co-immunoprecipitated DNA is subjected to quantitative PCR (qPCR), microarray analysis (ChIP) or deep sequencing (seq). If the target loci are known a priori, ChIP–qPCR is more convenient. The enrichment of the target loci by ChIP with the specific antibody is evaluated by the recovery efficiency (typically expressed as the percent of input chromatin used for immunoprecipitation). It is important to perform two experimental controls: ChIP with nonspecific immunoglobulin G and qPCR for another control locus that is supposed to be negative in the target modification. When the enrichment of the target locus by ChIP with the specific antibody is significantly higher than the controls, one can conclude that the modification recognized by the antibody is enriched in the locus. A technical limitation of ChIP is handling chromatin from cell populations of typically 105–106 cells (minimally, 103–104 cells).71, 72, 73, 74 As different modifications could be present in the same loci in different cells, two overlapping modifications on the same loci cannot be straightforwardly interpreted as bivalent marks. To confirm bivalent modifications on the same nucleosomes, sequential ChIP using different antibodies (re-ChIP) is required.65

Figure 5
figure 5

Scheme of ChIP. To analyze histone modification on genome loci, ChIP-based methods are useful. For ChIP with histones, either crosslinked or native chromatin can be used. After fragmentation either or both by sonication or micrococcal nuclease digestion, chromatin fragments harboring a specific modification can be precipitated with the specific antibody. Purified DNA that was associated with precipitated chromatin is now ready for qPCR, sequencing and microarray analyses.

Thanks to recent developments in next-generation DNA sequencers, the sequences of immunoprecipitated DNA can be determined and mapped to the reference genome. This ChIP–seq method is a powerful technique for determining the genome-wide distribution of specific modifications.3, 74 Although ChIP–seq was expensive and required bioinformatics knowledge, prices have become more reasonable and many analytical applications are now available to nonspecialists, including peak calling75 and aggregation plotting to transcription start and end sites (ngs.plot; http://code.google.com/p/ngsplot/). Compared with ChIP–qPCR, which is the measure of recovery rate of the specific locus, ChIP–seq is the measure of fold enrichment of each locus. Therefore, if the modification is broadly distributed (like silencing associated H3K9me2 and H3K9me3), deeper reads are required compared with the modifications associated with active transcription (like H3K4me3 and H3K27ac), which produce descent peaks like transcription factors. To see the enrichment of inactive marks, increasing the bin size is sometimes helpful.52

For ChIP experiments, the quality of antibody is an essential factor. Although many histone-modification-specific antibodies are available from different commercial sources, about 20–25% of ‘specific’ or ‘ChIP-grade’ antibodies were reported to be unspecific and/or inapplicable to ChIP.76 This is because an antibody needs to distinguish small changes, such as a single methylation on the same lysine (for example, me2 versus me3) or a few amino acid differences with the same modification (for example, H3K9 and H3K27, which share ARKS sequence; see Figure 6). Therefore, the specificity of each antibody needs to be carefully evaluated. The lot-to-lot variation becomes particularly problematic when polyclonal antibodies are used, as the nature of sera differs in different animals and also in different bleeds from the same animal, even though the injected antigen is the same.77, 78 Even though it has become more common that every lot is validated by the supplier, not all antibodies are extensively tested and users may need to validate by themselves using a set of synthetic peptides or a peptide array on which various peptides are spotted (Figure 6).

Figure 6
figure 6

An example of antibody specificity validation. The specificity of a rabbit polyclonal antibody directed against H3K9me3 was evaluated using a peptide array on which various modified peptides are spotted in duplicate (MODified Histone Peptide Array; Active Motif). This antibody binds to peptides harboring H3K9me3 alone (A15) and those harboring H3K9me3 with other modifications at R2, K4, R8 or K14 (blue boxes), but does not bind to peptides harboring H3K9me3 with phosphorylated S10 or T11 (green boxes). The antibody also cross-reacts with peptides harboring H3K27me3 (red boxes).

These problems can be overcome by using monoclonal antibodies, which are expected to retain their specificity over time, although an intensive effort is needed to screen a good hybridoma clone.76, 78 Monoclonal antibodies are, however, criticized for having weak binding affinity (particularly for mouse), potential structure-specific epitope recognition and binding inhibition by neighboring modifications. However, high-affinity monoclonal antibodies with dissociation constants from nano- to picomolar levels can be obtained even from mice.79 The inhibition or occlusion of antibody binding by neighboring modifications is not exclusive to monoclonal antibodies, but is also found in polyclonal antibodies.77 For example, all H3K9me3-specific antibodies tested so far were sensitive when S10 is phosphorylated; that is, they hardly reacted with a peptide containing K9me3 and S10ph or T11ph (Figure 6 and unpublished observation). The structure dependence may also be minimized by screening monoclonal antibodies for different applications such as ELISA, immunoblotting, immunofluorescence and immunoprecipitaiton.78, 79 To reduce possible antibody-specific effects, cross-validation using different antibodies to the same modification might be helpful, but this is not an efficient way. Alternatively, making pooled monoclonal or oligoclonal antibodies consisting of individually well-characterized ones might be a good option to overcome these issues. Until a ‘golden standard’ antibody becomes available, one should be careful about the specificity of antibody, which affects the interpretation of ChIP data.

Human genetics and epigenome analysis

Human genetics research stems from DNA sequence analysis to detect variations such as small nucleotide polymorphisms, copy number variations and chromosome rearrangements. Analysis of DNA methylation has been added as an extra epigenetic layer on human genetics. In general, high levels of DNA methylation in gene regulatory elements are anticorrelated with active genes and chromatin accessibility.3, 80 This has been well demonstrated through the studies of genomic imprinting, by which maternal or paternal allele-specific gene expression is regulated through specific methylation that is inherited from germ cells.7, 81 Similarly, on inactive X chromosomes in female cells, DNA methylation occurs on regulatory regions of silenced genes. In these cases, gene expression is repressed in an allele-specific manner, even though all essential components for the other allele are present. In other cases, methylation is added to both alleles to ensure gene silencing during differentiation or pathogenesis. Furthermore, DNA methylation at specific loci has become a cancer biomarker.82 In addition to DNA methylation, recent studies have shown that histones also have an important role in epigenetic gene regulation, and their modifications are useful for determining the gene activity state and defining regulatory elements, as described above. Thus, if cell samples with preserved histone–DNA contacts are available, evaluation of histone modifications at specific gene loci by ChIP–qPCR or genome-wide by ChIP–seq would be useful for deciphering the epigenome state of the samples. Recently, it has become possible to detect the association of specific histone modifications with gene loci in single cells within tissue sections by combining the proximal ligation assay with immunofluorescence and in situ hybridization.83 As the technology for studying the epigenome rapidly develops, more sensitive and high-throughput detection methods may become available for human genetics research, handling a relatively small number of cells.

Finally, apart from the study of histone modifications themselves, the study of disease-associated mutations in factors involved in the regulation and output of histone modifications will also be interesting targets for future human genetics research. These include mutations in modification and demodification enzymes, specific binding proteins and regulatory RNA.84, 85, 86, 87