Abstract
The human genome encodes 48 nuclear receptor (NR) genes, whose translated products transform chemical signals from endo-xenobiotics into pleotropic RNA transcriptional profiles that refine drug metabolism. This review describes the remarkable diversification of the 48 human NR genes, which are potentially processed into over 1000 distinct mRNA transcripts by alternative splicing (AS). The average human NR expresses ∼21 transcripts per gene and is associated with ∼7000 single nucleotide polymorphisms (SNPs). However, the rate of SNP accumulation does not appear to drive the AS process, highlighting the resilience of NR genes to mutation. Here we summarize the altered tissue distribution/function of well characterized NR splice variants associated with human disease. We also describe a cassette exon visualization pictograph methodology for illustrating the location of modular, cassette exons in genes, which can be skipped in-frame, to facilitate the study of their functional relevance to both drug metabolism and NR evolution. We find cassette exons associated with all of the functional domains of NR genes including the DNA and ligand binding domains. The matrix of inclusion or exclusion for functional domain-encoding cassette exons is extensive and capable of significant alterations in cellular phenotypes that modulate endo-xenobiotic metabolism. Exon inclusion options are differentially distributed across NR subfamilies, suggesting group-specific conservation of resilient functionalities. A deeper understanding of this transcriptional plasticity expands our understanding of how chemical signals are refined and mediated by NR genes. This expanded view of the NR transcriptome informs new models of chemical toxicity, disease diagnostics, and precision-based approaches to personalized medicine.
SIGNIFICANCE STATEMENT This review explores the impact of alternative splicing (AS) on the human nuclear receptor (NR) superfamily and highlights the dramatic expansion of more than 1000 potential transcript variants from 48 individual genes. Xenobiotics are increasingly recognized for their ability to perturb gene splicing events, and here we explore the differential sensitivity of NR genes to AS and chemical exposure. Using the cassette exon visualization pictograph methodology, we have documented the conservation of splice-sensitive, modular, cassette exon domains among the 48 human NR genes, and we discuss how their differential expression profiles may augment cellular resilience to oxidative stress and fine-tune adaptive, metabolic responses to endo-xenobiotic exposure.
Introduction
The nuclear receptor (NR) concept describes a mediator or sensor that translates chemical changes into physiologic effects (Evans and Mangelsdorf, 2014). These translators bind specific chemical ligands and chromatin leading to regulation of RNA synthesis (Mueller et al., 1958). The NR superfamily of 48 human transcription factors is composed of a modular domain gene structure that includes a DNA binding domain (DBD), a ligand binding domain (LBD), a hinge domain located between the DBD and LBD, and a highly variable amino (N)-terminal transactivation domain. The functional diversity of NR genes is defined in part by the structural organization of their LBD and the role ligand binding plays in mediating cellular activity. Variations in the LBD’s structural scaffold allow NR genes to be subdivided into three functional classes, which include 1) ligand-dependent NRs, which encode a LBD that requires ligand binding to activate both genomic and nongenomic signaling events; 2) ligand-independent or constitutively active NRs, which encode a LBD that is stuck [via mutation or alternative splicing (AS)] in the “on position” in the absence of ligand binding; and 3) inactive or transcriptional repressor NRs, which function as dominant negatives that can no longer interact with coactivators due to the loss of key structural motifs in the LBD (Bridgham et al., 2010). Based on this diversity, ligand binding can trigger a diverse array of cellular responses in NR genes, with ligands being classified according to their role as agonists, antagonists, partial agonists, inverse agonists, superagonists, or selective NR modulator (Burris et al., 2013).
Nearly all human genes are alternately spliced (Lee and Rio, 2015). AS is a cotranscriptional process that generates protein diversity via the inclusion or exclusion of different exons within the mRNA transcripts of a gene. This phenomenon is complex, but it is associated with changes in spliceosome binding preference from weaker to stronger splice sites, alterations in the balance between positive and/or negative splicing factors, dynamic regulation of tissue-specific factors, and transitions in mRNA secondary structure stability (Smith and Valcarcel, 2000; Soller, 2006; Chen and Manley, 2009). Xenobiotics and chemotherapeutic agents (Zaharieva et al., 2012; Lambert et al., 2017), UV-B radiation (Sprung et al., 2011), oxidative stress (Cote et al., 2012; Melangath et al., 2017), and heavy metals (Jiang et al., 2017; Li et al., 2018; Wang et al., 2018) are also known to contribute to AS. These effectors are increasingly being recognized for their potential to alter human drug metabolism and health, via splicing-related mechanisms that expand the scope of their toxic properties.
The premise of this article is to document naturally occurring splice variation patterns in the 48 human NR genes, to clarify the role AS plays in modulating endo-xenobiotic metabolism. While AS is a feature common to most multiexon genes, it appears to play a pivotal role in mediating the diverse set of functions that NRs employ to transduce a discrete, chemical exposure cue into an adaptive, physiologic response. The modular structural domains of NR genes are well conserved across the superfamily, however this review highlights the diversity of “cassette exon” structures operating across the seven human NR gene subfamilies (NR0–NR6) and explores how these features may alter a gene’s susceptibility to environmentally responsive, AS events. In general, a cassette exon is defined as a splicing event whereby an interposing exon, between two other exons, can be either included or skipped during pre-mRNA processing to generate two distinct mRNA transcripts and ultimately, two protein isoforms (Cui et al., 2017). If a cassette exon is expressed at a 100% inclusion rate, it is considered a constitutive exon (Zhang et al., 2016), however, a cassette exon may more accurately be described as any exon that can be conditionally excluded from a mature mRNA transcript without altering the reading frame or protein coding potential. In this regard, many cassette exons are overlooked and considered constitutive exons, simply because they are commonly included in the gene’s reference transcript. In reality, any exon that can be removed without altering the proteins reading frame, can be considered a cassette exon, including those that encode extensions of the amino or carboxy terminus. Modern RNA sequencing advances have greatly improved our ability to study cassette exon utilization and the role AS plays in expanding mRNA transcript diversity, however improved sequencing methodologies are still required to address fundamental questions relating to the role of AS and transcriptome expansion in normal physiology, development, and disease (Wan and Larson, 2018). In 2017, we published a similar review focused on AS in the cytochrome P450 (P450) superfamily that highlighted the tremendous expansion of the human P450 transcriptome over the last decade, which now includes over 1000 unique mRNA transcripts generated from the 57 human P450 genes (Annalora et al., 2017). The goal of the current review is to explore similar trends in the NR superfamily, and to compare the complementary role alternative splicing may play in mediating the expansion of gene function among these highly associated gene superfamilies, as reviewed (Honkakoski and Negishi, 2000).
To explore the role AS plays in expanding NR gene functionality, we annotated all known human NR splice variants listed in the National Institutes of Health (NIH) PubMed, AceView, and Ensembl data bases (Thierry-Mieg and Thierry-Mieg, 2006; Cunningham et al., 2015). From this computational meta-analysis, we found that the 48 human NR genes encode a transcriptome comprising at least 1004 unique transcript variants (see Table 1). This number represents all currently known reference (or wild-type) and variant NR mRNAs, including those with retained introns, premature termination codons, and those subject to nonsense-mediated decay; we also included some experimentally derived transcript variants listed in PubMed citations that were not represented in the primary data bases; these were identified by searching PubMed for “nuclear receptor-, NR*-, or nuclear hormone receptor-variant,” “splice variant,” “alternative splicing,” or “alternative transcript.”
We discovered that NR transcripts exploit AS options at rates comparable to the P450 superfamily, and that cassette exon utilization in particular, may be a key factor influencing the functional plasticity of NR genes. Numerous splice variants capable of influencing homo- and heterodimer interactions, ligand binding, DNA binding, and subcellular trafficking events were identified or predicted, some of which may help to refine complex physiologic responses to endo-xenobiotic exposures. Endogenous gene splicing programs involving NR genes and their coregulatory proteins, some of which are components of the spliceosome, are also known to exist (Auboeuf et al., 2005). In this regard, NR signaling cascades couple gene transcription with pre-mRNA splicing, allowing a high level of coordination among ligand binding events, gene splicing, and protein translation. This central role allows NR genes to modulate an array of pleiotropic cellular functions, including neurogenesis and steroid hormone signaling, via discrete interactions with tissue-specific, NR coregulators (e.g., Cofactor of BRCA1 (COBRA1) or Arginine and glutamate-rich protein 1 (ARGLU1)) that direct gene-specific AS events. It is currently thought that this class of splicing regulation is more limited in scope than stochastic splicing cascades triggered by the direct effects of chemical mutagens, UV irradiation, or oxidative stress (Sun et al., 2007; Magomedova et al., 2019).
AS can also be mediated by small molecules (as reviewed, Taladriz-Sender et al., 2019) and nucleic acid drugs, generating new opportunities for modulating aberrant P450 and NR gene activity linked to disease. Antisense oligonucleotides that induce alternate exon splicing by interfering with pre-mRNA processing are called splice-switching oligonucleotides (SSOs) (as reviewed, Croft et al., 2000; Kole et al., 2012; van Roon-Mom and Aartsma-Rus, 2012). SSO technology has primarily been exploited as a therapeutic approach to rare genetic diseases such as Duchenne muscular dystrophy (Fall et al., 2006; McClorey et al., 2006a,b; Adams et al., 2007; Fletcher et al., 2007; Koo and Wood, 2013) as highlighted by the Food and Drug Administration approval of Eteplirsen in 2016 (Syed, 2016). The therapeutic utility of SSOs to modulate the immune response have also been explored (Mourich and Iversen, 2009; Mourich et al., 2014; Panchal et al., 2014), as has ligand-independent signaling in the vitamin D receptor (Annalora et al., 2019). The implications of splice-switching small molecules and SSOs as gene-directed therapeutics targeting the pre-mRNA of NR genes underlie the examination of NR splice variants presented here.
Results
Human Nuclear Receptor Associations with Pathology.
The 48 genes in the nuclear receptor superfamily are fundamentally linked to human health and homeostasis, and this review aims to clarify how cellular changes in NR gene expression can promote both disease and health resilience. For clarity, a complete listing of all 48 human NR genes, their common names and abbreviations are listed in the supplemental abbreviations section of the Supplemental Materials. Using public data bases, we identified 31 NR genes already connected to a human disease either from literature citations, links to single nucleotide polymorphisms (SNPs), or genome-wide associative studies (GWAS), as indicated in Table 2. Disease associations were observed for reproductive, nervous, cardiovascular, musculoskeletal, metabolic, sensory, and immune systems, highlighting the central role of NR genes in regulating human health. Numerous associations with a variety of cancers and rare genetic diseases were also identified, and some of these are associated with NR transcript variants listed in the NIH AceView data base (Thierry-Mieg and Thierry-Mieg, 2006). A summary of all known transcript variants identified by searching PubMed, AceView, and the Ensembl data base (Cunningham et al., 2015) are listed in Table 1, as well as all known SNPs listed in the GeneCards data base (Stelzer et al., 2011).
Meta-Analysis of Nuclear Receptor Transcript Variants and SNPs.
As summarized in Table 1, we cataloged all known transcript variants for the 48 human NR genes, plus one processed pseudogene, farnesoid X receptor β (FXRβ (NR1H5)), and annotated their known association with human disease and any available information regarding their molecular weight, tissue-specific expression profile, and subcellular trafficking. The stereo-radar plot shown in Fig. 1 highlights the expansion of NR transcript variants in humans and contrasts these results with the number of known SNPs for each gene. We found the number of mRNA transcript variants expressed for each human NR gene ranged from 1 to 62 variants, with liver X receptor α (LXRα) (62), constitutive androstane receptor (CAR) (60), and estrogen related receptor γ (ERRγ) (47) expressing the highest number of known mRNA transcripts. In contrast, FXRβ (1), small heterodimer partner-1 (SHP-1) (3), dosage-sensitive sex reversal, adrenal hypoplasia critical region, on chromosome X, gene 1 (Dax-1) (4) and RAR-related orphan receptor β (RORβ) (4) display the fewest number of transcript variants. A comparison of total variant transcripts and total SNPs reveals some divergence between these characteristics as retinoic acid receptor α (RARα) (49,231) and RORα (39,338) display the highest number of known polymorphisms, but not the highest number of transcript variants [RARα (39) and RORα (28)]. However, the orphan nuclear receptor ERRγ (NR3B3) showed both an extremely high number of transcript variants (47) and SNPs (33,772). If the number of transcript variants is normalized to the total number of SNPs, the xenobiotic sensing NR gene, CAR (NR1I3), possesses nearly seven times as many splice variants per SNP (0.06) than any other human NR gene, which on average expresses ∼0.009 splice variants per SNP, suggesting that the CAR gene, an important regulator of xenobiotic metabolism, may be more sensitive to mutational selection pressures than the average NR gene and/or more functionally dependent on AS mechanisms to expand its xenosensor role (Auerbach et al., 2005; Chen et al., 2010). However, because the evolutionary history of the CAR gene is complex compared with other NR genes, having descended into mammals from lobe-finned fishes and tetrapods (Sarcopterygii), but not into reptiles or most birds (Zhao et al., 2015), it is difficult to understand why this gene may have become more sensitive to AS than other NR genes. CAR may simply have experienced higher levels of selection pressure in humans due to its central role mediating adaptation to changes in both diet and environmental exposure.
A Brief Comparison with P450 Genes.
Based on our analysis, we found that NR genes express roughly 20.5 transcript variants per gene, compared with 16.7 for the average P450 superfamily gene member (Annalora et al., 2017) (Table 1). This is not a dramatic difference. However, we found 6985 SNPs per gene for NR genes, but only 848 SNPs per gene for P450s. This translates to 341 SNPs per transcript variant for NR genes, as compared with 50 SNPs per transcript variant in the P450s. Given the similar gene size and number of total splice variants per gene, but nearly 7-fold more SNPs per gene, it appears that NR genes are subject to greater mutational selection pressures than P450 genes. However, because the accumulation of single nucleotide variants in NR genes have been inversely correlated to a gene’s age (Popadin et al., 2014; Mackeh et al., 2017), P450 genes are likely more ancient than their cognate NR genes, which due to their younger age, may be more resilient to mutations that might further optimize their expression parameters, function, or interactome.
Nuclear Receptor Splice Variants Associated with Human Disease.
While SNPs can undoubtedly have a profound effect on NR gene function via multiple mechanisms (Prakash et al., 2015), there is evidence that the NR gene family is more resilient to mutations than other gene families, including the human leukocyte antigens and histone deacetylases (Mackeh et al., 2017). However, less is known about how SNPs alter the natural splicing patterns of NR genes, and whether these types of variations are more likely to promote disease in humans. Based on this meta-analysis, and a growing appreciation for the role that AS plays in expanding gene function, we speculate that NR transcript variant expression, which can increasingly be monitored by RNA sequencing (RNA-sSeq) methods, may provide an improved fingerprint for disease diagnosis than genomic studies focused only on SNPs that may or may not alter gene splicing and expression. In this regard, we have listed more than 25 NR transcript variants that have been linked to a human disease (see Table 3). Alternately spliced NR variants can express dominant negative phenotypes (e.g., receptors lacking a complete DBD) or constitutively active phenotypes (e.g., receptors lacking a complete LBD) capable of driving disease, as we have reported for the vitamin D receptor (VDR; NR1I1) (Annalora et al., 2019). This phenomenon is perhaps best exemplified by splice variants of the androgen receptor (AR; NR3C4) that promote castration-resistant forms of prostate cancer (Dehm et al., 2008; Antonarakis et al., 2014, Kohli et al., 2017). In this meta-analysis we identified 12 additional cancer-related NR gene variants [for DAX-1, SHP, peroxisome proliferator–activated receptor (PPAR) β/δ, RARα, RARγ, LXRα, LXRβ, mineralocorticoid receptor (MR), estrogen receptor (ER) α, nerve growth factor IB (NGFIB or nuclear receptor related 77 (Nur77)), nuclear receptor related 1 (NURR1), and steroidogenic factor 1 (SF1)] that lack a complete LBD and may thus be constitutively active. We also identified 14 cancer-related NR variants [for PPARα, PPARγ, RARB, V-erbA-related protein EAR-4 (Rev-erbβ), LXRα, LXRβ, VDR, pregnane X receptor (PXR), hepatocyte nuclear factor 4α (HNF4α), testicular receptor 4 (TR4), AR, ERRα, NGFIB, and NURR1] that lack a complete DBD and may thus represent dominant negatives (based on the NIH AceView data base in early 2018) (Thierry-Mieg and Thierry-Mieg, 2006).
There are currently no crystal structures for any NR splice variants listed here, making the structural consequences of such large-scale modifications to the NR scaffold difficult to predict. However, the elimination of key domains clearly has a profound effect on NR function, tissue distribution, and subcellular trafficking, as highlighted by our current knowledge of where NR splice variants are being expressed and trafficked (see Table 4). Many nuclear receptor ligands are also either substrates or products of P450 enzymes, and there is growing interest in understanding how different classes of NR ligands might alter the splicing patterns of NR target genes, by altering RNA stability directly, or by modulating the activity of the transcription activation complex, spliceosome, or ribosome (Auboeuf et al., 2005, Bhat-Nakshatri et al., 2013, Zhou et al., 2015). It was recently shown that the environmental pollutant 2,3,7,8-tetrachlorodibenzo-p-dioxin can profoundly modulate the AS patterns of aryl hydrocarbon receptor (AhR) target genes (e.g., CYP1A1) in a hepatic mouse model (Villaseñor-Altamirano et al., 2019). The AhR is a ligand-dependent transcription factor, related to NR genes, that is a member of the basic-helix-loop-helix-Per-ARNT-Sim superfamily. While the mechanisms by which 2,3,7,8-tetrachlorodibenzo-p-dioxin drives global changes in AS via the AhR remain unknown, changes in spliceosome assembly, alternative exon recognition, and tissue-specific, splicing factor expression are thought to be involved. Xenobiotics like phenobarbital and nifedipine are known to promote AS in related, drug-metabolizing enzymes, including CYP2B2 (Desrochers et al., 1996; Zangar et al., 1999), and future studies focused on defining the structure-activity relationships between ligands and receptors that drive the cotranscriptional, AS processes are needed.
Cassette Exon Utilization Links Nuclear Receptor Structure, Function, and Evolution.
Despite numerous advances in the structural characterization and modeling of NR genes (Rastinejad et al., 2013), improved understanding of NR protein structure/function is still required to fully understand how these genes evolved substrate specificity, and how they interact as a collection of individual, alternatively spliced genes. Nuclear receptors are a diverse group of transcriptional regulators that play key roles in development, physiology, and reproduction while also acting as sensors of xenobiotic exposure. How they accomplish such diversity in their functional roles while being constrained by a highly conserved structural scaffold, while exposed to a complex and variable cellular interactome remains a mystery. We speculate that the expansive NR gene function observed in humans may be tightly linked to the expression of splice variant products that can fine-tune biologic signaling pathways, through the coordinated expression of modular, cassette exon domains that expand gene function. For example, we have previously shown that exon 8 of the human VDR is a cassette exon that can be excluded from mature transcripts, without disrupting protein expression, and that splice variants lacking a portion of the LBD (encoded by exon 8) do not transduce the same ligand-dependent and/or ligand-independent signaling cascades as the wild-type receptor (Annalora et al., 2019). Here, we observed that each NR subfamily has its own unique repertoire of cassette exon structures that may represent an underappreciated level of gene structure/function complexity operating at the transcriptional level.
We speculate that most if not all eukaryotic genes have evolved cassette exons to augment tissue-specific protein function during bouts of elevated environmental stress or chemical exposure. Here we propose a new scheme for visualizing gene structures, defined by cassette exon organization, which emphasizes gene structure over sequence, to facilitate interpretations of how multiple transcript variants can arise from a single gene. As shown in Fig. 2A, our cassette exon visualization pictograph methodology, which we abbreviate as CEViP, stems from the observation that all coding exons exist in one of nine possible reading frames with respect to their intron/exon border, which we refer to as exon junction frame codes: 1–9 [see inset to Fig. 2A; frame code 1 = (0,0); 2 = (0,+1); 3 = (0,+2); 4 = (+1,0); 5 = (+1,+1); 6 = (+1,+2); 7 = (+2,0); 8 = (+2,+1); and 9 = (+2, +2)]. The CEViP modeling convention allows rapid identification of cassette exon structures embedded in transcriptomic sequences, which we define as any modular exonic component organized to be skipped in the mature mRNA transcript without altering the standard reading frame. The CEViP approach also provides new insights into the evolutionary relationships among distinct NR families and subfamilies, which would be more difficult to identify using traditional sequence alignments and hierarchical clustering methods generated by programs like MUSCLE or Clustal X version 2 (Edgar, 2004; Larkin et al., 2007). CEViP modeling provides valuable insight into the splicing potential of an exon and the cumulative sensitivity of a gene to AS mechanisms.
A schematic representation for each of the 48 human NR genes, generated using the CEViP methodology is shown in Supplemental Fig. 1, and a summary of these details is presented in Supplemental Table 1. In Fig. 2B, the four general classes of cassette exons are highlighted in comparison with a prototype transcript that encodes no cassette exons. These four classes include: 1) N- or carboxy (C)-terminal encoding cassette exons; 2) internal cassette exons with complete or 3) incomplete reading frames, and 4) multiexonic, internal cassettes that must be skipped in combination to retain the standard reading frame. The modeling output of the CEViP methodology can also be used to explore questions relating to NR structure/function as exemplified for HNF4α (NR2A1) in Fig. 3 and the VDR (NR1I1) in Fig. 4.
In Fig. 3A, the crystal structure of the full-length HNF4α gene is shown in cylindrical cartoon representation to highlight how structural domains in the NR protein are organized with respect to the gene’s nine primary coding exons (Chandra et al., 2013). Modular, cassette exon components of HNF4α encoded by exons 5, 7, 8, and 9, which allow the protein to be expressed in full length or as multiple transcript variants lacking cassette exon–encoded domains, are also highlighted (Fig. 3B). The four cassette exons found in the human HNF4α gene encode large segments of the LBD and C terminus, which may allow for the tissue-selective expression of at least seven unique HNF4α transcript variants under normal conditions, as highlighted in Fig. 3C. Changes in cassette exon utilization patterns, triggered by oxidative stress, chemical exposure, mutation, or disease, may provide the HNF4α gene with increased flexibility to bind and accommodate different classes of lipid ligands or to induce constitutively active variant forms with ligand-independent functionalities capable of modulating both basal and inducible gene expression patterns.
A second CEViP model is provided for the VDR in Fig. 4A to highlight the positioning of cassette exon 8 with respect to the gene’s DBD, LBD, hinge region, and N and C termini. Figure 4B shows the positioning of key secondary structural elements encoded by cassette exon 8 (i.e., α-helices H7 and H8) within the crystal structure of the VDR’s LBD (Rochel et al., 2000). When coupled with structural information, CEViP models allows rapid assessment of gene complexity and modularity based on the number and location of cassette exons within a gene’s genomic structure; this information allows predictions of structural plasticity potentially harbored in the alternative transcript pool while providing insight into the sensitivity and responsiveness a gene may have to the environmentally sensitive, AS mechanisms.
By applying the CEViP modeling method to all 48 human NR genes, we revealed that cassette exon utilization patterns are highly conserved within NR families and subfamilies. Indeed, cassette exon patterns offer a simple criterion for NR linkage that is not biased by sequence homology. In Fig. 5 the diversity of cassette exon structures among NR0 and NR1 family genes is highlighted. Primitive NR0 family genes (Dax-1 and SHP) that lack a prototypical DBD and contain no cassette exons are shown in Fig. 5A. These small NR genes may resemble the common ancestor for all modern NR genes, having linked a primitive DNA binding element to an endo-xenobiotic sensor/receptor for the first time (Reitzel et al., 2011). Beyond this early evolutionary advancement, more complex NR gene assemblies emerged, with longer and more complex N- and C-terminal transactivation domains, highly conserved DBD and LBD elements and, ultimately, cassette exon structures. Interestingly, some NR genes, like the NR1C subfamily (PPARα, PPARδ, and PPARγ; see Fig. 5B), lack any cassette exons in their reference gene coding structure, whereas related genes from the NR1A (TRα and TRβ), NR1B (RARα, RARβ, and RARγ), NR1D (Rev-erbα and Rev-erbβ), and NR1I (VDR, PXR, and CAR) subfamilies all contain at least one cassette exon within the LBD (see Supplemental Fig. 1). Even greater cassette exon complexity is observed in the NR1F (RORα, RORβ, and RORγ) and NR1H (LXRα, LXRβ, and FXRα) subfamilies, which contain modular domains at both the N and C terminus, and within the DBD, LBD and hinge/dimerization domains (Fig. 5C). Based on only traditional sequence homology criteria, the NR1H family is most highly related to the NR1I subfamily, which includes the VDR (see Supplemental Fig. 2). However, NR1I subfamily genes encode a single cassette exon in the LBD of their transcripts, rather than the multiple cassettes used by NR1H genes in the DBD, LBD, hinge, and N and C termini. This divergence in gene structural assembly would have been difficult to detect using primary sequences alignments alone. A cassette exon analysis suggests that the NR1H subfamily may have emerged from a different common ancestor, one that more closely resembles the NR isoforms found in the NR1B, NR1D, NR1F, or NR6A subfamilies. While these common features may have evolved independently of a common ancestor, based on convergent degeneracy within the original NR scaffold template, the strong conservation of cassette exon structures across NR gene families and subfamilies implies that functional differences imparted by modular gene assembly may have helped drive the dynamic expansion of the NR gene structure into 48 distinct, but overlapping biologic regimes.
In contrast to the NR1 family, all members of the NR2 subfamily have at least one cassette exon, with NR2F (Chicken ovalbumin upstream promoter transcription factor I (COUP-TFI), COUP-TFII, and V-erbA-related gene EAR-2(EAR-2)), NR2A (HNF4α and HNF4γ), and NR2E1 (Tailless homologue receptor (TLX)) subfamilies each containing cassette exons that encode portions of the hinge region or LBD. NR2B [retinoid X receptor (RXR) α, RXRβ, and RXRγ] genes contain one cassette exon in the DBD/hinge region only, whereas the NR2C subfamily genes (TR2, TR4) use cassette exons in both DBD and LBD (see Supplemental Fig. 1 or Supplemental Table 1). It is notable that the single cassette exon found in the DBD/hinge of NR2B subfamily members (RXRs) is highly conserved across the NR2C subfamily and in all NR3 family members (Supplemental Fig. 3). In addition, all NR3, NR5, and NR6 subfamily members have dual cassette exons encoding portions of the LBD and DBD, with NR3 genes having additional C-terminal cassette exons, in contrast to extra N-terminal cassettes in the NR5 and NR6 families. NR3 family genes are further distinguished from all other NR genes by their N-terminal coding exon, which is not a cassette exon, but which adds an additional ∼400–600 amino acids to the transactivation domain (see Supplemental Fig. 1). In contrast, the NR4 family is largely devoid of any cassette exons, outside of one N-terminal cassette found in NR4A2 (Nurr1), which would make truncations of the transactivation domain and DBD possible. Ultimately, the global NR gene organization provided in Supplemental Fig. 1 reflects the hierarchical complexity of cassette exon utilization that we propose to be operating across the seven NR gene families. It is tempting to speculate that cassette exon evolution has not occurred randomly, and that these cryptic cassette features represent an additional dimension of gene complexity that guide or direct differential responses to cellular stress, on a scale that has yet to be fully comprehended or elucidated. A comparison of possible evolutionary trajectories for the seven human NR gene families based on phylogeny (as depicted in Supplemental Fig. 2) and cassette exon usage (as depicted in Supplemental Fig. 1) is shown in Supplemental Fig. 4.
Utilizing a cassette exon framework, we hypothesize that a single gene can explore the use of multiple, structural scaffolds “on-the-fly” to better detect and respond to chemicals that a tissue may have never been exposed to previously, due to genetic (or epigenetic) changes that alter endogenous metabolic output or via direct exposure to novel, synthetic, or environmental chemicals. In this regard, cassette exon utilization may function to enhance the resilience of a cell’s “chemical immune system” by enabling novel crosstalk networks to emerge among NR gene variants and their coactivators. This process would allow the cell to rapidly reshape the expression patterns of phase 0, I, II, and III metabolic systems (including NRs, P450s, and drug transporters) linked to their signal transduction pathways, as highlighted in Fig. 6. We speculate that this adaptive feedback response helps the cell refine the interplay among cognate, phase I–III metabolic systems, optimizing physiologic responses to endo-xenobiotics through the selection of splice variant transcripts that promote cellular homeostasis and health resilience after a chemical exposure.
Discussion
AS is a complex cotranscriptional process that is a function of several factors, including but not limited to gene coding structure, and the expression of tissue-specific splicing factors that vary during aging and development, in genetic diseases, cancer, and infections, and during exposure to a changing environment. AS of pre-mRNA is tightly regulated by multiple cellular factors including a large family of RNA binding proteins known as serine/arginine-rich proteins, that include the heterogeneous nuclear ribonucleoproteins; these splicing factors recognize a specific set of splicing regulatory elements that include exonic and intronic splicing enhancers, and exonic and intronic splicing silencers (Busch and Hertel, 2012). RNA polymerase II recruitment of transcription splicing factors in conjunction with the speed of transcription elongation ensure that that each splice site is selected in a competitive environment that promotes condition-specific gene expression and splicing (Lee and Rio, 2015). However, when individual splicing factors are absent or dysfunctional in a cell, dysregulation of highly evolved, splicing regulatory networks can arise and promote cellular stress or disease (as reviewed, Ule and Blencowe, 2019).
For example, spinal muscular atrophy, a rare genetic spinal motor neuron disease, is associated with reduced expression of the survival motor neuron (SMN) protein, a master heterogeneous nuclear ribonucleoprotein assembler, which results in a broad spectrum of splicing alterations (Zhang et al., 2008). Developmentally regulated DNA methylation events have also been shown to promote the skipping of exon 5 in CD45 expression (Shukla et al., 2011), and epigenetic modes of AS regulation are increasingly recognized (Zhu et al., 2018b). However, nearly half of all age-related AS events (e.g., tissue deterioration, organ dysfunction, telomere attrition, and loss of stem cell renewal) are due to changes in the expression of splicing factors (Mazin et al., 2013). Overexpression of the human oncogenic transcription factor related to an oncogene carried by the avian virus, Myelocytomatosis (v-myc) (MYC) in tumors can also yield multiple oncogenic AS events due to an upregulation of splicing factors (Anczuków and Krainer, 2016; Das et al., 2012). Stem cell–specific AS programs have also been established, and are mediated by RNA binding fox-1 homolog 2 (RBFOX2), a key player in mesenchymal tissue splicing (Venables et al., 2013). Understanding of the scope and importance of tissue-specific AS and tissue-dependent regulation of AS in vertebrates is growing (Blencowe, 2006; Barbosa-Morais et al., 2012; Merkin et al., 2012), as many of these events are lineage-specific, providing new insights into how phenotypic differences arise among vertebrates. The cellular consequences of AS in the NR family are highlighted in Table 4; these events add to the complexity and refinement of expression and phenotype necessary to respond to chemical challenges in the environment.
AS is also associated with evolutionarily significant alterations to the genome (Modrek and Lee, 2003) including, exon duplication events (Letunic et al., 2002), retrotransposon (i.e., Alu element)–mediated exonization (Sorek et al., 2002), exon creation (Pan et al., 2008), and the introduction of premature protein termination codons (Xing and Lee, 2004). AS may also accelerate paths of evolution based on observations that nonsynonymous mutations (ka) are favored over synonymous mutations (ks) (Yang and Bielawski, 2000); ka/ks is an established selection pressure metric that participates in reading frame preservation by alternate splicing (Xing and Lee, 2005). Our observation that cassette exons may play discrete roles in directing proteome expansion for NR genes, is consistent with the proposed role for AS in fine-tuning elements of gene structure/function and evolution (Bush et al., 2017, Grau-Bové et al., 2018). However, the full extent to which cassette exons are functionally relevant and conserved across other mammalian species remains largely unknown (Sorek et al., 2004).
These insights into the importance of AS events in evolution lead to a sharpened focus on the RNA sequence motifs that may regulate gene splicing in both intronic and exonic locations. Recent studies indicate that there are at least 42 different consensus sequences that are cis-elements, which regulate splicing at consensus “GU/AG” splice-site donor/splice-site acceptor exon junctions (Qu et al., 2017b). Mutations that alter these cis-elements can profoundly affect gene splicing and expression, and gene-directed therapeutics that modulate aberrant splicing are increasingly being explored. For example, protein-truncating mutations define Duchenne muscular dystrophy (DMD), which compromises muscle fiber integrity (Emery, 1989), in contrast to reading frame–preserving mutations, associated with Becker muscular dystrophy, a condition with slower progression and milder symptoms (Heald et al., 1994). Apparent deletion hotspots lead to approximately two-thirds of all DMD cases (Aartsma-Rus et al., 2002), generating new interest in finding antisense therapeutics that induce skipping exons to preserve reading frame. The search for antisense oligonucleotides that can induce exon skipping in the dystrophin pre-mRNA revealed some exons to be refractory to SSO-induced exon skipping (Aartsma-Rus et al., 2005; Wilton et al., 2007). Conversely, multiexon skipping of exons 6 and 8 was accomplished in the canine dystrophin gene to restore reading frame (Yokota et al., 2009; Echigoya et al., 2017). Building from the success of SSO therapeutics such as Eteplirsen in treating DMD in humans (Lim et al., 2017; Charleston et al., 2018), we hypothesized that that single- and multicassette exon skipping approaches targeting nuclear receptor genes might also be feasible, based on the modular nature of the NR protein scaffold and the family-specific conservation of cassette exons within key functional domains.
We have now identified putative cassette exons operating within the LBD and DBD of numerous human NR genes, and their alternative usage can promote the expression of both gain-of-function (i.e., constitutively active) and loss-of-function (i.e., dominant negative) phenotypes that strongly support a significant role for AS in regulating NR gene function. For example, in the AR, aberrant skipping of the exon(s) that encodes the LBD promote the expression of a ligand-independent AR splice variant that drives prostate cancer progression (Dehm et al., 2008; Guo and Qiu, 2011; Daniel and Dehm, 2017; Antonarakis et al., 2014; Kohli et al., 2017). Constitutively active splice variants of the MR (NR3C2) and the AhR, a related basic-helix-loop-helix-Per-ARNT-Sim family nuclear receptor, have also been identified, and these variants lack discrete segments of the LBD (McGuire et al., 2001; Zennaro et al., 2001). Ligand-independent ER (NR3A1) variants have also been reported with implications for the treatment of breast cancer (Heldring et al., 2007; Zhu et al., 2018a; Al-Bader et al., 2011). More directly, our group recently showed that targeted skipping of cassette exon 8 in the human VDR (NR1I1) promotes a splice variant form (Dex8-VDR) that stimulates ligand-independent transcriptional activity capable of suppressing colon cancer cell growth (Annalora et al., 2019). Conversely, AS of exon 3, which encodes a portion of the DBD, promoted the expression of a dominant negative splice variant that disrupts the VDR’s ligand-independent, gene-repressor function (Annalora et al., 2019). Using the NIH AceView data base (Thierry-Mieg and Thierry-Mieg, 2006), we have already identified 12 NR genes previously linked to cancer that generate splice variants lacking a complete DBD; these include: PPARβ/δ, RARα, RARγ, LXRα, LXRβ, MR, ERα, NGF1B, NURR1, and Steroidogenic factor 1 (SF1). An additional 14 NR genes previously linked to cancer that express splice variants lacking a complete LBD have also been identified by RNAseq methods; these include PPARα, PPARγ, RARβ, Rev-erbβ, LXRα, LXRβ, VDR, PXR, HNF4α, TR4, AR, ERRα, NGF1B, and NURR1. Although the mechanisms driving NR splice variant expression in cancer remain poorly understood, we are intrigued by the potential for antisense therapeutics to correct or reprogram aberrant gene expression profiles linked to cancer and other diseases, as reviewed (Martinez-Montiel et al., 2018).
The potential for developing antisense approaches that allow for targeted expression of ligand-independent NR variants capable of signaling in the absence of an endogenous ligand is also intriguing (Annalora et al., 2019). Equally compelling would be the manipulation of the DNA binding domains to create “therapeutic” dominant negative variants capable of reprogramming genomic signaling pathways in the absence of ligands. Our work with the VDR has already demonstrated that splice variants lacking a complete LBD can fold properly and bind antibodies directed toward the reference protein, in addition to forming unique homodimer assemblies (with the reference protein) that migrate differentially on an SDS-PAGE gel (Annalora et al., 2019). However, improved biochemical and structural characterization of NR splice variants, including those lacking well conserved cassette exon structures, are needed to further improve our understanding of the role AS plays in modulating global NR function. While our current understanding of NR splice variant structure/function remains limited due to the lack of structural information, AS events that remodel the LBD of xenobiotic sensor genes, like PXR or CAR, have the potential to dramatically modify gene function by altering the determinants of ligand binding, cofactor recruitment, and signal transduction (Auerbach et al., 2005; Chen et al., 2010; Wallace and Redinbo, 2013).
It is also important to recognize that cassette exon utilization is also linked to the expression of covalently closed, single-stranded circular RNAs (circRNAs), which are noncoding transcripts consisting of introns and/or exons (Salzman et al., 2012). These transcripts are highly stable, broadly distributed in eukaryotic cells, and increasingly recognized for a role in regulating gene expression and splicing, as they can interact with serine/arginine-rich proteins, microRNA, and long noncoding RNAs. CircRNAs are now implicated for roles in the etiology of a range of human disorders, including cancer, heart disease, diabetes, and aging Qu et al., 2017a a), and they represent an important class of noncoding RNA that are increasingly studied. Connecting functional circRNA expression to their host genes, and linking their assembly to the skipping of discrete cassette exon structures may help to provide deeper insight into the complex structure of eukaryotic gene assembly and greater appreciation for the role that AS events play in guiding host resilience to changes in the genome, microbiome, diet, or environment.
The primary goal of this review is to highlight the dramatic expansion of the NR transcriptome that is observed when all known splice variants (1004 total; Table 1) are recognized. This insight highlights how each NR gene can yield, on average, ∼20 splice variants to augment their gene function in both a species- and tissue-specific manner, based on gene structure, the accumulation of SNPs, variations in splicing factor expression, and changes in cellular stress that may alter DNA and RNA stability. Thermodynamic determinants of gene splicing/expression that operate independently from other posttranscriptional or posttranslational modifications can also further fine-tune NR gene expression and function. Our desire to understand the complexity of cassette exon utilization across the NR superfamily, led us to develop the CEViP methodology shown in Fig. 2A, and it helps to highlight the remarkable level of conservation of modular cassette motifs operating across both NR family and subfamily. This approach provides additional details about the sequences defining each intron/exon border within a gene, facilitating the identification of cassette exons and the visualization of potential alternative transcripts. However, without a comparative analysis of cassette exon structures from related mammalian species, it is currently difficult to contextualize the importance of these conserved features from an evolutionary perspective. We do not currently know if each species has a different cassette exon reservoir for orthologous genes.
If we make predictions of NR evolution based on an analysis of human cassette structures only (see Supplemental Fig. 4), it implies that the NR0 family is the most primitive class of NRs, followed by the NR1C and NR4A subfamilies, which essentially lack cassette exons in the reference gene, followed by the NR2A family which contains up to two cassette exons that are not well conserved in any other subfamily. However, there is strong support in the literature that suggests NR2A1 (HNFα) is the primordial NR gene that emerged from our common metazoan ancestor (demosponges) (Bridgham et al., 2010). In humans, this gene appears structurally complex compared with other NR genes, having four cassette exons; however, we speculate that this may represent organism-specific evolution of the gene, rather than high-level complexity intrinsic to the primordial gene. Moreover, the conservation of cassette exon structures observed in the NR2A subfamily is particularly striking, because although the size and location of NR2A cassette exons appears to have drifted over time, the overall organization of cassette exons has not (Supplemental Fig. 1). Ultimately, our approach for analysis of cassette exon structures imply that the NR2E subfamily of genes (TLX and Photoreceptor-specific nuclear receptor (PNR)) may be more ancestral than the NR2A family, due to the lack of well conserved cassette exon structures in this family compared with all other NR2 family members (Supplemental Fig. 5). While these findings are not definitive, cassette exon visualization provides a new approach to study NR gene evolution across different gene families, species and time. This tool may help to address the remaining controversy regarding the early evolution of NR genes, from an ancestral form participating in endogenous signaling pathways devoid of hormones, to more modern versions that perform a xenobiotic sensor role for the cell (Holzer et al., 2017). Interestingly, there is some evidence that the oldest NR genes may have first arisen in Ctenophores, a more primitive phylum related to Cnidarians, which curiously express NR genes that cluster with the NR2A family, but contain only a consensus LBD, which is strikingly similar to the assembly of NR0B subfamily genes in humans (Reitzel et al., 2011). A phylogenetic analysis of the primary amino acid sequences for all 48 human NR genes (see Supplemental Fig. 2) places the NR0 family at the top of the ancestral tree, but this is likely biased by their lack of a conserved DBD element. Beyond the NR0 family, NR gene expansion appears to have proceeded primarily through the diversification of two gene scaffolds, as exemplified by the NR2E (TLX and PNR) and NR6A (Germ cell nuclear factor (GCNF)) subfamilies. Interestingly, these subgroups possess two of the most unique cassette exon signatures we have observed, particularly for the NR6A (GCNF) gene, which utilizes five cassette exons that are not fully conserved, or recapitulated, in any other NR subfamilies (Supplemental Fig. 5). One possible explanation is that the utilization of cassette exons was less functionally relevant when the NR2E and NR6A subfamily genes were emerging, and that modern cassette exon utilization patterns, which may have arisen from random events, were standardized later to help maintain the reading frame of duplicated genes. AS mechanisms may have placed different selection pressures on these cassette exon features over time, with some becoming important splice-sensitive drivers of structural adaptation which today enhances gene resilience by allowing for stress-related proteome expansion, and tissue-specific, fine-tuning of NR gene functions. Thus, a deeper analysis of cassette exon utilization patterns across multiple species similar to what has been done for primary coding sequences (Zhao et al., 2015), could help to clarify the organism-specific drivers guiding NR gene diversification in humans as well as other species.
We also recognize that the diversification of internal cassette exons is not the only mechanism by which NR genes adapt their function to changing selection pressures. AS that alters the expression of the activation function 1 (AF-1) and activation function 2 (AF-2) motifs found in either the N or C terminus of the protein can also profoundly alter reference gene function. This is exemplified by the NR3C subfamily (Glucocorticoid receptor (GR), MR, Progesterone receptor (PR), AR), which appears to have evolved a highly extended, N-terminal transactivation domain (∼400–600 amino acids in length), encoded by a single, noncassette exon, to augment their signaling functions (Supplemental Fig. 1). An additional set of NR genes (including CAR, VDR, and GR) are known to modulate their function via the expression of naturally occurring, C-terminal variants (van der Vaart and Schaaf, 2009). While the function of these truncated NR variants remains unclear, many are thought to be dominant negatives incapable of efficiently binding endogenous ligands, coregulatory proteins or canonical DNA response elements. However, work involving the role of the AR gene in cancer has demonstrated that some C-terminal splice variants (i.e., AR-V7 and AR-V9) can also promote constitutively active NR phenotypes as well (Dehm et al., 2008; Antonarakis et al., 2014, Kohli et al., 2017). Therefore, internal cassette exons structures that alter the expression of the DBD, LBD, and hinge regions may represent only one strategy NR genes use to augment their cellular function. For example, in the human NR1C subfamily (PPARs) the role of cassette exons may have become superfluous or counterproductive over time and therefore were completely eliminated.
In summary, NR genes are highly prone to AS events and their pattern of cassette exon utilization provides a new lens for interpreting how these biologic sensors have evolved to accommodate different ligands, which are often the products of metabolic or environmental exposure pathways subject to variation over time (Holzer et al., 2017). As highlighted in Fig. 6, the expansion of multiple transcript variants from a single gene provides a cell with the powerful ability to find novel protein-based and noncoding RNA–based solutions to biochemical challenges that disrupt homeostasis, through the expression of novel transcript variants. We previously described how splice variant expression in the P450 superfamily of genes provides an expanding phenotypic cloud of functional plasticity that can augment a P450 gene’s ability to promote cell survival and fitness (Annalora et al., 2017). After exploring similar trends in the interrelated NR superfamily, we continue to speculate that splice variant expression, working in concert with reference gene function, provides the basis for sophisticated biologic adaptation to novel chemical insults. Because NR genes coordinate the expression and splicing of virtually all metabolic genes (including P450s and drug transporters) involved in the “chemical immune system,” their own sensitivity to AS provides the cell with an additional synergistic layer of complexity for bootstrapping new and increasingly complex biologic responses to the environment. For example, transcript variants (T1–T3) of PXR (NR1I2), are known to promote differential expression of both CYP3A4 (Lin et al., 2009) and UDP-glucuronosyltransferase 1A isoforms (Gardner-Stephen et al., 2004). The ability of PXR variants to coordinate tissue-specific patterns in the expression of both phase I and II metabolic genes highlight the potential of alternatively spliced NR genes to optimize cellular feedback mechanisms linked to drug influx (phase 0) and all phases (I–III) of drug metabolism.
While there are still some uncertainties regarding the role that AS events play in proteome expansion and the fine-tuning of reference gene functions under normal cellular conditions (Chaudhary et al., 2019), there is increasingly strong biochemical evidence to indicate that the majority of splice variants are translated (Weatheritt et al., 2016) and that they help shape normal human physiology and development by contributing a vast diversity of “functional alloforms” to the human proteome ((Yang et al., 2016)). The role of NR variants in driving idiosyncratic differences in human drug metabolism are currently less well understood, but the ability of PXR splice variants (T1–T3) to fine-tune the expression of both phase I and phase II metabolic systems highlights the potential of NR variants to personalize individual responses to endo-xenobiotic exposure. While NR variants driving human disease are currently more well characterized and understood than those driving interindividual variability in human drug metabolism, there is ample reason to believe that future advances in RNAseq methodologies, coupled with improved computational analysis, will provide improved diagnostics of AS events that may underlie or drive common chemical toxicities, adverse drug events, and individual susceptibility to environmental disease.
Ultimately, the splicing-related mechanisms affecting NR gene function explored in this review correspond well with earlier observations that SNPs and epigenetic modifications that alter the NR gene expression also induce differential expression of drug-metabolizing enzymes and transporters, resulting in variable drug responses that may complicate or limit the success of precision medicine (Prakash et al., 2015). Improved understanding of the role adaptive splicing mechanisms play in fine-tuning drug metabolism is therefore needed to enhance future predictions of chemical toxicity and adverse drug events, and this information would also provide an improved roadmap for implementing precision medicine, which if guided by pharmacogenomics alone, may continue to miss the hidden complexities that render one person’s poison another person’s cure.
Acknowledgments
This research was supported in part by funds from the Agricultural Research Foundation (ARF) at Oregon State University.
Authorship Contributions
Participated in research design: Annalora, Marcus, Iversen.
Conducted experiments: Annalora, Iversen.
Contributed new reagents or analytic tools: Annalora, Iversen.
Performed data analysis: Annalora, Iversen.
Wrote or contributed to the writing of the manuscript: Annalora, Marcus, Iversen.
Footnotes
- Received August 16, 2019.
- Accepted December 31, 2019.
↵This article has supplemental material available at dmd.aspetjournals.org.
Abbreviations
- AhR
- aryl hydrocarbon receptor
- AR
- androgen receptor
- AS
- alternative splicing
- CAR
- constitutive androstane receptor
- CEViP
- cassette exon visualization pictograph
- circRNA
- circular RNA
- DBD
- DNA binding domain
- DMD
- Duchenne muscular dystrophy
- ER
- estrogen receptor
- GWAS
- genome-wide association study
- HNF4α
- hepatocyte nuclear factor 4α
- LBD
- ligand binding domain
- MR
- mineralocorticoid receptor
- NIH
- National Institutes of Health
- NR
- nuclear receptor
- P450
- cytochrome P450
- PPAR
- peroxisome proliferator–activated receptor
- PXR
- pregnane X receptor
- RNAsequ
- RNA sequencing
- RXR
- retinoid X receptor
- SNP
- single nucleotide polymorphism
- SSO
- splice-switching oligonucleotide
- VDR
- vitamin D receptor
- Copyright © 2020 by The American Society for Pharmacology and Experimental Therapeutics