Transcription and Epigenetic Regulation




Abstract


This chapter covers the broad field of transcription and epigenetic regulatory mechanisms to equip the gastrointestinal physiologist with the tools to understand gene expression. This chapter reviews the composition of the nucleic acids and the methods used to study their structure and function. The effect of noncoding RNA species such as microRNA and long noncoding RNAs is discussed. In addition, the impact of histone and DNA modification on gene expression collectively known as epigenetic influences are discussed in the context of how the gut microenvironment.




Keywords

Polymerase 2, miRNA, Long-noncoding RNA, Chromatin, Histone, Nuclear export

 




Acknowledgment


The work was supported by Public Health Service NIH Grant R01-DK55732 and R37-DK45729 to JLM.


With the human genome sequencing project completed in 2001, perhaps the most important piece of information that we have learned is that the clues to our genetic destiny are contained in more than just the primary sequence of DNA encoding 20,500 proteins. Apparently, what distinguishes man from other life forms and most interestingly other mammals lies in the complex modifications, organization, and function of the 3.3 billion nucleotides (nt). Not only are these ~ 20,500 genes alternatively spliced, but their DNA, RNA, and protein products are chemically modified so as to change gene function. Therefore, as opposed to our genetic template being composed of a mere 20,500 genetic units, we are actually controlled by 20,500 to the nth power. The exponent has yet to be determined, but likely results in an enormous, perhaps infinite, combination of genetic events. This chapter will briefly summarize our basic understanding of gene expression, but will focus primarily on the new concepts and technologies of gene regulation in the postgenomic era. Arguably, the major advances since the 5th edition of this textbook continue to be the explosion in our understanding of noncoding RNAs, the impact of epigenetics, chromatin topology, and the refinement of high-throughput techniques.





Overview of Gene Organization



Nucleic Acids


The molecular definition of a eukaryotic gene is complex, but in the simplest terms, it is a nucleic acid sequence that encodes one polypeptide and one messenger ribonucleic acid molecule (mRNA). Genes are comprised of “two intertwining polymers” of deoxyribonucleic acids (DNAs) that are noncovalently attached to a variety of proteins, including histones and specialized proteins (e.g., polymerases and various accessory proteins). The association of DNA, histones, and specialized nuclear proteins collectively is called chromatin. Chromosomes are comprised of continuous strands of chromatin that have been compacted by supercoiling and looping so as to fit into the nucleus ( Fig.1.1 ). The steps governing the compacting and location of chromatin are now an area of intense investigation and will be discussed in Section 1.2 . Chromosomes are the basic heritable unit in the mammalian cell. In humans, there are 46 individual chromosomes or 23 chromosome pairs. The smallest unit of the DNA polymer is a nucleotide—a base attached to the first carbon of a five-carbon sugar phosphorylated at its fifth carbon ( Fig. 1.2 ). Nucleosides do not contain phosphates linked to the pentose sugar, thus differing from nucleotides, which contain one, two, or three phosphate groups. The type of base distinguishes the 4 nt found in DNA: adenine (A), thymine (T), cytosine (C), or guanine (G). They are bases because of the nitrogen groups contained within their single-ring (thymine, cytosine, or uracil) or double-ring structures (adenine or guanine). DNA contains the sugar deoxy ribose, whereas RNA contains the sugar ribose and the base uracil (U) instead of thymine. CpG islands are dinucleotides consisting of a deoxycytidine in the 5′ position adjacent to deoxyguanosine. These dinucleotides are “hot spots” for enzymes (e.g., DNMTs = DNA methyl transferases), which add a methyl group to the 5th carbon of the cytosine ring. The “p” indicates that one phosphate group separates these two nucleosides. This epigenetic mark blocks the expression of DNA and is a mechanism used frequently by gastrointestinal (GI) cancers to silence genes that block their ability to proliferate.




Fig. 1.1


Chromatin structure and organization. Each chromosome exists in a haploid (germ cells) or diploid/tetraploid state depending on their stage in the cell cycle. The short arm of the chromosome relative to the centromere is the “p” arm and the long arm is the “q” arm. Chromosomes represent compressed, compacted DNA double strand helix wrapped around core histones.

(From the Language of medicine. 4th ed.)



Fig. 1.2


Nucleic acid structure. A nucleoside consists of a purine or pyrimidine base covalently linked to the firs carbon of the pentose ring. The addition of one, two, or three phosphate groups is a nucleotide mono-, di-, or triphosphate. The type of sugar determines the type of nucleic acid: ribose in ribonucleic acids (RNAs) and deoxyribose in deoxyribonucleic acids (DNAs).

(Reprinted from Physiology of the gastrointestinal tract. 4th ed. 2006.)



Nucleic Acid Polymers: DNA, RNA


Polymers of nucleotides or nucleic acids (also called nucleoside mono-, di-, or triphosphates) are formed when the free phosphate group attached to the fifth carbon of an adjacent nucleotide of the pentose sugar condenses with the hydroxyl group on the third pentose carbon to produce two ester bonds and water (phosphodiester bond). Accordingly, the proximal end of each DNA strand (5′ end) contains a phosphate group at the fifth carbon of the deoxyribose sugar residue. The terminal nucleic acid at the 3′ end of each DNA strand contains a free hydroxyl group at the third carbon of the deoxyribose ring. By convention, nucleotide sequences are written from 5′ to 3′ reading from left to right with the sense strand presented as the upper strand. The antisense strand, written on the bottom, is antiparallel and complementary to the sense strand so that the 5′ to 3′ direction proceeds from right to left. Each nucleotide within the polymer is base paired with a particular nucleotide on the opposing strand by hydrogen bonds: adenine with thymine and guanine with cytosine. The DNA strand containing the same sequence as the messenger RNA (mRNA) is designated the sense strand , and the strand that it pairs with is designated the antisense strand . The antisense strand becomes the template sequence that will be transcribed by RNA polymerase II (Pol II) into mRNA and subsequently translated into amino acids.


Most of the studies on transcriptional control focus on genes transcribed by the 12-subunit enzyme Pol II and thus are designated as class II genes. It is Pol II that is responsible for transcribing gene sequences into protein-encoding messenger RNA (mRNA). Less than 2% of total RNA in the cell is mRNA. Many of these initial primary transcripts (hnRNA for heterogeneous nuclear RNA) are further processed as discussed below. Therefore, 98% of the nucleotides in the human genome do not reside in exons (sequences that encode proteins). Nevertheless, at least 50% of the noncoding RNA is transcribed and serves a function. Nine percent of cellular RNA is hnRNA, the bulk of which are small nuclear RNAs (snRNA, e.g., U2 involved in RNA splicing, 4%) and small nucleolar RNAs, for example, U22 snoRNA, comprising 1%. The other 4% of hnRNA is mRNA. An additional 1% of total cell RNA is microRNA (miRNA), previously called guide RNA (gRNA), which edits mature mRNA transcripts. RNA polymerase I (Pol I) transcribes all of the ribosomal genes except for the 5S gene. Ribosomal RNA represents about 75% of the RNA in the cell and is therefore essential for translation. RNA polymerase III (Pol III) transcribes the 5S ribosomal gene and the genes-encoding transfer RNA (tRNA). Transfer RNA represents about 15% of the total RNA in the cell. Pol I and III transcribe genes that will not be further translated into peptides and noncoding RNA transcripts, although their primary transcripts are also processed before reaching the cytoplasm. Since Pol II transcribes genes-encoding proteins, peptides, long noncoding RNA (lncRNA), and miRNAs, Pol II-regulated genes will be the primary focus of this chapter.



Gene Composition


A gene is analogous to a long sentence read from left to right and comprised of letters organized into words separated by spaces and punctuations. Specific DNA sequences “punctuate” the gene with important start and stop signals for transcription and translation. Several hundred to several thousand DNA base pairs (bp) may comprise one gene. These bp (the alphabet) are organized into functional groups (phrases) on the basis of whether a particular sequence is untranscribed, only transcribed (RNA), or both transcribed and translated (RNA and protein) ( Fig. 1.3 ). Exons are DNA sequences that are transcribed into mRNA by Pol II and exit the nucleus. Within the cytoplasm, exons may or may not be translated into peptides. Those exons that are transcribed and translated form the coding sequences (coding exon). In general, the term intron is used to describe the intervening DNA sequence that is transcribed but is subsequently removed from the primary transcript by RNA splicing (RNA processing) before exiting the nucleus as a mature transcript. However, it is now clear that many transcribed DNA sequences generate small noncoding RNA transcripts such as miRNAs or lncRNAs that can inhibit or modulate protein-coding genes in “cis or trans.” LncRNAs are commonly defined as transcripts that are > 200 nt that do not encode a protein compared to the significantly shorter miRNAs.




Fig. 1.3


Gene structure, transcription, and posttranscriptional processing. A gene is comprised of several hundred to several thousand bp, subdivided into functional elements. The locations of 5′ and 3′ untranslated sequences, exons, and introns are shown. The 5′ flanking sequences contain specific DNA elements (e.g., TATA box). RNA polymerase II transcribes DNA into heterogeneous nuclear RNA (hnRNA) during transcription. Twenty bp after the sequence AATAAA is transcribed to AAUAAA, mRNA is cleaved and the polyadenylate (poly(A)) tail is added to the 3′ end. A methylated guanylate residue is added to the 5′ end of the mRNA through a triphosphate linkage. Prior to exiting the nucleus, intron segments are removed by splicing factors during posttranscriptional processing.

(Reprinted from Physiology of the gastrointestinal tract. 4th ed. 2006.)


DNA sequences or elements that regulate transcription and are not transcribed into mRNA usually reside in the 5′ portion of a gene upstream (to the left) of the promoter. The promoter is a cluster of DNA sequences that binds Pol II in concert with accessory proteins to initiate the synthesis of mRNA. Accessory proteins control the accuracy and rate of polymerase binding. The first nucleotide transcribed into mRNA is assigned the number 1 with subsequent nucleotides (downstream or to the right of the promoter) assigned positive numbers as transcription proceeds toward the 3′ end. Nucleotides preceding the promoter (upstream or 5′) are assigned negative numbers. DNA sequences that encode a polypeptide (open reading frame) begin with the translational start site codon ATG (encoding methionine) and end with one of the three stop codons: TAA, TAG, and TGA. Thus, the translational start and three stop codons, respectively, are transcribed into mRNA as AUG, UAA, UAG, and UGA. Since there are four different DNA bases and it takes only three bases (a triplet) to encode an amino acid. There are 4 3 = 64 possible codons for 20 amino acids. In this way, the nucleotide code for proteins is considered “degenerate.” The redundant genetic code protects against the deleterious effects of mutations as detailed in the next paragraph. In addition, two or three peptides can be encoded by overlapping codons simply by shifting the reading frame by 1 or 2 nt. Regulatory sequences that are transcribed but not translated reside at both the 5′ and 3′ ends of the mature RNA transcript. Both 5′ and 3′ untranslated regulatory sequences, which range from 10 to several thousand nucleotides, participate in the fidelity of translation and mRNA stabilization or destabilization.


The degeneracy of the genetic code (several codon triplets encoding one amino acid) is what makes some bp changes (mutations) within an exon exhibit no deleterious phenotype. The bp change is designated synonomous if the same amino acid is substituted (also known as a silent mutation) or nonsynonomous if a different amino acid is substituted. Strictly speaking, mutations mean that there has been a bp change whether or not the change affects the type of amino acid inserted into a peptide. Despite a nonsynonomous mutation in the coding sequence, the amino acid substitution might not exhibit a change in the physical characteristics (phenotype) of the organism nor render phenotypic advantages or disadvantages to the organism. Changes in the genetic code that put an organism at a disadvantage and contribute to disease are what we commonly call “mutations.” BP changes in DNA that are neutral or impart a positive or negative advantage to the organism are also known as single nucleotide polymorphisms (SNPs). These SNPs can render subtle differences in the way an organism responds to its environment or other genetic influences ( Fig. 1.4 ). SNPs are a focus of intense investigation due to their use in genome-wide scans to identify genes contributing to common multigene disorders, for example, diabetes, hypertension, etc.




Fig. 1.4


Single nucleotide polymorphism (SNP). Schematic diagram of a SNP in which a protein encoding gene sequence differs between two individuals by one nucleotide.



RNA Species


RNA molecules that encode proteins (except most histone proteins) are distinguished from ribosomal and transfer RNA by the series of adenosines added to the 3′ end of the molecule commonly referred to as the poly(A) RNA tail ( Fig. 1.3 ). This feature is a useful means to isolate mRNA from more abundant RNA species (transfer and ribosomal RNA) and also designates the functional termination of the protein-encoding portion of the gene. During transcription, the primary RNA transcript is cleaved 20 bp downstream of the AAUAAA site at the 3′ end, and ~ 150–200 adenine nucleotides are added to form the poly(A) tail. The 5′ end of the mRNA transcript receives a protective “cap” after synthesis of the first 30 nt that consists of a guanylate residue methylated at the seventh position and linked to the first nucleotide of RNA by three phosphates. The RNA cap is a high-affinity binding site for ribosomes. It should be noted that the element AATAA indicates the site of the poly A tail, but is not necessarily the functional end of the gene. Rather, the 3′ untranslated region (3′UTR) and 3′ untranscribed regions may still contain regulatory elements that modulate gene expression. In fact, most mRNAs bind sequences in the 3′UTR. Therefore, like the 5′ end of a gene, the 3′ end of the gene must be determined empirically.


Two classes of noncoding RNAs transcribed by Pol II have motivated the current expanded interest in RNA biology—mRNAs and long noncoding RNAs. mRNAs (miRNAs) are a class of noncoding RNAs generated primarily from DNA sequences between genes (intergenic) within introns or at the 3′ end of the gene. They were originally identified in plants and worms as posttranscriptional regulators of gene silencing. Pol II and sometimes Pol III transcribe DNA to produce primary miRNA transcripts. In addition, transcription factors modulate the expression of these mRNAs as for protein-encoding genes. For instance, extracellular signaling via typical signal transduction pathways and epigenetic mechanisms regulate the expression of mRNAs. The gene product is RNA rather than protein and exerts its effect on its own locus as well as multiple loci due to their small size and less stringent binding requirements. In this way, miRNAs are thought to regulate at least one-third of all human genes.


miRNAs are synthesized in the nucleus as a primary transcript (pri-miRNA) capable of forming several hairpin structures through internal complementarity ( Fig. 1.5 ). The microprocessing complex containing a nuclear RNase III endonuclease called Drosha and the DiGeorge syndrome critical region 8 protein (DGCR8) cleaves the pri-miRNA transcript. The Drosha protein complex removes flanking segments and an ~ 11 bp stem region. This step converts the pri-miRNA to precursor miRNAs (pre-miRNAs). Pre-miRNAs are typically 60–70-nt long hairpin RNAs with 2-nt overhangs at the 3′ end. The nuclear export receptor exportin-5 and RanGTP transport the pre-miRNA into the cytoplasm where it is further processed by a complex containing another RNase III endonuclease called Dicer. Dicer partners with RNA-binding proteins to cleave the pre-miRNA into 21–25 nt duplexes. The miRNA/miRNA* duplex consists of a guide RNA strand and a passenger strand indicated by an asterisk (miRNA*) that is discarded upon assembly of the R NA- i nduced s ilencing c omplex (RISC). Loading the miRNA/miRNA* duplex into RISC is a four-step process requiring ATP hydrolysis and the major RISC protein component called Argonaute (Ago proteins). Upon unwinding of the duplex, the miRNA* strand is discarded leaving a single strand 21–25 nt RNA molecule available for silencing specific clusters of genes by hybridizing to their 3′UTRs. Ago protein coat miRNAs and along with exosomes protect miRNAs from degradation in biofluids such as blood and urine rendering them potential biomarkers.




Fig. 1.5


Synthesis of microRNAs (miRNA). miRNAs are synthesized from the primary miRNA (pri-miRNA), which are then edited to the pre-miRNA. The RAN-GTP/Exportin 5 complex transports the Pre-RNA to the nucleus where the pre-miRNA is further processed to the miRNA/miRNA* duplex. *miRNA indicates the passenger strand that is discarded upon assembly of the R NA- i nduced s ilencing c omplex (RISC). The Argonaute (Ago) protein are the major protein component of the RISC. TRBP = TAR RNA-binding protein (aka PACT).

(Reproduced from Kwak PB, Iwasaki S, Tomari Y. The microRNA pathway and cancer. Cancer Sci 2010; 101 (11):2309–15. doi: 10.1111/j.1349-7006.2010.01683.x)


Long noncoding RNAs are nucleic acids that do not encode a protein and are at least 200-nt long or greater. They are distinguished from miRNAs by their size (lncRNA > 200 nt versus miRNAs ~ 22 nt) and the ability to exhibit more diverse functions. miRNAs typically suppress multiple gene targets, whereas lncRNAs typically regulate the gene from which they are transcribed, albeit by multiple mechanisms. The advent of whole genome sequencing has identified more noncoding transcripts than coding complicating our ability to define their function. lncRNAs can function in “cis” or “trans,” can circularize or remain linear. Moreover, lncRNAs can function as protein scaffolds by recruiting regulatory complexes to genes, or behave as decoys, signaling molecules or as antisense interference transcripts. Therefore, through these diverse behaviors, lncRNAs exhibit pleomorphic functions such as genomic imprinting, chromosome shaping, and allosterically enzyme regulation. The function of most lncRNAs is unknown and thus the transcripts have simply been named numerically. Those lncRNAs that have been assigned a function include XIST (X chromosome inactivation), HOTAIR (Hox transcript antisense RNA), and TERC (telomerase elongation).



Linking Gene Structure to Function


Previously the 5′ border of a gene was identified by the promoter region (functionally determined) and by the first nucleotide transcribed into mRNA (cap site) determined empirically by various reverse transcriptase methods—for example, primer extension analysis or anchored polymerase chain reaction (PCR and DNAse1 hypersensitivity sites). These techniques used reverse transcriptase to synthesize complementary or copy DNA (cDNA) ( Fig. 1.6 ). Radiolabeled primers complementary to the 5′ end of the DNA sequence to be copied were allowed to anneal to mRNA. Reverse transcriptase then adds deoxynucleotides to the primer in the 3′ to 5′ direction. Synthesis of the cDNA terminates when the 5′ end of the mRNA is reached. Template mRNA molecules were removed by ribonucleases (RNases), and the synthesis of a double-stranded cDNA was completed through the action of DNA polymerase. Because the newly synthesized cDNA was radiolabeled at the 5′ end, the length of the cDNA (and hence the transcriptional start site) was determined by resolving the fragments on a denaturing polyacrylamide gel and comparing the length observed in bp to the known cDNA sequence.




Fig. 1.6


Complementary DNA (cDNA). Primers complementary to a portion of the mRNA are allowed to anneal. For unknown sequences, as in the synthesis of cDNA libraries, a primer complementary to the poly (A) tail is used, i.e., poly (dT). Reverse transcriptase added along with all four deoxynucleotides (dNTPs) will transcribe mRNA in the 3′ to 5′ direction to make copy DNA. The mRNA template is removed by RNases, and double-stranded cDNA is made using DNA polymerase. In primer extension analysis, the 5′ end of mRNA (the cap site) is identified by annealing primers of a known sequence near the 5′ end of mRNA.

(Reprinted from Physiology of the gastrointestinal tract. 4th ed. 2006.)


In the age of whole genome analysis, the characterization of gene function has lagged behind the generation of transcript mapping. In other words, the biochemical assays such as DNase-seq, ATAC-seq (assay for transposase- accessible chromatin), ChIP-seq, and 3C (chromatin conformation capture) genome-based methods do not provide an assessment of function. This has led to the development of high-throughput methods to identify changes in gene transcription levels (both coding and noncoding). These include RNA-seq and STARR-seq (self-transcribing active regulatory region sequencing. In addition, CRISPR/Cas9 methods of activating or silencing gene in situ have permitted the development of functional readouts for enhancer modification within its endogenous environment.


We now know that these additional DNA sequences might encode noncoding RNA that regulates gene expression in addition to the well-described enhancer sequences. Specific DNA elements called insulator elements mark the boundary of genes. These elements, originally identified on the globin gene, bind an 11-zinc finger transcription factor called CTCF, which is capable of blocking histone acetylation spreading between adjacent genes. More recently, it is now understood that gene expression occurs in insulated neighborhoods generated by chromosomal loops formed by the binding site for CTCF and the cohesion complex. Thus, enhancer or repressor sequences that are kilobases away from the transcriptional start site (TSS) can brought closer to the genes that they regulate by forming gene-enhancer/repressor “neighborhoods” called topologically associated domains (TADs). It has recently been shown that CTCF-binding site mutations that prevent the formation of TADs can cause disease.


Given the requirement for larger and larger pieces of DNA to recapitulate native expression in transgenic mouse models, techniques have been developed to clone and manipulate large pieces of DNA (over 50 kilobases), for example, yeast artificial chromosomes (YACs) and bacterial artificial chromosomes (BACs). Recombineering is a powerful technique performed in bacteria that permits the introduction of foreign DNA or point mutations into these large plasmids that are eventually introduced into transgenic mice, but has been superceded by a powerful new technology called CRISPR-Cas9.


CRISPR/Cas represents the latest and to date the most powerful breakthrough in our ability to modify or manipulate the genome with precision. The term CRISPR stands for clustered regularly interspaced short palindromic repeats and Cas is the abbreviation for CRISPR-associated protein. Cas9 is a nuclease that uses guide RNA to direct the enzyme to the specific DNA sequence to be modified by forming Watson-Crick base pairing. Thus, the technique is a simple, RNA-guided method by which bacteria and Archaea defend themselves from the DNA of invading bacteriophages (adaptive immune mechanism). In short, the technology originates from studying the bacterial immune system and consists of two parts: a DNA-binding domain that recognizes the sequence to be modified and an effector domain that mediates double-strand DNA breakage. These two steps activate the host cell’s sequence-specific endonucleases to repair the break by nonhomologous recombination resulting in modification of the targeted sequence. The specificity of the technology lies in the ability to program the guide RNA. Prior to CRISPR/Cas, zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) were the primary methods used to execute programmable genome editing.





Epigenetic Influences


Epigenetics, literally means “outside of or beyond genetics,” refers to the “study of genetic modifications that are mitotically and/or meiotically heritable yet do not change the DNA sequence”. Thus, mutations or deletions can alter the length of a gene that in turn alters the primary sequence of the protein. By contrast, epigenetic influences chemically modify the nucleotide or amino acid structure that in turn changes how that particular DNA or (histone) protein is recognized by nuclear proteins without changing the sequence itself. Although it is now clear from the completed sequence of the human genome that there are only about 20,500 gene loci, the complexity of the genetic information encoded in human chromosomes must enlist other features of chromatin. The epigenetic influences on chromatin appear to be one of the critical features that enhance genomic complexity. A major target of epigenetic changes is histones, basic proteins that coat the naked DNA double helix. The N-terminal tails of histones (H1, H2A, H2B, H3, H4) are positively charged due to the basic amino acid lysine. The positively charged histones attach to DNA because of the negatively charged phosphate groups comprising the DNA backbone. The ionic interaction is reduced if the positive charge on the lysines is removed. Specific enzymes called histone acetyltransferases (HATs) acetylate the lysine side group effectively eliminating the positive charge ( Fig. 1.7 ). The loss of the ionic interaction between the histones and phosphate groups on DNA permit greater access to the DNA helix by accessory proteins such as polymerases, transcription factors, and coactivators or repressors. Chromatin becomes “open,” accessible and readily transcribed. By contrast, there are enzymes called histone deacetylases (HDACs) that “close” chromatin by removing the acetyl groups from the lysines at the N-terminal tails of histone proteins. These enzymes are called histone deacetylates (HDACs). Removal of the acetyl group restores the positive charge to histones allowing the ionic interaction between histones and DNA to be restored. Consequently, nonhistone proteins such as polymerases and transcription factors become excluded from DNA, transcription is silenced, and chromatin becomes inactive.




Fig. 1.7


Nucleosome structure and histone modifications on histone tails. (A) The double-strand DNA helix winds twice around a complex of the four core histones assembled as dimmers. Unacetylated histones are positively charged and adhere tightly to the negatively charged DNA preventing access by transcription regulatory proteins. Histones that are acetylated are less positively charged and do not adhere as tightly to chromatin allowing access of regulatory proteins to the DNA. The addition or removal of acetyl groups to the ends of histones is regulated by acetyltransferase (HATs) and deacetylase enzyme complexes (HDACs). The short chain fatty acid butyrate inhibits the activity of HDACs. (B) Shown are the amino-terminal histone residues modified by acetylation, methylation and phosphorylation.

(Reprinted from Physiology of the gastrointestinal tract. 4th ed. 2006.)


Collectively, histones and accessory proteins associated noncovalently with DNA are what forms chromatin. Chromatin exists in two forms—euchromatin and heterochromatin. Euchromatin contains actively transcribed genes that decondense during DNA replication. Euchromatin is also centrally located in the nucleus. By contrast, heterochromatin contains transcriptionally silent genes that remain condensed at the periphery of the nucleus. The DNA sequences within heterochromatin are repetitive and only 15% of nuclear chromatin is heterochromatin. The major forms of epigenetic modifications in mammalian cells occur on DNA and histones and include such covalent modifications as methylation and acetylation, but also the addition of other organic residues. The most common epigenetic change is DNA methylation. In addition, methylation is currently the only epigenetic change known to occur on DNA. By contrast, histone proteins undergo over 100 types of epigenetic modifications, of which the most common include acetylation, methylation, and phosphorylation. Histones are frequently the target of changes, but nuclear regulatory proteins, for example, transcription factors can also be covalently modified, most commonly by phosphorylation. Epigenetic changes affect such events as chromatin folding, gene expression, X-chromosome inactivation, and genomic imprinting. They are essential for development and differentiation in which clusters of genes must be activated or silenced at precisely timed intervals during an organism’s growth and maturation. In addition, epigenetic changes provide mechanisms by which the environment affects the genome, for example, microbiota, immune disorders, and cancer.



DNA Methylation


DNA methylation is a postsynthesis modification that normal DNA undergoes after each replication. This modification is catalyzed by DNA methyltransferases (DNMTs) and occurs on the C-5 position of cytosine residues within CpG dinucleotides located primarily in the promoter of a gene. There are three major DNMTs (DMNT1, 3A, 3B). Each DNMT plays a distinct and critical role in cells. Murine knockouts of DNMT1 and DNMT3b exhibit embryonic lethality. The DNMT3a homozygous mouse appeared normal at birth but died by 4 weeks of age. In humans, mutations of DNMT3b are linked to ICF syndrome ( I mmunodeficiency, C entromere instability, F acial anomalies). DNMTI functions as the “maintenance” methyltransferase since it functions during cell division to methylate the newly synthesized DNA strand as dictated by the hemi-methylated complementary strand. DNMT3a plays a central role in the methylation of neural specific genes.


Sixty percent of human genes contain a CpG island. While methylation can also occur in other parts of the gene, CpG dinucleotides tend to be underrepresented in the genome and when they are found appear in clusters ranging from 0.5 to several kilobases with GC content greater than 55%. About 15% of CpG dinucleotides cluster in short DNA segments known as CpG islands. The remaining 85% of the islands are spread throughout the genome in repetitive hypermethylated segments that are transcriptionally silent. Methylation of “CpG islands” is a late evolutionary development and functions to maintain genome stability by repressing transposons and repetitive DNA elements.


DNA methylation is an important event in many processes, including transcriptional repression, X chromosome inactivation and genomic imprinting. CpG islands locate in the promoter region of genes about 60% of the time and are normally hypomethylated particularly in the germ cells. Collectively, these CpG clusters or islands cover only about 0.7% of the entire genome, which is still equivalent to several million nucleotides. Hypermethylation at CpG islands induces transcriptional silencing that in turn is stably inherited. Thus as cells differentiate, a significant percentage of these CpG islands become methylated in a tissue specific manner. Typically these would be genes involved in cell renewal. As observed with HDACs and deacetylation, the methylation status of cancers might seem contradictory. Yet, aberrant de novo hypermethylation of CpG islands is a hallmark of some human cancers and occurs early during carcinogenesis. Tumor suppressor genes are locally hypermethylated by some cancers to silence their expression; whereas, oncogenes might be hypomethylated.


The DNA of tumor cells is globally hypomethylated, a process that is linked to nutritional status, for example, B 12 or cobalamin absorption. Cobalamin is required for the synthesis of S -adenosylmethionine, the primary methyl donor in the cell. In this way, reduced cobalamin absorption as sometimes observed in Crohn’s or pernicious anemia would provide an environment favorable to cancer. Niacin required to form NAD, which is necessary for ADP-ribosylation of histones, also affects chromatin structure.


The most precise approach to assessing DNA methylcytosines is through bisulfite sequencing. Treating DNA with sodium bisulfite converts unmethylated cytosines to uracil that when subjected to conventional DNA sequencing are read as thymines. Methylated cytosines are still read as cytosines. Although bisulfite sequencing is not as easy to scale up as a genome-wide analysis by methylation-sensitive restriction enzyme (MSRE) analysis, sequencing is the most accurate way to determine the methylated sites in DNA or the methylome.


Genomic imprinting occurs in gametogenesis and is necessary for development. One of the X chromosomes in females is not expressed due to the heavy methylation of the inactive X chromosome. The epigenetic phenomenon whereby expression of a gene depends on whether it is inherited from the mother or the father is called imprinting and is due to differential methylation of specific cytosine bases on the maternal versus the paternal genes. Recent genome-wide analysis of genomic imprinting in the mouse identified 1300 loci that exhibit parental bias in the expression of specific mRNA transcripts. The gene loci identified control neural systems associated with feeding and behavior. In addition, the authors in a separate article showed preferential selection of the X chromosome inherited from the mother as opposed to the one from the father in glutamatergic neurons of the female cortex. The interleukin-18 gene was identified as an important locus controlling sex-specific preferences.



Histone Modifications


The basic repeating unit of chromatin is the nucleosome. Each nucleosome is composed of 147 bps of DNA wrapped twice around a histone protein octamer consisting of two molecules of each of the four core histones (H2A, H2B, H3, and H4). The linker histone H1 sits alone between each core nucleosome facilitating further compaction. Each histone contains a structured globular domain with a histone-fold motif important for nucleosome assembly, and a highly charged unstructured amino-terminal tail of 25–40 residues, which protrudes from the body of the nucleosome to latch onto the phosphate backbone. The amino-termini are the major sites for histone modifications. Histones can be modified by acetylation, methylation, phosphorylation, ADP-ribosylation, ubiquitination, and sumoylation ( Table 1.1 ). The mixture of these covalent modifications create a “code” on the surface of the histone molecule that is subsequently recognized by a class of chromatin-binding proteins, for example, bromo- and chromodomain-containing proteins that mediate chromatin compaction, transcription, and DNA repair. Acetylation, methylation, ubiquitination, and sumoylation occur on the lysine residues while methylation also occurs on arginine residues. Phosphorylation occurs on serines and threonines, ADP-ribosylation on glutamic acids. Most of these modifications, particularly acetylation, alters the charge distribution on the amino-terminus and alters nucleosome structure, which can in turn regulate chromatin structure. Some covalent modifications act as molecular switches, enabling or disabling subsequent covalent modifications, which explains the functional complexity of epigenetic modifications. Each modification correlates with a specific physical status of chromatin. The next several sections will highlight the most common histone modifications.



Table 1.1

Enzymes, Targets, and Effect of Epigenetic Modifications




























































Target Covalently Modified Group Adds Removes Effect on Gene Expression a Enzyme Inhibitors
DNA Methyl DNMT Gadd45 ↑ increases or ↓ decreases Azacytadine; RG-108
Histone Acetyl (KAT)/HAT HDACs ↑ increases or ↓ decreases Butyrate, SAHA, trichostatin A, valproic acid
Add to lysines (K) Methyl KMT (SETs, PCG1, 2, TrG) KDM Jumonji (JMjC, Jarid) ↑ if H3K4me3; H3K36me3; H3K79me3 ↓ if H3K9me2,3; H3K27me3 BIX-01294
Add to arginines (R) Methyl PRMTs (CARM1, PRMT1) PADI4 ?
Add to S10H3 Phosphate AurB PP1
Add to lysines (K) Ubiquitin 76 aa peptide Ub ligases (Ring 2) Ub protease (USP)
Add to lysines (K) Sumo = small ubiquitin-like modifiers, ~ 76 aa Ubc9 Ub protease (SUSP)

a “?” means unknown.




Histone Acetylation


Acetylation of histones occurs at the ε-amino side group of specific lysines within the N-termini of histones. HATs transfer an acetyl group from the donor acetyl-CoA to the histone terminal lysines. In hypoacetylated chromatin, the positive charges on unacetylated lysines are attracted to the negatively charged DNA, producing compact, closed chromatin, which represses transcription. By contrast, acetylation of the lysines removes their positive charges resulting in a less compact, open chromatin structure, which facilitates gene transcription. Therefore, HAT activity and subsequently histone acetylation are linked mainly to transcriptional activation ( Fig.1.7 ). Removal of the acetyl group (deacetylation) by HDACs restores the positive charge on lysines, chromatin becomes compacted and less accessible to regulatory proteins required for transcription. Thus, HDACs and deacetylation are primarily associated with transcriptional repression ( Fig. 1.7 ).


The HATs are divided into five families. These include the p300/CBP HATs (p300 and CBP), G c n 5-related a cetyl t ransferases (GNATs, including Gcn5, PCAF, etc.), MYST ( M OZ, Y bf2/Sas3, S as2 and T ip60)-related HATs, the general transcription factor HATs (TFIID subunit TAF250 and TFIIIC), and the nuclear hormone-related HATs (SRC1 and ACTR). The most consistent functional characteristic of HATs is that they are transcriptional coactivators. These proteins are components of large multisubunit complexes that do not bind DNA directly, but instead form protein- protein interactions with DNA-binding transcription factors. The MYST proteins are the largest family of acetyltransferases. More recently, the Gcn5-related acetyltransferases are considered to be part of a complex called SAGA for S pt- A da- G cn5- A cetyltransferase. SAGA preferentially acetylates several N-terminal lysines within H3 and H2B in response to cellular stress, for example, low glucose, hypoxia, and UV damage. Moreover, in addition to its HAT activity, SAGA also has deubiquitinase activity. In summary, the themes that are consistently emerging are first that these histone-modifying enzymes are components of large complexes and second for every enzymatic complex that adds an organic residue to histones, there is a complementary enzymatic complex that can remove them ( Table 1.1 ).


The more numerous mammalian HDACs have been grouped into three protein classes. Class I includes HDACs 1, 2, 3, and 8; class IIA includes HDACs 4, 5, and 7; class IIB includes HDACs 6 and 10; and class IV is comprised of HDAC 11. HDACs 1–11 are zinc-dependent. The class III HDAC family consists of the conserved nicotinamide adenine dinucleotide (NAD)-dependent Sir2 family of deacetylases or sirtuins of which there are 7. The sirtuins are not zinc dependent. Like HATs, HDACs do not bind directly to DNA, but are recruited to genes by large multisubunit complexes to function primarily as corepressors of transcription.


The function of HATs and HDACs is of particular relevance in the GI tract due to the effect of butyrate, a by-product of colonic bacterial fermentation, on histone acetylation ( Fig. 1.7 ). Epidemiologic studies uniformly concur that a diet high in fiber is protective against colon cancer. The short-chain fatty acid butyrate is one of several fiber-derived fermentation products capable of maintaining epithelial cell differentiation. The differentiation effects were initially revealed after treatment of erythroleukemic cells with butyrate. Subsequently, it was discovered that the induction of differentiation by butyrate correlated with histone hyperacetylation due to suppression of HDACs. Thus, the HDAC inhibitory effects of butyrate and resulting histone hyperacetylation might, in fact, be one mechanism by which dietary fiber exerts its anticancer effects. While butyrate is normally used by colonocytes as a carbon source under low glucose conditions, colon cancers use the Warburg effect when glucose is in abundance to generate ATP via glycolysis. The butyrate that is not converted by fatty acid oxidation in the mitochondria to produce ATP is taken up by the nucleus where it suppresses HDACs. Thus, the HDAC inhibitory effect of butyrate depends on the metabolic state of the cell.


Most reviews support the viewpoint that butyrate and HDAC inhibitors are potent anticancer agents. Collectively, early studies emphasized the global effects of butyrate on chromatin remodeling, but the molecular basis for the gene-specific effects of butyrate remains poorly defined. HDAC inhibitors regulate less than 10% of actively transcribed genes. Most of those are upregulated through GC-rich sites. In addition to histone acetylation, it is now known that DNA-binding proteins can become acetylated. Thus, a possible mechanism by which hyperacetylation induced by butyrate might target specific genes is through acetylation of specific transcription factors. The proposed function of acetylated transcription factors varies and includes increased or decreased DNA binding as well as protein stability. In many instances, the genetic targets of butyrate are GC-rich sequences that bind Sp1 and Sp3. Gamma glutamyl transferase, IGF-binding protein 3, G alpha (i2), galectin, Cox 1, and intestinal alkaline phosphatase are all upregulated by butyrate through Sp1 sites. Sp1-binding sites are also implicated in the butyrate induction of p21 WAF1 gene expression. HAT p300 recruited to the p21 WAF1 promoter cooperates with Sp1 and Sp3 to mediate the effects of butyrate. However, Sp1 does not cooperate directly with p300, but instead binds the histone deacetylase HDAC1. The Sp1-HDAC1 complex in turn forms complexes with other corepressors such as Sin3A. Thus, Sp1 appears to be the factor that confers p21 WAF1 promoter repression by recruiting HDAC4 and corepressor complexes.


HDACs have opposing functions especially in cancer. On the one hand, HDACs can prevent the activation of tumor suppressor genes and block the ability of a cancer cell to undergo apoptosis. However, HDAC2 silencing triggers apoptosis. Another important feature of HDACs is their interaction with DNA methylation. HDACs cooperate with DNMTs by removing the acetyl groups blocking methylation targets on histones or DNA.



Histone Methylation


There are two types of histone methylation, targeting either lysine or arginine residues. Histone methyltransferases (HMTs) perform these modifications utilizing S -adenosylmethioine as the methyl group donor. Lysine methylation is implicated in changes in chromatin structure and gene regulation; whereas, arginine methylation correlates with the active state of transcription, like acetylation.



Histone Methylation at Lysines


Methylation of lysine residues (K) occurs on histone H3 primarily at K4, K9, and K27 and on H4 at K20 ( Fig. 1.7 B). The lysine residue can be mono-, di-, or trimethylated at the episilon amino group. Methylation of H3 especially on lysine 4 and 36 (H3K4 and H3K36) is associated with an open chromatin configuration and transcribed chromatin. In contrast, the methylation of H3 at K9, K27, and H4K20 is associated with condensed, repressed chromatin. Thus, the overall effect of histone methylation on gene expression depends on which lysine is methylated and to what degree (mono-, di-, trimethylated).


In general, there are at least four families of lysine methyltransferases. All of the lysine methyltransferases (KMT or HMT for histone methyltransferases) are distinguished by the presence of SET domains. One family of these methyltransferases is further distinguished by the presence of an additional protein domain separate from the SET domain and will be discussed further in Section 1.2.3 on chromatin-binding proteins. SET protein domains are approximately 130 residues homologous to amino acid segments in Su(var)3-9, Enhancer of Zeste and Trithorax, three Drosophila proteins with intrinsic methyltransferase activity. The mammalian form of the prototypical lysine methyltransferase (or KMT) Su(var)3-9 is Suv39h and is involved in stabilizing heterochromatin by trimethylation of histone H3 at lysine K9. Histone methylation at K9 is recognized by a subgroup of E2F-related transcription factors called HP1α, β, or γ. These HP1 proteins use chromodomains to recognize the trimethylated atomic feature or imprint on H3. The methylated or acetylated imprints on DNA or two classes of proteins—those with chromodomains that recognize methyl group imprints, and those with bromodomains that recognize acetyl group imprints. Transcriptional coactivators such as CBP, p300, and PCAF are HATs that contain bromodomains. They acetylate histones and other nuclear proteins and so not surprisingly also recognize an acetyl group imprint. These proteins are discussed in greater detail below under Section 1.2.3 . So, in summary, the initial and prototypical KMT protein family are the SET domain-containing proteins, which target H3K9 and are recognized by HP1 factors.


Proteins involved in histone demethylation underscore the fact that like acetylation, protein methylation is a dynamic process. The Jumonji domain-containing proteins demethylate histone lysines. The Jumonji C protein family (JmjC) catalyzes the removal of methyl groups from lysines while the Jumonji D family (JmjD) removes methyl residues from arginines. In addition, JARID2 (a Jumonji C and ARID-domain-containing protein) catalyzes the removal of methyl groups from H3K4me3 and H3K4me2 and can function as corepressors or balances stem cell self-renewal with differentiation by affecting methylation at H3K27. It has been recently shown that JARID2 is a component of the PRC2 and mediates transcriptional repression by recruiting PCR2, an H3K27 methylase, to gene promoters.



Histone Methylation at Arginines


Methylation at arginines occurs within the tails of histone H3 (R2, R17, and R26) and H4 (R3) and is catalyzed by CARM1 (coactivator-associated arginine methyltransferase 1) and PRMT1 (protein arginine N -methyltransferase 1 (PRMT1), respectively, in mammalian cells ( Fig. 1.7 ). Like lysines, arginines can be either mono- or di- methylated (asymmetric or symmetric additions) on the guanidino nitrogen and this process is antagonized by human PADI4 (peptidylarginine deiminase 4), which converts methyl-Arg to citrulline. Less is known about the fate of histones methylated at arginines. However, initial studies indicate that the methylated arginines create an imprint recognized by coregulatory molecules, for example, p300 and SWI/SNF. CARM1 has been shown to inhibit alveolar cell proliferation and promote differentiation.



Histone Phosphorylation


Histone phosphorylation occurs on all four core histones: H2A (S1), H2B (S14), H3 (S10 and S28), and H4 (S1) ( Fig. 1.7 A). The phosphorylation of S10 in H3 is associated with transcriptional activation and chromosome condensation during mitosis. In addition, phosphorylation of S10 in H3 is also associated with the transduction of external signals to chromatin leading to the transient expression of immediate-early (IE) genes. The phosphorylation of H3 is mediated by several specific kinases, activated by distinct pathways. For example, mammalian mitotic H3 phosphorylation is associated with Aurora B/IPL kinase. H3 phosphorylation by IKKα is important for the activation of NF-κB, and the IE gene response is mediated mainly by mitogen and stress-activated kinases MSK1 and MSK2. Histone H2B phosphorylation condenses chromatin and is involved in apoptosis. The downstream effect of H2A phosphorylation by Bub1 kinase is apparently required for chromosome stability. By contrast, the effect of H4 phosphorylation is unknown.


Most of the covalent modifications of histones are known to be reversible. Consequently, if the presence of a modification influences transcription in a particular way, its removal might have the opposite effect. In this way, the cell could effectively respond to changes in environmental cues. Moreover, some histone modifications are mechanistically linked. For example, phosphorylation of S10 on H3 enhances histone acetylation by Gcn5 (part of the SAGA complex), while H3 K9 methylation inhibits phosphorylation at H3 S10. Given the number of sites and the variety of possible covalent modifications, the combinatorial possibilities are extremely large. The combinatorial pattern of N-terminal modifications results in a heterogeneous identity for each nucleosome that the cell interprets as a readable code from the genome to the cellular machinery directing various processes to occur. This concept is commonly referred to as the “histone code hypothesis”. The precise modification status of a specific histone tail on a given gene can also change during the process of transcriptional regulation and each of these different combinations of histone modifications may elicit distinct downstream transcriptional signals.



Chromatin-Binding Proteins


The remaining histone methyltransferase families also recognize methyl groups on regulatory proteins other than histones and therefore are discussed here. The second group of SET domain methyltransferase proteins is related to the Drosophila protein Enhancer of Zeste, with the prototypical mammalian protein named Ezh2. Ezh2 is part of a complex of proteins called the Polycomb group (PcG). The two variations of these complexes have been designated Polycomb repression complexes 1 (PRC1) and 2 (PRC2). Ezh2 belongs to the PRC2 complex that also includes Eed, Suz12; whereas, PRC1 includes the Ring finger proteins (Ring1a,b, Rnf, Hpc, Edr, and Bmi1). Conditional deletion of Eed in the intestinal crypt resulted in crypt degeneration. Therefore, the PRC2 complex is required for normal stem cell maintenance. Ezh2 has recently garnered significant attention due to its overexpression and therefore oncogeneic function in several epithelial cancers including prostate and breast, compared to its tumor suppressor function in some hematopoietic cancers. Consistent with its oncogenic role in epithelial cancers, Ezh2 is also overexpressed in colon, gastric, and liver cancer. A genome-wide analysis of prostate cancers recently revealed an androgen- dependent fusion protein called TMPRSS2-ERG that in Chip-Seq analysis was found to transcriptionally target Ezh2. Bmi1 has received increased attention because it is an important marker of normal and cancerous hematopoeitic stem cells. Bmi1 is also associated with the + 4 reserve stem cell in the intestinal crypt zone. In addition, the PRC1 complex contains proteins that have E3 ubiquitin ligase activity. The Polycomb group of proteins with their SET domains not only participates in histone lysine methylation, but both PRC1 and 2 complexes are also important in recognizing the methylated protein imprint.


The human homolog of the Drosophila Trithorax (Trx) protein is the mixed leukemia gene (MLL1). There are four human MLL homologs. MLL1 has been shown to be a specific methyltransferase of H3 at K4. It in turn forms protein-protein interactions with coactivators, for example, CBP and corepressor chromatin remodelers, for example, SWI/SNF. Other Trithorax homologs, for example, Ash1, Trx, form complexes with different coregulatory complexes. Collectively, the Trithorax group (TrG) of proteins can either activate or repress transcription depending on the coregulator with which it associates. Nevertheless, the TrG proteins characteristically oppose the activity of the Polycomb group (PcG). The tumor suppressor protein menin (positionally cloned gene product of the MEN type 1 locus) interacts with MLL1 and normally induces the cyclin-dependent inhibitor p27 Kip.


RIZ (retinoblastoma protein-interacting zinc finger protein), SMYD3, and MDS-EVI1 form a fourth family of SET-domain proteins because they have two isoforms that exhibit opposing functions. The isoform containing the SET domain has tumor suppressor function while the isoform missing the SET domain is cancer promoting. This “ying-yang” theory put forth by Huang is especially true for RIZ and MDS-EVI1 in which the cancer by an unclear mechanism disturbs the normal ratio between the two isoforms. The SMYD3 protein contains another DNA-binding domain called MYND in addition to a SET domain and is overexpressed in colorectal and hepatocellular carcinomas.


Crosstalk between DNA methylation and the histone modifications exists. These interactions were revealed by the observation that HDAC1 forms a complex with DNMT1 and 5-methyl-cytosine-binding protein (MBP) on a methylated promoter to silence gene expression. Similar crosstalk occurs between HDACs, Suv39, and HP1; HDACs, PRC2, and PRC1; HATs, MLL1, and BRM. The enzymes that epigenetically modify the genome are categorized in terms of those that “erase” chromatin marks; add chromatin marks (“writers”) or “read” chromatin marks ( Table 1.2 ).



Table 1.2

Erasers, Writers, Readers
























Categories Erasers Writers Readers
HDACs HATs Bromodomain
NURD PRC1, PRC2 Chromodomain
PARP



Epigenetics and Development


The epigenetic control of gene expression is a fundamental feature of mammalian development, as indicated by developmental arrest or abnormalities in methylation or acetylation-deficient mutants. X-chromosome inactivation is an example of sequence-identical alleles being stably maintained in different functional states. In humans, X-linked inactivation serves to normalize the level of expression of X-linked genes in females (XX) and males (XY). Mutations in genes that affect global epigenetic profiles can give rise to human diseases. For example, the Fragile X syndrome results when a CGG repeat in the FMR1 (fragile X mental retardation gene 1) 5′ regulatory region expands and becomes methylated de novo, causing the gene to be silenced and creating a visible “fragile” site on the X chromosome under certain conditions. On a more global level, mutations in the DNMT3b (which regulates DNA methylation) gene lead to ICF syndrome and CBP (with acetyltransferases activity) mutations cause RSTS (Rubinstein-Taybi syndrome). Discovered in 2004, lysine demethylases (LSDs) appear to play an essential role in stem cell pluripotency versus lineage specification.



Epigenetics and Cancer


Epigenetic changes play an important role in tumorigenesis. The major epigenetic changes that take place during cancer development are generally the aberrant DNA methylation of tumor suppressor genes and histones. Genomic methylation patterns are frequently altered in tumor cells, with global hypomethylation accompanying region-specific hypermethylation events. When hypermethylation events occur within the promoter of a tumor suppressor gene, this can silence expression of the associated gene and provide the cell with a growth advantage in a manner similar to deletions or mutations. Although cancer cells are hypomethylated in the genome compared to normal tissues, many tumor-suppressor genes are silenced in tumor cells due to hypermethylation. This aberrant methylation event occurs early in tumor development and increases progressively, eventually leading to the malignant phenotype. For example, a high percentage of patients with sporadic colorectal cancers with a microsatellite instability phenotype show methylation and silencing of the gene-encoding MLH1 (MutL protein homolog 1). Other methylated tumor suppressors loci include CDKN2A (p16 INK ) , p14ARF , Rb , E-cadherin , and BRCA1 . Deregulation of genomic imprinting can also play a role in cancer development, as exemplified by loss of IGF2 gene imprinting in Wilms’ tumor.


Chromatin remodeling also plays an important role during tumorigenesis. Loss or misdirection of HATs has been linked to embryonic aberrations in mice and to human cancers. Misdirection of HAT activities as a result of chromosomal translocations is associated with multiple human leukemias. In acute promyelocytic leukemia, the oncogenic fusion protein PML-RARα (promyelocytic leukemia-retinoic acid receptor-α) recruits an HDAC to repress genes essential for the differentiation of hematopoietic cells. Similarly, in acute myeloid leukemia, AML1-ETO fusions recruit the repressive N-CoR-Sin3-HDAC1 complex that in turn inhibits normal myeloid development.


More recently, noncoding RNAs transcribed from intervening (intronic) sequences have been linked to epigenetic changes cell cycle regulation, immune surveillance and cancer. These large intervening noncoding RNAs (originally called lincRNAs, currently called lncRNAs) in some instances redirect the repressive polycomb repressor complex 2 (PRC2) to genes that promote cell renewal. In particular, the lncRNA called HOTAIR is overexpressed in breast cancer and redirects the PRC2 complex, which methylates H3K27, an epigenetic change that tends to condense chromatin. Perhaps not surprising, many epigenetic marks target the developmentally relevant homeobox class of transcription factors (Hox genes), which in turn are master regulators of embryonic development and stem cell pluripotency that when altered can lead to disease.


The fact that many human diseases, including cancer, have an epigenetic etiology has encouraged the development of a new therapeutic option called “epigenetic therapy”. Many agents have been discovered that alter methylation patterns on DNA or the modification of histones, and several of these agents are currently being tested in clinical trials.





Anatomy of a Gene Promoter


The major advances in the area of transcriptional initiation since the prior edition of this textbook have occurred primarily in the explosion of information on epigenetic modifications. The prototypical epigenetic therapies include the use of demethylating agents, for example, 5- azacytidine, for myelodysplastic disorders and histone deacetylase inhibitors, for example, SAHA, to treat a number of epithelial cancers. A major focus of transcriptional elongation has been the role of the enzymes involved in epigenetic changes, for example, the HAT Gcn5. Moreover, the trithorax group of epigenetic factors, specifically MLL1, forms a lysine methyltransferase complex with the elongation factor ELL. Thus, the following section will briefly summarize the historical basis of gene promoter structure and transcriptional initiation. For more details, the reader is referred to recent reviews and the prior edition of this chapter.



DNA Elements


RNA polymerase II (Pol II) and its accessory factors bind to a DNA sequence called the promoter located upstream of protein-coding sequences to direct RNA transcription. Without the promoter, the genetic sequences that encode the information to make a functional peptide product will not be transcribed. Other 5′ flanking sequences or DNA elements that participate in transcription are sequence-specific binding sites for proteins that regulate the fidelity, rate, and timing of Pol II binding, formation of the preinitiation complex, and initiation of transcript elongation under basal and regulated conditions. These sequences are defined as cis-acting elements because they are a part of the same (cis) gene. DNA elements are categorized according to their ability to regulate transcription as a function of their distance and orientation from the promoter. Sequences that are contained within the first 30–100 bp of the promoter are considered promoter-dependent cis-acting elements. If they are positive-acting elements and increase the rate of transcription, they are considered activating DNA elements , whereas if they are negative-acting DNA elements and decrease or repress the rate of transcription, they are considered repressor or silencer elements.


The RNA core promoter consists of two types—focused and dispersed. Focused promoters contain either one or a tight cluster of start sites over a few bp; whereas, a dispersed promoter contains several start sites over about 100 bp and are typically found at CpG dinucleotide sites. Critical promoter elements include TATA elements, which lie upstream of the transcription start site, the initiator sequence (Inr) that spans the start site, upstream regulatory elements that bind either transcriptional activators or repressors and finally downstream poly(dA-dT) elements. The TATA element or “TATA box” is an element whose DNA sequence is TATA or variants thereof. This sequence resides at a fixed distance 25–30 bp upstream from the transcriptional start site in many Pol II promoters, and its location relative to the start site is position- and distance dependent. However, many genes do not have TATA sequences. These “TATA-less promoters” still remain dependent on the TATA-binding protein (TBP) to assemble at the promoter to form the preinitiation complex (PIC) but the recruitment of TBP is not rate limiting.


Initiator elements (Inr) although initially identified at the “TATA-less promoters” have subsequently been found at both TATA-containing and TATA-less promoters. Their role appears to be in directing the accuracy of Pol II initiation. These Inr elements reside within the first 60 bp of the transcriptional start site, directly overlap the start site itself, but do not have a clearly defined consensus sequence. Many of the genes-encoding gastrointestinal peptides (e.g., gastrin, somatostatin, cholecystokinin, glucagons, and secretin) contain TATA elements ; however, the gene-encoding the growth factor, transforming growth factor alpha (TGFα), does not.





Methodology


This section summarizes some of the molecular techniques used to study the transcriptional control of genes. These methods are used to study either genetic structure or function. Three systems have been used to study function: reconstituted cell-free transcription assays, cell and tissue culture models, and whole-animal studies. Methods that analyze the structural interactions include those techniques that assess DNA-protein interactions and those that assess protein-protein interactions. More recently, studies of noncoding RNAs involve understanding RNA-DNA, RNA-RNA and RNA-protein interactions.



Structural Methods


Once functional regulatory DNA elements have been identified, assays that assess DNA-protein interactions are performed. Indeed, in circumstances where a long sequence (> 50 bp) must be analyzed, it is simpler to identify DNA-protein interactions first and then determine if these DNA elements are involved in transcriptional regulation. DNase I footprinting assays are used to identify DNA-binding elements that interact with crude or purified nuclear proteins by protecting them from chemical or enzymatic cleavage. Such assays are particularly well suited for studying cooperative interactions among proteins bound to adjacent DNA elements. The technique can be carried out in vivo or in vitro. However, in vivo footprinting has been superceded by chromatin immunoprecipitation (ChIP) assays described below. Electrophoretic gel mobility shift assays (EMSA, gel shift, gel delay, or band shift assays) permit a more detailed analysis of (a) the type of protein complexes that bind to individual DNA elements and (b) the specificity of the protein interaction with specific bp ( Fig. 1.8 ). This assay is also rapid and easier to perform than footprinting assays. Methylation interference assays extend the power of the gel shift assay by identifying specific nucleotide contacts that are required for DNA binding. DNA affinity precipitation (DAPA) is a DNA-protein interaction assay that uses a biotinylated DNA-binding site to identify the proteins that are recruited to an element. The assay uses the DNA element to isolate the protein factors along with immunoblots to identify the proteins that form both the protein-DNA and protein-protein interactions. Southwestern blot analysis takes advantage of specific DNA elements that are used to detect nuclear proteins separated on a denaturing gel and transferred to nitrocellulose or produced by a phage expression library.




Fig. 1.8


Electrophoretic mobility shift assay (EMSAs, gel shift). A DNA element ~30–100 bp in length is labeled and then incubated with crude nuclear extract or purified protein. A band on the autoradiogram is detected if the radiolabeled probe is retarded and does not migrate to the bottom of the gel. The specificity of binding is determined by competing with unlabeled DNA sequences. Competitor 1 is related to the probe sequence, whereas Competitor 2 is unrelated to the probe sequence.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Apr 21, 2019 | Posted by in ABDOMINAL MEDICINE | Comments Off on Transcription and Epigenetic Regulation

Full access? Get Clinical Tree

Get Clinical Tree app for offline access