Fig. 1.1
Associations of the IL12/IL23- and IL27-regulating genes with IBD in keeping with the TH1 and TH2/TH17 theory (Wang et al. [96]). Only the main proteins in these pathways are shown. For each gene, the most significant P value among SNPs closest to the gene was annotated
Meta-analysis
The associated common variants identified by single GWAS usually have modest individual effects, often with odds ratios of smaller than 1.2 for binary traits or with explained variance of less than 1% for quantitative traits [55]. To discover common variants with even smaller effects, a sample size larger than that of single studies is required. Meta-analysis combines large data sets and is an economical way to improve sample size. An early meta-analysis of three genome-wide Crohn scans identified 21 new Crohn susceptibility loci. It increased the number of independent loci conclusively associated with Crohn to 32, explaining approximately 20% of Crohn disease heritability [56]. Including three additional GWAS scans, a recent meta-analysis added 39 new confirmed Crohn disease susceptibility loci [57]. These 39 new loci increase the proportion of explained heritability to only 23.2%, indicating their rather modest effects. While some of these newly identified loci contain a single gene, others contain multiple genes or none at all. Some functionally interesting candidate genes in the implicated regions including STAT3, JAK2, ICOSLG, ITLN1, and SMAD3 are briefly described below.
STAT3 (signal transducer and activator of transcription 3) and JAK2 (Janus kinase 2) both come from the JAK-STAT pathway. This major signaling pathway transmits information from cell surface receptors stimulated by cytokine and growth factors to the nucleus to regulate transcription of various genes. STATs play a central role in Th17 differentiation [58] while both contribute to IL23R signaling [32]. ICOSLG (inducible T-cell co-stimulator ligand) is a co-stimulatory molecule expressed on intestinal (and other) epithelial cells. It has been suggested that ICOSLG may have a key role in controlling the effector functions of regulatory T cells [59]. There is direct evidence showing that maturing plasmacytoid dendritic cells express different sets of molecules including ICOSLG for T-cell priming [60]. ITLN1 (intelectin-1) is known to be expressed in human small bowel and colon. It is found that the lactoferrin receptor (LFR), which is structurally identical to human ITLN1, seems critical in membrane stabilization, preventing loss of digestive enzymes and protecting the glycolipid microdomains from pathogens [61]. SMAD3 (SMAD family member 3) binds the TRE element in the promoter region of many genes that are regulated by transforming growth factor beta (TGF-β) and, on formation of the SMAD3/SMAD4 complex, activates transcription. It has been demonstrated that SMAD3 deficiency will enhance Th17 during the TGF-β-mediated induction of Foxp3+ regulatory T cells [62].
Impact of the Immunochip
Common immune disorders such as ankylosing spondylitis, celiac disease, multiple sclerosis, psoriasis, rheumatoid arthritis, systemic lupus erythematosus, and type 1 diabetes often share overlapping susceptibility loci in GWAS studies [63]. Motivated by this observation, the Immunochip Consortium was formed to produce an inexpensive genotyping array that could be used to analyze hundreds of thousands of samples in autoimmune disease. The chip interrogates approximately 200,000 SNPs at 186 loci to enable dense genotyping so that SNPs located close together in the loci of interest including those at low allele frequencies can be included in analyses [64]. The results gained from this effort played a large role in the meta-analysis of Jostins et al. which raised the tally of IBD-associated loci to 163 [65]. The Jostins study revealed that 113 of the 163 loci are shared with other complex diseases including 66 loci shared with other autoimmune diseases [63]. The economical cost of the Immunochip allowed so many samples to be genotyped that loci could be identified at a genome-wide significance level where in the previous meta-analyses they showed only marginal significance.
A further goal of the Immunochip effort is to fine-map variants so that by using Bayesian statistical analyses, the individual causal variant can be identified rather than a large ensemble of variants that are in linkage disequilibrium with each other [66]. For instance, this fine-mapping can be used to show that amino acid substitutions in NOD2 and IL23R are the causal SNPs that drive the genetic association signal.
GWAS Meta-analysis in Ulcerative Colitis
Recent GWAS and candidate gene association studies have identified 18 susceptibility loci for UC, which explain approximately 11% of the heritability for this disease. A recent meta-analysis combining data from six GWAS identified 29 additional UC risk loci, increasing the number of confirmed associations to 47 [67]. Examination of the gene content of the 47 associated regions shows that three regions each contain a single gene, most (35 out of 47) contain multiple genes, and nine contain no genes. Some noteworthy candidate genes including PRDM1, TNFRSF14, TNFRSF9, IL1R2, IL8RA, and IL8RB are briefly described below.
PRDM1 (PR domain containing 1, with ZNF domain) is the master transcriptional regulator of plasma cells and acts as a transcriptional repressor of the IFN-β promoter by binding specifically to the PRDI element. It drives the maturation of B lymphocytes into Ig-secreting cells. TNFRSF14 (tumor necrosis factor receptor superfamily, member 14) has an important role in preventing intestinal inflammation in a T-cell transfer model of colitis [68]. TNFRSF9 (tumor necrosis factor receptor superfamily, member 9) is a co-stimulator in the regulation of peripheral T-cell activation, with enhanced proliferation and IL-2 secretion. This factor is expressed by dendritic cells, granulocytes, and endothelial cells at inflammation sites. IL1R2 (interleukin 1 receptor, type II) can reduce IL1B activities by competitive binding to IL1B, preventing its binding to IL1R1. It is found that IL1B production by lamina propria macrophages is increased in patients with ulcerative colitis [69]. IL8RA and IL8RB (chemokine (C-X-C motif) receptor 1/2) are two receptors for interleukin-8, which is a powerful neutrophil chemotactic factor. Binding of IL-8 to the receptor also causes activation of neutrophils. IL8RA, but not IL8RB, expression is found to be increased in macrophages, lymphocytes, and epithelium in ulcerative colitis. It has been suggested that IL8RA may help IL-8 to play a role beyond neutrophil recruitment in mediating the immune response in UC [70].
Trans-ancestry Association Studies
The vast majority of genetic studies in IBD have been conducted in European ancestry populations. However, the expansion of these studies into Asian populations has yielded some insights. In the Japanese population, the well-known NOD2 polymorphisms are virtually absent [71]. GWAS in Japan has shown that the single largest association signal is located in the TNFSF15 gene encoding the pro-inflammatory cytokine TL1A [72].
Liu et al. conducted a trans-ethnic meta-analysis including 86,640 individuals of European ancestry and 9,846 individuals from East Asia, India, or Iran [4]. This study implicated 38 new loci, raising the tally to 200 total loci, and determined that there were significant differences in the frequency of risk alleles in the different populations. Nevertheless, the direction and magnitude of the effect at the shared loci were very similar between ancestries, suggesting that the casual variants are likely to be common (minor allele frequency greater than 5%). Besides the large impact of TL1A in the Asian population, the HLA locus was also found to have a greater influence particularly in ulcerative colitis [4].
The Debut of Next-Generation Sequencing
The traditional method of DNA sequencing was developed by Sanger using dideoxy-nucleotides as chain terminators [73]. This technology has become quite efficient and can be run on an automated instrument to generate 700-bp sequence reads with fluorescently labeled terminators. In the last 10 years, a new generation of DNA sequencing technology has emerged which uses sequencing by synthesis on a massively parallel scale to generate hundreds of gigabases of raw sequence per day, that is, 1800 whole human genomes per year on a single instrument (Illumina Inc., San Diego, CA). This technology has revolutionized the field of Mendelian genetics, that is, rare monogenic diseases, by enabling the identification of rare variants in a family setting. Interestingly, inflammatory bowel diseases can have Mendelian mimics that can be detected by next-generation sequencing, particularly in the very early-onset (VEO) patients [74, 75]. More attention will be given to the diagnosis of these genetic phenocopies and the management of the very young patients in the chapter of this textbook on very early-onset IBD.
Sequencing in High-Risk Individuals and Families
With level of technology available as of this writing, the most cost-effective approach to massively parallel sequencing in IBD patients is to target the exome, that is, the 1% of the genome that encodes the amino acids of proteins. Congenital deficiency of the receptor for the immunomodulatory cytokine IL-10 was the first monogenic defect identified as causative of VEO-IBD in 2009. While refractory to medical therapy, these patients responded to bone marrow transplant [76]. Exome sequencing has revealed additional patients with IL-10 receptor deficiency [75]. Since that time, multiple other monogenic defects have been identified through exome sequencing. An early example of the success of this approach was seen in a 15-month-old child who presented with perianal fistulae and failure to thrive unresponsive to standard treatments which progressed to pancolitis. The patient underwent many surgical procedures and genetic tests that did not resolve his disease. Exome sequencing revealed that this patient carried an exceedingly rare mutation on the X chromosome in the XIAP gene, a potent regulator of the inflammatory response [77]. Since this protein acts in cells of the hematopoietic lineage, he was treated by a bone marrow transplant resulting in resolution of his disease. Other monogenic cases of VEO-IBD have been identified and have resulted in lifesaving therapy.
Features that suggest a patient may be a candidate for exome sequencing include early onset of disease, unusual severity, familial pattern of transmission, and refractory response to standard therapies. It is recommended to obtain DNA samples from the parents in addition to the proband so that Mendelian errors in allele transmission can be identified since there is an error rate inherent in next-generation sequencing. The trio of exomes is also useful in identifying de novo mutations which may be pathogenic [78].
Next-Generation Sequencing as a Research Tool
It is commonly believed that some fraction of the heritability of complex genetic disorders, such as IBD and particularly VEO-IBD, is due to rare or low-frequency variants [79]. Due to their rarity, these variants are not in strong linkage disequilibrium with proxy SNPs, which is required to make the GWAS approach feasible. Therefore, discovery of additional genes and low-frequency variants will require direct sequencing of tens of thousands of genomes [80]. The cost of whole genome sequencing is still prohibitive on this scale, so some research groups have focused on the exome as discussed above. Another approach to finding rare or coding variation has been to sequence specific genes in a large cohort based on the gene’s status as a GWAS candidate. Rivas et al. identified additional coding mutations in NOD2 and IL23R as well as novel coding variants in CARD9, IL18RAP, CUL2, C1orf106, PTPN22, and MUC19 [81]. Beaudoin et al. performed amplicon sequencing on 55 genes in 200 cases and 150 controls for ulcerative colitis. They confirmed the previous associations with CARD9 and IL23R, as well as a novel association in RNF186 [82].
Efforts are currently underway to extend sequencing to thousands of exomes to search for pathogenic coding variants. A difficulty to this approach is that any individual variant is so rare that there is insufficient statistical power to identify the variant at genome-wide significance. As a result, many statistical methods have been developed which aggregate all the discovered variants in a gene into a single supervariant to test the burden of rare variants between cases and controls [83]. Although the outcome of this large-scale exome sequencing in IBD is still pending, there are some early signs that there will be a low yield of novel associations based on similar studies in complex autoimmune disease. The BGI (formerly Beijing Genomics Institute) performed discovery exome sequencing of psoriasis in 781 cases and 676 controls followed by sequencing-based replication in 9,946 cases and 9,906 controls with a panel of 1,326 targeted genes [84]. They found missense SNVs in IL23R, GJB2, LCE3D, ERAP1, CARD14, and ZNF816A based on single-SNP association statistics; notably all the variants were not truly rare but ranged from low frequency to common. They analyzed their data using most of the known gene-based association tests (burden tests) and did not reveal any novel associations, leading the group to conclude that coding variants in the targeted genes account for little of the genetic risk [84]. This scenario could hold for IBD as well.
Risk Prediction in IBD
Encouraged by the notable success of GWAS in Crohn disease and ulcerative colitis, it is logical to ask if these advances can deliver sufficiently accurate predictions to make targeted intervention realistic. Several efforts have been made, but most results are generally modest or even negative. For example, in a recent study, Kang and colleagues reported the best AUC (area under the receiver operating characteristic curve) score of 0.72 in predicting CD risk using GWAS genotype data [85]. This best AUC is obtained assuming the optimal number of predictors is given. The practical AUC may be even lower because the optimal number of predictors is usually unknown and has to be inferred from data itself. However, it is noted that these early efforts usually use small or modest sample sizes. As in meta-analysis, it is possible to compile a large sample size by combining as many cohorts as possible, yielding a boost in prediction performance. Using the large sample size and wide variant spectrum of the Immunochip data set in combination with advanced machine learning methods, Wei et al. were able to achieve an AUC of 0.86 for Crohn disease and 0.83 for ulcerative colitis [86]. These statistical methods may prove to be useful in disease classification as deep whole genome sequence data becomes available.
Genotype-Phenotype Correlations in Pediatric IBD
Disease Type and Location
The discovery of genetic polymorphisms in IBD has afforded investigators with the opportunity to identify predictive correlations between specific variants and phenotypic disease characteristics. Analyses of adult populations have demonstrated that carriage of NOD2 risk alleles predicts disease onset at an earlier age and ileal disease location in a dose-dependent manner. Subsequently, a meta-analysis of 16 case-control studies confirmed the association of NOD2 carriage with ileal disease location and also identified a correlation with fibrostenosing behavior and family history of IBD [16].
The majority of pediatric studies have concurred with findings from adult counterparts that carriage of NOD2 variants is associated with ileal disease. Estimates suggest that 20–65% of children with ileal Crohn disease possess at least one NOD2 mutation; consistent phenotypic associations have not been seen for other regions of the gastrointestinal tract [43, 87–91]. In contradistinction to the adult literature, correlates of NOD2 variants with fibrostenosing disease have demonstrated conflicting results [19, 87, 88, 90, 91]. Two large studies from the USA and Scotland found that 34–45% of Crohn patients possessing NOD2 polymorphisms had evidence of fibrostenosing disease, especially the 1007fs and R720W variants [88, 91]. Three other pediatric studies, however, found no correlation of NOD2 with fibrostenosing disease [19, 87, 90].
Growth Parameters
As growth failure is an important feature of pediatric IBD, several groups have investigated the relationship between anthropometric parameters and NOD2 status. A study of 101 Crohn patients demonstrated that 44% of participants possessing a NOD2 polymorphism were <5% for weight at the time of diagnosis, while only 15% of those without a genetic variant were <5% for weight at the time of diagnosis [87]. Although similar trends were seen for height, these results did not reach statistical significance. Another study of 93 Crohn patients, however, did not show any correlation between NOD2 status and height or weight Z scores at disease onset or for the lowest Z score during childhood [90]. Rather, disease severity was the strongest predictor for impaired growth, and ileal involvement was associated with height retardation at disease onset and the lowest Z score during follow-up. Finally, a German group did not find any statistically significant difference in mean body mass index (BMI) or mean height percentiles at diagnosis between patients with and without NOD2 variants [92]. The authors did note a nonsignificant trend of greater numbers of patients possessing NOD2 polymorphisms being below the 3% for BMI. These data imply that while NOD2 variants may be associated with poor growth, this effect may be more a reflection of malnutrition secondary to ileal location and disease severity as opposed to an inherent genetic effect.
Association with Risk of Surgery
Results of pediatric studies correlating NOD2 status with the need for small bowel surgical resection have consistently delineated a positive association. Russell et al. estimated an odds ratio for risk of surgery among children with Crohn possessing any NOD2 mutation to be 4.45 [91]. Pediatric Crohn patients with the 1007fsInsC variant appeared to have a greater likelihood of requiring surgery with an odds ratio of 4.8. Among US Caucasian Crohn patients, hazard ratios for surgery indicate that children possessing the 3020insC variant are at sixfold greater risk of requiring surgical intervention [88]. Furthermore, these children also showed a trend toward a need for earlier surgery at median of 14 months versus 23 months after diagnosis.
Large-Scale Phenotypic Correlations
Cleynen et al. analyzed subphenotypes of IBD in 34,819 patients who were genotyped on the Immunochip [93]. For Crohn disease, the phenotypes examined were age at diagnosis, disease location, disease behavior (penetrating, stricturing, inflammatory), and requirement for surgery. For ulcerative colitis, the phenotypes examined were age of onset, disease extent, and colectomy. Across all 186 loci on the Immunochip, only SNPs in NOD2, the HLA locus, and 3p21 (MST1) were found to have genome-wide significance, influencing all subphenotypes [93]. The disease location was essentially fixed over time and was the main independent determinant of the patient’s disease process, while disease behavior and requirement for surgery were largely markers of disease progression. A composite genetic risk score based on the 163 known loci was associated all disease subphenotypes, but only the three loci named above were individually significant. The authors concluded that the binary classification of IBD into Crohn disease and ulcerative colitis is not supported by genetic data and that a ternary classification should be used: ulcerative colitis, colonic Crohn disease, and ileal Crohn disease [93].
Genetic Sharing Between Pediatric Age of Onset IBD and Other Autoimmune Diseases
As the Immunochip genotyping effort amply demonstrated, there is a shared genetic architecture for a wide variety of autoimmune diseases. Li et al. performed GWAS in 6,035 cases of ten different pediatric autoimmune diseases and 10,718 shared controls. This effort identified 27 genome-wide significant loci which had shared risk among multiple pediatric autoimmune diseases, for instance, a novel role for CD40LG in Crohn disease, ulcerative colitis, and celiac disease [94]. The main pathways identified as responsible for this shared risk were cytokine signaling, antigen presentation, T-cell activation, JAK-STAT signaling, and helper T-cell cytokine signaling [94]. A study of SNP-h 2, also called narrow-sense heritability, across these ten pediatric autoimmune diseases showed that the additive heritability explained by genotyped and imputable SNPs was 0.454 for Crohn disease and 0.386 for ulcerative colitis [95]. In pairwise analysis, Crohn disease and ulcerative colitis showed the strongest correlation of all pairwise combinations of the ten autoimmune diseases (0.69) [95].
Summary
Both family and twin-based studies lend strong support for a genetic basis of IBD. This is further supported by observations of racial/ethnic variations in disease prevalence. The recent advent of GWAS has markedly advanced the identification of well-replicated IBD association and has substantiated this concept at a molecular level and catapulted the field of IBD genetics into a new realm of discovery. Future sequencing studies are likely to identify rarer variants that confer greater risks at the individual level and may help uncovering new gene interactions and networks that contribute to the pathogenesis of IBD allowing for stratification of IBD patients into different therapeutic pathways and interventions in the future.
Acknowledgment
We are most grateful to Dr. Judy H. Cho, Dr. Nancy McGreal, Dr. Zhi Wei, and Steve Baldassano (MD/PhD student) who wrote earlier versions of this chapter.
References
1.
2.
3.
4.
Liu JZ, van Sommeren S, Huang H, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet. 2015;47:979–86.CrossRefPubMedPubMedCentral
5.