Abstract
Genetic studies of kidney diseases over the past two decades have provided novel insights into underlying pathophysiologic mechanisms and spurred progress toward the goal of precision medicine. This chapter provides a primer on genetic variation, methods for gene discovery of monogenic kidney diseases (linkage analysis and next-generation sequencing), and identification of risk loci for kidney diseases with complex inheritance (genome-wide association studies [GWAS]). The chapter also briefly discusses the challenges in interpretation of identified variants from next-generation sequencing (NGS).
Keywords
genetic, DNA sequencing, variant, APOL1, GWAS, SNP, chronic kidney disease, focal segmental glomerulosclerosis
Genetic studies of kidney diseases over the past two decades have provided novel insights into underlying pathophysiologic mechanisms and spurred progress toward the goal of precision medicine. This chapter provides a primer on genetic variation, methods for gene discovery of monogenic kidney diseases (linkage analysis and next-generation sequencing), and identification of risk loci for kidney diseases with complex inheritance (genome-wide association studies [GWAS]). The chapter also briefly discusses the challenges in interpretation of identified variants from next-generation sequencing (NGS).
Genomic Variation
Across an individual’s genome, nucleotide positions can vary from the “normal” reference sequence. The simplest form of variation involves substitution of a single nucleotide base for another, known as a “single nucleotide variant” (SNV). Historically, rare SNVs that were sufficient to cause disease were called “mutations,” whereas more common SNVs, occurring in greater than 1% of the population, were called “single nucleotide polymorphisms (SNPs).” The more recent SNV terminology, as it is related to disease, allows us to report SNVs as a function of their frequency in a reference population. Rare SNVs are responsible for mendelian disease, while more common SNVs increase risk for common disease. In addition to SNVs, genomic variation can include structural variation from small insertions or deletions of ten to hundreds of nucleotides (indels) to duplications or deletions of larger chromosomal segments. These larger structural variants, known as copy number variants (CNVs), are increasingly being recognized for their role in human disease.
Mendelian Gene Discovery
Genetic discovery through the 1980s and 1990s required recruitment of families with multiple affected individuals containing a defined phenotype transmitted in a mendelian inheritance pattern (autosomal recessive, autosomal dominant, or X-linked). Family members (with and without disease) were genotyped, and linkage studies, which test for the segregation of phenotypic trait alongside genetic markers with known chromosomal location, were performed. Markers cosegregating with the phenotype “mapped” the candidate gene to a specified chromosomal region, and subsequent fine-resolution mapping narrowed the candidate region, highlighting potential genes for direct sequencing and functional analysis. Linkage studies played a pivotal role in the discovery of numerous mendelian genes, unraveling the genetic basis for kidney diseases such as polycystic kidney disease (PKD1, PKHD1) , congenital nephrotic syndrome (NPHS1), and Alport syndrome (COL4A5) .
Two major advances in the 2000s revolutionized the potential for gene discovery. First, completion of the Human Genome Project increased the utility of short fragment reads because they could be aligned to the published reference genome to identify genetic variation in tested samples. Second, development of NGS technology permitted sequencing of massive quantities of short DNA fragments in parallel, increasing sequencing output by orders of magnitude. In concert, these two advances enabled whole-genome sequencing (WGS) studies, which can provide almost complete sequence data for an individual.
While sequencing costs have dropped exponentially in the first decade following development of NGS technology, the massive quantity of sequence data generated from WGS requires considerable computational resources and analysis. Thus, particularly for genetic discovery studies focusing on rare mendelian diseases, targeted sequencing strategies, such as whole-exome sequencing (WES), have been used as a method of choice for high-throughput sequencing. WES uses enrichment strategies to sequence only the exome, the 1% of the genome that encodes proteins. Linkage analysis has been coupled with WES to identify novel disease genes. For example, in two families with hereditary atypical hemolytic uremic syndrome, WES determined disease association with the gene DGKE , which is independent of the complement pathway, identifying a potentially novel disease mechanism. In another example, WES and genome-wide linkage analysis identified a gene (DSTYK) implicated in congenital urinary tract malformations that may be mechanistically important for urogenital embryologic development. In NS, application of NGS technology has accelerated gene discovery, and over 30 genes have now been implicated in mendelian monogenic disease. Such studies highlight the genetic heterogeneity within a single clinical entity and thus the potential for improved disease classification from genetic studies.
To date, over 200 monogenic genes have been identified for kidney diseases, and these monogenic genes can underlie common causes for chronic kidney disease (CKD) that often present during childhood, including steroid-resistant NS and cystic disease. The discovery of these monogenic genes has revolutionized our understanding of underlying disease pathogenesis, such as the role of cilia in cystic diseases and the role of the podocyte in NS. In addition, genetic testing can, alongside the appropriate clinical, biochemical, and/or histologic data, provide a definitive diagnosis for a patient. Results from genetic testing may enable counseling for family planning or diagnosis of affected family members. Lastly, genetic testing may facilitate personalized medical decision making, such as prognosis or therapeutic optimization.
Variant Interpretation
Interrogation of sequencing data from high-throughput NGS has revealed the presence of significant genetic variation throughout the genome, including millions of SNVs. Initial results from the 1000 Genomes Project, a large-scale sequencing project, demonstrated that each healthy person harbors around 250 to 300 loss-of-function variants and 50 to 100 variants implicated in inherited disorders. The sheer volume of inherent genetic variation increases the false-positive discovery rate when screening for disease-associated variants. Subsequently, improper attribution of causality to a gene or variant can have significant research and clinical implications. From a research perspective, guidelines have been proposed that incorporate genetic, experimental, and bioinformatic evidence to assess confidence in attributing causality at gene level or variant level.
From a clinical perspective, multiple academic and commercial laboratories have begun genetic sequencing of isolated genes, selected gene panels, or whole exomes to aid in medical decision making. Identification of variants in sequencing results is only the first step in clinical interpretation as variants can fall along a spectrum of pathogenicity. In 2015, the American College of Medical Genetics and Genomics (ACMG) published updated Standards and Guidelines for the interpretation of identified sequence variants. A five-tier system (“pathogenic,” “likely pathogenic,” “uncertain significance,” “likely benign,” and “benign”) was introduced for classification of mendelian variants. A variety of tools are used to annotate (predict the effect of) the variant, which assists the classification of pathogenicity. Population databases (i.e., Exome Aggregation Consortium [ExAC]) can provide variant allele frequencies because variants responsible for mendelian disease are predicted to be very rare in the population. Disease databases (i.e., ClinVar) can provide historical reports of the clinical significance of identified variants. For variants resulting in nonsynonymous changes to amino acid code, computational (“in silico”) analyses with various software programs (e.g., SIFT, PolyPhen, CADD) can be performed to predict impact on protein function or structure.
Despite the aforementioned methods to assess pathogenicity, classification of variants remains imperfect, often necessitating other evidence to confidently define mendelian disease. For example, the presence of a variant of uncertain significance alongside a pathogenic variant in a recessive disease-causing gene raises diagnostic uncertainty, particularly because current sequencing studies cannot determine if two identified variants are located in cis on the same chromosome or in trans on different chromosomes. In such situations, genetic sequencing of the parents can provide further evidence for segregation of identified variants. Consultation with a medical geneticist can be invaluable when doubt arises regarding the interpretation of sequencing results.
Moving Beyond Monogenic Disease
As noted above, initial genetic studies in kidney disease focused on identifying rare causal SNVs with large effect that are responsible for disease with a mendelian inheritance pattern. However, by their nature, mendelian causes of kidney disease are rare. Thus a parallel goal in genetic nephrology research has focused on identifying low-frequency and common SNVs that may increase the risk of kidney disease. GWAS have emerged as a method to achieve this goal. In GWAS, SNPs are genotyped to determine whether the frequency of any of these markers is significantly different between unrelated cases versus unrelated controls ( Fig. 37.1 ).