Journal: Nature genetics
The timing of puberty is a highly polygenic childhood trait that is epidemiologically associated with various adult diseases. Using 1000 Genomes Project-imputed genotype data in up to ∼370,000 women, we identify 389 independent signals (P < 5 × 10(-8)) for age at menarche, a milestone in female pubertal development. In Icelandic data, these signals explain ∼7.4% of the population variance in age at menarche, corresponding to ∼25% of the estimated heritability. We implicate ∼250 genes via coding variation or associated expression, demonstrating significant enrichment in neural tissues. Rare variants near the imprinted genes MKRN3 and DLK1 were identified, exhibiting large effects when paternally inherited. Mendelian randomization analyses suggest causal inverse associations, independent of body mass index (BMI), between puberty timing and risks for breast and endometrial cancers in women and prostate cancer in men. In aggregate, our findings highlight the complexity of the genetic regulation of puberty timing and support causal links with cancer susceptibility.
Intelligence is associated with important economic and health-related life outcomes. Despite intelligence having substantial heritability (0.54) and a confirmed polygenic nature, initial genetic studies were mostly underpowered. Here we report a meta-analysis for intelligence of 78,308 individuals. We identify 336 associated SNPs (METAL P < 5 × 10(-8)) in 18 genomic loci, of which 15 are new. Around half of the SNPs are located inside a gene, implicating 22 genes, of which 11 are new findings. Gene-based analyses identified an additional 30 genes (MAGMA P < 2.73 × 10(-6)), of which all but one had not been implicated previously. We show that the identified genes are predominantly expressed in brain tissue, and pathway analysis indicates the involvement of genes regulating cell development (MAGMA competitive P = 3.5 × 10(-6)). Despite the well-known difference in twin-based heratiblity for intelligence in childhood (0.45) and adulthood (0.80), we show substantial genetic correlation (rg = 0.89, LD score regression P = 5.4 × 10(-29)). These findings provide new insight into the genetic architecture of intelligence.
Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
Despite the important role that monozygotic twins have played in genetics research, little is known about their genomic differences. Here we show that monozygotic twins differ on average by 5.2 early developmental mutations and that approximately 15% of monozygotic twins have a substantial number of these early developmental mutations specific to one of them. Using the parents and offspring of twins, we identified pre-twinning mutations. We observed instances where a twin was formed from a single cell lineage in the pre-twinning cell mass and instances where a twin was formed from several cell lineages. CpG>TpG mutations increased in frequency with embryonic development, coinciding with an increase in DNA methylation. Our results indicate that allocations of cells during development shapes genomic differences between monozygotic twins.
The koala, the only extant species of the marsupial family Phascolarctidae, is classified as ‘vulnerable’ due to habitat loss and widespread disease. We sequenced the koala genome, producing a complete and contiguous marsupial reference genome, including centromeres. We reveal that the koala’s ability to detoxify eucalypt foliage may be due to expansions within a cytochrome P450 gene family, and its ability to smell, taste and moderate ingestion of plant secondary metabolites may be due to expansions in the vomeronasal and taste receptors. We characterized novel lactation proteins that protect young in the pouch and annotated immune genes important for response to chlamydial disease. Historical demography showed a substantial population crash coincident with the decline of Australian megafauna, while contemporary populations had biogeographic boundaries and increased inbreeding in populations affected by historic translocations. We identified genetically diverse populations that require habitat corridors and instituting of translocation programs to aid the koala’s survival in the wild.
Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All the results can be visualized using the MaTCH webtool.
Despite strong evidence supporting the heritability of major depressive disorder (MDD), previous genome-wide studies were unable to identify risk loci among individuals of European descent. We used self-report data from 75,607 individuals reporting clinical diagnosis of depression and 231,747 individuals reporting no history of depression through 23andMe and carried out meta-analysis of these results with published MDD genome-wide association study results. We identified five independent variants from four regions associated with self-report of clinical diagnosis or treatment for depression. Loci with a P value <1.0 × 10(-5) in the meta-analysis were further analyzed in a replication data set (45,773 cases and 106,354 controls) from 23andMe. A total of 17 independent SNPs from 15 regions reached genome-wide significance after joint analysis over all three data sets. Some of these loci were also implicated in genome-wide association studies of related psychiatric traits. These studies provide evidence for large-scale consumer genomic data as a powerful and efficient complement to data collected from traditional means of ascertainment for neuropsychiatric disease genomics.
A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.
Characterized primarily by a low body-mass index, anorexia nervosa is a complex and serious illness1, affecting 0.9-4% of women and 0.3% of men2-4, with twin-based heritability estimates of 50-60%5. Mortality rates are higher than those in other psychiatric disorders6, and outcomes are unacceptably poor7. Here we combine data from the Anorexia Nervosa Genetics Initiative (ANGI)8,9 and the Eating Disorders Working Group of the Psychiatric Genomics Consortium (PGC-ED) and conduct a genome-wide association study of 16,992 cases of anorexia nervosa and 55,525 controls, identifying eight significant loci. The genetic architecture of anorexia nervosa mirrors its clinical presentation, showing significant genetic correlations with psychiatric disorders, physical activity, and metabolic (including glycemic), lipid and anthropometric traits, independent of the effects of common variants associated with body-mass index. These results further encourage a reconceptualization of anorexia nervosa as a metabo-psychiatric disorder. Elucidating the metabolic component is a critical direction for future research, and paying attention to both psychiatric and metabolic components may be key to improving outcomes.
Explaining the genetics of many diseases is challenging because most associations localize to incompletely characterized regulatory regions. Using new computational methods, we show that transcription factors (TFs) occupy multiple loci associated with individual complex genetic disorders. Application to 213 phenotypes and 1,544 TF binding datasets identified 2,264 relationships between hundreds of TFs and 94 phenotypes, including androgen receptor in prostate cancer and GATA3 in breast cancer. Strikingly, nearly half of systemic lupus erythematosus risk loci are occupied by the Epstein-Barr virus EBNA2 protein and many coclustering human TFs, showing gene-environment interaction. Similar EBNA2-anchored associations exist in multiple sclerosis, rheumatoid arthritis, inflammatory bowel disease, type 1 diabetes, juvenile idiopathic arthritis and celiac disease. Instances of allele-dependent DNA binding and downstream effects on gene expression at plausibly causal variants support genetic mechanisms dependent on EBNA2. Our results nominate mechanisms that operate across risk loci within disease phenotypes, suggesting new models for disease origins.