Concept: Genetic genealogy
This study examines genetic diversity among 102 registered English Bulldogs used for breeding based on maternal and paternal haplotypes, allele frequencies in 33 highly polymorphic short tandem repeat (STR) loci on 25 chromosomes, STR-linked dog leukocyte antigen (DLA) class I and II haplotypes, and the number and size of genome-wide runs of homozygosity (ROH) determined from high density SNP arrays. The objective was to assess whether the breed retains enough genetic diversity to correct the genotypic and phenotypic abnormalities associated with poor health, to allow for the elimination of deleterious recessive mutations, or to make further phenotypic changes in body structure or coat. An additional 37 English bulldogs presented to the UC Davis Veterinary Clinical Services for health problems were also genetically compared with the 102 registered dogs based on the perception that sickly English bulldogs are products of commercial breeders or puppy-mills and genetically different and inferior.
Pathogens and the diseases they cause have been among the most important selective forces experienced by humans during their evolutionary history. Although adaptive alleles generally arise by mutation, introgression can also be a valuable source of beneficial alleles. Archaic humans, who lived in Europe and Western Asia for more than 200,000 years, were probably well adapted to this environment and its local pathogens. It is therefore conceivable that modern humans entering Europe and Western Asia who admixed with them obtained a substantial immune advantage from the introgression of archaic alleles. Here we document a cluster of three Toll-like receptors (TLR6-TLR1-TLR10) in modern humans that carries three distinct archaic haplotypes, indicating repeated introgression from archaic humans. Two of these haplotypes are most similar to the Neandertal genome, and the third haplotype is most similar to the Denisovan genome. The Toll-like receptors are key components of innate immunity and provide an important first line of immune defense against bacteria, fungi, and parasites. The unusually high allele frequencies and unexpected levels of population differentiation indicate that there has been local positive selection on multiple haplotypes at this locus. We show that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with reduced microbial resistance and increased allergic disease in large cohorts. This provides strong evidence for recurrent adaptive introgression at the TLR6-TLR1-TLR10 locus, resulting in differences in disease phenotypes in modern humans.
Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10-30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed <20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4-6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna (caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.
Autoimmune thyroid disease (AITD), including Graves' disease (GD) and Hashimoto’s thyroiditis (HT), is one of the most common of the immune-mediated diseases. To further investigate the genetic determinants of AITD, we conducted an association study using a custom-made single-nucleotide polymorphism (SNP) array, the ImmunoChip. The SNP array contains all known and genotype-able SNPs across 186 distinct susceptibility loci associated with one or more immune-mediated diseases. After stringent quality control, we analysed 103 875 common SNPs (minor allele frequency >0.05) in 2285 GD and 462 HT patients and 9364 controls. We found evidence for seven new AITD risk loci (P < 1.12 × 10(-6); a permutation test derived significance threshold), five at locations previously associated and two at locations awaiting confirmation, with other immune-mediated diseases.
Although the concept of genomic selection relies on linkage disequilibrium (LD) between quantitative trait loci and markers, reliability of genomic predictions is strongly influenced by family relationships. In this study, we investigated the effects of LD and family relationships on reliability of genomic predictions and the potential of deterministic formulas to predict reliability using population parameters in populations with complex family structures. Five groups of selection candidates were simulated taking different information sources from the reference population into account: 1) allele frequencies; 2) LD pattern; 3) haplotypes; 4) haploid chromosomes; 5) individuals from the reference population, thereby having real family relationships with reference individuals. Reliabilities were predicted using genomic relationships among 529 reference individuals and their relationships with selection candidates and with a deterministic formula where the number of effective chromosome segments (M(e)) was estimated based on genomic and additive relationship matrices for each scenario. At a heritability of 0.6, reliabilities based on genomic relationships were 0.002±0.0001 (allele frequencies), 0.015±0.001 (LD pattern), 0.018±0.001 (haplotypes), 0.100±0.008 (haploid chromosomes) and 0.318±0.077 (family relationships). At a heritability of 0.1, relative differences among groups were similar. For all scenarios, reliabilities were similar to predictions with a deterministic formula using estimated M(e). So, reliabilities can be predicted accurately using empirically estimated M(e) and level of relationship with reference individuals has a much higher effect on the reliability than linkage disequilibrium per se. Furthermore, accumulated length of shared haplotypes is more important in determining the reliability of genomic prediction than the individual shared haplotype length.
Sex chromosomes are an ideal system to study processes connected with suppressed recombination. We found evidence of microsatellite expansion, on the relatively young Y chromosome of the dioecious plant sorrel (Rumex acetosa, XY1Y2 system), but no such expansion on the more ancient Y chromosomes of liverwort (Marchantia polymorpha) and human. The most expanding motifs were AC and AAC, which also showed periodicity of array length, indicating the importance of beginnings and ends of arrays. Our data indicate that abundance of microsatellites in genomes depends on the inherent expansion potential of specific motifs, which could be related to their stability and ability to adopt unusual DNA conformations. We also found that the abundance of microsatellites is higher in the neighborhood of transposable elements (TEs) suggesting that microsatellites are probably targets for TE insertions. This evidence suggests that microsatellite expansion is an early event shaping the Y chromosome where this process is not opposed by recombination, while accumulation of TEs and chromosome shrinkage predominate later.
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
The human Y-chromosome does not recombine across its male-specific part and is therefore an excellent marker of human migrations. It also plays an important role in male fertility. However, its evolution is difficult to fully understand because of repetitive sequences, inverted repeats and the potentially large role of gene conversion. Here we perform an evolutionary analysis of 62 Y-chromosomes of Danish descent sequenced using a wide range of library insert sizes and high coverage, thus allowing large regions of these chromosomes to be well assembled. These include 17 father-son pairs, which we use to validate variation calling. Using a recent method that can integrate variants based on both mapping and de novo assembly, we genotype 10898 SNVs and 2903 indels (max length of 27241 bp) in our sample and show by father-son concordance and experimental validation that the non-recurrent SNP and indel variation on the Y chromosome tree is called very accurately. This includes variation called in a 0.9 Mb centromeric heterochromatic region, which is by far the most variable in the Y chromosome. Among the variation is also longer sequence-stretches not present in the reference genome but shared with the chimpanzee Y chromosome. We analyzed 2.7 Mb of large inverted repeats (palindromes) for variation patterns among the two palindrome arms and identified 603 mutation and 416 gene conversions events. We find clear evidence for GC-biased gene conversion in the palindromes (and a balancing AT mutation bias), but irrespective of this, also a strong bias towards gene conversion towards the ancestral state, suggesting that palindromic gene conversion may alleviate Muller’s ratchet. Finally, we also find a large number of large-scale gene duplications and deletions in the palindromic regions (at least 24) and find that such events can consist of complex combinations of simultaneous insertions and deletions of long stretches of the Y chromosome.
This study focuses on the descendants of the royal Inka family. The Inkas ruled Tawantinsuyu, the largest pre-Columbian empire in South America, which extended from southern Colombia to central Chile. The origin of the royal Inkas is currently unknown. While the mummies of the Inka rulers could have been informative, most were destroyed by Spaniards and the few remaining disappeared without a trace. Moreover, no genetic studies have been conducted on present-day descendants of the Inka rulers. In the present study, we analysed uniparental DNA markers in 18 individuals predominantly from the districts of San Sebastian and San Jerónimo in Cusco (Peru), who belong to 12 families of putative patrilineal descent of Inka rulers, according to documented registries. We used single-nucleotide polymorphisms and short tandem repeat (STR) markers of the Y chromosome (Y-STRs), as well as mitochondrial DNA D-loop sequences, to investigate the paternal and maternal descent of the 18 alleged Inka descendants. Two Q-M3* Y-STR clusters descending from different male founders were identified. The first cluster, named AWKI-1, was associated with five families (eight individuals). By contrast, the second cluster, named AWKI-2, was represented by a single individual; AWKI-2 was part of the Q-Z19483 sub-lineage that was likely associated with a recent male expansion in the Andes, which probably occurred during the Late Intermediate Period (1000-1450 AD), overlapping the Inka period. Concerning the maternal descent, different mtDNA lineages associated with each family were identified, suggesting a high maternal gene flow among Andean populations, probably due to changes in the last 1000 years.
Genetics can provide invaluable information on the ancestry of the current inhabitants of Cyprus. A Y-chromosome analysis was performed to (i) determine paternal ancestry among the Greek Cypriot (GCy) community in the context of the Central and Eastern Mediterranean and the Near East; and (ii) identify genetic similarities and differences between Greek Cypriots (GCy) and Turkish Cypriots (TCy). Our haplotype-based analysis has revealed that GCy and TCy patrilineages derive primarily from a single gene pool and show very close genetic affinity (low genetic differentiation) to Calabrian Italian and Lebanese patrilineages. In terms of more recent (past millennium) ancestry, as indicated by Y-haplotype sharing, GCy and TCy share much more haplotypes between them than with any surrounding population (7-8% of total haplotypes shared), while TCy also share around 3% of haplotypes with mainland Turks, and to a lesser extent with North Africans. In terms of Y-haplogroup frequencies, again GCy and TCy show very similar distributions, with the predominant haplogroups in both being J2a-M410, E-M78, and G2-P287. Overall, GCy also have a similar Y-haplogroup distribution to non-Turkic Anatolian and Southwest Caucasian populations, as well as Cretan Greeks. TCy show a slight shift towards Turkish populations, due to the presence of Eastern Eurasian (some of which of possible Ottoman origin) Y-haplogroups. Overall, the Y-chromosome analysis performed, using both Y-STR haplotype and binary Y-haplogroup data puts Cypriot in the middle of a genetic continuum stretching from the Levant to Southeast Europe and reveals that despite some differences in haplotype sharing and haplogroup structure, Greek Cypriots and Turkish Cypriots share primarily a common pre-Ottoman paternal ancestry.