BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. RESULTS: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. CONCLUSIONS: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
It is still debated if pre-existing minority drug-resistant HIV-1 variants (MVs) affect the virological outcomes of first-line NNRTI-containing ART.
Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.
Haptophytes are a key phylum of marine protists, including ~300 described morphospecies and 80 morphogenera. We used 454 pyrosequencing on large subunit ribosomal DNA (LSU rDNA) fragments to assess the diversity from size-fractioned plankton samples collected in the Bay of Naples. One group-specific primer set targeting the LSU rDNA D1/D2 region was designed to amplify Haptophyte sequences from nucleic acid extracts (total DNA or RNA) of two size fractions (0.8-3 or 3-20 μm) and two sampling depths [subsurface, at 1 m, or deep chlorophyll maximum (DCM) at 23 m]. 454 reads were identified using a database covering the entire Haptophyta diversity currently sequenced. Our data set revealed several hundreds of Haptophyte clusters. However, most of these clusters could not be linked to taxonomically known sequences: considering OTUs(97%) (clusters build at a sequence identity level of 97%) on our global data set, less than 1% of the reads clustered with sequences from cultures, and less than 12% clustered with reference sequences obtained previously from cloning and Sanger sequencing of environmental samples. Thus, we highlighted a large uncharacterized environmental genetic diversity, which clearly shows that currently cultivated species poorly reflect the actual diversity present in the natural environment. Haptophyte community appeared to be significantly structured according to the depth. The highest diversity and evenness were obtained in samples from the DCM, and samples from the large size fraction (3-20 μm) taken at the DCM shared a lower proportion of common OTUs(97%) with the other samples. Reads from the species Chrysoculter romboideus were notably found at the DCM, while they could be detected at the subsurface. The highest proportion of totally unknown OTUs(97%) was collected at the DCM in the smallest size fraction (0.8-3 μm). Overall, this study emphasized several technical and theoretical barriers inherent to the exploration of the large and largely unknown diversity of unicellular eukaryotes.
We analyzed the diversity of bacterial epibionts and trophic ecology of a new species of Kiwa yeti crab discovered at two hydrothermal vent fields (E2 and E9) on the East Scotia Ridge (ESR) in the Southern Ocean using a combination of 454 pyrosequencing, Sanger sequencing, and stable isotope analysis. The Kiwa epibiont communities were dominated by Epsilon- and Gammaproteobacteria. About 454 sequencing of the epibionts on 15 individual Kiwa specimen revealed large regional differences between the two hydrothermal vent fields: at E2, the bacterial community on the Kiwa ventral setae was dominated (up to 75%) by Gammaproteobacteria, whereas at E9 Epsilonproteobacteria dominated (up to 98%). Carbon stable isotope analysis of both Kiwa and the bacterial epibionts also showed distinct differences between E2 and E9 in mean and variability. Both stable isotope and sequence data suggest a dominance of different carbon fixation pathways of the epibiont communities at the two vent fields. At E2, epibionts were putatively fixing carbon via the Calvin-Benson-Bassham and reverse tricarboxylic acid cycle, while at E9 the reverse tricarboxylic acid cycle dominated. Co-varying epibiont diversity and isotope values at E2 and E9 also present further support for the hypothesis that epibionts serve as a food source for Kiwa.
Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.
Next generation sequencing technology has enabled characterization of metagenomics through massively parallel genomic DNA sequencing. The complexity and diversity of environmental samples such as the human gut microflora, combined with the sustained exponential growth in sequencing capacity, has led to the challenge of identifying microbial organisms by DNA sequence. We sought to validate a Scalable Metagenomics Alignment Research Tool (SMART), a novel searching heuristic for shotgun metagenomics sequencing results.
MOTIVATION: Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values which contain additional information. RESULTS: With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analyzing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm. AVAILABILITY: JATAC is freely available under the General Public License from http://malde.org/ketil/jatac/. CONTACT: Ketil.Malde@imr.no SUPPLEMENTARY INFORMATION: Supplementary Material is available at Bioinformatics online.
The endangered marine gastropod,Lobatus gigas,is an important fishery resource in the Caribbean region. Microbiological and parasitological research of this species have been poorly addressed despite its role in ecological fitness, conservation status and prevention of potential pathogenic infections. This study identified taxonomic groups associated with orange colored protrusions in the muscle of queen conchs using histological analysis, 454 pyrosequencing, and a combination of PCR amplification and automated Sanger sequencing. The molecular approaches indicate that the etiological agent of the muscle protrusions is a parasite belonging to the subclass Digenea. Additionally, the scope of the molecular technique allowed the detection of bacterial and fungi clades in the assignment analysis. This is the first evidence of a digenean infection in the muscle of this valuable Caribbean resource.
The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina data sets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as one nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.The ISME Journal advance online publication, 11 July 2014; doi:10.1038/ismej.2014.117.