We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.
BACKGROUND: PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy. RESULTS: We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche’s 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise. CONCLUSIONS: DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.
In sports such as golf and darts it is important that one can produce ballistic movements of an object towards a goal location with as little variability as possible. A factor that influences this variability is the extent to which motor planning is updated from movement to movement based on observed errors. Previous work has shown that for reaching movements, our motor system uses the learning rate (the proportion of an error that is corrected for in the planning of the next movement) that is optimal for minimizing the endpoint variability. Here we examined whether the learning rate is hard-wired and therefore automatically optimal, or whether it is optimized through experience. We compared the performance of experienced dart players and beginners in a dart task. A hallmark of the optimal learning rate is that the lag-1 autocorrelation of movement endpoints is zero. We found that the lag-1 autocorrelation of experienced dart players was near zero, implying a near-optimal learning rate, whereas it was negative for beginners, suggesting a larger than optimal learning rate. We conclude that learning rates for trial-by-trial motor learning are optimized through experience. This study also highlights the usefulness of the lag-1 autocorrelation as an index of performance in studying motor-skill learning.
- Database : the journal of biological databases and curation
- Published about 5 years ago
Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available athttps://github.com/SuLab/crowd_cid_relexDatabase URL:https://github.com/SuLab/crowd_cid_relex.
The 1999 Institute of Medicine (IOM) report To Err Is Human transformed thinking about patient safety in U.S. health care. On its 15th anniversary, a topic largely missing from that report is finally getting its due. With its new report, Improving Diagnosis in Health Care, the IOM has acknowledged the need to address diagnostic error as a “moral, professional, and public health imperative.”(1) The new report emphasizes that diagnostic errors may be one of the most common and harmful of patient-safety problems. Why has it taken so long for the patient-safety movement to recognize the importance of diagnostic errors? Perhaps . . .
Our actions often do not match our intentions when there are external disturbances such as turbulence. We derived a novel modeling approach for determining this motor intent from targeted reaching motions that are disturbed by an unexpected force. First, we demonstrated how to mathematically invert both feedforward (predictive) and feedback controls to obtain an intended trajectory. We next examined the model’s sensitivity to a realistic range of parameter uncertainties, and found that the expected inaccuracy due to all possible parameter mis-estimations was less than typical movement-to-movement variations seen when humans reach to similar targets. The largest sensitivity arose mainly from uncertainty in joint stiffnesses. Humans cannot change their intent until they acquire sensory feedback, therefore we tested the hypothesis that a straight-line intent should be evident for at least the first 120 milliseconds following the onset of a disturbance. As expected, the intended trajectory showed no change from undisturbed reaching for more than 150 milliseconds after the disturbance onset. Beyond this point, however, we detected a change in intent in five out of eight subjects, surprisingly even when the hand is already near the target. Knowing such an intent signal is broadly applicable: enhanced human-machine interaction, the study of impaired intent in neural disorders, the real-time determination (and manipulation) of error in training, and complex systems that embody planning such as brain machine interfaces, team sports, crowds, or swarms. In addition, observing intent as it changes might act as a window into the mechanisms of planning, correction, and learning.
Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences.
A wide variety of research studies suggest that breakdowns in the diagnostic process result in a staggering toll of harm and patient deaths. These include autopsy studies, case reviews, surveys of patient and physicians, voluntary reporting systems, using standardised patients, second reviews, diagnostic testing audits and closed claims reviews. Although these different approaches provide important information and unique insights regarding diagnostic errors, each has limitations and none is well suited to establishing the incidence of diagnostic error in actual practice, or the aggregate rate of error and harm. We argue that being able to measure the incidence of diagnostic error is essential to enable research studies on diagnostic error, and to initiate quality improvement projects aimed at reducing the risk of error and harm. Three approaches appear most promising in this regard: (1) using ‘trigger tools’ to identify from electronic health records cases at high risk for diagnostic error; (2) using standardised patients (secret shoppers) to study the rate of error in practice; (3) encouraging both patients and physicians to voluntarily report errors they encounter, and facilitating this process.
BACKGROUND: We sought to characterise the frequency, health outcomes and economic consequences of diagnostic errors in the USA through analysis of closed, paid malpractice claims. METHODS: We analysed diagnosis-related claims from the National Practitioner Data Bank (1986-2010). We describe error type, outcome severity and payments (in 2011 US dollars), comparing diagnostic errors to other malpractice allegation groups and inpatient to outpatient within diagnostic errors. RESULTS: We analysed 350 706 paid claims. Diagnostic errors (n=100 249) were the leading type (28.6%) and accounted for the highest proportion of total payments (35.2%). The most frequent outcomes were death, significant permanent injury, major permanent injury and minor permanent injury. Diagnostic errors more often resulted in death than other allegation groups (40.9% vs 23.9%, p<0.001) and were the leading cause of claims-associated death and disability. More diagnostic error claims were outpatient than inpatient (68.8% vs 31.2%, p<0.001), but inpatient diagnostic errors were more likely to be lethal (48.4% vs 36.9%, p<0.001). The inflation-adjusted, 25-year sum of diagnosis-related payments was US$38.8 billion (mean per-claim payout US$386 849; median US$213 250; IQR US$74 545-484 500). Per-claim payments for permanent, serious morbidity that was 'quadriplegic, brain damage, lifelong care' (4.5%; mean US$808 591; median US$564 300), 'major' (13.3%; mean US$568 599; median US$355 350), or 'significant' (16.9%; mean US$419 711; median US$269 255) exceeded those where the outcome was death (40.9%; mean US$390 186; median US$251 745). CONCLUSIONS: Among malpractice claims, diagnostic errors appear to be the most common, most costly and most dangerous of medical mistakes. We found roughly equal numbers of lethal and non-lethal errors in our analysis, suggesting that the public health burden of diagnostic errors could be twice that previously estimated. Healthcare stakeholders should consider diagnostic safety a critical health policy issue.
Recently Veugelers and Ekwaru published data  indicating that, in its dietary reference intakes for calcium and vitamin D, the Institute of Medicine (IOM) had made a serious calculation error . Using the same data set as had the IOM panel, these investigators showed that the Recommended Dietary Allowance (RDA) for vitamin D had been underestimated by an order of magnitude. Veugelers and Ekwaru, using the IOM’s data, calculated an RDA of 8895 IU per day. They noted that there was some uncertainty in that estimate, inasmuch as this value required an extrapolation from the available data, which did not include individuals receiving daily vitamin D inputs above 2400 IU/day.[…].