Concept: Bayes' theorem
An important problem in reproductive medicine is deciding when people who have failed to become pregnant without medical assistance should begin investigation and treatment. This study describes a computational approach to determining what can be deduced about a couple’s future chances of pregnancy from the number of menstrual cycles over which they have been trying to conceive. The starting point is that a couple’s fertility is inherently uncertain. This uncertainty is modelled as a probability distribution for the chance of conceiving in each menstrual cycle. We have developed a general numerical computational method, which uses Bayes' theorem to generate a posterior distribution for a couple’s chance of conceiving in each cycle, conditional on the number of previous cycles of attempted conception. When various metrics of a couple’s expected chances of pregnancy were computed as a function of the number of cycles over which they had been trying to conceive, we found good fits to observed data on time to pregnancy for different populations. The commonly-used standard of 12 cycles of non-conception as an indicator of subfertility was found to be reasonably robust, though a larger or smaller number of cycles may be more appropriate depending on the population from which a couple is drawn and the precise subfertility metric which is most relevant, for example the probability of conception in the next cycle or the next 12 cycles. We have also applied our computational method to model the impact of female reproductive ageing. Results indicate that, for women over the age of 35, it may be appropriate to start investigation and treatment more quickly than for younger women. Ignoring reproductive decline during the period of attempted conception added up to two cycles to the computed number of cycles before reaching a metric of subfertility.
In a number of applications there is a need to determine the most likely pedigree for a group of persons based on genetic markers. Adequate models are needed to reach this goal. The markers used to perform the statistical calculations can be linked and there may also be linkage disequilibrium (LD) in the population. The purpose of this paper is to present a graphical Bayesian Network framework to deal with such data. Potential LD is normally ignored and it is important to verify that the resulting calculations are not biased. Even if linkage does not influence results for regular paternity cases, it may have substantial impact on likelihood ratios involving other, more extended pedigrees. Models for LD influence likelihoods for all pedigrees to some degree and an initial estimate of the impact of ignoring LD and/or linkage is desirable, going beyond mere rules of thumb based on marker distance. Furthermore, we show how one can readily include a mutation model in the Bayesian Network; extending other programs or formulas to include such models may require considerable amounts of work and will in many case not be practical. As an example, we consider the two STR markers vWa and D12S391. We estimate probabilities for population haplotypes to account for LD using a method based on data from trios, while an estimate for the degree of linkage is taken from the literature. The results show that accounting for haplotype frequencies is unnecessary in most cases for this specific pair of markers. When doing calculations on regular paternity cases, the markers can be considered statistically independent. In more complex cases of disputed relatedness, for instance cases involving siblings or so-called deficient cases, or when small differences in the LR matter, independence should not be assumed. (The networks are freely available at http://arken.umb.no/~dakl/BayesianNetworks.).
Background Although transcatheter aortic-valve replacement (TAVR) is an accepted alternative to surgery in patients with severe aortic stenosis who are at high surgical risk, less is known about comparative outcomes among patients with aortic stenosis who are at intermediate surgical risk. Methods We evaluated the clinical outcomes in intermediate-risk patients with severe, symptomatic aortic stenosis in a randomized trial comparing TAVR (performed with the use of a self-expanding prosthesis) with surgical aortic-valve replacement. The primary end point was a composite of death from any cause or disabling stroke at 24 months in patients undergoing attempted aortic-valve replacement. We used Bayesian analytical methods (with a margin of 0.07) to evaluate the noninferiority of TAVR as compared with surgical valve replacement. Results A total of 1746 patients underwent randomization at 87 centers. Of these patients, 1660 underwent an attempted TAVR or surgical procedure. The mean (±SD) age of the patients was 79.8±6.2 years, and all were at intermediate risk for surgery (Society of Thoracic Surgeons Predicted Risk of Mortality, 4.5±1.6%). At 24 months, the estimated incidence of the primary end point was 12.6% in the TAVR group and 14.0% in the surgery group (95% credible interval [Bayesian analysis] for difference, -5.2 to 2.3%; posterior probability of noninferiority, >0.999). Surgery was associated with higher rates of acute kidney injury, atrial fibrillation, and transfusion requirements, whereas TAVR had higher rates of residual aortic regurgitation and need for pacemaker implantation. TAVR resulted in lower mean gradients and larger aortic-valve areas than surgery. Structural valve deterioration at 24 months did not occur in either group. Conclusions TAVR was a noninferior alternative to surgery in patients with severe aortic stenosis at intermediate surgical risk, with a different pattern of adverse events associated with each procedure. (Funded by Medtronic; SURTAVI ClinicalTrials.gov number, NCT01586910 .).
Human behaviour is highly individual by nature, yet statistical structures are emerging which seem to govern the actions of human beings collectively. Here we search for universal statistical laws dictating the timing of human actions in communication decisions. We focus on the distribution of the time interval between messages in human broadcast communication, as documented in Twitter, and study a collection of over 160,000 tweets for three user categories: personal (controlled by one person), managed (typically PR agency controlled) and bot-controlled (automated system). To test our hypothesis, we investigate whether it is possible to differentiate between user types based on tweet timing behaviour, independently of the content in messages. For this purpose, we developed a system to process a large amount of tweets for reality mining and implemented two simple probabilistic inference algorithms: 1. a naive Bayes classifier, which distinguishes between two and three account categories with classification performance of 84.6% and 75.8%, respectively and 2. a prediction algorithm to estimate the time of a user’s next tweet with an [Formula: see text]. Our results show that we can reliably distinguish between the three user categories as well as predict the distribution of a user’s inter-message time with reasonable accuracy. More importantly, we identify a characteristic power-law decrease in the tail of inter-message time distribution by human users which is different from that obtained for managed and automated accounts. This result is evidence of a universal law that permeates the timing of human decisions in broadcast communication and extends the findings of several previous studies of peer-to-peer communication.
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
To test the hypothesis that the performance of first-trimester screening for pre-eclampsia (PE) by a method that uses Bayes' theorem to combine maternal factors with biomarkers is superior to that defined by current National Institute for Health and Care Excellence (NICE) guidelines.
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method.
- The British journal of mathematical and statistical psychology
- Published over 7 years ago
In this paper we implement a Markov chain Monte Carlo algorithm based on the stochastic search variable selection method of George and McCulloch (1993) for identifying promising subsets of manifest variables (items) for factor analysis models. The suggested algorithm is constructed by embedding in the usual factor analysis model a normal mixture prior for the model loadings with latent indicators used to identify not only which manifest variables should be included in the model but also how each manifest variable is associated with each factor. We further extend the suggested algorithm to allow for factor selection. We also develop a detailed procedure for the specification of the prior parameters values based on the practical significance of factor loadings using ideas from the original work of George and McCulloch (1993). A straightforward Gibbs sampler is used to simulate from the joint posterior distribution of all unknown parameters and the subset of variables with the highest posterior probability is selected. The proposed method is illustrated using real and simulated data sets.
Bayesian speckle tracking. Part I: an implementable perturbation to the likelihood function for ultrasound displacement estimation
- IEEE transactions on ultrasonics, ferroelectrics, and frequency control
- Published almost 8 years ago
Accurate and precise displacement estimation has been a hallmark of clinical ultrasound. Displacement estimation accuracy has largely been considered to be limited by the Cramer¿Rao lower bound (CRLB). However, the CRLB only describes the minimum variance obtainable from unbiased estimators. Unbiased estimators are generally implemented using Bayes¿ theorem, which requires a likelihood function. The classic likelihood function for the displacement estimation problem is not discriminative and is difficult to implement for clinically relevant ultrasound with diffuse scattering. Because the classic likelihood function is not effective, a perturbation is proposed. The proposed likelihood function was evaluated and compared against the classic likelihood function by converting both to posterior probability density functions (PDFs) using a noninformative prior. Example results are reported for bulk motion simulations using a 6¿ tracking kernel and 30 dB SNR for 1000 data realizations. The canonical likelihood function assigned the true displacement a mean probability of only 0.070 ± 0.020, whereas the new likelihood function assigned the true displacement a much higher probability of 0.22 ± 0.16. The new likelihood function shows improvements at least for bulk motion, acoustic radiation force induced motion, and compressive motion, and at least for SNRs greater than 10 dB and kernel lengths between 1.5 and 12λ.