Concept: Ronald Fisher
This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.
Questions over the clinical significance of cannabis withdrawal have hindered its inclusion as a discrete cannabis induced psychiatric condition in the Diagnostic and Statistical Manual of Mental Disorders (DSM IV). This study aims to quantify functional impairment to normal daily activities from cannabis withdrawal, and looks at the factors predicting functional impairment. In addition the study tests the influence of functional impairment from cannabis withdrawal on cannabis use during and after an abstinence attempt.
Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a “near significant p-value” to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called “fiddling”) in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000.
Data analysis is used to test the hypothesis that “hitting is contagious”. A statistical model is described to study the effect of a hot hitter upon his teammates' batting during a consecutive game hitting streak. Box score data for entire seasons comprising [Formula: see text] streaks of length [Formula: see text] games, including a total [Formula: see text] observations were compiled. Treatment and control sample groups ([Formula: see text]) were constructed from core lineups of players on the streaking batter’s team. The percentile method bootstrap was used to calculate [Formula: see text] confidence intervals for statistics representing differences in the mean distributions of two batting statistics between groups. Batters in the treatment group (hot streak active) showed statistically significant improvements in hitting performance, as compared against the control. Mean [Formula: see text] for the treatment group was found to be [Formula: see text] to [Formula: see text] percentage points higher during hot streaks (mean difference increased [Formula: see text] points), while the batting heat index [Formula: see text] introduced here was observed to increase by [Formula: see text] points. For each performance statistic, the null hypothesis was rejected at the [Formula: see text] significance level. We conclude that the evidence suggests the potential existence of a “statistical contagion effect”. Psychological mechanisms essential to the empirical results are suggested, as several studies from the scientific literature lend credence to contagious phenomena in sports. Causal inference from these results is difficult, but we suggest and discuss several latent variables that may contribute to the observed results, and offer possible directions for future research.
BACKGROUND: Better knowledge of the suprascapular notch anatomy may help to prevent and to assess more accurately suprascapular nerve entrapment syndrome. Our purposes were to verify the reliability of the existing data, to assess the differences between the two genders, to verify the correlation between the dimensions of the scapula and the suprascapular notch, and to investigate the relationship between the suprascapular notch and the postero-superior limit of the safe zone for the suprascapular nerve. METHODS: We examined 500 dried scapulae, measuring seven distances related to the scapular body and suprascapular notch; they were also catalogued according to gender, age and side. Suprascapular notch was classified in accordance with Rengachary’s method. For each class, we also took into consideration the width/depth ratio. Furthermore, Pearson’s correlation was calculated. RESULTS: The frequencies were: Type I 12.4%, Type II 19.8%, Type III 22.8%, Type IV 31.1%, Type V 10.2%, Type VI 3.6%. Width and depth did not demonstrate a statistical significant difference when analyzed according to gender and side; however, a significant difference was found between the depth means elaborated according to median age (73 y.o.). Correlation indexes were weak or not statistically significant. The differences among the postero-superior limits of the safe zone in the six types of notches was not statistically significant. CONCLUSIONS: Patient’s characteristics (gender, age and scapular dimensions) are not related to the characteristics of the suprascapular notch (dimensions and Type); our data suggest that the entrapment syndrome is more likely to be associated with a Type III notch because of its specific features.
Conclusive evidence for sexual dimorphism in non-avian dinosaurs has been elusive. Here it is shown that dimorphism in the shape of the dermal plates of Stegosaurus mjosi (Upper Jurassic, western USA) does not result from non-sex-related individual, interspecific, or ontogenetic variation and is most likely a sexually dimorphic feature. One morph possessed wide, oval plates 45% larger in surface area than the tall, narrow plates of the other morph. Intermediate morphologies are lacking as principal component analysis supports marked size- and shape-based dimorphism. In contrast, many non-sex-related individual variations are expected to show intermediate morphologies. Taphonomy of a new quarry in Montana (JRDI 5ES Quarry) shows that at least five individuals were buried in a single horizon and were not brought together by water or scavenger transportation. This new site demonstrates co-existence, and possibly suggests sociality, between two morphs that only show dimorphism in their plates. Without evidence for niche partitioning, it is unlikely that the two morphs represent different species. Histology of the new specimens in combination with studies on previous specimens indicates that both morphs occur in fully-grown individuals. Therefore, the dimorphism is not a result of ontogenetic change. Furthermore, the two morphs of plates do not simply come from different positions on the back of a single individual. Plates from all positions on the body can be classified as one of the two morphs, and previously discovered, isolated specimens possess only one morph of plates. Based on the seemingly display-oriented morphology of plates, female mate choice was likely the driving evolutionary mechanism rather than male-male competition. Dinosaur ornamentation possibly served similar functions to the ornamentation of modern species. Comparisons to ornamentation involved in sexual selection of extant species, such as the horns of bovids, may be appropriate in predicting the function of some dinosaur ornamentation.
In some insect species, females may base their choice for a suitable mate on male odor. In the red mason bee, Osmia bicornis, female choice is based on a male’s odor bouquet as well as its thorax vibrations, and its relatedness to the female, a putative form of optimal outbreeding. Interestingly, O. bicornis can be found as two distinct color morphs in Europe, which are thought to represent subspecies and between which we hypothesize that female discrimination may be particularly marked. Here we investigated (i) if these two colors morphs do indeed represent distinct, reproductively differentiated populations, (ii) how odor bouquets of male O. bicornis vary within and between populations, and (iii) whether variation in male odor correlates with genetic distance, which might represent a cue by which females could optimally outbreed. Using GC and GC-MS analysis of male odors and microsatellite analysis of males and females from 9 populations, we show that, in Denmark, an area of subspecies sympatry, the two color morphs at any one site do not differ, either in odor bouquet or in population genetic differentiation. Yet populations across Europe are distinct in their odor profile as well as being genetically differentiated. Odor differences do not, however, mirror genetic differentiation between populations. We hypothesize that populations from Germany, England and Denmark may be under sexual selection through female choice for local odor profiles, which are not related to color morph though which could ultimately lead to population divergence and speciation.
BACKGROUND: Nutritional epidemiology is a highly prolific field. Debates on associations of nutrients with disease risk are common in the literature and attract attention in public media. OBJECTIVE: We aimed to examine the conclusions, statistical significance, and reproducibility in the literature on associations between specific foods and cancer risk. DESIGN: We selected 50 common ingredients from random recipes in a cookbook. PubMed queries identified recent studies that evaluated the relation of each ingredient to cancer risk. Information regarding author conclusions and relevant effect estimates were extracted. When >10 articles were found, we focused on the 10 most recent articles. RESULTS: Forty ingredients (80%) had articles reporting on their cancer risk. Of 264 single-study assessments, 191 (72%) concluded that the tested food was associated with an increased (n = 103) or a decreased (n = 88) risk; 75% of the risk estimates had weak (0.05 > P ≥ 0.001) or no statistical (P > 0.05) significance. Statistically significant results were more likely than nonsignificant findings to be published in the study abstract than in only the full text (P < 0.0001). Meta-analyses (n = 36) presented more conservative results; only 13 (26%) reported an increased (n = 4) or a decreased (n = 9) risk (6 had more than weak statistical support). The median RRs (IQRs) for studies that concluded an increased or a decreased risk were 2.20 (1.60, 3.44) and 0.52 (0.39, 0.66), respectively. The RRs from the meta-analyses were on average null (median: 0.96; IQR: 0.85, 1.10). CONCLUSIONS: Associations with cancer risk or benefits have been claimed for most food ingredients. Many single studies highlight implausibly large effects, even though evidence is weak. Effect sizes shrink in meta-analyses.
The primary objective of this study was to investigate if differences in dog bite characteristics exist amongst legislated and non-legislated dog breeds listed under breed-specific legislation in Ireland (age when bitten, anatomical bite locations, triggers for biting, victim’s relationship with the dog, geographical location and owner presence, history of aggression, reporting bite incident to authorities, medical treatment required following the bite, and type of bite inflicted). A second objective of the current study was to investigate dog control officer’s enforcement and perceptions of current legislation. Data for statistical analyses were collated through a nationally advertised survey, with Pearson Chi-square and Fisher’s Exact Test statistical methods employed for analyses. A total of 140 incident surveys were assessed comprising of non-legislated (n = 100) and legislated (n = 40) dog bite incidents.