Discover the most talked about and latest scientific content & concepts.

Concept: Cohen's kappa


BACKGROUND: Systematic reviews have been challenged to consider effects on disadvantaged groups. A priori specification of subgroup analyses is recommended to increase the credibility of these analyses. This study aimed to develop and assess inter-rater agreement for an algorithm for systematic review authors to predict whether differences in effect measures are likely for disadvantaged populations relative to advantaged populations (only relative effect measures were addressed). METHODS: A health equity plausibility algorithm was developed using clinimetric methods with three items based on literature review, key informant interviews and methodology studies. The three items dealt with the plausibility of differences in relative effects across sex or socioeconomic status (SES) due to: 1) patient characteristics; 2) intervention delivery (i.e., implementation); and 3) comparators. Thirty-five respondents (consisting of clinicians, methodologists and research users) assessed the likelihood of differences across sex and SES for ten systematic reviews with these questions. We assessed inter-rater reliability using Fleiss multi-rater kappa. RESULTS: The proportion agreement was 66% for patient characteristics (95% confidence interval: 61%-71%), 67% for intervention delivery (95% confidence interval: 62% to 72%) and 55% for the comparator (95% confidence interval: 50% to 60%). Inter-rater kappa, assessed with Fleiss kappa, ranged from 0 to 0.199, representing very low agreement beyond chance. CONCLUSIONS: Users of systematic reviews rated that important differences in relative effects across sex and socioeconomic status were plausible for a range of individual and population-level interventions. However, there was very low inter-rater agreement for these assessments. There is an unmet need for discussion of plausibility of differential effects in systematic reviews. Increased consideration of external validity and applicability to different populations and settings is warranted in systematic reviews to meet this need.

Concepts: Evidence-based medicine, Assessment, Interval finite element, Meta-analysis, Cohen's kappa, Inter-rater reliability, Contract, Fleiss' kappa


Purpose: the Thai PPS Adult Suandok tool was translated from the Palliative Performance Scale (PPSv2) and had been used in Chiang Mai, Thailand for several years. Aim: to test the reliability and validity of the Thai translation of PPSv2. Design: a set of 22 palliative cases were used to determine a PPS score on Time-1, and repeated two weeks later as Time-2. A survey questionnaire was also completed for qualitative analysis. Participants: a total of 70 nurses and physicians from Maharaj Nakorn Hospital in Chiang Mai participated. Results: The Time-1 intraclass correlation coefficient (ICC) for absolute agreement is 0.911 (95% CI 0.86-0.96) and for consistency is 0.92 (95% CI 0.87-0.96). The Time-2 ICC for agreement is 0.905 (95% CI 0.85-0.95) and for consistency is 0.912 (95% CI 0.86-0.96). These findings indicate good agreement among participants and also were somewhat higher in the Time-2 re-test phase. Cohen’s kappa score is 0.55, demonstrating a moderate agreement. Thematic analysis from the surveys showed that 91% felt PPS to be a valuable clinical tool overall, with it being ‘very useful’ or ‘useful’ in several areas, including care planning (78% and 20%), disease monitoring (69% and 27%) and prognostication (61% and 31%), respectively. Some respondents noted difficulty in determining appropriate scores in paraplegic patients or those with feeding tubes, while others found the instructions long or difficult. Conclusion: the Thai PPS Adult Suandok translated tool has good inter- and intra-rater reliability and can be used regularly for clinical care.

Concepts: Psychometrics, Reliability, Covariance and correlation, Thailand, Cohen's kappa, Inter-rater reliability, Translation, Chiang Mai


BACKGROUND: Physical activity is assumed to be important in the prevention and treatment of frailty. It is however unclear to what extent frailty can be influenced, because an outcome instrument is lacking. OBJECTIVES: An Evaluative Frailty Index for Physical activity (EFIP) was developed based on the Frailty Index Accumulation of Deficits and clinimetric properties were tested. DESIGN: The content of the EFIP was determined in a written Delphi procedure. Intra-rater reliability, inter-rater reliability, and construct validity were determined in an observational study (n=24) and to determine responsiveness, the EFIP was used in a physical therapy intervention study (n=12). METHOD: Intra-rater reliability and inter-rater reliability were calculated using Cohen’s kappa, construct validity was determined by correlating the score on the EFIP with those on the Timed Up &Go Test (TUG), the Performance Oriented Mobility Assessment (POMA), and the Cumulative Illness Rating Scale for geriatrics (CIRS-G). Responsiveness was calculated by means of the Effect Size (ES), the Standardized Response Mean (SRM), and a paired sample t-test. RESULTS: Fifty items were included in the EFIP. Inter-rater (Cohen’s kappa: 0,72) and intra-rater reliability (Cohen’s kappa: 0,77 and 0,80) were good. A moderate correlation with the TUG, POMA, and CIRS-G was found (0,68 -0,66 and 0,61 respectively, P< 0.001). Responsiveness was moderate to good (ES: -0.72 and SRM:-1.14) for an intervention with a significant effect (P< 0.01). LIMITATIONS: The clinimetric properties of the EFIP have been tested in a small sample and anchor based responsiveness could not be determined. CONCLUSIONS: The EFIP is a reliable, valid, and responsive instrument to evaluate the effect of physical activity on frailty in research and clinical practice.

Concepts: Scientific method, Psychometrics, Student's t-test, Reliability, Cohen's kappa, Inter-rater reliability, Jacob Cohen, Fleiss' kappa


BACKGROUND: In Canada, new models of orthopaedic care involving advanced practice physiotherapists (APP) are being implemented. In these new models, aimed at improving the efficiency of care for patients with musculoskeletal disorders, APPs diagnose, triage and conservatively treat patients. Formal validation of the efficiency and appropriateness of these emerging models is scarce. The purpose of this study is to assess the diagnostic agreement of an APP compared to orthopaedic surgeons as well as to assess treatment concordance, healthcare resource use, and patient satisfaction in this new model. METHODS: 120 patients presenting for an initial consult for hip or knee complaints in an outpatient orthopedic hospital clinic in Montreal, Canada, were independently assessed by an APP and by one of three participating orthopaedic surgeons. Each health care provider independently diagnosed the patients and provided triage recommendations (conservative or surgical management). Proportion of raw agreement and Cohen’s kappa were used to assess inter-rater agreement for diagnosis, triage, treatment recommendations and imaging tests ordered. Chi-Square tests were done in order to compare the type of conservative treatment recommendations made by the APP and the surgeons and Student t-tests to compare patient satisfaction between the two type of care. RESULTS: The majority of patients assessed were female (54%), mean age was 54.1 years and 91% consulted for a knee complaint. The raw agreement proportion for diagnosis was 88% and diagnostic inter-rater agreement was very high (kappa=0.86; 95%CI: 0.80-0.93). The triage recommendations (conservative or surgical management) raw agreement proportion was found to be 88% and inter-rater agreement for triage recommendation was high (kappa=0.77; 95%CI: 0.65-0.88). No differences were found between providers with respect to imaging tests ordered (p>=0.05). In terms of conservative treatment recommendations made, the APP gave significantly more education and prescribed more NSAIDs, joint injections, exercises and supervised physiotherapy (p<0.05). Patient satisfaction was significantly higher for APP care than for the surgeons care (p<0.05). CONCLUSION: The diagnoses and triage recommendations for patients with hip and knee disorders made by the APP were similar to the orthopaedic surgeons. These results provide evidence supporting the APP model for orthopaedic care.

Concepts: Health care, Health care provider, Patient, Diagnosis, Hospital, Physician, Illness, Cohen's kappa


Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson’s disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen’s kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.

Concepts: Scientific method, Reliability, Cohen's kappa, Inter-rater reliability, Fleiss' kappa


Objectives The standard approach to the assessment of occupational exposures is through the manual collection and coding of job histories. This method is time-consuming and costly and makes it potentially unfeasible to perform high quality analyses on occupational exposures in large population-based studies. Our aim was to develop a novel, efficient web-based tool to collect and code lifetime job histories in the UK Biobank, a population-based cohort of over 500 000 participants. Methods We developed OSCAR (occupations self-coding automatic recording) based on the hierarchical structure of the UK Standard Occupational Classification (SOC) 2000, which allows individuals to collect and automatically code their lifetime job histories via a simple decision-tree model. Participants were asked to find each of their jobs by selecting appropriate job categories until they identified their job title, which was linked to a hidden 4-digit SOC code. For each occupation a job title in free text was also collected to estimate Cohen’s kappa (κ) inter-rater agreement between SOC codes assigned by OSCAR and an expert manual coder. Results OSCAR was administered to 324 653 UK Biobank participants with an existing email address between June and September 2015. Complete 4-digit SOC-coded lifetime job histories were collected for 108 784 participants (response rate: 34%). Agreement between the 4-digit SOC codes assigned by OSCAR and the manual coder for a random sample of 400 job titles was moderately good [κ=0.45, 95% confidence interval (95% CI) 0.42-0.49], and improved when broader job categories were considered (κ=0.64, 95% CI 0.61-0.69 at a 1-digit SOC-code level). Conclusions OSCAR is a novel, efficient, and reasonably reliable web-based tool for collecting and automatically coding lifetime job histories in large population-based studies. Further application in other research projects for external validation purposes is warranted.

Concepts: Estimator, United Kingdom, Code, Source code, Cohen's kappa, Inter-rater reliability, Collected, Fleiss' kappa


Clinical evaluation of scapular dyskinesis (SD) aims to identify abnormal scapulothoracic movement, underlying causal factors, and the potential relationship with shoulder symptoms. The literature proposes different methods of dynamic clinical evaluation of SD, but improved reliability and agreement values are needed. The present study aimed to evaluate the intrarater and interrater agreement and reliability of three SD classifications: 1) 4-type classification, 2) Yes/No classification, and 3) scapular dyskinesis test (SDT). Seventy-five young athletes, including 45 men and 30 women, were evaluated. Raters evaluated the SD based on the three methods during one series of 8-10 cycles (at least eight and maximum of ten) of forward flexion and abduction with an external load under the observation of two raters trained to diagnose SD. The evaluation protocol was repeated after 3 h for intrarater analysis. The agreement percentage was calculated by dividing the observed agreement by the total number of observations. Reliability was calculated using Cohen Kappa coefficient, with a 95% confidence interval (CI), defined by Kappa coefficient ±1.96 multiplied by the measurement standard error. The interrater analyses showed an agreement percentage between 80% and 95.9% and an almost perfect reliability (k>0.81) for the three classification methods in all the test conditions, except the 4-type and SDT classification methods, which had substantial reliability (k<0.80) in shoulder abduction. Intrarater analyses showed agreement percentages between 80.7% and 89.3% and substantial reliability (0.67 to 0.81) for both raters in the three classifications. CIs ranged from moderate to almost perfect categories. This indicates that the three SD classification methods investigated in this study showed high reliability values for both intrarater and interrater evaluation throughout a protocol that provided SD evaluation training of raters and included several repetitions of arm movements with external load during a live assessment.

Concepts: Scientific method, Observation, Normal distribution, Shoulder, Cohen's kappa, Inter-rater reliability, Multiplication, Fleiss' kappa


Background. A culture of stringent drug policy, one-size-fits-all treatment approaches, and drug-related stigma has clouded clinical HIV practice in the United States. The result is a series of missed opportunities in the HIV care environment. An approach which may address the broken relationship between patient and provider is harm reduction-which removes judgment and operates at the patient’s stage of readiness. Harm reduction is not a routine part of care; rather, it exists outside clinic walls, exacerbating the divide between compassionate, stigma-free services and the medical system. Methods. Qualitative, phenomenological, semi-structured, individual interviews with patients and providers were conducted in three publicly-funded clinics in Chicago, located in areas of high HIV prevalence and drug use and serving African-American patients (N = 38). A deductive thematic analysis guided the process, including: the creation of an index code list, transcription and verification of interviews, manual coding, notation of emerging themes and refinement of code definitions, two more rounds of coding within AtlasTi, calculation of Cohen’s Kappa for interrater reliability, queries of major codes and analysis of additional common themes. Results. Thematic analysis of findings indicated that the majority of patients felt receptive to harm reduction interventions (safer injection counseling, safer stimulant use counseling, overdose prevention information, supply provision) from their provider, and expressed anticipated gratitude for harm reduction information and/or supplies within the HIV care visit, although some were reluctant to talk openly about their drug use. Provider results were mixed, with more receptivity reported by advanced practice nurses, and more barriers cited by physicians. Notable barriers included: role-perceptions, limited time, inadequate training, and the patients themselves. Discussion. Patients are willing to receive harm reduction interventions from their HIV care providers, while provider receptiveness is mixed. The findings reveal critical implications for diffusion of harm reduction into HIV care, including the need to address cited barriers for both patients and providers to ensure feasibility of implementation. Strategies to address these barriers are discussed, and recommendations for further research are also shared.

Concepts: Patient, Physician, Illness, Patience, Code, Cohen's kappa, Inter-rater reliability, Advanced practice nurse


Equitable access to programs and health services is essential to achieving national and international health goals, but it is rarely assessed because of perceived measurement challenges. One of these challenges concerns the complexities of collecting the data needed to construct asset or wealth indices, which can involve asking as many as 40 survey questions, many with multiple responses. To determine whether the number of variables and questions could be reduced to a level low enough for more routine inclusion in evaluations and research without compromising programmatic conclusions, we used data from a program evaluation in Honduras that compared a pro-poor intervention with government clinic performance as well as data from a results-based financing project in Senegal. In both, the full Demographic and Health Survey (DHS) asset questionnaires had been used as part of the evaluations. Using the full DHS results as the “gold standard,” we examined the effect of retaining successively smaller numbers of variables on the classification of the program clients in wealth quintiles. Principal components analysis was used to identify those variables in each country that demonstrated minimal absolute factor loading values for 8 different thresholds, ranging from 0.05 to 0.70. Cohen’s kappa statistic was used to assess correlation. We found that the 111 asset variables and 41 questions in the Honduras DHS could be reduced to 9 variables, captured by only 8 survey questions (kappa statistic, 0.634), without substantially altering the wealth quintile distributions for either the pro-poor program or the government clinics or changing the resulting policy conclusions. In Senegal, the 103 asset variables and 36 questions could be reduced to 32 variables and 20 questions (kappa statistic, 0.882) while maintaining a consistent mix of users in each of the 2 lowest quintiles. Less than 60% of the asset variables in the 2 countries' full DHS asset indices overlapped, and in none of the 8 simplified asset index iterations did this proportion exceed 50%. We conclude that substantially reducing the number of variables and questions used to assess equity is feasible, producing valid results and providing a less burdensome way for program implementers or researchers to evaluate whether their interventions are pro-poor. Developing a standardized, simplified asset questionnaire that could be used across countries may prove difficult, however, given that the variables that contribute the most to the asset index are largely country-specific.

Concepts: Evaluation, Non-parametric statistics, Cohen's kappa, Inter-rater reliability, Jacob Cohen, Fleiss' kappa, Scott's Pi, Joseph L. Fleiss


The aim of this study was to create and evaluate the validity, reliability and feasibility of the Regional Anaesthesia Procedural Skills tool, designed for the assessment of all peripheral and neuraxial blocks using all nerve localisation techniques. The first phase was construction of a 25-item checklist by five regional anaesthesia experts using a Delphi process. This checklist was combined with a global rating scale to create the tool. In the second phase, initial validation by 10 independent anaesthetists using a test-retest methodology was successful (Cohen kappa ≥ 0.70 for inter-rater agreement, scores between test to retest, paired t-test, p > 0.12). In the third phase, 70 clinical videos of trainees were scored by three blinded international assessors. The RAPS tool exhibited face validity (p < 0.026), construct validity (p < 0.001), feasibility (mean time to score < 3.9 min), and overall reliability (intraclass correlation coefficient 0.80 (95% CI 0.67-0.88)). The Regional Anaesthesia Procedural Skills tool used in this study is a valid and reliable assessment tool to score the performance of trainees for regional anaesthesia.

Concepts: Statistics, Psychometrics, Validity, Reliability, Covariance and correlation, Test-retest, Cohen's kappa, Inter-rater reliability