On the twelfth Day of Christmas, a Statistician Despatched to Me . . .

The BMJ’s statistical editors relish a quiet Christmas, so make their want come true and take note of the listing of frequent statistical fake pas introduced right here by Riley and colleagues

The weeks main as much as Christmas are a magical time for medical analysis. The approaching vacation season creates a dramatic upsurge in productiveness, with researchers discovering time to complete off statistical analyses, draft manuscripts, and reply to reviewers’ feedback. This exercise results in a plethora of submissions to journals similar to The BMJ in December, in order that researchers can end the yr with a way of educational achievement and benefit from the festivities with their family members. Certainly, with optimism fuelled by mulled wine and mince pies, researchers might even anticipate their article’s acceptance by early January, on the finish of the 12 days of Christmas.

A collective, nevertheless, works in opposition to this season of publication goodwill and cheer—a small however influential group of statisticians with very shiny noses for element, in search of “all is true” fairly than “all is vibrant” and emphasising no, no, no fairly than ho, ho, ho. The statisticians’ core perception is {that a} analysis article is for all times, not only for Christmas, they usually ship statistical critiques that promote excessive requirements of methodological rigour and transparency. So you may think about how busy they’re in the course of the Christmas interval with its inflow of submissions—even earlier than they will eat, drink, and be merry, these people are working tirelessly to detect submissions with misguided evaluation strategies that ought to be roasting on an open fireplace, doubtful statistical interpretations as pure as yellow snow, and half-baked reporting of research particulars that convey zero consolation and pleasure. Bah humbug!

Annually The BMJ’s statistical editors evaluation greater than 500 articles. For about 30 years, the statistical crew was led by Martin Gardner and Doug Altman,12 each of whom noticed similarities between statisticians and the Christmas star, with the statisticians lighting a path of analysis integrity, selling methodology over metrics,34 and inspiring statistical rules to “save science and the world.”5

To elicit the most typical points encountered throughout statistical peer evaluation, an inside survey was administered to The BMJ’s statistical editors. Twelve gadgets had been recognized, and every are described right here. There may be one merchandise for every of the 12 days of Christmas, the interval between 25 December and 5 January when the statisticians conduct their critiques within the mindset of the Grinch,6 however with the type coronary heart of Miracle On thirty fourth Avenue.


Each December The BMJ’s statistical editors meet for a day, once they focus on frequent statistical considerations, problematic submissions (together with people who slipped by the online, the so-called sin bin articles), and methods to enhance the evaluation course of, earlier than unwinding at The BMJ’s Christmas get together. On the assembly on 18 December 2019, the statisticians agreed that an article showcasing frequent statistical points could be useful for authors of future article submissions, and an preliminary set of things was mentioned. When reminded about this text at subsequent Christmas conferences on 17 December 2020 and 16 December 2021, the statisticians defined that progress was being delayed, sarcastically due to the variety of statistical critiques that wanted to be prioritised in The BMJ’s system.

After additional procrastination, on 28 June 2022 a possible listing of things was shared among the many statistical editors by electronic mail, and everybody was requested to incorporate any additional points they commonly encountered throughout statistical evaluation. The findings had been collated and mentioned (by electronic mail) and a closing listing of a very powerful gadgets agreed for wider dissemination. Twelve gadgets had been chosen, to match the variety of days of Christmas in the well-known track (and thereby enhance the prospect of publication in The BMJ’s Christmas situation). Sensitivity analyses, together with shallow and deep studying approaches, led to the identical 12 gadgets being chosen. An automatic synthetic intelligence algorithm shortly recognized that every one the statistical editors had been responsible of comparable statistical fake pas in a few of their very own analysis articles, and so are usually not whiter than snow.

The 12 days of statistical evaluation

To assist drive them house for Christmas, the 12 recognized gadgets are briefly defined. Think about them as stocking fillers for you, The BMJ reader and potential future creator. Permitting for sizeable Christmas meals, digest one merchandise every day between 25 December and 5 January and make a New Yr’s decision to comply with the steerage.

On the primary day of Christmas, a statistician despatched to me:

Make clear the analysis query

Christmas is a time for reflection on the that means of life and future expectations. Equally, of their critiques, statisticians will usually encourage authors to replicate on their analysis query and make clear their targets. For example, in an observational research, the authors might have to make clear the extent to which their analysis is descriptive or causal, prognostic issue identification or prediction mannequin growth, or exploratory or confirmatory. For causal analysis, authors could also be requested to specific the underlying premise (causal pathway or mannequin), for instance, when it comes to a directed acyclic graph. In systematic critiques of intervention research, authors would possibly have to state their analysis query utilizing the Inhabitants, Intervention, Comparability, and Final result system—the PICO construction.

A associated request could be to make clear the estimand—the research’s goal measure for estimation.7 In a randomised trial, for instance, the estimand is a remedy impact, however a statistician would possibly request higher definitions for the inhabitants, therapies being in contrast, outcomes, abstract measure (eg, threat ratio or threat distinction, conditional or marginal impact), and different options.78 Equally, in a meta-analysis of randomised trials the estimand should be outlined within the context of potential heterogeneity of research traits. In a meta-analysis of hypertension trials with totally different lengths of follow-up, for instance, if the estimand is a remedy impact on blood strain, readability is required about whether or not this pertains to one time level (eg, one yr), every of a number of time factors (eg, one yr and 5 years), or some common throughout a spread of time factors (eg, six months to 2 years).

On the second day of Christmas, a statistician despatched to me:

Deal with estimates, confidence intervals, and medical relevance

Simply as with under-cooked turkeys being despatched again so will articles that focus solely on P values and “statistical significance” to find out whether or not a discovering is essential. It is very important take into account the estimates (eg, imply variations, threat ratios, or hazard ratios similar to the required estimands from the primary day of Christmas), corresponding 95% confidence intervals, and potential medical relevance of findings. Statistical significance usually doesn’t equate to medical significance—if, for example, a big trial estimates a threat ratio of 0.97 and a 95% confidence interval of 0.95 to 0.99, then the remedy impact is doubtlessly small, regardless that the P worth is way lower than 0.05. Conversely, absence of proof doesn’t imply proof of absence9—right here’s an instance; if a small trial estimates a threat ratio of 0.70 and a 95% confidence interval of 0.40 to 1.10, then the magnitude of impact continues to be doubtlessly giant, regardless that the P worth is bigger than 0.05. Therefore, the statistical editors will ask authors to make clear phrases similar to “important discovering,” be much less definitive when confidence intervals are extensive, and take into account leads to the context of medical relevance or impression. A bayesian strategy could also be useful,10 to specific probabilistic statements (eg, there’s a chance of 0.85 that the chance ratio is <0.9).

On the third day of Christmas, a statistician despatched to me:

Rigorously account for lacking knowledge

Lacking values happen in all varieties of medical analysis,11 each for covariates and for outcomes. Authors have to not solely acknowledge the completeness of their knowledge but in addition to quantify and report the quantity of lacking knowledge and clarify how such knowledge had been dealt with in analyses. It’s spooky what number of submissions fail to do that—the ghost of Christmas articles previous, current, and future.

If it transpires members with lacking knowledge had been merely excluded (ie, an entire case evaluation was carried out), then authors could also be requested to revise their analyses by together with these members, utilizing an acceptable strategy for imputing the lacking values. A whole case evaluation isn’t really helpful, particularly in observational analysis, as discarding sufferers often reduces statistical energy and precision to estimate relationships and can also result in biased estimates.12 The most effective strategy for imputation is context particular and too nuanced for detailed interrogation right here. For instance, methods for dealing with lacking baseline values in randomised trials would possibly embody changing with the imply worth (for steady variables), making a separate class of a categorical predictor to point the presence of a lacking worth (ie, the lacking indicator methodology), or a number of imputation carried out individually by randomised group.1314 For observational research analyzing associations, imply imputation and lacking indicator approaches can result in biased outcomes,15 and so a a number of imputation strategy is commonly (although not all the time16) most well-liked. Below a lacking at random assumption, this includes lacking values being imputed (on a number of events to replicate the uncertainty within the imputation) conditional on the noticed values of different research variables.17 When utilizing a number of imputation, the strategies used to do that have to be described, together with the set of variables used within the imputation course of. An introduction to a number of imputation is offered elsewhere,12 and there are textbooks devoted to lacking knowledge.18

On the fourth day of Christmas, a statistician despatched to me:

Don’t dichotomise steady variables

Santa likes dichotomisation (you’re both naughty or good), however statisticians could be appalled if authors selected to dichotomise steady variables, similar to age and blood strain, by splitting them into two teams outlined by being above and under some arbitrary reduce level, similar to a systolic blood strain of 130 mm Hg. Dichotomisation ought to be prevented,1920 because it wastes info and isn’t justifiable in contrast with analysing steady variables on their steady scale (see the stocking filler for the fifth day of Christmas). Why ought to a person with a worth just under the reduce level (on this occasion 129 mm Hg) be thought-about fully totally different from a person with a worth simply above it (131 mm Hg)? Conversely, the values for 2 people inside the similar group might differ tremendously (allow us to say 131 mm Hg and 220 mm Hg) and so why ought to they be thought-about the identical? On this context, dichotomisation is perhaps thought-about unethical. Research members conform to contribute their knowledge for analysis on the proviso it’s used appropriately; discarding info by dichotomising covariate values violates this settlement.

Dichotomisation additionally reduces statistical energy to detect associations between a steady covariate and the end result,192021 and it attenuates the predictive efficiency of prognostic fashions.22 In a single instance, dichotomising on the median worth led to a discount in energy akin to discarding a 3rd of the info,23 whereas in one other instance, retaining the continual scale defined 31% extra end result variability than dichotomising on the median.20 Lower factors additionally result in knowledge dredging and the number of “optimum” reduce factors to maximise statistical significance.21 This results in bias and lack of replication in new knowledge and hinders meta-analysis as a result of totally different research undertake totally different reduce factors. Dichotomisation of steady outcomes additionally reduces energy and should end in deceptive conclusions.2425 instance is a randomised trial by which the required pattern measurement was diminished from 800 to 88 after the end result (Beck rating) modified from being analysed as dichotomised to being analysed on its steady scale.26

On the fifth day of Christmas, a statistician despatched to me:

Think about non-linear relationships

At Christmas dinner, some household relationships are easy to deal with, whereas others are extra advanced and require higher care. Equally, some steady covariates have a easy linear relationship with an end result (maybe after some transformation of the info, similar to a pure log transformation), whereas others have a extra advanced non-linear relationship. A linear relationship (affiliation) assumes {that a} 1 unit enhance within the covariate has the identical impact on the end result throughout your complete vary of the covariate’s values. The idea being, for instance, that the impression of a change in age from 30 to 31 years is similar as a change in age from 90 to 91 years. In distinction, a non-linear affiliation permits the impression of a 1 unit enhance within the steady covariate to differ throughout the spectrum of predictor values. For instance, a change in age from 30 to 31 years might have little impression on threat, whereas a change in age from 90 to 91 years could also be necessary. The 2 most typical approaches to non-linear modelling are cubic splines and fractional polynomials.272829303132

Except for categorisation, most submissions to The BMJ solely take into account linear relationships. The statistical reviewers subsequently might ask the researchers to think about non-linear relationships, to keep away from necessary associations not being absolutely captured and even missed.33 The research by Johannesen and colleagues is an instance of non-linear relationships being examined.34 The authors used restricted cubic splines to indicate that the affiliation between low density lipoprotein levels of cholesterol and the chance of all trigger mortality is U-shaped, with high and low ranges related to an elevated threat of all trigger mortality within the normal inhabitants of Denmark. Figure 1 illustrates the findings for the general inhabitants, and for subgroups outlined by use of lipid reducing remedy, with the connection strongest in these not receiving remedy.

Fig 1
Fig 1

Non-linear affiliation derived utilizing restricted cubic splines of people from the Copenhagen Normal Inhabitants Research adopted for a imply 9.4 years, from Johannesen et al.34 Multivariable adjusted hazard ratios for all trigger mortality are proven in keeping with ranges of low density lipoprotein ldl cholesterol (LDL-C) on a steady scale. 95% confidence intervals are derived from restricted cubic spline regressions with three knots. Reference traces for no affiliation are proven at a hazard ratio of 1.0. Arrows point out focus of LDL-C related to the bottom threat of all trigger mortality. Analyses had been adjusted for baseline age, intercourse, present smoking, cumulative variety of cigarette pack years, systolic blood strain, lipid reducing remedy, diabetes, heart problems, most cancers, and power obstructive pulmonary illness

On the sixth day of Christmas, a statistician despatched to me:

Quantify variations in subgroup outcomes

Many submitted articles embody outcomes for subgroups, similar to outlined by intercourse or gender, or those that do and don’t eat Brussels sprouts. A typical mistake is to conclude that the outcomes for one subgroup are totally different from the outcomes of one other subgroup, with out really quantifying the distinction. Altman and Bland thought-about this eloquently,35 exhibiting remedy impact outcomes for 2 subgroups, the primary of which was statistically important (threat ratio 0.67, 95% confidence interval 0.46 to 0.98; P=0.03), whereas the second was not (0.88, 0.71 to 1.08; P=0.2). A naïve interpretation is to conclude that the remedy is useful for the primary subgroup however not for the second subgroup. Nonetheless, really evaluating the outcomes between the 2 subgroups reveals a large confidence interval (ratio of threat ratios 0.76, 95% confidence interval 0.49 to 1.17; P=0.2), which suggests additional analysis is required earlier than concluding a subgroup impact. A associated mistake is to make conclusions about whether or not subgroups differ based mostly solely on if their separate 95% confidence intervals overlap or not.36 Therefore, if researchers look at subgroups of their research, the statistical editors will examine for quantification of variations in subgroup outcomes, and, if not executed, ask for this to be addressed. Even when real variations exist between subgroups, the (remedy) impact should be necessary for every subgroup, and subsequently this ought to be recognised in research conclusions.

Inspecting variations between subgroups is advanced, and a broader subject is the modelling of interactions between (remedy) results and covariates.37 Issues embody the dimensions used to measure the impact (eg, threat ratio or odds ratio),38 guaranteeing subgroups are usually not arbitrarily outlined by dichotomising a steady covariate,39 and permitting for doubtlessly non-linear relationships (see our stocking fillers for the fourth day and fifth day of Christmas).40

On the seventh day of Christmas, a statistician despatched to me:

Think about accounting for clustering

At The BMJ’s Christmas get together, the statistical editors are inclined to cluster in a nook, avoiding interplay and eye contact with non-statisticians each time attainable for concern of being requested to conduct a postmortem examination of rejected work. Equally, a analysis research might include knowledge from a number of clusters, together with observational research that use e-health information from a number of hospitals or practices, cluster or multicentre randomised trials,414243444546 and meta-analyses of particular person participant knowledge from a number of research.47 Generally the evaluation doesn’t account for this clustering, which may result in biased outcomes or deceptive confidence intervals.48495051 Ignoring clustering makes a powerful assumption that outcomes for people inside totally different clusters are related to one another (eg, when it comes to the end result threat), which can be troublesome to justify when clusters similar to hospitals or research have totally different clinicians, procedures, and affected person case combine.

Thus, if, within the knowledge evaluation, a submitted article ignores apparent clustering that must be captured or thought-about, the statistical editors will ask for justification of this or for a reanalysis accounting for clustering utilizing an strategy appropriate for the estimand of curiosity (see our stocking filler for the primary day of Christmas).525354 A multilevel or combined results mannequin is perhaps really helpful, for instance, as this enables cluster particular baseline dangers to be accounted for and permits between cluster heterogeneity within the impact of curiosity to be examined.

On the eighth day of Christmas, a statistician despatched to me:

Interpret I2 and meta-regression appropriately

Systematic critiques and meta-analyses are fashionable submissions to The BMJ. Most of them embody the I2 statistic55 however interpret it incorrectly, which provides the statisticians a recurring nightmare earlier than (and after) Christmas. I2 describes the share of variability in (remedy) impact estimates that is because of between research heterogeneity fairly than likelihood. The impression of between research heterogeneity on the abstract remedy impact estimate is small if I2 is near 0%, and it’s giant if I2 is near 100%. A typical mistake is for authors to interpret I2 as a measure of absolutely the quantity of heterogeneity (ie, to think about I2 as an estimate of the between research variance in true results), and to erroneously use it to determine whether or not to make use of a random results meta-analysis mannequin. That is unwise, as I2 is a relative measure and is determined by the scale of the inside research variances of impact estimates, not simply the scale of the between research variance of true results (often known as τ2). For instance, if all of the included research are small, and thus inside research variances of impact estimates are giant, I2 may be near 0% even when the between research variance is giant and necessary.56 Conversely, I2 could also be giant even when the between research variance is small and unimportant. Statistical critiques will ask authors to right any misuse of I2, and to additionally current the estimate of between research variance instantly.

Meta-regression is commonly used to look at the extent to which research degree covariates (eg, imply age, dose of remedy, threat of bias ranking) clarify between research heterogeneity, however typically the statistical editors will ask authors to interpret meta-regression outcomes cautiously.57 Firstly, the variety of trials are sometimes small, after which meta-regression is affected by low energy to detect research degree traits which can be genuinely related to adjustments within the total remedy impact in a trial. Secondly, confounding throughout trials is probably going, and so making causal statements concerning the impression of trial degree covariates is finest prevented. For instance, these trials with the next threat of bias may also have the best dose or be carried out specifically nations, thus making it arduous to disentangle the impact of threat of bias from the impact of dose and nation. Thirdly, the trial degree affiliation of aggregated participant degree covariates (eg, imply age, proportion males) with the general remedy impact shouldn’t be used to make inferences about how values of participant degree covariates (eg, age, intercourse, biomarker values) work together with remedy impact. Aggregation bias might result in dramatic variations in noticed relationships on the trial degree from these on the participant degree,5859 as proven in figure 2.

Fig 2
Fig 2

Aggregation bias when utilizing meta-regression of research degree outcomes fairly than particular person participant knowledge meta-analysis of treatment-covariate interactions. The analysis query was whether or not blood strain reducing remedy is simpler amongst girls than males. Proof is proven from a meta-analysis of 10 trials of antihypertensive remedy, evaluating the throughout trial affiliation of remedy impact and proportion males (stable line)—which is steep and statistically important—with participant degree interactions of intercourse and remedy impact in every trial (dashed traces) —that are flat and neither clinically nor statistically necessary. This case research relies on earlier work.475860 Every block represents one trial, with block measurement proportional to trial measurement. Throughout trial affiliation is denoted by gradient of stable line, derived from a meta-regression of the trial remedy results in opposition to proportion of males, which suggests a big impact of a 15 mm Hg (95% confidence interval 8.8 to 21 mm Hg) higher discount in systolic blood strain in trials with solely girls in contrast with solely males. Nonetheless, the treatment-sex interplay based mostly on participant degree knowledge is denoted by gradient of dashed traces inside every trial, and on common these counsel solely a 0.8 mm Hg (−0.5 to 2.1 mm Hg) higher remedy impact for girls than for males, which is neither clinically nor statistically important

On the ninth day of Christmas, a statistician despatched to me:

Assess calibration of mannequin predictions

Scientific prediction fashions estimate end result values (for steady outcomes) or end result dangers (for binary or time-to-event outcomes) to tell analysis and prognosis in people. Articles creating or validating prediction fashions usually fail to completely consider mannequin efficiency, which may have necessary penalties as a result of inaccurate predictions can result in incorrect choices and dangerous communication to sufferers, similar to giving false reassurance or hope. For fashions that estimate end result threat, predictive efficiency ought to be evaluated when it comes to discrimination, calibration, and medical utility, as described elsewhere.616263

Nonetheless, nearly all of submissions focus solely on mannequin discrimination (as quantified by, for instance, the C statistic or space underneath the curve28)—when that is executed, an incomplete impression is created, simply as with that unfinished 1000 piece jigsaw from final Christmas. Figure 3 exhibits a printed calibration plot for a prediction mannequin with a promising C statistic of 0.81, however there may be clear (albeit maybe small) miscalibration of predicted dangers within the vary of predicted dangers between 0.05 and 0.2.64 This miscalibration might impression the medical utility of the mannequin, particularly if choices, similar to about remedy or monitoring methods, are dictated by threat thresholds in that vary of predicted dangers, which may be investigated in a call curve evaluation.65 Conversely, miscalibration doesn’t essentially point out the mannequin has no medical utility, because it is determined by the magnitude of miscalibration and when it happens in relation to determination thresholds.

Fig 3
Fig 3

Instance of a calibration plot to look at settlement between noticed dangers and estimated (predicted) dangers from a prediction mannequin.64 The research developed prediction fashions to estimate the chance of mortality in people who skilled subarachnoid haemorrhage from ruptured intracranial aneurysm. Circles are estimated and noticed dangers grouped by 10ths of estimated dangers, and the yellow dashed line is a loess smoother to seize settlement throughout the vary of estimated dangers. AUROC=space underneath the receiving operator attribute

Statistical editors can also counsel that researchers of mannequin growth research undertake a reanalysis utilizing penalisation or shrinkage strategies (eg, ridge regression, lasso, elastic internet), which cut back the potential for overfitting and assist enhance calibration of predictions in new knowledge.6667 Penalisation strategies, similar to Firth’s correction,68 can be necessary in non-prediction conditions (eg, randomised trials estimating remedy results) with sparse knowledge, as customary strategies (similar to logistic regression) might give biased impact estimates on this scenario.69

On the tenth day of Christmas, a statistician despatched to me:

Rigorously take into account the variable choice strategy

A typical space of criticism in statistical critiques is using variable choice strategies (eg, number of covariates based mostly on the statistical significance of their results).70 If these strategies are used, statistical editors will ask authors for justification. Relying on the research, statistical editors would possibly even counsel authors keep away from these approaches solely, simply as you’ll that final remaining turkey sandwich on New Yr’s Day. For instance, variable choice strategies are finest prevented in prognostic issue research, as the everyday intention is to supply an unbiased estimate of how a specific issue provides prognostic worth over and above different (established) prognostic components.71 Due to this fact, a regression mannequin forcing in all the present components is required to look at the prognostic impact of the brand new issue after accounting for the impact of current prognostic components. Equally, in causal analysis based mostly on observational knowledge, the selection of confounding components to incorporate as adjustment components ought to be chosen based mostly on the causal pathway—for instance, as expressed utilizing directed acyclic graphs (with consideration of potential mediators between covariates and end result72), not statistical significance based mostly on automated choice strategies.

Within the growth of medical prediction fashions, variable choice (by shrinkage) could also be included utilizing strategies similar to lasso or elastic internet, which begin with a full mannequin together with all candidate predictors for potential inclusion. A typical, however inappropriate strategy is to make use of univariable screening, when choices for predictor inclusion are based mostly on P values for noticed unadjusted impact estimates. This isn’t a wise technique,73 as what issues is the impact of a predictor after adjustment for different predictors, as a result of in observe the related predictors are used (by healthcare professionals and sufferers) together. When, for instance, a prognostic mannequin was being developed for threat of recurrent venous thromboembolism, the researchers discovered that the unadjusted prognostic impact of age was not statistically important from univariable evaluation however that the adjusted impact was important and in the other way from multivariable evaluation.74

On the eleventh day of Christmas, a statistician despatched to me:

Assess the impression of any assumptions

Everybody agrees that It’s A Great Life is a Christmas film, however whether or not this is applicable to Die Arduous is debatable. Equally, statistical editors would possibly debate authors’ die-hard evaluation assumptions, and even ask them to look at whether or not outcomes change if the assumptions change (a sensitivity evaluation). For instance, in submitted trials with time-to-event knowledge, similar to time to recurrence or loss of life, it is not uncommon to report the hazard ratio, assuming it’s a fixed over the entire follow-up interval. If this assumption is just not justified in an article, authors could also be requested to handle this—for instance, by graphically presenting how the hazard ratio adjustments over time (maybe based mostly on a survival mannequin that features an interplay between the covariate of curiosity and (log) time).75 One other instance is in submissions with bayesian analyses, the place prior distributions are labelled as “imprecise” or “non-informative” however should be influential. On this scenario, authors could also be requested to exhibit how outcomes change when different believable prior distributions are chosen.

On the twelfth day of Christmas, a statistician despatched to me:

Use reporting pointers and keep away from overinterpretation

Altman as soon as stated, “Readers shouldn’t should infer what was most likely executed, they need to be informed explicitly. Correct methodology ought to be used and be seen to have been used.”76 Incompletely reported analysis is indefensible and creates confusion, simply as with these unlabelled presents underneath the Christmas tree. Readers have to know the rationale and targets of a reported research, the research design, strategies used, participant traits, outcomes, certainty of proof, analysis implications, and so forth. If any of those parts are lacking, authors shall be requested to make clear them.

Make use of reporting pointers. They supply a guidelines of things to be reported (Santa suggests checking this twice), which characterize the minimal element required to allow readers (together with statistical editors) to grasp the analysis and critically appraise its findings. Reporting pointers are listed on The EQUATOR Community web site, which maintains a complete assortment of pointers and different supplies associated to well being analysis reporting.77Table 1 exhibits examples, together with the CONSORT assertion for randomised trials79 and the TRIPOD guideline for prediction mannequin research.8081The BMJ requires authors to finish the guidelines inside the related guideline (and embody it with a submission), indicating on which web page of the submitted manuscript every merchandise has been reported.

Desk 1

Examples of reporting pointers and their extensions for various research designs

One other frequent a part of the statistical editors evaluation course of, associated to reporting, is to question overinterpretation of findings—and even spin,82 similar to unjustified claims of causality, generalisability of outcomes, or instant implications for medical observe. Incorrect terminology is one other bugbear—specifically the misuse of multivariate (fairly than multivariable) to discuss with a regression mannequin with a number of covariates (variables), and the misuse of quantiles to discuss with teams fairly than the reduce factors used to create the teams (eg, deciles are the 9 reduce factors used to create 10 equal sized teams referred to as 10ths).83


This listing of 12 statistical points routinely encountered throughout peer evaluation of articles submitted to The BMJ will hopefully assist authors of future submissions. Final Christmas statistical editors tweeted this listing, however the very subsequent day they acquired poor submissions anyway. This yr, to save lots of them from tears, they’ve tailor-made it for somebody particular—you, The BMJ reader.

Authors ought to tackle this listing earlier than dashing to submit papers to The BMJ subsequent Christmas, to be able to convey pleasure to the world by decreasing the size of statistical critiques and permitting the statistical editors to spend extra time with their important (sure, pun supposed) others over the festive interval. If authors did adhere to this steerage, the “On the twelfth Day of Christmas” track would change to the very constructive “On the twelfth Day of Christmas Evaluation” with lyrics reflecting suggestions from a contented statistician (maybe take part utilizing the track sheet in determine 4).

Fig 4
Fig 4

Music sheet for “On the twelfth Day of Christmas Evaluation, a Pleased Statistician despatched to me . . .”

In the end, The BMJ needs to publish the gold not the mould, the frankincense not the makes-no-sense, and the myrrh not the urrgghh. Many different subjects may have been talked about, and for additional steerage readers are directed to the BMJ Statistics Notes collection (written primarily by Doug Altman and Martin Bland), the Analysis Strategies and Reporting part of The BMJ,84 and different overviews of frequent statistical errors.8586


This text is devoted to Doug Altman and Martin Gardner, who led by instance as chief statistical editors at The BMJ for over 30 years. We additionally thank the researchers who’ve responded politely to our statistical critiques over a few years.


  • Contributors: RR conceived the paper and generated the Christmas theme and preliminary set of 12 stocking fillers. RR and GC produced the primary draft of the article, together with the exploratory textual content for every merchandise and examples. All authors offered feedback and advised adjustments, which had been then addressed by RR and GC. RR revised the article on the idea of reviewer feedback, adopted by options and closing approval by all authors.

  • Competing pursuits: We’ve learn and understood the BMJ Group coverage on declaration of pursuits and declare: none.

  • Provenance and peer evaluation: Not commissioned; externally peer reviewed.

Leave a Reply

Your email address will not be published. Required fields are marked *