Skip to main content

Risk of Bias Assessment Template for Systematic Reviews

Gatsbi

A strong systematic review does not just collect studies; it evaluates how much trust readers should place in each result. That is the role of risk of bias assessment. In contemporary evidence synthesis, the core tools are domain-based and question-driven: RoB 2 for randomized trials, ROBINS-I for non-randomized studies of interventions, and QUADAS-2 or, for new diagnostic reviews, increasingly QUADAS-3 for diagnostic test accuracy studies. These tools are built around structured domains, signaling questions, and transparent justifications rather than numerical “quality scores.” Their output should directly shape how you present findings, grade certainty with GRADE, and decide whether meta-analysis results need restriction, stratification, or sensitivity analysis. 

This article is therefore structured to do four things at once: help readers understand RoB assessment, give them a ready-to-use template, show them how to apply it accurately, and package the page for SEO and geographic relevance. Downloadable templates are included below in spreadsheet and CSV form, and the embedded tables are designed to be printable or pasted into Excel, Google Sheets, Word, Notion, or RevMan workflows. The downloadable workbook and CSV files in this post are simplified working templates aligned to official domain structures; for final review conduct, authors should still consult the full official guidance for the selected tool. 

Why Risk of Bias Assessment Matters in Systematic Reviews

Risk of bias is about systematic error in a study result, not about whether authors wrote the paper clearly, and not about random error or imprecision. A study can be precise but biased, or imprecise but methodologically sound. Cochrane distinguishes bias from both imprecision and external validity, and PRISMA separates reporting of study risk of bias from reporting bias due to missing results in a synthesis. 

That distinction matters because modern RoB tools are designed to assess the result being used in the review, not merely a study “as a whole.” RoB 2 explicitly focuses on a specific result from a randomized trial. ROBINS-I does the same for a specific result from a non-randomized intervention study, and QUADAS-3 now moves diagnostic accuracy appraisal toward estimate-level assessment as well. This result-level orientation is one of the biggest practical shifts in contemporary evidence synthesis, because one outcome or timepoint from the same study may be at low risk of bias while another is not. 

RoB assessment is also not optional if the review aims to be methodologically credible. Cochrane’s handbook treats it as a core step in review conduct, PRISMA 2020 requires transparent reporting of how it was done, and GRADE uses risk of bias as one of the five domains that can lower certainty of evidence for an outcome. WHO’s guideline-development handbook similarly embeds evidence appraisal and GRADE-based reasoning into decision-making. 

In practice, a good RoB assessment improves a review in three ways. First, it prevents overconfident conclusions from flawed studies. Second, it helps readers understand differences between results that are methodologically stronger versus weaker. Third, it provides the bridge from study appraisal to synthesis decisions, such as whether the main meta-analysis should be restricted to low-risk studies, whether subgroup analyses by bias status are justified, and whether certainty should be downgraded in the Summary of Findings table. 

Which Risk of Bias Tool Should You Use

The short answer is simple: use the tool that matches the study design and question type. For intervention effects in randomized trials, use RoB 2. For non-randomized intervention studies, use ROBINS-I. For diagnostic test accuracy studies, QUADAS-2 remains widely used, but the University of Bristol’s QUADAS project now identifies QUADAS-3 as the current recommended version. RoB 2 also has specific variants for cluster-randomized and crossover trials. Beyond primary-study tools, ROB-ME evaluates bias due to missing evidence in a synthesis, ROBIS evaluates risk of bias in systematic reviews, and PROBAST addresses diagnostic/prognostic prediction model studies. The Newcastle–Ottawa Scale remains common in legacy observational reviews, but it is a star-based assessment tool, not an algorithm-guided result-level bias tool like RoB 2 or ROBINS-I. 

Concise comparison table

ToolBest use caseCore domainsSignaling questionsTypical output
RoB 2Individually randomized parallel-group trialsRandomization; deviations from intended interventions; missing outcome data; measurement of outcome; selection of reported resultYes, algorithm-guidedLow risk / Some concerns / High risk
RoB 2 for cluster trialsCluster-randomized trials, including stepped-wedge contextsRoB 2 domains plus recruitment/identification timing issues specific to clustersYes, algorithm-guidedLow risk / Some concerns / High risk
RoB 2 for crossover trialsTwo-period crossover RCTsRoB 2 framework with crossover-specific attention to carryover and period effectsYes, algorithm-guidedLow risk / Some concerns / High risk
ROBINS-INon-randomized studies of interventionsConfounding; selection into study; classification of interventions; deviations from intended interventions; missing data; measurement of outcomes; selection of reported resultYes, algorithm-guidedLow / Moderate / Serious / Critical / No information
QUADAS-2Diagnostic test accuracy studiesPatient selection; index test; reference standard; flow and timing; plus applicability concerns in first three domainsYes, tailored to review questionDomain judgments for risk of bias and applicability
QUADAS-3New diagnostic accuracy reviews and estimate-level DTA appraisalParticipants; index test; target condition; analysis; plus applicability in first three domainsYes, tailored and estimate-levelDomain and overall judgments for risk of bias and applicability
ROB-MEBias due to missing evidence in pairwise synthesis/meta-analysisMissing studies and missing results within studies at the synthesis levelStructured judgmentsRisk of bias due to missing evidence in a synthesis
ROBISOverviews/umbrella reviews appraising existing systematic reviewsReview-level concerns with review process and review biasStructured, phasedReview-level risk of bias
PROBASTDiagnostic and prognostic prediction model studiesParticipants; predictors; outcome; analysisYesRisk of bias and applicability
Newcastle–Ottawa ScaleLegacy cohort/case-control appraisal workflowsSelection; comparability; exposure/outcomeItem checklist with starsStar-based summary

Table note. RoB 2 structure and judgments are summarized from Cochrane and Cochrane Handbook guidance; cluster and crossover variants from Cochrane Chapter 23 and riskofbias.info; ROBINS-I from Cochrane Chapter 25; QUADAS-2 and QUADAS-3 from the Bristol QUADAS project and original publications; ROB-ME, ROBIS, PROBAST, and NOS from their official project sites or foundational papers. 

A practical selection rule works well for most blogs and reviews. If your review question asks, “What is the effect of an intervention?” and the included studies are randomized, choose RoB 2. If allocation was not randomized, choose ROBINS-I and explicitly specify the target trial and key confounders before you begin. If the review asks, “How accurate is this test?” use QUADAS-2 if you need continuity with older reviews, but for a new review begun now, consider QUADAS-3 because it is the current recommended iteration and shifts appraisal from study-level to estimate-level judgments. 

Downloadable and Printable Risk of Bias Assessment Template

You can download the templates used in this article here:

The templates below are intentionally fillableprintable, and tool-aligned. They preserve official domain names and judgment categories, avoid homemade scoring systems, and include fields for the exact result being assessed, support for judgment, and reviewer consensus. That structure reflects PRISMA 2020 reporting expectations and the logic of RoB 2, ROBINS-I, and QUADAS. 

Master fields to include in every template

FieldWhat to enter
Review IDInternal review name or registration ID
Study IDCitation, trial registry, or author-year
Report IDIf multiple reports exist for the same study
Study designRCT, cluster RCT, crossover RCT, cohort, case-control, DTA, etc.
Result assessedOutcome, timepoint, analysis population, and effect estimate
Tool and versionRoB 2, ROBINS-I, QUADAS-2, QUADAS-3, etc.
Reviewer 1Initial judgment
Reviewer 2Initial judgment
Consensus judgmentFinal agreed judgment
Support for judgmentShort quotation or evidence note
NotesProtocol deviations, author contact, assumptions

Template for randomized controlled trials

RoB 2 contains five mandatory domains for individually randomized parallel-group trials. Assessment is result-specific, and answers to signaling questions are mapped by algorithm to domain judgments of Low riskSome concerns, or High risk. The overall judgment is at least as severe as the worst domain judgment, unless reviewers justify an override. 

SectionFillable fieldExample entry
Study detailsReview IDDiabetes-SR-2026
Study detailsStudy ID / citationSmith 2025
Study detailsOutcome / timepointHbA1c at 24 weeks
Study detailsEffect of interestEffect of assignment to intervention
Domain 1Bias arising from the randomization processLow
Domain 1 supportAllocation sequence computer-generated; central allocation concealment
Domain 2Bias due to deviations from intended interventionsSome concerns
Domain 2 supportParticipants unblinded; limited evidence of differential co-interventions
Domain 3Bias due to missing outcome dataSome concerns
Domain 3 support11% attrition; reasons partially balanced but outcome related to dropout plausible
Domain 4Bias in measurement of the outcomeLow
Domain 4 supportLab-based HbA1c measured identically across groups
Domain 5Bias in selection of the reported resultSome concerns
Domain 5 supportNo publicly accessible SAP; multiple timepoints mentioned
OverallOverall RoB 2 judgmentSome concerns
OverallLikely direction of biasUnclear

How to fill it in. Enter the exact result first, then answer each domain using the official signaling-question logic. Keep justifications brief and evidence-based. If the trial is cluster-randomized, add the cluster-specific recruitment/identification domain; if it is crossover, explicitly inspect carryover and period effects. 

Template for non-randomized studies of interventions

ROBINS-I demands more front-end specification than RoB 2. Cochrane recommends that reviewers define important confounding domains and co-interventions in the protocol and describe a target trial that the non-randomized study is trying to emulate. It also emphasizes that NRSI appraisal requires both methodological expertise and subject-matter expertise. 

SectionFillable fieldExample entry
Study detailsReview IDAsthma-SR-2026
Study detailsStudy ID / citationLee 2024
Study detailsTarget trial specificationAdults with uncontrolled asthma initiating biologic A vs biologic B
Study detailsOutcome / timepoint / estimateExacerbations at 12 months; adjusted HR
Pre-interventionBias due to confoundingSerious
SupportDisease severity and eosinophil count incompletely adjusted
Pre-interventionBias in selection of participants into the studyModerate
SupportNew-user design used, but index date capture imperfect
At interventionBias in classification of interventionsLow
SupportPharmacy and prescribing records concordant
Post-interventionBias due to deviations from intended interventionsModerate
SupportDifferential add-on therapy possible after start of follow-up
Post-interventionBias due to missing dataLow
SupportOutcome capture via claims database near-complete
Post-interventionBias in measurement of outcomesLow
SupportHospitalized exacerbation objectively recorded
Post-interventionBias in selection of the reported resultModerate
SupportMultiple adjustment sets reported; analysis plan unavailable
OverallOverall ROBINS-I judgmentSerious

How to fill it in. Start by stating the causal contrast and the target trial. Then document whether confounders were measured validly, whether follow-up begins at intervention start, how interventions were classified, whether post-baseline deviations matter for your effect of interest, and whether the reported estimate may have been selected from multiple analyses. ROBINS-I uses Low, Moderate, Serious, Critical, and No information. An overall Critical judgment means the study should not contribute useful evidence to the synthesis. 

Template for diagnostic accuracy studies

QUADAS-2 evaluates four domains—patient selection, index test, reference standard, and flow and timing—and also judges applicability for the first three domains. It is applied in four phases: summarize the review question, tailor the tool and guidance, build a flow diagram, and assess bias/applicability. The Bristol QUADAS program now states that QUADAS-3 is the current recommended version, but QUADAS-2 remains common and still useful, especially when updating older reviews or matching existing review methods. 

SectionFillable fieldExample entry
Study detailsReview IDSepsis-DTA-2026
Study detailsStudy ID / citationPatel 2023
Study detailsIndex testPlasma biomarker X
Study detailsReference standardClinical diagnosis by adjudication committee
Study detailsAccuracy estimate assessedSensitivity and specificity at prespecified threshold
Risk of biasPatient selectionHigh
SupportCase-control sampling with exclusions of indeterminate cases
Risk of biasIndex testLow
SupportThreshold prespecified before analysis
Risk of biasReference standardSome concern / High depending tool conventions
SupportAdjudication may have partial awareness of biomarker data
Risk of biasFlow and timingHigh
SupportNot all participants received the same reference standard; delayed verification
ApplicabilityPatient selection applicabilityHigh
ApplicabilityIndex test applicabilityLow
ApplicabilityReference standard applicabilitySome concerns
OverallOverall summaryHigh risk of bias; applicability concerns in patient spectrum

How to fill it in. Tailor the signaling questions to the review’s test, setting, and target condition before study appraisal begins. If this is a brand-new diagnostic review, consider whether QUADAS-3 is the better primary framework because it introduces the “ideal test accuracy trial” concept and estimate-level assessment. 

How to Apply the Template and Turn It Into Better Evidence Synthesis

A disciplined RoB workflow begins before you ever open the first article PDF. PRISMA 2020 expects reviewers to prespecify the tool and version, describe the domains used, state how many reviewers assessed each study, indicate whether they worked independently, and explain how disagreements were resolved. Cochrane also recommends deciding in advance which results matter most, especially those intended for the Summary of Findings table. 

A disciplined RoB workflow

That workflow is especially important because RoB 2 and ROBINS-I are structured, but not mechanical. The signaling questions guide judgment; they do not replace it. Cochrane notes that reviewers may override algorithm-generated proposals when justified, and for ROBINS-I the process is more involved because confounding and causal structure must be considered explicitly. 

Common pitfalls to avoid

The most common error is assessing the study in general instead of the specific result used in the review. A second frequent error is mixing up risk of bias with reporting qualitycertainty, or applicability. A third is inventing a numeric total score from multiple domains. Official tools are domain-based and designed to preserve the mechanism of bias rather than collapse diverse problems into a single homemade score. For diagnostic reviews, the parallel mistake is to ignore applicability concerns even though QUADAS explicitly separates risk of bias from applicability. 

Another major pitfall in non-randomized studies is starting ROBINS-I without defining confounders and the target trial. Cochrane is explicit that review teams should specify important confounding domains and co-interventions in the protocol and that both methodological and content expertise are recommended. Without that setup, consensus conversations turn subjective very quickly. 

For cluster trials and crossover trials, the common mistake is to assume the standard RoB 2 worksheet is enough. Cluster trials require explicit attention to identification/recruitment after cluster randomization and to unit-of-analysis issues. Crossover trials require attention to carryover and period effects, and Cochrane recommends that review authors state clearly how crossover data were handled and consider sensitivity analyses when assumptions are uncertain. 

Tips for better inter-rater reliability

The most effective reliability strategy is calibration before full extraction. Have both reviewers assess the same two to five studies first, then reconcile wording, thresholds, and what counts as sufficient evidence for each domain. PRISMA 2020 supports this independence-and-reconciliation model by asking authors to state whether multiple reviewers worked independently and how disagreements were resolved. 

Cochrane also gives a practical tip that is often overlooked: if you calculate agreement statistics for signaling-question responses, treat Yes and Probably yes as the same response, and No and Probably no as the same response. It also advises that signaling questions be answered independently rather than allowing one answer to drive another except where the tool explicitly requires branching. 

Good reliability also depends on a clean evidence trail. Store the quotation, registry entry, protocol excerpt, supplement, or author correspondence that supports each judgment. That is consistent with Cochrane’s insistence on “support for judgment” and it makes arbitration faster, more transparent, and much easier to defend during peer review. 

How to incorporate RoB into GRADE and meta-analysis

RoB should directly influence synthesis, not sit in an appendix. Cochrane Chapter 7 recommends several strategies when risks of bias vary across studies: restrict the primary analysis to low-risk studies; stratify or subgroup analyses by overall RoB or specific domains; present all studies but explicitly account for RoB in certainty judgments; or, less commonly, use statistical bias adjustment methods. Cochrane does not currently recommend formal weighting schemes that mechanically assign study weights based on RoB categories, because those methods are not sufficiently developed for routine review use. 

In many intervention reviews, the most defensible workflow is to present the all-studies meta-analysis, then run a sensitivity analysis restricted to low-risk studies and report whether conclusions materially change. If enough studies exist, you can also use subgroup analysis or meta-regression to test whether results differ by RoB category, while recognizing that these analyses are often underpowered. Lack of a statistically significant subgroup difference does not prove absence of bias. 

In GRADE, risk of bias is one of the five downgrade domains. Cochrane Chapter 14 explicitly states that study-level assessments should feed directly into the GRADE “risk of bias” domain. For randomized trials, “Low” RoB generally means no limitation, “Some concerns” may or may not justify one level of downgrading depending on likely impact, and “High” RoB can indicate serious or very serious limitations. For ROBINS-I, a “Critical” judgment indicates extremely serious limitations; Cochrane further notes that critically biased non-randomized studies should not be included in synthesis. 

At the synthesis level, do not forget missing evidence. Individual-study RoB tools do not replace assessment of reporting bias or missing studies. Cochrane’s ROB-ME was developed specifically for risk of bias due to missing evidence in a pairwise synthesis or meta-analysis. If your page targets advanced review authors, mentioning ROB-ME is worthwhile because it clarifies the difference between “selection of the reported result” within a study and missing results across the synthesis. 

Cochrane also recommends visually displaying RoB judgments alongside meta-analysis results where possible. Forest plots stratified by RoB and traffic-light plots can help readers see whether the most influential studies are also the most trustworthy. The robvis tool and package were created specifically to generate publication-quality traffic-light and bar plots formatted for tools such as RoB 2, ROBINS-I, and QUADAS-2.

Conclusion

A risk of bias assessment template is more than a checklist—it is the bridge between collecting studies and making trustworthy conclusions. In a systematic review, every included study should be judged not only by what it reports, but by how reliably its methods support the result being used. Choosing the right tool, such as RoB 2 for randomized trials, ROBINS-I for non-randomized intervention studies, or QUADAS for diagnostic accuracy reviews, helps reviewers make transparent, consistent, and defensible judgments.

A well-designed template keeps this process organized. It records the study details, the exact result being assessed, domain-level judgments, reviewer notes, and the evidence behind each decision. This makes the review easier to audit, easier to reproduce, and easier to report under PRISMA and GRADE expectations. Most importantly, it prevents risk of bias assessment from becoming a formality. When used properly, it directly informs sensitivity analyses, certainty-of-evidence ratings, and the final interpretation of findings.

For researchers, students, and evidence synthesis teams, the best approach is simple: select the correct risk of bias tool before extraction begins, assess results independently, document every judgment clearly, and use the findings to shape the synthesis—not just the appendix. A clear risk of bias assessment template helps turn a systematic review from a collection of studies into a credible, transparent, and decision-ready body of evidence.

References

Higgins JPT, Savović J, Page MJ, Elbers RG, Sterne JAC. Cochrane Handbook Chapter 8: Assessing risk of bias in a randomized trial. Current Cochrane Handbook guidance on RoB 2

Sterne JAC, Hernán MA, McAleenan A, Reeves BC, Higgins JPT. Cochrane Handbook Chapter 25: Assessing risk of bias in a non-randomized study. Current Cochrane guidance on ROBINS-I. 

Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Annals of Internal Medicine. 2011. 

Whiting PF, Tomlinson E, Rutjes AWS, et al. QUADAS-3: A Revised Tool for the Quality Assessment of Diagnostic Test Accuracy Studies. Annals of Internal Medicine. 2026. 

Page MJ, McKenzie JE, Bossuyt PM, et al. PRISMA 2020 statement and checklist resources. 

Cochrane Handbook Chapter 14: Completing Summary of Findings tables and grading the certainty of the evidence. 

Cochrane Handbook Chapter 7: Considering bias and conflicts of interest among included studies. Guidance on incorporating RoB into analyses. 

Cochrane Chapter 23 and riskofbias.info resources for cluster-randomized and crossover RoB 2 variants. 

ROB-ME resources from Cochrane Bias. 

McGuinness LA, Higgins JPT. robvis: risk-of-bias visualization tool and package.