This assistant allows you to perform Risk of Bias ratings according to the Metapsy
guidelines for psychological intervention trials (Miguel et al., 2025↗).
These guidelines offer help in applying Cochrane's Risk of Bias tool 2.0↗ specifically in psychological treatment research.
1
Was the
allocation sequence random?
A random component was used in the sequence generation process. Examples include:
Computer-generated random numbers
Reference to a random number table
Coin tossing
Shuffling cards or envelopes
Throwing dice
Drawing lots
Minimization is generally implemented with a random element (at least when the scores are equal), so
an allocation sequence that is generated using minimization should generally be considered to be
random.
Answer ‘No/PN’ if no random element was used in
generating
the allocation sequence or the sequence is predictable. Examples include alternation; methods based
on dates (of birth or admission); patient record numbers; allocation decisions made by clinicians or
participants; allocation based on the availability of the intervention; or any other systematic or
haphazard method.
Answer ‘No information’ if the only information about
randomization methods is a statement that the study is randomized.
2
Was the allocation sequence
concealed?
Answer ‘Yes/PY’ if the trial used any form of remote or
centrally administered method to allocate interventions to participants, where the process of
allocation is controlled by an external unit or organization, independent of the enrolment personnel
(e.g. independent central pharmacy, telephone or internet-based randomization service providers).
Answer ‘Yes/PY’ if envelopes or drug containers were
used
appropriately. Envelopes should be opaque, sequentially numbered, sealed with a tamper-proof seal
and opened only after the envelope has been irreversibly assigned to the participant.
Answer ‘No/PN’ if there is reason to suspect that the
enrolling investigator or the participant had knowledge of the forthcoming allocation.
3
Is the number of randomized
participants
balanced between groups?
Imbalance in the number of randomized participants can be examined through the proportion assigned
to one group and its 99% confidence intervals. If the intervals do not cover the expected 50%, there
is suspicion of baseline imbalance.
A calculator to test for imbalance in the number of randomized participants is provided below.
Using the form below, you can assess if the number of randomized participants can be considered as
balanced between the two groups.
You have to insert the intended allocation ratio (for most trials, this will be 1:1), as well as the
reported number of patients that were randomized to each group.
The rating above will be adjusted based on the submitted values.
Provide Trial Information:
4
Is baseline clinical severity
balanced
between groups?
Imbalance in baseline symptom severity is examined by the SMD at baseline and its 99% confidence
intervals. An SMD > 0.2 (with 99% CI lower limit > 0) will be suggestive of baseline imbalance. In
the Metapsy RoB addendum, we focus on baseline severity only because it is consistently reported
across trials, can be examined objectively, and it is a known predictor of the outcome (for
depression interventions). P-values and confidence intervals resulting from multiple
significance testing (e.g., due to multiple clinical severity measures) can be adjusted using
Bonferroni corrections.
Depending on the clinical field and mental health disorder, there might be key prognostic factors
that should be added to this examination. In that case, the judgment of the reviewers can overwrite
the algorithm and the proposed items and further variables can be examined for baseline imbalances."
A calculator to test for imbalance in one relevant baseline variable is provided below. A test of
multiple relevant baseline variables can be conducted using the testBaselineImbalance function included in the metapsyTools package.
Using the form below, you can assess if a variable measuring the clinical severity at baseline can
be considered as
balanced between the two groups.
You have to provide the reported number of patients that were randomized to each group, as well as
the baseline means and standard deviations of the variable in both groups.
The rating above will be adjusted based on the submitted values.
Provide Trial Information:
Notes
5
Were participants masked to the
intervention?
In almost all psychotherapy trials, this will be ‘No/PN’.
Exceptions can be specifically studied for the trials comparing two active psychological interventions
(e.g., both active interventions are presented to participants as equally effective, the trial is
designed as a non-inferiority trial, etc.).
6
Were carers and people delivering
the
interventions masked?
In almost all psychotherapy trials, this will be ‘No/PN’.
Exceptions can be specifically studied for the trials comparing two active psychological interventions
(e.g., both active interventions are presented to participants as equally effective, the trial is
designed as a non-inferiority trial, etc.).
7
Did the interventions closely align
with
the intended plan, without any significant deviations that could affect the results?
When participants and carers are not blinded, there could be
deviations from the
intended interventions associated with this lack of blinding
. An example of such
a type of deviation is when participants who are told that they were on the waitlist
seek the intervention of the experimental arm outside the trial or other interventions.
Another example is that, due to the trial context, participants in the psychotherapy
group have easier access to the prescription of antidepressant medication than
participants in the control group.
Deviations from the intended interventions that would arise
even if the intervention
took place outside the trial
do not constitute a risk of bias with regard to
the
intention-to-treat analysis. An example could be drop-outs due to side effects of
antidepressants: such drop-outs are deviations from the intended intervention but
would occur even if the patients were taking antidepressants outside the trial context.
To pose a risk, these deviations should be influential for the outcome. For example,
participants taking benzodiazepines in a depression psychotherapy trial might not be
influential, but receiving the psychotherapy of interest or antidepressant medication
is influential.
If there are deviations related to the outcome but they are balanced between the groups,
bias will be less likely. However, if these influential deviations are unbalanced
between the groups, it is more likely that the intervention effect estimate is biased.
For example, when a substantially larger proportion of participants allocated to the
psychotherapy group are taking antidepressants compared to the control group.
Reviewers should make agreements on how an imbalance between groups is defined.
In the previous example, an imbalance of 20% could be taken as a rough indication.
In these cases, the trial can be judged as high risk, by giving a
“No” answer to this item.
8
Was an appropriate analysis used to
estimate the effect of assignment to intervention?
This item evaluates whether trialists report to have adhered to the intention-to-treat principles,
or to the so-called modified ITT (where the authors drop the randomized participants because
their outcomes are missing).
Dropping the randomized participants because they did not receive the intervention they were
allocated to, or counting the participants who received the intervention that they were not
allocated to as those receiving this other intervention is inappropriate. In some cases,
it might be sensible to assume that trialists adhered to ITT principles even if it is not
explicitly stated (e.g., when authors used statistical analyses that are implicitly associated
with the ITT principle).
9
Is the inappropriate analysis
unlikely
to have an impact on the result?
Only required if "was an appropriate analysis used to estimate
the effect of assignment to intervention?" (8) was answered 'No/PN' or
'No Information'.
If no appropriate analysis was conducted (or there is not enough information),
reviewers should examine whether there was likely an impact of not using an appropriate
analysis to estimate the effect of assignment to intervention. In cases in which excluded
participants (e.g., due to not receiving a minimum number of sessions) or participants
analyzed in the wrong group (e.g., analyzed in the waitlist group because in the
end they did not receive the intervention) are less than 5%, part 2 of domain 2 will be
rated as “Some concerns”. High risk will be given for thresholds above 5% or for
“No information”. Missing outcome data (proportion of participants that did not
answer the questionnaires) is not assessed here, but rather in domain 3 (item 10).
Notes
10
Was data available for nearly all
participants?
For meta-analyses focused on continuous outcomes: if the proportion of available data is above
>95%, then this domain is directly rated as low risk and it is not needed to answer the next
questions. Note that this refers to the proportion of participants who have endpoint data (i.e., who
completed the questionnaires), and not the LOCF or imputed data.
For meta-analyses focused on dichotomous outcomes: For dichotomous outcomes, the proportion
required is directly linked to the risk of the event. If the observed number of events is much greater
than the number of participants with missing outcome data, the bias would necessarily be small.
11
Was an appropriate analysis used for
handling the impact of missing data?
Only required if "was data available for nearly all
participants?" was answered 'No/PN' or 'No
Information'.
For example, the following approaches are considered appropriate for handling the impact of
missing data:
MMRMs (mixed models for repeated measures, also known as mixed models, growth curve
analyses) based on two or more measurements after baseline (e.g., mid-treatment, post-test,
follow-ups, etc.).
Multiple imputation, if the following criteria are met:
A) “Rubin's rules” or other methods to account for imputation uncertainty were applied.
B.1 and B.2 or C.1) applies:
B.1) Important auxiliary variables were used in the imputation model, such as intermediate
outcome assessments.
B.2) Imputations were generated separately by RCT groups (“bygroup” imputation).
C.1) Controlled/reference-based imputation (e.g., “jump-to-reference”) is used, using
appropriate external information.
Sensitivity analyses corresponding with a range of plausible reasons for missingness to
confirm the primary analyses.
When data are missing not at random, most of the aforementioned models will be biased.
“Last observation carried forward” is not
considered an appropriate approach for handling missing data.
12
Is the trial of moderate to large
size?
Only required if "was an appropriate analysis used for
handling the impact of missing data?" was answered 'No/PN' or 'No Information'.
A trial is considered to be of moderate to large size when it includes at least 40 participants
per arm.
13
Did enough participants in each of
the
arms complete the post-treatment assessments?
Only required if "was an appropriate analysis used for
handling the impact of missing data?" was answered 'No/PN' or 'No Information'.
This is fulfilled when at least 70% of the randomized participants in each of the arms
completed the assessments at post-treatment.
14
Is there indirect evidence
indicating
that missingness is unrelated to the outcome?
Only required if "was an appropriate analysis used for
handling the impact of missing data?" was answered 'No/PN' or 'No Information'.
This item examines the following sources of indirect evidence that might signal risk of bias:
a) whether the overall reported reasons for missing data indicate that missingness might depend on
its true value (e.g., the general reason for missing data is termination of funding would not be
indicative of risk of bias),
b) large imbalances in the proportion of missing data between the groups: As a rule of thumb, the
difference in study drop-out between the groups should be less than 20% (e.g., the intervention arm
has a study drop-out of 10% and the wait-list control has a drop-out of 20%).
c) differing reasons for missing data between the groups (e.g., more participants from the
intervention group abandon the trial due to being more critically symptomatic could be an indicator
of missigness being related to the outcome).
If any of the three is present, reviewers should answer with a “No”, given that missing data might be related to the outcome.
No information for evaluating at least one of these three sources also leads to a high risk of bias.
Notes
15
Is the method of measurement
appropriate?
The instrument used to quantify the outcome should be identical in both groups, and should
reliably measure the outcome of interest.
This will be the case for most psychometrically validated outcome rating scales.
16
Were self-report measures used?
In the context of psychological interventions, blinding is often not feasible or even impossible to
achieve.
Empirical evidence shows that self-reports are not associated with overestimated treatment effects
despite the lack of participant blinding (Cuijpers et al., 2010).
Therefore, if there is no evidence that participants under-reported or there is a safeguard to prevent
this (e.g., therapists did not have access to the self-reports), reviewers may consider self-reports
as low risk of bias.
If it is likely that patients under-reported their severity (for example, to please the therapists or
the researchers when self-reports are filled out in front of the therapist at the last session),
reviewers should consider setting this item to ‘No/PN’,
which
will lead to a high risk of bias for this domain.
17
Were assessor-rated instruments
used?
Assessor-rated instruments are filled out by the therapist, independent evaluators, or research
personnel. These are usually administered as a structured interview with the participant or are
completed by the evaluator through observation of the participant's behavior.
18
Were the assessors masked to
treatment
allocation?
If "were self-report measures used?" was answered 'Yes/PY', please consider the important information in the explanation
section
below.
This item evaluates whether the assessors evaluating the outcome were masked to treatment allocation. This item should be rated as 'Yes' when trialists clearly report that the assessors were masked. If this is not reported, it might be sensible to assume that no masking procedures were implemented ('No/PN').
Meta-analysts might be interested in including all available results for one outcome domain within a trial. For example, in a depression trial, there might be post-test data from the Hamilton Depression Rating Scale (assessor-rated) and the Beck Depression Inventory (self-report). There could be two strategies for these cases, depending on the meta-analysis protocol and analysis approach:
Separate risk of bias scores could be performed for each outcome and numeric result, or
Two or more ratings could be combined into an aggregated score, conservatively rating the aggregated score as high risk when one of the measurements is at high risk.
For the Metapsy depression database we use the second strategy, combining ratings into an aggregated score.
Notes
19
Is there a trial registration,
protocol,
or statistical analysis plan (SAP) available?
If there is no access to documents describing the trial's protocol, pre-defined outcome
measures, or analysis plan, this domain should be directly evaluated as “Some concerns”.
If reviewers have access to such documents, the next questions have to be answered.
20
Write down the registration number,
protocol, or link to SAP
Only required if "is there a trial registration, protocol, or
statistical analysis plan (SAP) available?" was answered 'Yes/PY'.
21
Were outcomes (scale and time point)
pre-specified before unblinded outcome data were available?
Only required if "is there a trial registration, protocol, or
statistical analysis plan (SAP) available?" was answered 'Yes/PY'.
This item evaluates whether the outcome measures in a trial were
pre-specified before the start of
data collection
(i.e., start of participant enrolment). For example, when the trial was registered
(and specified the outcomes) before the start of data collection, which is known as prospective
registration. Outcome domain, scale, time point, metric, and analysis should ideally be
specified,
although reviewers can answer this item as “Yes” if at least scale and time point were
specified.
Retrospective registrations (registering the trial after start of data collection) or lack of
information should lead to a “No” answer in this item,
which results in “Some concerns” for this domain.
22
Are all outcomes of interest for the
meta-analysis in line fully reported in the paper or available through other means?
Only required if "were outcomes (scale and time point)
pre-specified before unblinded outcome data were available?" was answered 'Yes/PY'.
This will partly depend on the meta-analysis protocol. If meta-analysts aim to include all
available instruments for a given outcome (e.g., depression severity), the article should
report all instruments in full (with enough data to be entered in the meta-analysis).
If meta-analysts have a pre-specified hierarchy for instrument inclusion
(e.g., select HAM-D over BDI), then selective reporting will be judged regarding
the availability of outcomes based on the pre-specified hierarchy.
It will also be rated as high risk when a non-registered outcome is added
in the publication of the paper. Reviewers should consider whether data
from the outcome of interest can be made available through other
means (e.g., contact through authors).
'No information' on this item directly leads to a rating of
“Some concerns” for this domain.
23
Was the analysis plan pre-specified?
Only required if "is there a trial registration, protocol, or
statistical analysis plan (SAP) available?" was answered 'Yes/PY'.
"Pre-specified" is interpreted as "before unblinded data is available".
Dates of publication of the analysis plan should precede participant enrollment.
24
Was data analyzed in accordance with
the
pre-specified plan?
Only required if "was the analysis plan pre-specified?" was
answered 'Yes/PY'.
Any deviation from the original plan should be reported in the final publication and
should be justified. If properly justified, meta-analysts can rate this signalling question
as 'Yes/PY', after careful assessment of how
this deviation can affect the studied meta-analytic result.
As part of the Metapsy initiative, we do not consider this question in the algorithm
unless there is evidence of unjustified deviations from a pre-specified analysis plan,
in which case the entire domain is downgraded to "high risk".
Notes
Assessment
Overall Rating
Randomization Process
1. Was the allocation sequence random?
2. Was the allocation sequence concealed until
participants were enrolled and assigned to
interventions?
3. Is the number of randomized participants
balanced between groups?
4. Is baseline clinical severity balanced
between groups?
Intervention Deviations
5. Were participants masked to the
intervention?
6. Were carers and people delivering the
interventions masked?
7. Did the interventions closely align with
the intended plan, without any significant deviations
that could affect the results?
8. Was an appropriate analysis used to
estimate the effect of assignment to intervention?
9. Is the inappropriate analysis unlikely to
have an impact on the result?
Missing Data
10. Was data available for nearly all
participants?
11. Was an appropriate analysis used for
handling the impact of missing data?
12. Is the trial of moderate to large size?
13. Did enough of the participants in each of
the arms complete the assessments at the post-treatment?
14. Is there indirect evidence indicating that
missingness is unrelated to the outcome?
Outcome Measurement
15. Is the method of measurement appropriate
and applied using similar procedures for both groups?
16. Were self-report measures used?
17. Were assessor-rated instruments used?
18. Were the assessors masked to treatment
allocation?
Selection of Results
19. Is there a trial registration, protocol,
or statistical analysis plan (SAP) available?
20. Write down the
registration number, protocol, or link to SAP
21. Were outcomes (scale and time point)
pre-specified before unblinded outcome data were available?
22. Are all outcomes of interest for the
meta-analysis in line fully reported in the paper or
available through other means?
23. Was the analysis plan pre-specified?
24. Was data analyzed in accordance with the
pre-specified plan?