Psilocybin for Depression
About this page
Welcome to the landing page for our living systematic review and meta-analysis on psilocybin for depression symptoms (pre-registered here)! Follow the links in our documentation section for relevant information including our code-walkthrough (reproducibility guide). Here we provide an overview on our latest results, currently up-to-date with our most recent preprint.
Eligibility Criteria
We included randomized controlled clinical trials published in English in peer-reviewed journals comparing psilocybin for depressive symptoms with a comparator in adult (>18 years old) populations. Eligible interventions included any dose and formulation (natural or synthetic) of psilocybin or other prodrugs of psilocin intended to produce an alteration of subjective experience in the patient, with or without the conjunctive use of therapy. Eligible comparators included any form of placebo with or without the conjunctive use of therapy, including low doses of the intervention drug and any dose of other psychotropics intended to improve blinding (without known therapeutic efficacy for depression). Eligible comparators also included control for spontaneous improvement via waitlist or usual care. To make our database more comprehensive, we also conducted sensitivity analyses where we expanded these criteria to include reports found in gray literature (Krempien 2023), studies presenting only a mixture of pre- and post-crossover data (Grob 2011), and studies comparing psilocybin with comparators with known therapeutic efficacy (Carhart-Harris 2021). However, these studies were not included in the primary analyses.
Study selection
We identified 5,069 studies from our searches. Of these, 82 passed initial title and abstract screening, while 12 reports (711 participants) were included after full-text review, with 9 reports (529 participants) meeting inclusion criteria for our primary model.
Figure 1: PRISMA flow diagram showing study selection process
Study Characteristics
Our database consists of 201 effect sizes generated from 12 studies, covering multiple depression instruments, outcome types (e.g. endpoint, change-score, or dichotomous), and time-points (from baseline to over 6 months). As shown in Table 1, primary endpoints ranged from 2 to 7 weeks, with an average endpoint of 4 weeks. The primary outcome instrument was chosen hierarchically, with the primary outcome identified by the study receiving preference if it measured depression, followed by a preference for clinician-rated depression measures.
Table 1: Summary of RCTs on psilocybin for depressive symptoms.
N given is the number of participants randomized. Study endpoints are the reported primary endpoints for each study and given in weeks since the final dose. Summary statistics across studies (blue rows) were calculated as weighted averages. Overall mean age was calculated using the median age from Grob 2011. aN for these studies reflect the number of participants in the high and low dose groups. Goodwin 2022 and Krempien 2023 contained 75 and 18 participants in the medium dose arms, respectively, bringing the total N for the database to 711. These arms are included in the database, but not used in the present analyses, except the Goodwin 2022 medium dose arm in the alternate dosing for Goodwin et al. sensitivity analysis. US = United States; UK = United Kingdom; CH = Switzerland; CA = Canada; Multi = multi-site. Goodwin 2022 was conducted at sites across the US, CA, UK, and European Union. MDD = major depressive disorder; TRD = treatment-resistant depression; AUD = alcohol use disorder.
Risk of Bias
We assessed study bias using Cochrane’s Risk of Bias 2.0 tool, which is the standard approach for bias assessment in randomized controlled trials. We focused our assessment specifically on primary outcome variables. In studies with multiple primary outcomes or where the primary outcome was not a depression measure, we evaluated the outcome selected for our primary meta-analysis. Furthermore, we limited our bias assessment to pre-crossover data only. If information pertinent to assessment of risk of bias was not available, authors were contacted via email.
The RoB 2.0 tool examines five potential bias sources: randomization procedures, deviations from the intended protocol, missing outcome data, outcome measurement, and selective reporting. Each domain contains several questions rated on a four-level scale ranging from ‘yes’ to ‘no’, in addition to an option for insufficient information. Domain-level and overall study bias ratings are classified as low, medium, or high according to a predetermined algorithm. While this algorithm guides assessment, evaluators may justifiably override automated bias determinations when specific concerns warrant greater or lesser emphasis than the algorithm suggests.
Table 2: Additional study characteristics and risk of bias assessments using Cochrane’s Risk of Bias 2.0 tool domains.
Randomization = bias due to randomization process;
Deviations = bias due to deviations from intended interventions;
Missingness = bias due to missing outcome data;
Measurement = bias due to measurement of the outcome;
Selection = bias due to selection of the reported results;
Overall = overall risk of bias;
BDI = Beck Depression Inventory; GRID-HAMD = Grid Hamilton Rating Scale for Depression; MADRS = Montgomery-Asberg Depression Rating Scale; QIDS-SR = Quick Inventory of Depressive Symptomatology–Self-Report.
Overall, 3 studies were determined to have a high risk of bias, 5 studies had some concerns, and 3 studies were deemed to have an overall low risk of bias (Table 2). Rosenblat 2024 received some concerns for randomization due to the significant imbalance in baseline depression severity between the two arms. This imbalance favors the comparator in our effect size calculation for this study.
Results
Psilocybin treatment significantly reduces depression symptoms compared with control conditions
The analysis on continuous outcomes in the 9 studies included in the primary model showed a statistically significant reduction in depression scores with psilocybin treatment compared to control conditions (Figure 2; Hedges’ g = -0.91 [-1.35; -0.48], p = 0.0013, k = 9, n = 501), with moderate between-study heterogeneity (tau2 = 0.13 [0.01; 1.48], I2 = 58.1% [12.2%; 80.0%]).
Figure 2: Meta-analysis on continuous outcome variables.
Boxes represent the standardized mean difference (Hedges’ g) for each study, and the lines extending from the box represent the 95% confidence interval around each effect size, while the size of each box is proportional to its weight.
The diamond at the bottom represents the pooled effect size (meta-analytic mean).
The gray line at the bottom represents the prediction interval of the expected range of true effects in a new study. HK = Knapp-Hartung adjustment.
Visual inspection of a funnel plot (SI Figure 1) revealed limited asymmetry, and an Egger’s test did not find small study effects (intercept = -1.73 [-4.54; -1.08], t = -1.21, p = 0.27), implying minimal evidence of publication bias.
SI Figure 1: Funnel plot of 9 studies in primary meta-analytic model.
Psilocybin’s effects are rapid and consistent over several weeks
Our three-level CHE model revealed an overall significant decrease in depression scores under psilocybin compared to control conditions (Hedges’ g = -0.90 [-1.23; -0.57], p < 0.001, k = 9, n = 529, tau2 = 0.18 [0.03; 0.47], I2 = 74% [31%; 88%]), ensuring that our results were not sensitive to the time point studied. Adding time since final dose as a continuous predictor to our model (SI Figure 2) revealed significant effect favoring psilocybin immediately following dosing (intercept = -0.92 [-1.26; -0.58], p < 0.0001) that was stable over time (slope = 0.0009 [-0.0023; 0.0041], p = 0.57).
SI Figure 2: Meta-regression of time since final dose on the three-level CHE model.
Higher response and remission rates after psilocybin treatment
In addition to differences in symptoms of depression, we found evidence for statistically significant greater treatment response with psilocybin compared to control conditions (SI Figure 3; RR = 2.77 [1.93; 3.97], p = 0.001, k = 5, n = 380), with low between-study heterogeneity (tau2 = 0.00 [0.00; 0.96]; I2 = 0.00% [0.00%; 79.20%]).
SI Figure 3: Meta-analysis on response to treatment.
Boxes represent the standardized mean difference (Hedges’ g) for each study, and the lines extending from the box represent the 95% confidence interval around each effect size, while the size of each box is proportional to its weight.
The diamond at the bottom represents the pooled effect size (meta-analytic mean).
The gray line at the bottom represents the prediction interval of the expected range of true effects in a new study.
We also found that there were significantly higher remission rates with psilocybin compared to control conditions (SI Figure 4; RR = 4.13 [3.39; 5.04], p < 0.001, k = 5, n = 380), with low between-study heterogeneity (tau2 = 0.00 [0.00; 0.00]; I2 = 0.00% [0.00%; 79.20%]).
SI Figure 4: Meta-analysis on remission rates.
Boxes represent the standardized mean difference (Hedges’ g) for each study, and the lines extending from the box represent the 95% confidence interval around each effect size, while the size of each box is proportional to its weight.
The diamond at the bottom represents the pooled effect size (meta-analytic mean).
The gray line at the bottom represents the prediction interval of the expected range of true effects in a new study.
Subgroup and sensitivity analyses
To assess the robustness of our primary findings and explore potential sources of heterogeneity, we conducted a series of 5 subgroup analyses for continuous outcomes (Figure 3). These analyses allowed us to evaluate the consistency of treatment effects across different study characteristics and methodological approaches. Each subgroup analysis described below employed the same meta-analytic model parameters as our primary analysis.
- Depression as primary diagnosis: To assess whether study design moderated treatment effects, we conducted subgroup analyses limited to studies where participants had depression as their primary diagnosis rather than comorbid with other conditions (excluding Griffiths 2016, Ross 2016, and Rieser 2025).
- Exclude open-label: We evaluated the impact of excluding open-label waitlist control studies (Davis, 2021; Rosenblat, 2024).
- Exclude high RoB: To assess the impact of study quality on outcomes, we excluded studies rated as having high risk of bias (Griffiths 2016 and Rosenblat 2016).
- Parallel design: We conducted a separate analysis that only included parallel group studies (Goodwin 2022, Raison 2023, von Rotz 2023, Back 2024, and Rieser 2025).
- Crossover design: Similarly, we evaluated an analysis that only included crossover design studies (Griffiths 2016, Ross 2016, Davis 2021, and Rosenblat 2024).
The following sensitivity analyses were also performed, again employing the same meta-analytic model parameters as our primary analysis (except for the fixed effects models):
- Expanded inclusion criteria: We first conducted an analysis with expanded eligibility criteria that incorporated all studies from the main model plus three additional studies that were excluded from the primary analysis: one study from grey literature (Krempien 2023; high-dose versus control), one with an active comparator (Carhart-Harris 2021), and one where pre- and post-crossover data were combined together (Grob 2021).
- Excluding outliers: We repeated our primary meta-analysis on continuous outcomes after removing the statistical outlier studies (e.g. those whose effect size confidence intervals do not overlap with the confidence interval of the pooled effect; Davis 2021).
- Fixed effects model: We ran fixed-effects models as a sensitivity analysis to compare to the random effects models. The fixed effects model assumes that the between-study variance (tau2) is 0, such that all studies share a common true effect size. For our continuous model, we used a standard inverse-variance weighting fixed-effects model on standardized mean differences (Hedges’ g). For our dichotomous model, we used a standard inverse-variance weighting fixed-effects model on the log risk ratio.
- Alternate dosing in Goodwin 2022: Given that Goodwin 2022 employed a three-arm design comparing psilocybin at 25 mg and 10 mg doses against a 1 mg control, we conducted a subgroup analysis substituting the 10 mg intervention arm for the 25 mg arm used in our main analysis.
- Clinician-rated outcomes: To examine whether the method of assessment influenced observed effects, we conducted a sensitivity analysis that only included clinician-administered depression assessments. MADRS was chosen as the preferred clinician-administered instrument for this analysis, followed by the GRID-HAM-D. Studies included were: Goodwin 2022, Raison 2023, von Rotz 2023, Back 2024, and Rosenblat 2024 using the MADRS, and Griffiths 2016 and Davis 2021 using the GRID-HAM-D.
- Self-report outcomes: Similarly, we conducted a sensitivity analysis that only included studies reporting self-report depression measures, regardless of whether these were the primary outcome measures. BDI was the preferred self-report instrument which was reported in all self-report studies. Studies included were: Ross 2016, Griffiths 2016, Davis 2021, von Rotz 2023, and Rieser 2025.
Subgroup analyses reveal open-label and cross-over designs as high sources of heterogeneity
Our 5 subgroup analyses produced results largely in line with our primary model results. Hedges’ g values of each subgroup analysis were largely similar to the primary model, with 4 out of 5 analyses showing significant results (Figure 3). Notably, subgroup analyses excluding either open-label waitlist control studies or crossover studies had substantially lower between-study heterogeneity (tau2 = 1.3x10-6 [0; 0.68] and 7.7x10-8 [0.00, 1.05], respectively). Analyzing crossover studies on their own resulted in both higher heterogeneity (tau2 = 0.68 [0.30; 3.65]) and a non-significant effect of psilocybin (-1.12 [-2.64; 0.40]). It is worth noting, however, that the four studies included here vary in other important factors as well (k = 2 open-label waitlist control, k = 2 advanced cancer).
Figure 3: Pooled effect sizes for subgroup and sensitivity analyses.
Box and whiskers represent the meta-analytic mean and corresponding 95% confidence intervals for each subgroup analysis.
K and tau2 represent the number of studies included in each analysis and heterogeneity for each analysis, respectively.
The pooled effect size from the main model is presented at the top for comparison purposes.
Sensitivity analyses show significant and comparable results to the main model
We performed a series of sensitivity analyses that largely supported our primary results. First, our model using expanded inclusion criteria showed a significant and comparable effect size (SI Figure 5; Hedges’ g = -0.89 [-1.27; -0.51], p <0.001 , k = 12, tau2 = 0.15 [0.03; 1.21], n = 602). We also ran a model removing statistical outliers (Davis 2021), resulting in a significant effect size comparable to the primary model (Hedges’ g = -0.80 [-1.08; -0.52], p <0.001, k = 8, tau2 = 0.0075 [0.00; 0.61], n =477). Furthermore, we replicated our primary analyses using a fixed effects model on both continuous outcomes (Hedges’ g = -0.85 [-1.03; -0.66], p < 0.001, k = 9, n = 501), response outcomes (RR = 2.8 [2.05; 3.83], p < 0.001, k = 5, n = 380), and remission outcomes (RR = 4.13 [2.66; 6.42], p <0.001, k = 5, n = 380). In contrast, a model evaluating self-report outcomes yielded insignificant results with substantial heterogeneity (Hedges’ g = -1.11 [-2.58, 0.35], p = 0.102, k = 5 , tau2 = 1.03 [0.22 13.1], n = 190). It should be noted, however, that the 5 studies included in the self-report analysis vary substantially by treatment population (k = 2 MDD, k = 2 advanced cancer, and k = 1 AUD). All remaining sensitivity analyses showed significant and comparable results (Figure 3).
SI Figure 5: Forest plot of 12 studies in the expanded model. Boxes represent the standardized mean difference (Hedges’ g) for each study, and the lines extending from the box represent the 95% confidence interval around each effect size, while the size of each box is proportional to its weight. The diamond at the bottom represents the pooled effect size (meta-analytic mean). The gray line at the bottom represents the prediction interval of the expected range of true effects in a new study. HK = Knapp-Hartung adjustment.
Conclusions
Our initial meta-analytic results show promise for the efficacy of psilocybin in treating depressive symptoms, however these results should be interpreted with caution. As more RCTs are published, we will regularly update our SYPRES website and dashboard in a reproducible and transparent manner. This living systematic review, in conjunction with the associated open-science resources, aims to provide a valuable and transparent resource for researchers, clinicians, policymakers, and the public.
References
- Grob, C., Danforth, A., Chopra, G., Hagerty, M., McKay, C., Halberstadt, A., & Greer, G. (2011). Pilot Study of Psilocybin Treatment for Anxiety in Patients With Advanced-Stage Cancer. ARCHIVES OF GENERAL PSYCHIATRY, 68(1), 71–78. https://doi.org/10.1001/archgenpsychiatry.2010.116
- Griffiths, R. R., Johnson, M. W., Carducci, M. A., Umbricht, A., Richards, W. A., Richards, B. D., Cosimano, M. P., & Klinedinst, M. A. (2016). Psilocybin produces substantial and sustained decreases in depression and anxiety in patients with life-threatening cancer: A randomized double-blind trial. Journal of Psychopharmacology, 30(12), 1181–1197. https://doi.org/10.1177/0269881116675513
- Ross, S., Bossis, A., Guss, J., Agin-Liebes, G., Malone, T., Cohen, B., Mennenga, S. E., Belser, A., Kalliontzi, K., Babb, J., Su, Z., Corby, P., & Schmidt, B. L. (2016). Rapid and sustained symptom reduction following psilocybin treatment for anxiety and depression in patients with life-threatening cancer: A randomized controlled trial. Journal of Psychopharmacology, 30(12), 1165–1180. https://doi.org/10.1177/0269881116675512
- Carhart-Harris, R., Giribaldi, B., Watts, R., Baker-Jones, M., Murphy-Beiner, A., Murphy, R., Martell, J., Blemings, A., Erritzoe, D., & Nutt, D. J. (2021). Trial of Psilocybin versus Escitalopram for Depression. New England Journal of Medicine, 384(15), 1402–1411. https://doi.org/10.1056/NEJMoa2032994
- Davis, A. K., Barrett, F. S., May, D. G., Cosimano, M. P., Sepeda, N. D., Johnson, M. W., Finan, P. H., & Griffiths, R. R. (2021). Effects of Psilocybin-Assisted Therapy on Major Depressive Disorder: A Randomized Clinical Trial. JAMA Psychiatry, 78(5), 481–489. https://doi.org/10.1001/jamapsychiatry.2020.3285
- Goodwin Guy M., Aaronson Scott T., Alvarez Oscar, Arden Peter C., Baker Annie, Bennett James C., Bird Catherine, Blom Renske E., Brennan Christine, Brusch Donna, Burke Lisa, Campbell-Coker Kete, Carhart-Harris Robin, Cattell Joseph, Daniel Aster, DeBattista Charles, Dunlop Boadie W., Eisen Katherine, Feifel David, … Malievskaia Ekaterina. (2022). Single-Dose Psilocybin for a Treatment-Resistant Episode of Major Depression. New England Journal of Medicine, 387(18), 1637–1648. https://doi.org/10.1056/NEJMoa2206443
- Krempien, S., Inamdar, A., Shenouda, M., Johnson, K., Maass-Robinsin, S., Johnson, M., Kelman, A., Nathan, P., Belser, A., Pathare, P., House-Gecewicz, A., Sorie, A., Bartlone, A., Varty, G., Morgan, M., Avery, K., Muhammad, A., Nivorozhkin, A., & Palfreyman, M. (2023). Randomized, Double-Blind, Placebo-Controlled Study to Evaluate the Safety and Efficacy of CYB003, a Deuterated Psilocybin Analog in Patients With Major Depressive Disorder. NEUROPSYCHOPHARMACOLOGY, 48, 195–196.
- von Rotz, R., Schindowski, E. M., Jungwirth, J., Schuldt, A., Rieser, N. M., Zahoranszky, K., Seifritz, E., Nowak, A., Nowak, P., Jäncke, L., Preller, K. H., & Vollenweider, F. X. (2023). Single-dose psilocybin-assisted therapy in major depressive disorder: A placebo-controlled, double-blind, randomised clinical trial. eClinicalMedicine, 56. https://doi.org/10.1016/j.eclinm.2022.101809
- Raison, C. L., Sanacora, G., Woolley, J., Heinzerling, K., Dunlop, B. W., Brown, R. T., Kakar, R., Hassman, M., Trivedi, R. P., Robison, R., Gukasyan, N., Nayak, S. M., Hu, X., O’Donnell, K. C., Kelmendi, B., Sloshower, J., Penn, A. D., Bradley, E., Kelly, D. F., … Griffiths, R. R. (2023). Single-Dose Psilocybin Treatment for Major Depressive Disorder: A Randomized Clinical Trial. JAMA, 330(9), 843–853. https://doi.org/10.1001/jama.2023.14530
- Back, A. L., Freeman-Young, T. K., Morgan, L., Sethi, T., Baker, K. K., Myers, S., McGregor, B. A., Harvey, K., Tai, M., Kollefrath, A., Thomas, B. J., Sorta, D., Kaelen, M., Kelmendi, B., & Gooley, T. A. (2024). Psilocybin Therapy for Clinicians With Symptoms of Depression From Frontline Care During the COVID-19 Pandemic: A Randomized Clinical Trial. JAMA Network Open, 7(12), e2449026. https://doi.org/10.1001/jamanetworkopen.2024.49026
- Rosenblat, J. D., Meshkat, S., Doyle, Z., Kaczmarek, E., Brudner, R. M., Kratiuk, K., Mansur, R. B., Schulz-Quach, C., Sethi, R., Abate, A., Ali, S., Bawks, J., Blainey, M. G., Brietzke, E., Cronin, V., Danilewitz, J., Dhawan, S., Di Fonzo, A., Di Fonzo, M., … McIntyre, R. S. (2024). Psilocybin-assisted psychotherapy for treatment resistant depression: A randomized clinical trial evaluating repeated doses of psilocybin. Med, 5(3), 190-200.e5. https://doi.org/10.1016/j.medj.2024.01.005
- Rieser, N. M., Bitar, R., Halm, S., Rossgoderer, C., Gubser, L. P., Thévenaz, M., Kreis, Y., Rotz, R. von, Nordt, C., Visentini, M., Moujaes, F., Engeli, E. J. E., Ort, A., Seifritz, E., Vollenweider, F. X., Herdener, M., & Preller, K. H. (2025). Psilocybin-assisted therapy for relapse prevention in alcohol use disorder: A phase 2 randomized clinical trial. eClinicalMedicine, 82. https://doi.org/10.1016/j.eclinm.2025.103149