Will Bayesian Statistics Transform Trials?
FDA has finally published its Bayesian guidance. Will it matter?
In January 2026, FDA released draft guidance to industry on the use of Bayesian Statistics. The news made a big splash, and many speculated on what it might mean for the future of clinical trial design. For a good general introduction to the significance of this guidance, I recommend this Vox article (not incidentally, I was quoted in it). Witold Więcek has also offered up some helpful thoughts.
I’d like to offer some thoughts of my own, particularly around the timing of this guidance and what it might mean. FDA’s drug center has been facing pressure to embrace Bayesian statistics for decades. I worked at the FDA for over a decade, and saw frequent pushes to get a Bayesian guidance released. So why is the guidance coming out now? And will it actually change how clinical trials are designed?
Why go Bayesian?
Before we dig into those questions, it’s helpful to understand the statistical approach drug companies have traditionally used to prove that their drugs work. For decades, companies have used an approach rooted in frequentist (non-Bayesian) statistics: they conduct two, “pivotal” randomized controlled trials, each designed to prove that the drug has a clinically meaningful benefit. The trials are considered successful if each produces a two-sided p-value no greater than 0.05. While FDA has never officially required trials to meet this standard, in the past they have endorsed this approach.
Advocates for the use of Bayesian statistics would point out several problems with this traditional frequentist approach. First, there is the p-value itself, which is defined as “the probability of getting results at least as extreme as the ones you observed, given that the null hypothesis [of no drug effect] is correct”. It’s a concept that strains the comprehension of even expert scientists (I had to copy the definition from Wikipedia to make sure I got it right, which is what I suspect most responsible scientists do when asked what a p-value is). P-values and frequentist statistics are useful in drug regulation, but Bayesian methods give us a simpler and useful metric: the probability, given the data observed in the study, that the drug meets a threshold of effectiveness. That is a value that is easier to understand and often more useful for decision-making.
A more significant shortcoming of the frequentist approach, from a Bayesian perspective, is its tendency to discard useful information. When evaluating whether a drug should be approved, the FDA has the opportunity to evaluate multiple sources of evidence: There are, of course, the drug company’s pivotal studies. But they might also examine clinical studies from earlier phases, studies of similar drugs for the same condition, or studies of the same drug for similar conditions. If the drug is already being used in the clinic, FDA might also want to look at patients’ experiences on the drug in the real world. All of these sources of evidence are valuable, but the traditional frequentist paradigm discards much of it, relying solely on those pivotal studies to provide evidence of the drug’s effectiveness.
Bayesian statistics can help with these problems: It lets reviewers consider all of the data about a drug – not just the data collected in a given study – and it produces results that are easier to interpret and more relevant to regulators, clinicians, and patients. These benefits come with caveats: the results of a Bayesian study depend on modeling decisions made upfront, including the choice of the Bayesian prior (more on that below). That’s why FDA’s guidance suggests that Bayesian studies should demonstrate strong “frequentist operating characteristics” and stress-test assumptions. But with rigor and careful planning, Bayesian statistics can help us make better use of the information we have.
So why has this taken so long?
The proximate reason we are seeing this guidance now is that FDA committed to release the guidance in its 2022 user fee negotiations with industry. But the truth is, this guidance could probably have been released sooner. FDA has been discussing the use of Bayesian statistics for decades – FDA’s medical device center published its own Bayesian guidance nearly 20 years ago. I think the real reason this guidance is coming out is that, for the agency, the benefit of applying a Bayesian approach to drugs finally outweighs the downsides.
And despite its merits, Bayesian statistics does have a big downside: it requires the construction of the Bayesian prior probability distribution, or prior. The prior describes the range of effect sizes we think are plausible before the trial begins based on existing evidence, and is crucial in determining how the study data will be interpreted. While there are best practices in constructing the prior, it inherently requires some degree of subjective judgment on the part of the study designers.
It’s easy to see how the prior might raise red flags for regulators. If the prior is subjective, how do you decide what it should be? And how do you keep drug companies from gaming the system by choosing a favorable prior? Yet while these concerns are real, I don’t think they’re the main reason that FDA has been hesitant to embrace Bayesian statistics. After all, there is always an element of subjectivity in drug review. FDA tries to make its standards clear, but they acknowledge that they must make subjective judgments on what evidence should be collected and how it should be weighed. And the FDA is perfectly willing and able to scrutinize study design choices and the study data itself. They’re not likely to be fooled by drug companies who play games with statistics – even if the statistics are Bayesian.
If I had to guess, I suspect the biggest problem for FDA was not the subjectivity of the prior; it was the exercise of putting numbers behind those subjective judgments. For fans of Bayesian statistics (and legions of statistics nerds), the fact that the Bayesian prior can be quantified is one of its greatest strengths. Capturing relevant information in a Bayesian prior feels more rigorous and rational than relying solely on subjective judgment – and it’s much better than simply throwing that relevant information away.
But I suspect this is not how most normal people – FDA reviewers included – feel. The FDA has long stressed the importance of “clinical judgment” in reviews. The Bayesian prior threatens that. After all, if a reviewers’ clinical judgment is a key factor in review decisions, any attempt to translate that clinical judgment into numbers risks diluting and distorting that judgment.
In the past, FDA has been even more explicit in rejecting the quantification of its review decisions. In a 2013 report, they considered and rejected the idea of using numerical weights of benefits and risks to help them review drugs. The process of assigning weights, they argued, involved “numerous judgments that are at best debatable and at worst arbitrary.” While the construction of a Bayesian prior does not preclude reviewers from exercising judgment, it does probably elicit a similarly negative reaction.
And yet, here we are, with the Bayesian guidance in hand. The guidance even goes so far as to suggest that companies could explicitly quantify benefits and risks in their trial in the form of a Bayesian “loss function.”
Why the shift? Perhaps it’s because the alternative approach has grown less tenable.
FDA’s existing statistical approach no longer made sense
Against the “debatable and arbitrary” assumptions required by Bayesian statistics, the perceived simplicity, objectivity, and consistency of a frequentist approach must have seemed appealing to the FDA. In exchange for ignoring prior data, the frequentist approach gives us a consistent statistical approach with a consistent interpretation. The frequentist approach must have been particularly appealing to FDA in the 1960s, when it first introduced its efficacy standard. At that time, the agency feared being barraged by poorly conducted studies of questionable products. Indeed, FDA articulated this fear as recently as 1998 in its guidance on how it evaluates drug effectiveness:
The inherent variability in biological systems may produce a positive trial result by chance alone. This possibility is acknowledged, and quantified to some extent, in the statistical evaluation of the result of a single efficacy trial. It should be noted, however, that hundreds of randomized clinical efficacy trials are conducted each year with the intent of submitting favorable results to FDA. Even if all drugs tested in such trials were ineffective, one would expect one in forty of those trials to “demonstrate” efficacy by chance alone at conventional levels of statistical significance.
In other words, the agency needed to filter out the junk from the “hundreds” of randomized studies they might see each year, and relied on frequentist statistics, hypothesis testing, and replication to help them achieve this.
Nowadays, the circumstances the agency faces are different. If the agency ever faced a barrage of “hundreds” of randomized trials, that is no longer the case. In 2024, FDA approved 50 new molecular entities on the basis of only 75 trials. And it probably felt unfair to assume that most or all of the drugs that underwent these trials were ineffective (even if many would ultimately fail); many of those trials were done only after extensive study, including prior use and research in humans.
The frequentist framework no longer made sense. Even before the Bayesian guidance was released, FDA understood that, despite their reliance on frequentist statistics, these trials could not be reviewed in an evidentiary vacuum. As far back as the 1990s and early 2000s, FDA has found ways to “borrow” from prior information to support efficacy determinations: FDA has made use of “seamless” phase 2/3 trials in cancer, natural history studies, and even mechanistic data to support efficacy determinations. They usually described this as making use of “confirmatory evidence” or “ the totality of evidence”. Now, FDA has departed further from the traditional approach, requiring only one trial instead of two.
More broadly, the key regulatory question has shifted: instead of “how do we filter out the junk science” – a goal well-supported by frequentist methods – FDA and drug companies are trying to use scarce clinical data as efficiently as possible to make informed determinations about each drug that crosses its desk. That’s where Bayesian statistics shines.
The choice FDA now faces is not whether or not to rely on prior evidence; it’s whether to do so awkwardly, in its current frequentist framework – in which that borrowing is implicit and unstructured; or using the more structured approach offered by Bayesian statistics. The publication of this guidance suggests that FDA is starting to embrace the more structured approach.
Will this guidance lead to change?
Are we about to embark on a new Bayesian era at FDA? Perhaps, but first, drug companies need to actually adopt the methods described in the guidance.
I expect the process to be slow. Despite FDA’s endorsement, this is still just a draft guidance, and few drug companies will want to be among the first early-adopters to try out new Bayesian approaches with agency review teams. Drug companies have famously risk-averse cultures, particularly when it comes to running their most expensive and high-stakes pivotal trials. And even though FDA has signaled that it is amenable to Bayesian approaches, most will wait to adopt them until they see FDA approving applications that make use of them.
Adoption will also be slowed by a lack of capacity and expertise. Drug companies and the FDA will both need to train their staffs and get familiar with the technical details of designing and approving Bayesian studies, negotiating over priors, and understanding the benefits and limits of the approach. In the meantime, drug companies may find themselves more comfortable working within the existing frequentist approach, knowing that the agency has grown increasingly comfortable with accepting “confirmatory evidence” of effectiveness in lieu of data from trials.
But in the longer term, I am betting that Bayesian approaches will become far more common. Drug companies are going to find it difficult to resist the opportunity to make more efficient use of limited clinical evidence; particularly in rare diseases, pediatric indications, and other areas where evidence is particularly scarce. If enough pioneering companies are willing to take the chance on something new, others will follow. And if we find new ways to strengthen the quality of prior evidence, whether through clinically validated AI-driven “virtual cells” or simply faster and more abundant early-phase human trials, the case for Bayesian methods will grow even stronger.
I’d like to thank my co-bloggers Witold Więcek, Manjari Narayan, Saloni Dattani, and Ruxandra Teslo for their input on this piece.



I am so skeptical about Bayesian analysis because the priors will be manipulated.
There is a live example: the Tigris trial of polymyxin hemoperfusion for sepsis. They took one of the many negative RCTs and post-hoc cherry-picked a “positive” subgroup and declared it as their prior. The Tigris study itself had just enough patients to not turn the tide.
Great piece. I agree Bayesian CTs will become a much bigger part of drug approvals. The advantage of Bayesian methods when using adaptive designs will also be a factor.
However, I disagree that the Bayesian prior threatens the idea of reviewer clinical judgement. The subjectivity in Bayesian priors allows for more clinical judgement not less. At the design stage the reviewers can have their clinical judgements incorporated (and quantified) into the supplementary and sensitivity analysis sections of the statistical analysis plan in the form of different priors and prior weights etc. At authorisation application stage the reviewers still have clinical judgement, e.g., where Bayesian dynamic borrowing is used, whether 65% borrowing or 70% borrowing is acceptable relies on clinical input more than whether .048 or .055 (or .027) is acceptable imo.