David Brown wrote an interesting piece in the Washington Post two weeks ago, “The press-release conviction of a biotech CEO and its impact on scientific research.” Excerpt:
“The press release described a clinical trial of interferon gamma-1b (sold as Actimmune) in 330 patients with a rapidly fatal lung disease. What’s unusual is that everyone agrees there weren’t any factual errors in the four-page document. The numbers were right; it’s the interpretation of them that was deemed criminal. (Former InterMune biotech company CEO W. Scott) Harkonen was found guilty of wire fraud in 2009 for disseminating the press release electronically.
In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasn’t “statistically significant,” either.
Specifically, the so-called P value — a mathematical measure of the strength of the evidence that there’s a true difference between a treatment and placebo — was 0.08. It needs to be 0.05 or smaller to be considered “statistically significant” under the conventions of medical research.
Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statisticians to look.
It turns out that people with mild to moderate cases of the disease (as measured by lung function) had a dramatic difference in survival. Only 5 percent of those taking the drug died, compared with 16 percent of those on placebo. The P value was 0.004 — highly significant.
But there was a problem. This mild-to-moderate subgroup wasn’t one the researchers said they would analyze when they set up the study. Subdividing patients after the fact and looking for statistically significant results is a controversial practice. In its most extreme form, it’s scorned as “data dredging.” The term suggests that if you drag a net through a bunch of numbers enough times, you’ll come up with something significant sooner or later.
Exactly what Harkonen was thinking isn’t known, as he didn’t testify at his trial. Nevertheless, the press release’s two headlines focused all the attention on the mild-to-moderate subgroup.
“InterMune Announces Phase III Data Demonstrating Survival Benefit of Actimmune in IPF,” the document said in bold-face letters. Following it in italics was this sentence: “Reduces Mortality by 70% in Patients with Mild to Moderate Disease.”
Those two sentences were Harkonen’s crime.
‘No falsification of data’
In the trial, much was made of P values and the issue of after-the-fact analyses.
Two of the government’s experts testified that if a study misses all its “primary endpoints” (as this one did), then it’s improper to draw conclusions about a drug’s effect in subgroups identified later. The press release acknowledged missing the primary endpoints, but it didn’t indicate that the featured subgroup was identified after the study’s data were collected.
The prosecutors also emphasized that Harkonen had a financial motive for spinning the study in the most positive way. This wasn’t hard to find. The third paragraph of the press release said: “We believe these results will support use of Actimmune and lead to peak sales in the range of $400-$500 million per year, enabling us to achieve profitability in 2004 as planned.”
There was some talk that if Harkonen had just admitted more uncertainty in the press release — using the verb “suggest” rather than “demonstrate” — he might have avoided prosecution. (The U.S. attorney’s office for Northern California declined to talk about the case. The prosecution’s chief statistical expert, Thomas Fleming of the University of Washington, didn’t answer two e-mail requests for an interview.)
What’s unusual is that everything in the press release was correct. What was lacking, the prosecutor, jury, judge and appeals court concluded, was context.”
In evaluating news and information about health care research every day, I see a lot of spinning of research results. Important context is left out. The words matter. This was an important story.
Follow us on Twitter: