David Brown wrote an interesting piece in the Washington Post two weeks ago, “The press-release conviction of a biotech CEO and its impact on scientific research.” Excerpt:
“The press release described a clinical trial of interferon gamma-1b (sold as Actimmune) in 330 patients with a rapidly fatal lung disease. Whats unusual is that everyone agrees there werent any factual errors in the four-page document. The numbers were right; its the interpretation of them that was deemed criminal. (Former InterMune biotech company CEO W. Scott) Harkonen was found guilty of wire fraud in 2009 for disseminating the press release electronically.
In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasnt statistically significant, either.
Specifically, the so-called P value a mathematical measure of the strength of the evidence that theres a true difference between a treatment and placebo was 0.08. It needs to be 0.05 or smaller to be considered statistically significant under the conventions of medical research.
Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statisticians to look.
It turns out that people with mild to moderate cases of the disease (as measured by lung function) had a dramatic difference in survival. Only 5 percent of those taking the drug died, compared with 16 percent of those on placebo. The P value was 0.004 highly significant.
But there was a problem. This mild-to-moderate subgroup wasnt one the researchers said they would analyze when they set up the study. Subdividing patients after the fact and looking for statistically significant results is a controversial practice. In its most extreme form, its scorned as data dredging. The term suggests that if you drag a net through a bunch of numbers enough times, youll come up with something significant sooner or later.
Exactly what Harkonen was thinking isnt known, as he didnt testify at his trial. Nevertheless, the press releases two headlines focused all the attention on the mild-to-moderate subgroup.
InterMune Announces Phase III Data Demonstrating Survival Benefit of Actimmune in IPF, the document said in bold-face letters. Following it in italics was this sentence: Reduces Mortality by 70% in Patients with Mild to Moderate Disease.
Those two sentences were Harkonens crime.
No falsification of data
In the trial, much was made of P values and the issue of after-the-fact analyses.
Two of the governments experts testified that if a study misses all its primary endpoints (as this one did), then its improper to draw conclusions about a drugs effect in subgroups identified later. The press release acknowledged missing the primary endpoints, but it didnt indicate that the featured subgroup was identified after the studys data were collected.
The prosecutors also emphasized that Harkonen had a financial motive for spinning the study in the most positive way. This wasnt hard to find. The third paragraph of the press release said: We believe these results will support use of Actimmune and lead to peak sales in the range of $400-$500 million per year, enabling us to achieve profitability in 2004 as planned.
There was some talk that if Harkonen had just admitted more uncertainty in the press release using the verb suggest rather than demonstrate he might have avoided prosecution. (The U.S. attorneys office for Northern California declined to talk about the case. The prosecutions chief statistical expert, Thomas Fleming of the University of Washington, didnt answer two e-mail requests for an interview.)
Whats unusual is that everything in the press release was correct. What was lacking, the prosecutor, jury, judge and appeals court concluded, was context.”
In evaluating news and information about health care research every day, I see a lot of spinning of research results. Important context is left out. The words matter. This was an important story.
Follow us on Twitter: