VaxGen’s Claims of Vaccine Efficacy Evanesce in Autumn’s Last Light
The Bonferroni Blunder
On Monday February 24th, 2003, the biotech company VaxGen publicly announced the results of the first ever phase III efficacy trial of an HIV vaccine candidate, igniting an unexpected firestorm of controversy that has continued to smolder ever since. Now, a new analysis of data from the trial — presented publicly for the first time on September 17, at a meeting of the National Institutes of Health’s AIDS Vaccine Research Working Group — may extinguish further debate.
Prior to the announcement of the results, there was little optimism among scientists and activists that the vaccine construct — dubbed AIDSVAX and comprising two versions of HIV’s ever-mutating gp120 envelope protein — would protect against infection, and the overall outcome was therefore not a surprise: of the 5,009 study participants that received more than three shots of vaccine or placebo, 5.7% of those in vaccine group became infected with HIV versus 5.8% of those in the placebo group. But VaxGen’s press release contained an unanticipated claim: the vaccine, according to the company, showed statistically significant protective efficacy among Blacks (the term African-Americans was not used because participants were recruited in Puerto Rico and the Netherlands as well as the U.S.) and an arbitrarily combined subgroup of “Blacks, Asians and Others.”
Many activists, aware of the small numbers of non-White participants that VaxGen had recruited into the trial, immediately suspected that the claims were statistically shaky. But some mainstream journalists swallowed VaxGen’s interpretation whole, aided and abetted by a company press release that trumpeted a “less than 2% possibility” that the result had occurred by chance. Steve Sternberg in USA Today headlined his story “Vaccine for AIDS Appears to Work.” In response, several advocacy organizations — including TAG, AIDS Vaccine Advocacy Coalition, Project Inform and GMHC — rapidly issued statements outlining their concerns that the data were being wildly overplayed, potentially as a deliberate face-saving and stock price-saving strategy (appearing on CNN’s Financial News Network the morning of the announcement, VaxGen CEO Lance Gordon described the subgroup data as a “marvelous result”). The immediate fall-out was unpleasant: some members of the African American and Asian communities wondered if mainstream AIDS organizations were simply dismissing the results out of a lack of concern for the communities in which VaxGen was claiming some efficacy.
But when the smoke began to clear, the depths of VaxGen’s public relations depravity became apparent. A basic statistical tenet of conducting multiple analyses of trial data is that the prospect of finding a significant result by chance increases every time the data is sliced a different way. VaxGen’s statistical analysis of efficacy among demographic subgroups should therefore have been adjusted to take into account the multiple comparisons they were making, but — surprise, surprise — they were not. This basic error was first spotted by Larry Peiperl from the University of California at San Francisco’s HIVinsite Web site, who wrote about his concerns in a little-noticed article released the same day as the VaxGen announcement: “the p value suggests less than a 2% likelihood that the result in black participants was due to chance rather than vaccine efficacy. But if one looks for many possible correlations — say in Hispanics, in people over 30, in men, in women, in people in urban centers, etc. — it becomes increasingly likely that one will find a positive result due to chance, as every separate attempt to find a correlation becomes another opportunity for chance to play a role. … It is not clear from the press conference how the analysis was done, and how or whether statistical adjustments for possible multiple subgroup analyses were performed.”
Peiperl’s point was quickly echoed in a widely circulated e-mail from a highly respected researcher at the Los Alamos HIV Database. By the Thursday of that week, The New York Times had picked up on the story: “AIDS vaccine numbers off, statistician says — Effectiveness for minorities may be overstated” ran the headline, above an article in which independent statistician Steve Self explained the problem. The response from VaxGen’s President and AIDSVAX pied piper Don Francis? It’s “a tangential issue.” Science magazine’s Jon Cohen finally wrestled a confession from VaxGen’s Senior Vice President Marc Gurwith later that day: “The p values that were in the press release were not adjusted,” admitted Gurwith, in an article on the ScienceNow Web site entitled “VaxGen’s Sketchy Statistics.” In the absence of appropriate corrections, it was now unclear if the results in any of the individual subgroups analyzed were statistically significant, as VaxGen originally claimed. Ironically, at the same time that Cohen’s story was being released, VaxGen CEO Lance Gordon was standing on the podium at an investors conference in New York city telling the crowd that the trial results had been adjusted for multiple comparisons using a standard statistical tool called a Bonferroni correction (he later had to admit his error).
Despite the rapid collapse of VaxGen’s statistical house of cards, company co-founder Phil Berman presented an update on the trial results on March 31st at the Keystone vaccine research meeting, and continued to claim that the subgroup efficacy data were real (although his slides now noted that the p values were “unadjusted”). He did divulge, however, that AIDSVAX had shown no effect on the secondary trial endpoints of post-infection viral load and CD4 T cell counts, thus giving the lie to teasers contained in slides released as part of the original February announcement (which hinted at “atypical” immunological control of viral load in some vaccinated subjects — at the time of writing, these slides remain available online.
At this point, it fell to the National Institutes of Health to step in and attempt to resolve the issue. On April 24, a panel of independent researchers and representatives from some community organizations (including the AIDS Vaccine Advocacy Coalition and the African American AIDS Policy Institute) were brought together to hear yet more details on the VaxGen data. Attendees at the meeting were surprised to be initially told that they had been brought together to decide if the NIH should fund an additional phase III efficacy trial in minorities, and they quickly articulated their unwillingness to make any such decision. On the other hand, few people at the meeting seemed inclined to entirely write the data off, either. Instead, it was decided to solicit a more detailed analysis of the VaxGen data from a committee made up of representatives from NIH (Dean Follman, Jorge Flores), the SCHARP Statistical Center at the University of Washington (Peter Gilbert, Steve Self), CDC (Martha Ackers, Dale Hu) and VaxGen (Marc Gurwith, Vladimir Popovic).
Originally due on July 1, the new report finally made its public debut at the AIDS Vaccine Research Working Group meeting held on September 17, the first day of the recent AIDS Vaccine 2003 Conference in New York. NIH’s Dean Follman presented an overview of the results, starting with the multiple analyses required to evaluate the AIDVAX results in demographic subgroups. Instead of outlining whether the subgroup data remained statistically significant using various statistical correction methods, Follman presented the committee’s take on the likelihood of obtaining significant results by chance. When 12 subgroups were compared, a significant result (p=<0.01) could be obtained by chance about 8% of the time. If the number of subgroups analyzed was increased to 15, the chance of obtaining a fluke result with a p value of <0.05 also increased, occurring about 22% of the time.
The committee also looked for reasons — other than chance — that might explain the trial results. There was some evidence that people at higher risk for HIV infection might have less likely to become infected if they received vaccination. However, risk behavior among participants was equivalent in the White and Black subgroups. Also, because the overall trial result showed no protection, the fact that higher risk individuals who received AIDSVAX showed a slightly reduced risk of HIV infection means that lower risk individuals who were vaccinated had an increased risk of infection compared to those lower risk participants that received placebo.
This unlikely scenario is hard to explain, unless it also reflects the play of chance. Another line of evidence suggested that antibody levels (measured at the peak of the response, shortly after vaccination) tended to be higher among Black male participants compared to their White counterparts, and among women compared to men. Furthermore, participants with a higher antibody response to AIDSVAX appeared to be at a lower risk for acquiring HIV infection during the study. However, once again, the fact that the overall study result was a wash means that individuals with low antibody responses to the vaccine faced a higher risk of infection than those receiving placebo.
This result is also inexplicable, unless — as Follman concluded — the antibody response to immunization is simply acting as a surrogate for a more robust immune response, and individuals with a more robust immune response were less likely to acquire HIV infection. This is arguably an encouraging finding, in that it does suggest that the quality of a person’s immune response to HIV may affect their susceptibility to infection.
After reviewing the data, Follman reported that the conclusion of the committee was that the subgroup results were “likely spurious.” In other words, in the absence of any plausible biological explanation, the unadjusted p values that suggested statistically significant protection among certain subgroups were likely to be a product of chance. Whether VaxGen is ready to concede the point is unclear — at the ICAAC conference in Chicago just prior to the NYC vaccine meeting, the company presented a study which stated that “analysis of the data by race and gender suggests the possibility of protection against HIV infection in women and Blacks.” VaxGen’s version of the trial outcome has apparently been submitted to a journal for publication.
Meanwhile, the NIH is convening a series of meetings in order to work out how to publicize the countervailing conclusions of Follman’s committee. Further complicating matters, the results of a second AIDSVAX efficacy trial — this one conducted among intravenous drug users in Thailand — are due to be announced in the fourth quarter of 2003. In a move that rather undermines their confident public stance, VaxGen are refusing to pay for the analyses that are required to complete the Thai trial. Instead, the results will be analyzed by an independent panel led by an as yet unnamed CDC staffer and the funding will be provided partly by the NIH (to the tune of around $600,000 or more) and partly by a private non-profit entity that most people assume to be the Gates Foundation (who will be required to stump up around $1 million).
By the year’s end, as more data from both efficacy trials (including the new analyses presented by Dean Follman) become published and available, a clearer picture of the AIDVAX debacle should emerge. One key lesson for future vaccine efficacy trials is writ large already: the participation of both men and women from diverse ethnic backgrounds is vital, not just from a perspective of equity, but to ensure that comparisons of vaccine effects by gender and race can be made with confidence.