Observational Databases Promise to Solve Clinical Trials Lag, But Experts Warn All Data Not Created Equal
March 2000
“Built-In Bias”
Four years into the era of HAART-for-life, as the feasibility of eradication recedes to the vanishing point, the Division of AIDS at the National Institutes of Health (NIH) sponsored a workshop on “long-term effectiveness research” in HIV disease. If the disease is going to be chronic, how can researchers make it manageable? Over-optimistic assumptions about adherence, eradication, and toxicity governed initial recommendations to “Hit early, hit hard.” As a plethora of new and bizarre side effects became apparent, the wisdom of hitting early became somewhat eroded — especially as the immune system displayed a greater ability to reconstitute itself than had previously been expected. Hence the new interest in hitherto heretical questions such as “When to start?” and “How to change?” antiretroviral therapy. Some notes from the workshop follow.
NIAID Director Anthony S. Fauci opened the workshop, saying “I’m here because it’s important. Over 200,000 Americans with HIV are unaware of their infection, and 40,000 become infected each year. The burden of treatment [in the USA] that’s ahead of us is going to be greater than all we’ve treated to date. If we don’t need to treat everyone every day — e.g., with structured treatment interruptions (STIs) — it’s possible we could treat some people in developing countries abroad who wouldn’t otherwise be able to be treated . . .”
John Bartlett, Fauci’s co-chair on the HHS Antiviral Guidelines Committee, summarized his view of the data standards used in developing and updating the treatment guidelines. “For the Guidelines, we always use randomized, controlled trials. Sample size is unspecified. The duration of trials is between 24-48 weeks. For analysis, we use intent-to-treat and on treatment. The endpoints we look at include viral load <50, <500, and a separate analysis for those entering with viral loads over 100,000 copies/mL. We also look at adverse drug events, and tolerance. We’d like a comparison to the regimens in the preferred category.”
“Simply put, what we do in the clinic is different than what we say in the Guidelines. Among the main concerns with current guidelines: long-term outcomes are not available from most of the studies; initiation of ‘When to start?’ is arbitrary; the need for individualization of regimens; the outstanding quandary of protease-sparing initial regimens; the benefit of partial viral suppression vs. the rapid squandering of future therapeutic options; the continuing threat of drug resistance issues.”
“Things in this field change with great speed. In most areas of medicine, the average time for clinical trial results to affect practice is 10 years; in HIV it is two years. Clinical trials have a participant bias. Fifty to eighty percent of individuals in clinical trials achieve a viral load below 50 copies/mL; in clinical practice this rate is a 20-40%. Enrollment and retention are problems in randomized clinical trials. This is largely driven by Medicaid policies, which are state-specific. Where you have a good Medicaid, you have a disincentive for enrollment — look at Maryland, Massachusetts, and New York. Moreover, none of the big trials to date has assessed cost.”
No one in Bartlett’s clinic is now on a single protease inhibitor (as part of a 4+ drug combo) — with the possible exception of nelfinavir. Bartlett pointed out that, using MACS data (Mellors, Ann Intern Med 1997), there was little difference in the incidence of AIDS at 3-5 years between those with CD4 over 500 or between 350-500. Similarly, the Swiss HIV cohort did not see major differences in progression between injecting drug users and others who started HAART later, and early starters (Junghans, AIDS 1999).
Gwen Scott of University of Miami School of Medicine spoke about the Pediatric Guidelines. There is a sub-population of children whose immune systems are preserved, and it is not known whether they should be started on antiretroviral therapy (ART). Other important questions on the pediatric front include: Which combination therapy can best preserve growth and development? What is the mechanism of neurologic disease in children and how can it be prevented? How should resistance testing be best interpreted and incorporated into clinical practice? What is the role of cesarean section in women with undetectable viral load on HAART in preventing perinatal transmission?
Trip Gulick of the Cornell ACTU pointed out that in the AIDS field, a three-year study is considered “long-term.” [Actually, in HIV infection a one-year study is considered long-term.] Clinical trial demographics do not always represent clinic populations, and study outcomes do not always match clinic outcomes.
Carlton Hogan of the CPCRA Statistical Center gave an activist perspective. He assumes that most patients initiating therapy in the next five years will be able to get their plasma viral load beneath the limit of quantitation — at least in the short-term. Therefore, drug side effects are likely to outweigh AIDS progression events. Hogan also predicts we won’t have many novel drug classes or compounds, and that combination ART will still be necessary.
What events are relevant? AIDS opportunistic infections (OIs) have decreased dramatically since the advent of potent combination therapy. Most future acute OI incidents are likely to be in individuals with barriers to care or unaware of their HIV seropositivity. These groups will be difficult to study. In patients on antiretroviral therapy, rates of adverse events — some life-threatening (e.g., cardiovascular) — are climbing. These may well become more common than OIs themselves. We cannot imagine (much less predict) the potential consequences of 10-20 years of antiretroviral therapy, complicated by interactions with cardiovascular drugs and aging. Other questions not raised in this workshop: when/if to stop antiretroviral therapy? when/if to restart? what to do with patients with discordant viral load and CD4 responses to antiretroviral therapy? Melanie Thompson of the Atlanta CPCRA unit cited some pithy quotes: “A physician is a person who pours drugs of which she knows little into a body of which she knows less. A doctor is someone who kills you today to prevent you from dying tomorrow.”
Randomized, Controlled Trials vs. Observational Cohort Studies
Mike Saag of the University of Alabama at Birmingham reviewed some observational cohort studies. The MACS has 5,000 individuals enrolled; EuroSIDA, 7,300; the Swiss Cohort, 3,400; the Frankfurt cohort, 1,100; the Royal Free Hospital, 500; the Vancouver cohort, 1,750; the Glaxo-sponsored CHORUS study, 5,000. The adult ACTG study A5001 (Adult Linked Longitudinal Randomized Trials, or ALLRT protocol), just opening, will follow 2,500-3,000 individuals co-enrolled in selected randomized adult ACTG studies.
Saag asked how the ACTG site will get the adverse event and hospitalization data if it’s not providing primary care? Using data from the UAB cohort, Saag pointed out the complexities of current care. Among 143 patients whose viral load rose to over 5,000 copies/mL, there were 1,067 “regimen events,” 242 unique regimens, and 107 unique regimen sequences. Every patient experienced a virtually unique regimen. Eighty-nine regimens included at least four drugs. Average time on each regimen was a median of four months, mostly due to toxicity.
Lawrence Friedman of the National Heart, Lung & Blood Institute (NHLBI) discussed analogous experiences from cardiovascular disease. He focused on coronary heart disease and selected risk factors. Similar to HIV infection, cardiovascular disease is chronic, takes decades to develop, and has many interacting risk factors. There are many ways for adverse consequences to develop: heart failure, aneurysm, embolism, plaque rupture, and others. Further, there are many interventions: lower LDH, raise HDH, stop smoking, lower blood pressure, modify diet, increase exercise, give antiplatelet drugs or antioxidants. Treatment approaches include anticoagulants, ACE-inhibitors, beta blockers, defibrillators, bypass surgery, angioplasty, heart transplant, and others.
In 1972 the NHLBI recommended reducing blood pressure; in 1985 the institute recommended reducing cholesterol. Both of these approaches were initially based on risk evaluation from observational cohort data — without much data from large, randomized clinical trials to go by. Later the statin trials produced results, proving that lowering cholesterol saves lives; however, there were some other not-so-happy examples.
Anti-arrhythmic therapy was recommended based on observational studies. Yet the randomized clinical trials showed that, rather than prolonging life, the anti-arrhythmics actually increased mortality. Similarly, observational cohort studies showed a high level of cardiovascular benefit from estrogen replacement therapy, while randomized clinical trials are, if anything, showing a negative effect.
Friedman looked at the salt controversy. Should you reduce salt intake to reduce hypertension? Observational data are inconsistent. Most randomized clinical trials have been small and short. The few larger longer-term trials show quite modest effects on blood pressure. No trials have looked at clinical outcomes. Meta-analyses show statistically significant but quite modest differences. Some advocate reducing salt only in high-risk individuals, since the effect on blood pressure is minimal for most, and salt reduction reduces quality of life. Others favor a population strategy, claiming that even small shifts could have great public health importance. NHLBI held a workshop, which waffled: “The evidence that salt intake contributes to high blood pressure continues to increase. Americans take too much salt. The population strategy could affect cardiovascular mortality as much or more than a high-risk strategy.”
Friedman concluded by stating that cardiovascular medicine has been revolutionized by trials. Not all important trials have been done. Randomized clinical trials were interpreted along with observational cohort study (OCS) data.
Caroline Sabin of the Royal Free Hospital in London further discussed the value and limitations of randomized clinical trials versus OCS. We have randomized clinical trials with clinical endpoints, randomized clinical trials with surrogate marker data, and OCS, case control studies, etc. Before HAART, progression was faster, clinical endpoints were possible within a 2-3 year time frame, treatment options were limited, drop-outs were “relatively simple” to deal with, and results were easier to interpret. Even then, however, clinical endpoint trials seemed to take too long. Surrogate marker-based trials were then adopted based on the biology of disease and the need to speed up “answers.” First CD4 cell counts were used, then plasma viral load.
There are some benefits to observational studies. They reflect routine clinical practice, look at lots of regimens, sometimes have longer follow-up, and are perceived to be quicker. They are still too small to look at rarer, more complex regimens or adverse events, however, and treatment comparisons can come with built-in bias. Compare patients starting regimen A or B without randomization: Do they really share the same prognosis? What determines the choice of regimen A and regimen B? If the choice between A and B is made based on subjective clinical or laboratory evaluation — as it often should be — then the treatment outcomes will necessarily reflect these differences as well as any differences between A and B.
Sabin showed data — generated earlier by Andrew Phillips — comparing three randomized clinical trials and three observational cohort studies looking at the relative benefits of AZT vs. AZT/ddI and AZT/ddC. Here, the three randomized clinical trials — ACTG 175, CPCRA 007, and Delta — and the three observation cohort studies — EuroSIDA, the French Hospital Cohort and the Swiss Cohort — showed comparable results. The databases agreed. This was nice.
However, in a second comparison — using ACTG 320 as the randomized clinical trial and the same cohort studies — ACTG 320, EuroSIDA and the Swiss Cohort all agreed. But the gargantuan French Hospital Cohort study (N = 70,000) went in exactly the opposite direction, suggesting that you’d be better off adding just 3TC, not 3TC and indinavir, to your AZT. Has the database given us the wrong answer to the correct question, the correct answer to a different question, or what?
Observational cohort studies are also of limited use unless the strategies are in current practice. They can be quick if retrospective data exist, but they will take nearly the same amount of time as a randomized clinical trial if data must be collected prospectively. OCS are also confounded by amount of prior treatment received. For example, the earliest individuals to go on HAART are likely to have been on mono or dual nucleoside therapy the longest, whereas later HAART starters probably had less prior nucleoside exposure. No surprise, then, that the late HAART group experienced greater benefit than the earlier group. With pulsed therapy studies — because pulsed therapy is a recent-to-emerge strategy — OCS will, like randomized clinical trial data, need to be collected prospectively.
Finally, when can we justifiably rely on OCS data? Only in cases where the treatment effect is large? Where many independent, well-run OCSs are consistent? When there is no apparent confounding in studies? Or when the confounding occurs in opposite directions in different studies? On balance, long-term studies are essential to consider clinical events and toxicities and long-term surrogate marker values. If randomized clinical trials are feasible and ethical, we should do them.