While previously working on studies related to aging populations and patients with kidney disease, Statistics Professor Bin Nan came across a set of emerging issues that could not be resolved using standard methods. So, he decided to explore new methods, becoming the Principal Investigator on a grant titled “Cutting Edge Survival Methods for Epidemiological Data.” The National Institutes of Health (NIH) recently awarded Nan and his co-investigators — University of Michigan Professors Yi Li and Siobán Harlow — $1.2 million to develop the methods over the next four years.
“The standard methods people are using cannot solve many important questions that investigators are interested in,” says Nan. In particular, he is working with his colleagues to address three main problems in longitudinal cohort studies: delayed entry, covariate censoring and modeling terminal events.
Accounting for Unobserved Events
When you’re studying how long a patient survives, or how long it is before a disease occurs, you are not able to include patients who experienced events prior to the study’s enrollment. Patients with longer at-risk time periods have higher probabilities of being selected, yielding a biased sample. “The general terminology for this delayed entry is left truncation,” explains Nan, noting that it’s difficult to account for this. There are standard methods for dealing with left truncation, but they usually involve strong assumptions that are often violated in practice, and Nan and his colleagues are proposing a more robust and yet more efficient approach that relies on fewer assumptions.
Measuring the Effects of the Unmeasurable
Nan encountered another problem when studying how hormone levels are linked to health in peri- or post-menopausal women. The issue is that current assay technology can only quantify hormones above a certain level. According to Nan, “below a certain level, the variability is much bigger relative to the actual amount you’re trying to measure, so the measurement is not reliable — if it’s even available.” He explains that this classical problem is called the “limit of detection,” and it leads to covariate censoring — the variable can’t be determined below a certain level — in regression analysis. This issue also appears in environmental health studies when measuring the concentration of a pollutant in the air or water. The usual approach of simple substitution for measures below the detection limit can lead to biased results. “It’s not good,” says Nan. “Particularly for the study I was involved with, where certain hormones, even at a very low level, can play an important role.”
So, Nan and his colleagues are working to develop more robust methods for estimating the effects of such variables. They plan to use measurement errors routinely determined in lab work but rarely reported. “We’ll come up with different strategies to incorporate the measurement error,” he says. Based on the measurement error model, they’ll develop strategies to account for the errors and the limit of detection simultaneously. Nan hopes to create a statistical package with different approaches that other practitioners can apply in their own work.
Modeling that Makes Up for “Missing” Data
The final problem Nan is working to address deals with overcoming terminal events, a common issue in studies on aging. “Suppose we’re interested in understanding how bone marrow density affects people’s health — if or why they fall or have trouble walking, and so on,” says Nan. If a study participant passes away during the study, there are different ways to handle this. He says that one commonly used approach is the joint modeling method that links the longitudinal data and terminal event data through a latent variable. Another approach considers longitudinal data as a partially observed stochastic process stopped by a terminal event.
An implicit assumption of these methods is that the longitudinal variables would hold the same relationship if not stopped by the terminal event, so the data is treated as incomplete if a terminal event occurs — in other words, it’s usually viewed as though “you’re missing data after death,” explains Nan. “We’re taking the opposite view,” he says. “If I observe everything until death, then I don’t have any missing data. But if someone is still alive, then I have missing data, because I don’t know the data from now to the terminal event.” The goal is to use the complete information (data collected all the way to the terminal event) to build a model in which the parameters of interest are directly affected by how long people live. Nan adds that a future grant stemming from this work might possibly address more challenging problems, such as the time to disease onset instead of the longitudinal process of when a terminal event occurs, but for now, he is focused on solving these three main issues.
“I’m pretty confident we can achieve the goals, based on our preliminary research,” says Nan. “It will just take time.”
— Shani Murray