A news release from the Cincinnati Children’s Hospital Medical Center touts a new artificial intelligence algorithm that is “up to 93 percent accurate” in classifying people who are suicidal, compared to non-suicidal mentally ill and control patients, at three different hospital emergency rooms in the Midwest.
The release provides some details of the structure of the study. But the release overreaches, as readers don’t get a hint of crucial context and caveats — including the limitations, alternatives, possible harms, potential conflicts of interest, and more.
Unfortunately, being able to predict “when someone will commit suicide has been nearly impossible,” state the authors of the new study. What researchers can and have probed, however, are spoken and written evidence (e.g. suicide notes) to assess the risk of someone killing himself or herself. Reviewing all the evidence in real-time, where minutes or hours can count, is no simple task, so researchers have turned to artificial intelligence for help. Speech recognition and text analysis has grown extremely advanced in recent years, providing an interesting target to help identify suicidal risk and prevent deaths — though there is a lot more work to be done here before practical and minimally invasive screening tools can be put into the real world. To date, no evidence exists yet that this program can actually decrease suicides or attempted suicides.
No costs are presented.
We’re told the algorithm can identify someone who is actually suicidal with “great accuracy.” And it also throws out a few numbers, including that it’s “up to 93 percent accurate” (from a read of the study, this is in a combined group of kids and adults when assessed on speech) and “85 percent accurate in identifying a person who is suicidal, has a mental illness but is not suicidal, or neither.” But, these numbers in and of themselves don’t provide enough information to make sense of the findings: What are they being compared against? How has it been decided that these measurements are correct?
This isn’t covered in the release, but a look at the study leaves open a concerning question for us. Suicide prevention work is sensitive work: It’s tricky for caregivers and stressful for patients, and while the study notes an institutional review board approved the study, it doesn’t make clear how or whether the patients were counseled or debriefed before or after the study intervention (i.e. asking them a bunch of open-ended questions and recording audio of them speaking). This should have been made clear in the study as well as the release. There is also the potential harm of a false negative result — in which case caregivers may receive false reassurance regarding someone who is, in fact, suicidal.
Readers get a pretty solid overview of how the study was done, including the recruitment of 379 people at three different hospitals, and the procedure each patient went through. But some important strengths and limitations of the work aren’t noted here. One strength the release missed was noting how the hospitals ERs chosen gave a decent sampling of patients across economic, age, and environmental lines. Weaknesses it failed to mention included how the fused text-speech algorithm couldn’t satisfactorily distinguish adult suicidal patients compared to mentally ill, non-suicidal patients. Simply noting there was a risk of false-positives in mental health patients and false-negatives in adults would have made this release stronger. We’ll also note that the urban hospital had a much lower participation rate (only 123 out of 530 people approached in the ER chose to take part), which leaves open the question of whether or not some important data and context is missing in this study’s analysis and, by extension, the release.
We didn’t see any problematic descriptions of suicide here, which is of special concern given the evidence for contagion.
This isn’t made clear in the release, but it also isn’t clear in the study itself. As one saving grace, though, the lead author — who is a professor at Cincinnati Children’s Hospital Medical Center — is clearly identified.
The release notes standardized behavioral rating scales, and the fact that clinicians and caregivers are involved in screening patients for suicide risk. But it lacks details on what those screenings are called, how and when they are done, and what qualitative and health record-related risks they look for (e.g. mental health conditions, acute stress, history of suicide attempts, behavior changes, and more). Having at least a basic description of how patients are typically screened for suicide risk and subsequent observation would have made this release a lot stronger.
Availability of the algorithm isn’t made clear in the release, though the study notes that an “unexpected” lack of predictive power means “additional research” is required.
The release touts novel applications, such as screening in schools and other public places — but that’s a notional, anticipated use of the research. But it does establish novelty pretty clearly here: “When you look around health care facilities, you see tremendous support from technology, but not so much for those who care for mental illness. Only now are our algorithms capable of supporting those caregivers.”
Also, another novelty here, per our reading of the study, is combining promising speech and text machine-learning algorithms in hopes of classifying a more diverse group of people, including age and mental health status.
Nothing in this release raised a red flag for us.