Our recent paper about predicting risk of suicidal behavior following mental health visits  prompted questions from clinicians and health system leaders regarding practical utility of risk predictions.  Our clinical partners asked, “Are machine learning algorithms accurate enough to replace clinicians’ judgement?”  My answer was, “No, but they are accurate enough to direct clinicians’ attention.”

Our 12-year old Subaru is now passed down to my son, and we bought a 2018 model.  The new car has a “blind spot” warning system built in.  When it senses another vehicle in my likely blind spot, a warning light appears on the outside rearview mirror.  If I start to merge in that direction anyway, the light starts blinking and a warning bell chimes.  I like this feature a lot.  Even if I already know about the other car behind and to my right, the warning light isn’t too annoying or distracting.  The warning light may go on when there isn’t car in my blind spot – a false positive.  Or it may not light up when there is a car in that dangerous area – a false negative.  But the warning system doesn’t fall asleep or get distracted by something up ahead.  It keeps its “eye” on one thing, all the time.

I hope that machine learning-based suicide risk prediction scores could work the same way.  A high-risk score would alert me and my clinical colleagues to a potentially unsafe situation that we might not have noticed.  If we’re already aware of the risk, the extra notice doesn’t hurt.  If we’re not aware of risk, then we’ll be prompted to pay attention and ask some additional questions.  There will be false positives and false negatives.  But risk prediction scores don’t get distracted or forget relevant facts.

We are clear that suicide risk prediction models are not the mental health equivalent of a driverless car.  We anticipate that alerts or warnings based on calculated risk scores would always be delivered to actual humans.  Those humans might include receptionists who would notice when a high-risk patient cancels an appointment, nurses who would call when a high-risk patient fails to attend a scheduled visit, or treating clinicians who would be alerted before a visit that a more detailed clinical assessment is probably indicated.  A calculated risk score alone would not be enough to prompt any “automated” clinical action – especially a clinical action that might be intrusive or coercive.  I hope and expect that our risk prediction methods will improve.  But I am skeptical they will ever replace a human driver.

My new car’s blind spot warning system does not have control of the steering wheel, the accelerator, or the brake pedal.  If the warning is a false positive, I can choose to ignore or override it.  But that blinking light and warning chime will get my attention.  If that alarm goes off, I’ll take a close look behind me before deciding that it’s actually safe to change lanes.