Publications

2026

ProbML ICML MOSS

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

*M Lu, *Y Liu, M Nock, and Y Yacoby

Accepted @ ProbML 2026
Previous version accepted @ ICML MOSS 2025

Abs PDF

Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for EMA data, as it is irregularly sampled, noisy, and partially observed. But SDE-based models suffer from two key limitations. (a) These models often violate domain constraints, undermining scientific validity and clinical trust of the model. (b) Training is numerically unstable without ad hoc fixes (e.g. oversimplified dynamics) that are ill-suited for high-stakes applications. Here, we develop a novel class of expressive SDEs whose solutions are provably confined to a prescribed compact polyhedral state space, matching the domains of EMA data. In this work, (1) we show why chain-rule based constructions of SDEs on compact domains fail, theoretically and empirically; (2) we derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and (3), we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines. These contributions pave the way for principled, trustworthy continuous-time models of suicide risk and other clinical time series and extend applications of SDE-based methods (e.g. diffusion models) to domains with hard state constraints.
SIGCSE

Teaching Probabilistic Machine Learning in the Liberal Arts: Empowering Socially and Mathematically Informed AI Discourse

Y Yacoby

Accepted @ SIGCSE 2026 Oral Presentation

Abs PDF

We present a new undergraduate ML course at our institution, a small liberal arts college serving students minoritized in STEM, designed to empower students to critically connect the mathematical foundations of ML with its sociotechnical implications. We propose a "framework-focused" approach, teaching students the language and formalism of probabilistic modeling while leveraging probabilistic programming to lower mathematical barriers. We introduce methodological concepts through a whimsical, yet realistic theme, the "Intergalactic Hypothetical Hospital," to make the content both relevant and accessible. Finally, we pair each technical innovation with counter-narratives that challenge its value using real, open-ended case-studies to cultivate dialectical thinking. By encouraging creativity in modeling and highlighting unresolved ethical challenges, we help students recognize the value and need of their unique perspectives, empowering them to participate confidently in AI discourse as technologists and critical citizens.
JPCS

Using smartphone surveys to predict next-week suicide attempts

M Nock, E Kleiman, K Bentley, R Fortgang, A Millner, K Zuromski, A Bear, A Christie, M Daniel, D DeMarco, L Follet, F Kelly, H Neveux, O Obi-Obasi, J Ricard, N Ramlal, T Tambedou, Y Yacoby, S Bird, R Buonapane, A Donovan, P Mair, J Onnela, R Picard, and J Smoller

Accepted @ Journal of Psychopathology and Clinical Science 2026

Abs PDF

Clinicians are tasked with predicting and preventing suicidal behavior among their patients; however, there is currently no method for accurately predicting whether a person will make a suicide attempt (SA) in the near future. We tested whether brief, smartphone-based surveys, combined with passively collected survey metadata, could predict the occurrence of suicidal behavior over the next 7 days among those at elevated risk. Participants were 619 patients presenting to the hospital with suicidal thoughts/behavior. They were sent brief (20-item) smartphone-based surveys 6 times/day for 3 months. Survey responses (N = 79,448) and metadata (e.g., time since last survey submission) were used as predictors of next-week SA and suicide-related event (SRE; which also included hospitalization to prevent an SA) in a series of machine learning models. The most accurate prediction was achieved using bidirectional long short-term memory and simple lasso-penalized logistic regression models, with the best performing model using bidirectional long short-term memory to predict SRE, which with specificity at .90, had area under the curve = .94, sensitivity = .87, and positive predictive value = .30, and SAs with area under the curve = .90, sensitivity = .74, positive predictive value = .16. Prediction accuracy was higher than has been achieved in prior studies and was strongest for models that predicted SREs (vs. SAs), included more sources of data, focused on adults (vs. adolescents), and when participants’ own data were included in the model training process (vs. being held out). The strongest and most consistent predictors of next-week SA included within-study history of SREs (from adult lasso regression: OR = 1.47) and self-reported agitation (OR = 1.11), whereas odds of next-week SA were decreased for surveys submitted on weekends (OR = 0.87) and in the context of feelings that one could resist suicidal urges (OR = 0.88-0.96). Brief smartphone-based surveys can predict next-week SAs/SREs with a fairly high degree of accuracy. Future work is needed to further improve accuracy and test just-in-time interventions targeting high-risk periods.

2025

NeurIPS TS4H

Improving Forecasts of Suicide Attempts for Patients with Little Data

G Hang, A Chen, H Neveux, M Nock, and Y Yacoby

Accepted @ NeurIPS TS4H 2025

Abs PDF

Ecological Momentary Assessment provides real-time data on suicidal thoughts and behaviors, but predicting suicide attempts remains challenging due to their rarity and patient heterogeneity. We show that single models fit to all patients perform poorly, while individualized models overfit with limited data. To address this, we introduce a Latent Similarity Gaussian Process (LSGP) that models patient heterogeneity, enabling those with little data to leverage similar patients’ trends. Preliminary results show improved sensitivity over baselines and offer new understanding of patient similarity.
UAI ICML HAS

Transparent Trade-offs between Properties of Explanations

H Tadesse, A Hüyük, Y Yacoby, W Pan, and F Doshi-Velez

Accepted @ UAI 2025
Previous version accepted @ ICML HAS 2024

Abs PDF

When explaining machine learning models, it is important for explanations to have certain properties like faithfulness, robustness, smoothness, low complexity, etc. However, many properties are in tension with each other, making it challenging to achieve them simultaneously. For example, reducing the complexity of an explanation can make it less expressive, compromising its faithfulness. The ideal balance of trade-offs between properties tends to vary across different tasks and users. Motivated by these varying needs, we aim to find explanations that make optimal trade-offs while allowing for transparent control over the balance between different properties. Unlike existing methods that encourage desirable properties implicitly through their design, our approach optimizes explanations explicitly for a linear mixture of multiple properties. By adjusting the mixture weights, users can control the balance between those properties and create explanations with precisely what is needed for their particular task.
Science

Preference-based Assistance Optimization for Lifting and Lowering with a Soft Back Exosuit

P Arens, A Quirk, W Pan, Y Yacoby, F Doshi-Velez, and C Walsh

Accepted @ Science Advances 2025

Abs PDF

Wearable robotic devices have become increasingly prevalent in both occupational and rehabilitative settings, yet their widespread adoption remains inhibited by usability barriers related to comfort, restriction, and noticeable functional benefits. Acknowledging the importance of user perception in this context, this study explores preference-based controller optimization for a back exosuit that assists lifting. Considering the high mental and metabolic effort discrete motor tasks impose, we used a forced-choice Bayesian Optimization approach that promotes sampling efficiency by leveraging domain knowledge about just noticeable differences between assistance settings. Optimizing over two control parameters, preferred settings were consistent within and uniquely different between participants. We discovered that overall, participants preferred asymmetric parameter configurations with more lifting than lowering assistance, and that preferences were sensitive to user anthropometrics. These findings highlight the potential of perceptually guided assistance optimization for wearable robotic devices, marking a step toward more pervasive adoption of these systems in the real world.

2024

AABI

Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders

Y Yacoby, W Pan, and F Doshi-Velez

Accepted @ Workshop at AABI 2024

Abs PDF

Inference for Variational Autoencoders (VAEs) consists of learning two models: (1) a generative model, which transforms a simple distribution over a latent space into the distribution over observed data, and (2) an inference model, which approximates the posterior of the latent codes given data. The two components are learned jointly via a lower bound to the generative model’s log marginal likelihood. In early phases of joint training, the inference model poorly approximates the latent code posteriors. Recent work showed that this leads optimization to get stuck in local optima, negatively impacting the learned generative model. As such, recent work suggests ensuring a high-quality inference model via iterative training: maximizing the objective function relative to the inference model before every update to the generative model. Unfortunately, iterative training is inefficient, requiring heuristic criteria for reverting from iterative to joint training for speed. Here, we suggest an inference method that trains the generative and inference models independently. It approximates the posterior of the true model a priori; fixing this posterior approximation, we then maximize the lower bound relative to only the generative model. By conventional wisdom, this approach should rely on the true prior and likelihood of the true model to approximate its posterior (which are unknown). However, we show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model’s posterior. We then use MAPA to develop a proof-of-concept inference method. We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines. Lastly, we present a roadmap for scaling the MAPA-based inference method to high-dimensional data.
Nature

Building personalized machine learning models using real-time monitoring data to predict idiographic suicidal thoughts

S Wang, R Genugten, Y Yacoby, W Pan, K Bentley, S Bird, R Buonopane, A Christie, M Daniel, D DeMarco, A Haim, L Follet, R Fortgang, F Kelly, E Kleiman, A Millner, O Obi-Obasi, J Onnela, N Ramlal, J Ricard, J Smoller, T Tambedou, K Zuromski, and M Nock

Accepted @ Nature Mental Health 2024

Abs PDF

Suicide risk is highest immediately after psychiatric hospitalization, but the field lacks methods for identifying which patients are at greatest risk, and when. We built personalized models predicting suicidal thoughts after psychiatric hospital visits (N=89 patients), using ecological momentary assessment (EMA; average EMA responses per participant=311). We built several idiographic models, including baseline autoregressive and elastic net models (using single train/test split) and Gaussian Process (GP) models (using an iterative rolling-forward prediction method). Simple GP models provided the best prediction of suicidal urges (R2_average of 0.17), outperforming baseline autoregressive (R2_average of 0.10) and elastic net (R2_average of 0.07) models. Similarly, simple GP models provided the best prediction of suicidal intent (R2_average of 0.12) compared to autoregressive (R2_average of 0.08) and elastic net (R2_average of 0.06). Findings suggest idiographic prediction of suicidal thoughts is possible, though accuracy currently is modest. Building GP models that iteratively update and learn symptom dynamics over time could provide important information to inform development of just-in-time adaptive interventions.

2023

SIGCSE

Empowering First-Year Computer Science Ph.D. Students to Create a Culture that Values Community and Mental Health

Y Yacoby, J Girash, and D Parkes

Accepted @ SIGCSE 2023 Oral Presentation

Abs PDF

Doctoral programs often have high rates of depression, anxiety, isolation, and imposter phenomenon. Consequently, graduating students may feel inadequately prepared for research-focused careers, contributing to an attrition of talent. Prior work identifies an important contributing factor to maladjustment: that, even with prior exposure to research, entering Ph.D. students often have problematically idealized views of science. These preconceptions can become obstacles for students in their own professional growth. Unfortunately, existing curricular and extracurricular programming in many doctoral programs do not include mechanisms to systematically address students’ misconceptions of their profession. In this work, we describe a new initiative at our institution that aims to address Ph.D. mental health via a mandatory seminar for entering doctoral students. The seminar is designed to build professional resilience in students by (1) increasing self-regulatory competence, and (2) teaching students to proactively examine academic cultural values, and to participate in shaping them. Our evaluation indicates that students improved in both areas after completing the seminar.

2022

NeurIPS ICBINB

An Empirical Analysis of the Advantages of Finite- vs. Infinite-Width Bayesian Neural Networks

J Yao, Y Yacoby, B Coker, W Pan, and F Doshi-Velez

Accepted @ NeurIPS ICBINB 2022

Abs PDF

Comparing Bayesian neural networks (BNNs) with different widths is challenging because, as the width increases, multiple model properties change simultaneously, and, inference in the finite-width case is intractable. In this work, we empirically compare finite- and infinite-width BNNs, and provide quantitative and qualitative explanations for their performance difference. We find that when the model is mis-specified, increasing width can hurt BNN performance. In these cases, we provide evidence that finite-width BNNs generalize better partially due to the properties of their frequency spectrum that allows them to adapt under model mismatch.
JMLR ICML UDL

Mitigating the Effects of Non-Identifiability on Inference for Bayesian Neural Networks with Latent Variables

*Y Yacoby, *W Pan, and F Doshi-Velez

Accepted @ JMLR 2022
Previous version accepted @ ICML UDL 2019 Spotlight Talk

Abs PDF

Bayesian Neural Networks with Latent Variables (BNN+LVs) capture predictive uncertainty by explicitly modeling model uncertainty (via priors on network weights) and environmental stochasticity (via a latent input noise variable). In this work, we first show that BNN+LV suffers from a serious form of non-identifiability: explanatory power can be transferred between the model parameters and latent variables while fitting the data equally well. We demonstrate that as a result, in the limit of infinite data, the posterior mode over the network weights and latent variables is asymptotically biased away from the ground-truth. Due to this asymptotic bias, traditional inference methods may in practice yield parameters that generalize poorly and misestimate uncertainty. Next, we develop a novel inference procedure that explicitly mitigates the effects of likelihood non-identifiability during training and yields high-quality predictions as well as uncertainty estimates. We demonstrate that our inference method improves upon benchmark methods across a range of synthetic and real data-sets.
HCOMP CHI HCXAI

"If it didn’t happen, why would I change my decision?": How Judges Respond to Counterfactual Explanations for the Public Safety Assessment

Y Yacoby, B Green, C Griffin, and F Doshi-Velez

Accepted @ HCOMP 2022
Previous version accepted @ CHI HCXAI 2022 Oral Presentation

Abs PDF

Many researchers and policymakers have expressed excitement about algorithmic explanations enabling more fair and responsible decision-making. However, recent experimental studies have found that explanations do not always improve human use of algorithmic advice. In this study, we shed light on how people interpret and respond to counterfactual explanations (CFEs) – explanations that show how a model’s output would change with marginal changes to its input(s) – in the context of pretrial risk assessment instruments (PRAIs). We ran think-aloud trials with eight sitting U.S. state court judges, providing them with recommendations from a PRAI that includes CFEs. We found that the CFEs did not alter the judges’ decisions. At first, judges misinterpreted the counterfactuals as real – rather than hypothetical – changes to defendants. Once judges understood what the counterfactuals meant, they ignored them, stating their role is only to make decisions regarding the actual defendant in question. The judges also expressed a mix of reasons for ignoring or following the advice of the PRAI without CFEs. These results add to the literature detailing the unexpected ways in which people respond to algorithms and explanations. They also highlight new challenges associated with improving human-algorithm collaborations through explanations.
arXiv ICML UDL

Failure Modes of Variational Autoencoders and Their Effects on Downstream Tasks

Y Yacoby, W Pan, and F Doshi-Velez

Full paper on arXiv 2022
Previous version accepted @ ICML UDL 2020

Abs PDF

Variational Auto-encoders (VAEs) are deep generative latent variable models that are widely used for a number of downstream tasks. While it has been demonstrated that VAE training can suffer from a number of pathologies, existing literature lacks characterizations of exactly when these pathologies occur and how they impact downstream task performance. In this paper, we concretely characterize conditions under which VAE training exhibits pathologies and connect these failure modes to undesirable effects on specific downstream tasks, such as learning compressed and disentangled representations, adversarial robustness, and semi-supervised learning.
arXiv ICML UDL

Uncertainty-Aware (UNA) Bases for Deep Bayesian Regression Using Multi-Headed Auxiliary Networks

*S Thakur, *C Lorsung, *Y Yacoby, F Doshi-Velez, and W Pan

Full paper on arXiv 2022
Previous version accepted @ ICML UDL 2020

Abs PDF

Neural Linear Models (NLM) are deep Bayesian models that produce predictive uncertainties by learning features from the data and then performing Bayesian linear regression over these features. Despite their popularity, few works have focused on methodically evaluating the predictive uncertainties of these models. In this work, we demonstrate that traditional training procedures for NLMs drastically underestimate uncertainty on out-of-distribution inputs, and that they therefore cannot be naively deployed in risk-sensitive applications. We identify the underlying reasons for this behavior and propose a novel training framework that captures useful predictive uncertainties for downstream tasks.

2020

ICML UDL

BACOUN: Bayesian Classifiers with Out-of-Distribution Uncertainty

T Guénais, D Vamvourellis, Y Yacoby, F Doshi-Velez, and W Pan

Accepted @ ICML UDL 2020

Abs PDF

Traditional training of deep classifiers yields overconfident models that are not reliable under dataset shift. We propose a Bayesian framework to obtain reliable uncertainty estimates for deep classifiers. Our approach consists of a plug-in "generator" used to augment the data with an additional class of points that lie on the boundary of the training data, followed by Bayesian inference on top of features that are trained to distinguish these "out-of-distribution" points.
ICML WHI

CRUDS: Counterfactual Recourse using Disentangled Subspaces

M Downs, J Chu, Y Yacoby, F Doshi-Velez, and W Pan

Accepted @ ICML WHI 2020

Abs PDF

Algorithmic recourse is the task of generating a set of actions that will allow individuals to achieve a more favorable outcome under a given algorithmic decision system. Using the Conditional Subspace Variational Autoencoder (CSVAE), we propose a novel algorithmic recourse generation method, CRUDS, that generates multiple recourse satisfying underlying structure of the data as well as end-user specified constraints. We evaluate our method qualitatively and quantitatively on several synthetic and real datasets, demonstrating that CRUDS proposes recourse that are more realistic and actionable than baselines.

2019

AABI PMLR

Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders

Y Yacoby, W Pan, and F Doshi-Velez

Accepted @ AABI 2019 Spotlight Talk

Additionally selected for publication @ PMLR 2019 Top 33%

Abs PDF

Variational Auto-encoders (VAEs) are deep generative latent variable models consisting of two components: a generative model that captures a data distribution p(x) by transforming a distribution p(z) over latent space, and an inference model that infers likely latent codes for each data point. Recent work shows that traditional training methods tend to yield solutions that violate modeling desiderata: (1) the learned generative model captures the observed data distribution but does so while ignoring the latent codes, resulting in codes that do not represent the data; (2) the aggregate of the learned latent codes does not match the prior p(z). This mismatch means that the learned generative model will be unable to generate realistic data with samples from p(z). In this paper, we demonstrate that both issues stem from the fact that the global optima of the VAE training objective often correspond to undesirable solutions. Our analysis builds on two observations: (1) the generative model is unidentifiable - there exist many generative models that explain the data equally well, each with different (and potentially unwanted) properties and (2) bias in the VAE objective - the VAE objective may prefer generative models that explain the data poorly but have posteriors that are easy to approximate. We present a novel inference method, LiBI, mitigating the problems identified in our analysis. On synthetic datasets, we show that LiBI can learn generative models that capture the data distribution and inference models that better satisfy modeling assumptions when traditional methods struggle to do so.
ASRM

The application of machine learning methods to evaluate predictors of live birth in programmed thaw cycles

D Vaughan, W Pan, Y Yacoby, E Seidler, A Leung, F Doshi-Velez, and D Sakkas

Accepted @ Fertility and Sterility 2019

PDF