Large Language Models

Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

This work evaluates how LLMs handle mental health crises, introducing a unified taxonomy, benchmark dataset, and expert-based evaluation protocol — revealing both support capabilities and significant safety risks.