In tests of 20 open-source and proprietary large language models, the software was more often tricked by mistakes in realistic-looking doctors’ discharge notes than by mistakes in social media conversations, researchers reported in The Lancet Digital Health.

