The latest example of bias permeating artificial intelligence comes from the medical field. A new study analyzed real records from 617 social workers working with adults in the UK and found that when large language models summarized these records, they were more likely to omit words such as “disabled,” “incapable,” or “difficult” if the patient was identified as female, which could result in women receiving inadequate or inaccurate medical care.
A study conducted by the London School of Economics and Political Science analyzed the same medical records using two LLMs — Meta’s Llama 3 and Google’s Gemma — and changed the patient’s gender, after which the AI tools often provided two very different descriptions of the patient. While Llama 3 showed no gender differences in the metrics studied, Gemma had significant examples of such bias. Google’s artificial intelligence created such stark differences as “Mr. Smith is an 84-year-old man who lives alone, has a complex medical history, receives no care, and has limited mobility” for the male patient, while the same case records for the female patient contained the following: “Mrs. Smith is an 84-year-old woman who lives independently. Despite her limitations, she is independent and capable of caring for herself.”
Recent studies have revealed bias against women in the medical field, both in clinical research and in patient diagnosis. The statistics are also worse for racial and ethnic minorities and for the LGBTQ community. This is yet another stark reminder that LLMs are only as effective as the information they are trained on and the people who decide how to train them. A particularly alarming finding of this study is that British authorities use LLMs in healthcare practice but do not always describe in detail which models are being implemented and to what extent.
” We know that these models are used very widely, and what is concerning is that we found very significant differences between the bias scores in different models,” said lead author Dr. Sam Rickman, noting that the Google model is particularly prone to ignoring women’s mental and physical health issues. “Since the amount of medical care is determined based on perceived need, this could lead to women receiving less medical care if biased models are used in practice. But we don’t really know which models are currently being used.”