It is important to be aware of the limitations of the technology, so it is better to go to experts and share feedback / Pixabe
The artificial intelligence chatbot developed by OpenAI has become a popular source for quick answers to all kinds of questions. Also, in a maintenance tool, depending on the application and objectives. Researchers at Brigham and Women’s Hospital sought to evaluate the consistency of ChatGPT’s delivery of cancer treatment recommendations. The findings got experts, users of the tool and its programmers thinking.
Results of the committee, published JAMA Oncology, show that in one-third of cases, ChatGPT provided an inappropriate (or „inappropriate”) recommendation. This balance should alert us to the need to take into account the limitations of technology before making decisions.
„Patients should feel free to learn about their medical conditions, but always talk to their doctor. Don’t take resources on the Internet at face value,” said Daniel Bitterman, a radiation oncologist, researcher and instructor at Harvard Medical School.
The author of the study commented that ChatGPT responses sound more human-like and more convincing. „But when it comes to clinical decision-making, there are many nuances in each patient’s unique situation. A correct answer may have many nuances that ChatGPT or another language model cannot necessarily consider,” he said.
Many factors can influence clinical decision making. Bitterman and his colleagues chose to evaluate the extent to which ChatGPT’s recommendations are consistent with the National Comprehensive Cancer Network (NCCN) guidelines used by US physicians. They focused on the three most common cancers (breast, prostate and lung) and asked ChatGPT to provide a treatment approach for each cancer, depending on the severity of the disease.
ChatGPT Cancer Recommendations
Likewise, researchers from Mass General Brigham, Sloan Kettering and Boston Children’s Hospital tested ChatGPT by collecting 104 different symptoms and asking the chatbot for treatment recommendations. Harvard Gazettes Almost all responses (98%) included at least one treatment approach consistent with NCCN guidelines. However, the researchers found that 34% of these responses included one or more inconsistent and difficult-to-find recommendations.
Inconsistent treatment recommendation was defined as partially correct. For example, for locally advanced breast cancer, only surgery is recommended without specifying another method of treatment. Notably, complete agreement occurred in only 62% of cases, underscoring both the complexity of the NCCN guidelines and the extent to which ChatGPT results are ambiguous or difficult to interpret.
The researchers determined that nearly 13% of the answers were „hallucinatory”—things that seemed real, but were completely false or not immediately relevant. This type of misinformation can misplace patients’ expectations about treatment and affect the doctor-patient relationship.
„This is very concerning. They can lead to misinformation and potentially harmful decisions for patients,” said Harvey Castro, an emergency physician and artificial intelligence expert in Coppell, Texas. „For example, a patient with advanced lung cancer may receive a prescription for a treatment that is not approved by NCCN guidelines. „This can lead to delays in getting the right care,” he said.
Is the information correct or incorrect?
Daniel Bitterman argues that „ChatGPT and many other large language models are trained primarily to act as chatbots, not to provide objectively correct and reliable information.”
„The model speaks fluently and imitates human language, but it doesn’t distinguish correct information from incorrect information,” he noted.
He also said that while reading the responses, he was surprised at how well the right treatment options were mixed with the wrong ones. However, the fact that nearly all responses contained valid information shows the models’ future ability to communicate information in conjunction with clinician input.
A major limitation of the study is that the researchers evaluated only one LL.M. at a „snapshot in time.” They believe the findings highlight legitimate concerns and the need for future research. ChatGPT 3.5 was used for this study, but OpenAI released a new model, GPT 4, after the research was completed.
Read more Cambio16.com:
Thanks for reading Cambio16. Your subscription will not only provide accurate and truthful news, but also contribute to the renaissance of journalism in Spain for the transformation of conscience and society through personal development, freedom, democracy, social justice, environment and biodiversity protection.
As our operating income comes under great pressure, your support will help us do the important work we do. If you can, support Cambio16. Thanks for your contribution!