Google Research and DeepMind have launched MedPaLM, an open-sourced large language model platform that is geared toward the medical domain.
According to Interesting Engineering, “It is meant to generate safe and helpful answers in the medical field. It combines HealthSearchQA, a new free-response dataset of medical questions sought online, with six existing open-question answering datasets covering professional medical exams, research, and consumer queries.
MedPaLM addresses multiple-choice questions and questions posed by medical professionals and non-professionals through the delivery of various datasets. These datasets come from MedQA, MedMCQA, PubMedQA, LiveQA, MedicationQA, and MMLU. A new dataset of curated, frequently searched medical inquiries called HealthSearchQA was also added to improve MultiMedQA.
The HealthsearchQA dataset consists of 3375 frequently asked consumer questions. It was collected by using seed medical diagnoses and their related symptoms. This model was developed on PaLM, a 540 billion parameter LLM, and its instruction-tuned variation Flan-PaLM to evaluate LLMs using MultiMedQA.
Med-PaLM currently claims to perform particularly well especially compared to Flan-PaLM. It still, however, needs to outperform a human medical expert’s judgment. Up to now, a group of healthcare professionals determined that 92.6 percent of the Med-PaLM responses were on par with clinician-generated answers (92.9 percent).
This is surprising as only 61.9 percent of the long-form Flan-PaLM answers were deemed to be in line with doctor assessments. Meanwhile, only 5.8 percent of Med-PaLM answers were deemed to potentially contribute to negative consequences, compared to 6.5 percent of clinician-generated answers and 29.7 percent of Flan-PaLM answers. This means that Med-PaLM replies are much safer...
This isn’t the first time Google ventured into AI-based healthcare. In May of 2019, Google joined up with medical researchers to train its deep learning AI to detect lung cancer in CT scans, performing as well as or better than trained radiologists, achieving just over 94 percent accuracy.
In May of 2021, Google rolled out a diagnostic AI for skin conditions on smartphones, which would allow every smartphone owner to have an idea of what their diagnosis might be. The app did not replace the role of a professional dermatologist, but it was a significant step forward for the field of AI healthcare.”
According to a physician blog published on Medium, “While the MedPaLM model performance was impressive and certainly superior to other NLP models investigated to date, it was still inferior to clinicians, particularly in incorrect retrieval of information (16.9% for MedPaLM vs 3.6% for human clinicians), evidence of incorrect reasoning (10.1% vs 2.1%) and inappropriate/incorrect content of responses (18.7% vs. 1.4%).
Bottomline, a huge step forward both in moving the needle towards a viable LLM that can be used for clinical knowledge, as well as in establishing frameworks to evaluate such models.
The high bar of safety that will be expected of such models before they can be used in practice, along with the fact that we need to investigate and shed light on bias and fairness in their functioning, means more work needs to be done. However, health-data trained LLMs, such as MedPaLM and the amusingly-named GatorTron, are paving the way.”
No comments:
Post a Comment