A recent study published in JAMA Network Open investigated the accuracy and reliability of nutrition information provided by two versions of Chat Generative Pre-trained Transformer (ChatGPT) chatbots.
Their findings indicate that while chatbots cannot take the place of nutritionists, they can improve communication between health professionals and patients if they are refined and strengthened further.
Study: Consistency and Accuracy of Artificial Intelligence for Providing Nutritional Information. Image Credit: Iryna Imago/Shutterstock.com
Background
Many people today depend on the internet to access health, medicine, food, and nutrition information. However, studies have indicated that nearly half of the nutrition information online is low quality or inaccurate.
Artificial intelligence (AI) chatbots have the potential to streamline how users navigate the vast array of publicly available scientific knowledge by providing conversational, easy-to-understand explanations of complex topics.
Previous research has evaluated how well chatbots can disseminate medical information, but their reliability in providing nutrition information remains relatively unexplored.
About the study
In this cross-sectional study, researchers followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. They assessed the accuracy of the information that ChatGPT-3.5 and ChatGPT-4 provided on macronutrients (proteins, carbohydrates, and fats) and energy content of 222 foods in two languages – Traditional Chinese and English.
They provided a prompt that asked the chatbot to generate a table containing the nutritional profile of each food in its uncooked form. This search was conducted in September-October 2023.
Each search was conducted five times to assess consistency; the coefficient of variation (CV) was calculated across these five measurements for each food.
The accuracy of the chatbot’s responses was judged by cross-referencing its reactions with the recommendations of nutritionists according to the food composition database maintained by the Food and Drug Administration of Taiwan.
A response was considered accurate if the chatbot’s estimate of energy (in kilocalories) or macronutrients (in grams) was within 10% to 20% of that provided by the nutritionists.
The researchers also calculated whether the chatbots’ responses significantly differed from the nutritionists’ recommendations and between the two versions of ChatGPT.
Findings
There were no significant differences between the estimates provided by the chatbots and nutritionists regarding the fat, carbohydrate, and energy levels of eight menus for adults. However, the researchers found that protein estimations varied significantly. The chatbot responses were considered accurate for energy content in 35-48% of the 222 included foods and had a CV lower than 10%. ChatGPT-4, the more recent version, performed better than ChatGPT-3.5 overall but tended to overestimate protein levels.
Conclusions
The study shows that chatbot responses compare well with nutritionists’ recommendations in certain respects but can overestimate protein levels and also show high levels of inaccuracy.
As they become widely available, they have the potential to be a convenient tool for people who wish to look up macronutrient and energy information about common foods and do not know which resources to consult.
However, the authors stress that chatbots are not a replacement for nutritionists; they can improve communication between patients and public health professionals by providing additional resources and simplifying complex medical language in conversational, easy-to-follow terms.
They also note that the foods they included in the search may not be frequently consumed, which has implications for the relevance of their findings.
AI chatbots cannot provide users with personalized dietary advice or precise portion sizes, nor can they generate specific dietary and nutrition-related guidelines. Moreover, chatbots may be unable to tailor their responses to the region where the user resides.
Portion sizes and consumption units differ greatly from country to country, as well as by the type of food and how it is prepared. Chatbots cannot factor in crucial cultural and geographic differences or provide the relevant household units for each consumer.
Arguably, the most important limitation is that ChatGPT is a general-purpose chatbot – not one trained specifically on dietetics and nutrition.
The cutoff for the training dataset was September 2021, so more recent research would not have been included. Users must not mistake chatbots for search engines, as their responses are a product of their training datasets as well as the wording of the prompts.
However, considering the immense popularity of chatbots and other forms of generative AI, future products will overcome these limitations and provide increasingly accurate, updated, relevant, and practical information on diet and nutrition.
Journal reference:
-
Chen, Y.C., Ho, D.K.N.H., Chiu, W., Cheah, K., Mayasari, N.R., Chang, J. (2023) Consistency and accuracy of artificial intelligence for providing nutritional information. Hoang, Y.N., JAMA Network Open. doi:10.1001/jamanetworkopen.2023.50367. https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2813295