ChatGPT maybe getting worse at coding, other tasks over time: Study

The study evaluated ChatGPT on four tasks solving math problems, answering sensitive/dangerous questions, generating code, and visual reasoning.

July 20, 2023 13:14 IST

ChatGPT maybe getting worse at coding, other tasks over time: Study

These tasks were chosen to represent “diverse and useful capabilities” of ChatGPT-like LLMs l Image from Bloomberg

A new study published by researchers from Stanford University and University of California, Berkeley, suggests that popular large language model (LLM) ChatGPT, may be getting worse at coding. The study, which evaluated the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four tasks, found that the model’s performance on code generation tasks had declined significantly over the three-month period.

The study evaluated ChatGPT on four tasks solving math problems, answering sensitive/dangerous questions, generating code, and visual reasoning. These tasks were chosen to represent “diverse and useful capabilities” of ChatGPT-like LLMs. The researchers found that ChatGPT’s performance on the code generation task declined significantly from March 2023 to June 2023.

ALSO READ l ChatGPT becomes fitness coach, helps man lose 11kgs in 3 months with its self-made diet plan

According to the report, during March 2023, the performance of GPT-4 in identifying prime numbers was quite good with an accuracy of 97.6%. However, by June 2023, GPT-4’s accuracy dropped for the same to 2.4%. On the other hand, during the same period, GPT-3.5 showed a notable improvement in its ability to identify prime numbers. Another intriguing observation is the change in behaviour when it comes to answering sensitive questions. In June 2023, both GPT-4 and GPT-3.5 were less willing to respond to sensitive queries compared to their behaviour in March 2023. Additionally, there were notable increases in formatting mistakes in code generation for both GPT-4 and GPT-3.5 in June 2023 compared to their performance in March 2023. The research found that the codes were “more verbose and less directly executable.”

While this study echoes the thoughts of many ChatGPT users who feel ChatGPT over the time has declined performance, OpenAI has denied these claims saying that it makes each new version better than the previous one.

“No, we haven’t made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one. Current hypothesis: When you use it more heavily, you start noticing issues you didn’t see before,” read a recent tweet from OpenAI VP of Product Peter Welinder.

Follow FE Tech Bytes on Twitter, Instagram, LinkedIn, Facebook.