In a development that is sending ripples through the academic world, OpenAI’s latest AI model, “ChatGPT o3,” has achieved a near-perfect score on the notoriously challenging JEE Advanced 2025 examination. The Joint Entrance Examination (JEE) Advanced is India’s most competitive university entrance exam, a gateway to the prestigious Indian Institutes of Technology (IITs).
Anushka Aashvi, an IIT Kharagpur engineer, spearheaded the groundbreaking experiment, initially embarking on it as a casual inquiry. However, the results were anything but ordinary. ChatGPT o3 scored an astounding 327 out of a possible 360 marks, a feat that would have secured it an All India Rank (AIR) of 4 in the actual exam.
Aashvi meticulously recreated realistic test conditions for the AI. In her blog, “Heltar,” she detailed how the model was prompted to “act like a JEE aspirant” and solve each question independently, without recourse to web searches or external Python tools. To eliminate any memory bias, each question was presented in a fresh chat session, and no corrections or hints were provided during the process.
Despite these stringent restrictions, ChatGPT o3 demonstrated remarkable proficiency. The AI notably achieved perfect 60s in both Chemistry and Mathematics during the second phase of the simulated exam, dropping only a few marks in Physics and earlier sections.
This unprecedented performance by an AI chatbot on such a high-stakes, human-centric examination underscores the rapidly evolving capabilities of artificial intelligence and prompts crucial discussions about its potential impact on education, competitive assessments, and the very definition of “intelligence.”
Meanwhile, a separate investigation led by Apple researchers sheds light on the limitations of leading AI systems such as ChatGPT o3, Claude, and DeepSeek. Despite producing confident, articulate responses, these models often falter under the weight of genuinely difficult tasks.
In a newly published research paper titled The Illusion of Thinking, Apple’s team argues that even the most advanced language models today might not engage in true reasoning as widely assumed. Their study reveals that, although these models can simulate intelligence convincingly, their capabilities tend to break down significantly when confronted with deeply complex challenges.
