Jonathan Ross, the CEO of AI chip startup Groq, has pointed to two fundamental issues that he believes are holding back OpenAI’s ChatGPT and similar Large Language Models (LLMs) from reaching their full potential — latency and the high cost of compute. Speaking at a recent industry event, the former Google engineer, who was instrumental in designing the original Tensor Processing Unit (TPU), argued that while AI models have become incredibly intelligent, the physical infrastructure supporting them is struggling to keep pace with human expectations.
AI chatbot’s ‘speed gap’ and latency Issues
According to Ross, the first major hurdle for ChatGPT is latency, or the delay between a user’s prompt and the AI’s response. He noted that for AI to become a truly seamless part of human workflow, it needs to operate at the speed of thought.
“The problem with current models is that they are still too slow for real-time interaction,” Ross explained. He suggested that the “typing” effect seen in ChatGPT is often a way to mask the time the system takes to process data. Ross contends that until AI can respond instantaneously, it will remain a tool rather than a fluid collaborator. This is where Groq’s own Language Processing Units (LPUs) aim to disrupt the market, claiming speeds significantly faster than traditional GPUs.
Sustainability of compute costs
The second problem that Ross highlighted is the sheer economic and environmental cost of running these models. As LLMs scale, the amount of power and hardware required to generate every single word increases exponentially.
Ross warned that the current trajectory of AI development is “economically unsustainable” for most companies. He argued that the industry is currently over-reliant on general-purpose chips (GPUs) that aren’t optimised for the specific “inference” tasks that AI models perform millions of times a day. Without a shift toward more efficient, specialised hardware, Ross believes the high “cost-per-query” will eventually limit the accessibility and democratisation of advanced AI tools.
Competitive hardware landscape
Ross’s critique comes at a time when Groq is positioning itself as a formidable challenger to Nvidia’s dominance in the AI hardware space. By focusing specifically on the speed of inference—how fast a model “thinks” once it is trained—Ross hopes to solve the very problems he has diagnosed in ChatGPT.
While OpenAI continues to refine the “brains” of its AI with newer models of GPT, Ross’s comments remind us that the body—the underlying hardware—may be just as critical in the race for AI supremacy.
