Sridhar Vembu, Founder and Chief Scientist of Zoho Corporation, has shared his views on a breakthrough in AI hardware that promises to slash inference costs dramatically and bring basic artificial intelligence directly into everyday consumer devices.

In a widely shared post on X (formerly Twitter), Vembu reacted to a demonstration by Taalas, a company developing specialised AI chips, which has hardwired a large language model directly into silicon. The Taalas HC1 chip reportedly delivers 16,960 tokens per second on an 8-billion-parameter Llama 3.1 model, making it roughly 10 times faster and 20 times cheaper to run than Nvidia’s flagship B200 GPU, while requiring no high-bandwidth memory (HBM) or liquid cooling.

In his appreciation, Vembu wrote, “AI model hardwired into silicon is a fantastic idea… This will dramatically cut inference cost. So much, that I believe a basic level of ‘intelligence’ can be built into every product! This is the path to machines that cook and so on (you have a ‘recipe-based code generator’ chip and the code then drives the cooking machine).”

He initially referred to the model as 3.1 billion parameters but quickly corrected it to the 8-billion-parameter Llama 3.1.

Vembu speaks on embedded AI in daily life

Vembu sees this technology as a turning point that moves AI from massive cloud GPU clusters to compact, low-power “AI appliances” that can run locally in devices. He envisions specialised, fixed-purpose models — such as a dedicated cooking assistant that understands recipes, steps, timings, and substitutions — permanently baked into cheap silicon chips that could last for years without frequent updates.

In response to concerns raised in replies about the difficulty of updating hard-wired models, Vembu noted that many real-world applications (like cooking or basic household tasks) do not require constant retraining, making the trade-off acceptable and cost-effective.

Zoho is investing in silicon capabilities

When asked whether Zoho plans to pursue similar technology in a follow-up comment, Vembu confirmed, “We are investing in silicon design capability.”

The statement marks the first public indication that the Indian software giant, which was traditionally focused on cloud-based business applications, is actively building in-house expertise in custom AI hardware design.

The Taalas demonstration has generated significant excitement in the AI community. A performance chart shared alongside the announcement shows the Taalas HC1 outperforming not only Nvidia’s H200 and B200 GPUs but also specialised inference chips from Groq, Sambanova, and Cerebras by a wide margin on tokens-per-second-per-user metrics.

Industry observers see this as part of a broader shift toward purpose-built AI accelerators that prioritise extreme efficiency and edge deployment over general-purpose GPU flexibility.