Etched recently secured $500M at a $5B valuation. Cerebras inked a massive $10B agreement with OpenAI. A new wave of NVIDIA inference competitors is scaling fast, and each is betting on a very different vision of the future. Here’s how they stack up.
Etched is going all-in on specialization. They designed a custom ASIC that does one thing only: transformer inference. There’s no general-purpose compute at all—just transformers etched directly into silicon. According to the company, a single Sohu server can replace roughly 160 H100 GPUs. Their underlying thesis is simple: transformers have already won, so why pay for flexibility you don’t need?
Cerebras took the opposite approach by maximizing scale. Their WSE-3 is a wafer-scale processor with over 900,000 cores—around 50× the core count of an H100. They’ve demonstrated throughput exceeding 2,000 tokens per second on large language models. The core idea is that GPU clusters eventually choke on networking overhead, so collapsing everything onto one enormous chip removes the interconnect bottleneck entirely. They’ve reportedly finalized a $10B OpenAI deal covering 750MW of compute and are targeting an IPO around 2026.
AMD’s strategy centers on software economics rather than radical hardware form factors. The belief here is that NVIDIA’s strongest defense is CUDA lock-in, and that open ecosystems will gradually weaken that advantage. AMD’s accelerators are competitive, and ROCm—while still incomplete—continues to mature. Major labs like OpenAI and Meta already use AMD hardware, and OpenAI reportedly took a 10% equity stake as part of a broader compute partnership. Character.AI has also announced that it serves roughly a billion daily queries using ROCm.
Tenstorrent is placing the boldest open-stack wager. The company raised $693M at a $2B pre-money valuation and claims to have closed around $150M in commercial contracts. It’s already shipping Wormhole PCIe cards and developer systems designed around fully open software tooling. Its upcoming Blackhole chip is built for Ethernet-native scale-out: 745 TFLOPS FP8, 32GB of GDDR6, and ten 400Gbps links, enabling cluster builds without specialized networking gear.
FuriosaAI is focused squarely on inference efficiency within realistic data center power limits. Reports suggest Meta offered $800M to acquire the company, which Furiosa declined. Their bet is that inference will become power-constrained faster than expected, and the winning accelerators will be those that deliver strong performance without blowing power budgets.
Timing is critical. The inference market is expanding at roughly a 38% CAGR, growing from about $106B in 2025 to an estimated $255B by 2030.
That said, NVIDIA still dominates with roughly 86–92% market share, while AMD sits near 4%. Players like Etched and Cerebras have yet to prove large-scale deployment.
Happy Investing!
Disclaimer: The information provided here is for general informational purposes only and should not be considered as professional financial or investment advice. Before making any financial decisions, including investments, it is essential to seek advice from a qualified financial advisor or professional.
Leave a comment