Report

Semiconductor Series Part III: Chip Designers and Hyperscalers

Apr 18, 2025

Report

Semiconductor Series Part III: Chip Designers and Hyperscalers

Apr 18, 2025

Semiconductor Series Part III: Chip Designers and Hyperscalers

The final part of the semiconductor series, Part III, is going to be about the designers behind the world’s advanced microchip ecosystem and their hyperscaler customers. Some notable chip designers are: Intel, AMD, Apple, Qualcomm, and, of course, Nvidia just to name a few. The engineers at these companies use software from EDA providers like Synopsys and Cadence to design the chips that go into smartphones, computers, and datacenters. On the datacenter side, hyperscalers like Microsoft, Amazon, Google, and Facebook cumulatively invest hundreds of billions of dollars of capex every year into outfitting their datacenters with the latest CPUs and GPUs.

Most of the world’s chips are designed by so-called fabless chip companies. Companies like Nvidia, AMD, and Qualcomm don’t own or operate the factories that produce their products. This allows them to focus solely on designing the fastest and most efficient chips while also letting them remain asset-light. As such, companies like Nvidia sports an eye watering-ly low debt/equity ratio of just 13%. On the other hand, IDMs like Intel have more of a dual track mind, where their survival relies on their ability to maintain market leadership in both foundry and design capabilities. A company like Intel must always bet on itself and take on substantial risk to finance new hundred billion dollar CapEx initiatives, while also designing attractive products to keep production lines running at full capacity.

The rise of the fabless, and the death of Intel

10 years ago, AMD and Nvidia were far from household names, but now both companies dwarf the historical chip king Intel in market cap. The rise of these two companies is equally a tale of disruptive innovation but also Intel’s C-suite hubris. The downfall of Intel can be attributed to many different mistakes, but perhaps the biggest misstep the company took was their aversion to EUV lithography. TSMC had already made the costly transition to EUV technology on their 7nm node, and while the cost of these new EUV machines was in the low 9 figure range, it afforded Intel-rival AMD a technological advantage over them for the first time in decades. Intel was averse to EUV adoption because the company believed it could save money and extend the useful life of their DUV machines.

Former Intel CEO Bob Swan stated the company’s technological woes were “somewhat a function of what we've been able to do in the past, which in essence was defying the odds. At a time when it was getting harder and harder, we set a more and more aggressive goal. From that, it just took us longer.”

Upper management believed that its past history of innovation was an indicator of its future capabilities, and that easy node innovations would always be there. This was a false pretense, and it was one that killed the company. Intel, whose design team could only use their internal fabrication plants, were structurally handicapped. Intel was first to 14nm by a full year, yet by the time the company got to their Intel 7 node, they were already two years behind TSMC. The difference? TSMC had fully adopted EUV by this time, while Intel-out of pure hubris- still thought that they could continue to print smaller and smaller features while maintaining acceptable yield levels using DUV multipatterning.

The x86 chip market is a commoditized duopoly between Intel and AMD. Both companies’ products are functionally equivalent, and thus each company must distinguish themselves by innovating on the raw performance of their chips. This makes for extremely low switching costs and enables customers to use Intel for one datacenter, but then migrate to a more performant AMD system for the next. As such, AMD, whose talented designers could use TSMC’s superior technology, began to win back share in the PC and datacenter CPU market.

Nvidia’s Rise

Nvidia benefits from a fabless business model, but primarily, it operates within a different market segment than Intel. When a company chooses to buy an AMD CPU, it is a direct loss for Intel. The CPU market is a zero sum game, where TAM growth lifts all parties. On the other hand, Nvidia’s primary market is the GPU market. GPUs were mostly used in consumer gaming PCs, professional creative work, and the occasional supercomputer. An Nvidia GPU was always the go-to choice for machine learning workloads as well, but this TAM was relatively small until ChatGPT hit the market. This was one of the biggest “a-ha!” moments for the market in years and is why the phrase “XYZ is having its ChatGPT Moment” was coined. The product thoroughly impressed the public with its natural language capabilities and showed investors that AI R&D payoff could come far sooner than expected.

All of a sudden, Nvidia became the picks and shovels of the AI boom. Transformer scaling laws dictate that the more compute and the more data you throw at a model, the better it gets; and as the primary supplier of compute, they held the gateway to greater model performance. As the ones who had a first-mover advantage in machine learning software infrastructure, in addition to their developer ecosystem network effect moat, Nvidia was in prime position to be the sole supplier of compute for all datacenters in the foreseeable future. This is exactly how it played out. Since 2022, Nvidia, a hardware company, has increased gross profits by more than 500%. In 2023, alone, the company more than doubled its revenue solely on the back of price increases, which of course, went straight to the bottom line. In CY23, Nvidia had an 85% incremental EBITDA margin. These margins are absolutely unheard of for any company selling a physical product, and they’re akin to post operating-leverage realization SaaS levels of profitability.

Hyperscalers:

The big 3 hyperscalers in the US are Amazon, Google, and Microsoft. These companies operate huge cloud businesses where they rent out access to compute. Beyond these 3, there are other large hyperscalers players like Meta, Oracle, Tesla, Apple, and Alibaba, who are also chomping at the bit to buy more Nvidia GPUs. These hyperscalers all have unique and different strategies for the AI game.

Microsoft

Microsoft, for example, has a tight partnership with OpenAI, and although they do have an internal AI research team, the company relies heavily on OpenAI’s state of the art models for use in their own products like Copilot. Microsoft’s strength is in distribution. They have billions of devices on the windows platform, and billions more using services that run on Azure. The company can’t justify investing tens of billions of dollars and shifting corporate focus onto internal AI research because they view that space as inevitably commoditized. Instead, a partnership with OpenAI gives them access to state of the art models, while reducing the financial and executional risk of an internal effort. The company has made slight efforts to develop its own AI accelerators, but their datacenters remain filled with an overwhelming majority of Nvidia GPUs.

Google

Google, on the other hand, has gone for a more vertically integrated approach. Google has historically been a leader of AI, and always invested heavily in AI development with projects like Waymo and DeepMind. Google is unique in that it has had an ongoing Tensor Processing Unit (TPU) development project since 2015. These TPUs have been designed from the ground up for doing the simple matrix multiplication math required for AI, and are a direct substitute for Nvidia’s pricey GPUs. That said, many of these AI accelerator projects are difficult to pull off. The actual hardware itself can be optimized for those simple matrix equations, but it has proven difficult to port over existing machine learning libraries, and make cutting edge training & inference work reliably at scale. This is where companies like Cerebras, Groq, Etched, and dare I say AMD find themselves right now. They have great hardware on paper that can do machine learning workloads faster than an Nvidia chip, but it has been difficult to commercialize the product given the lack of robust software support. Google is the exception in this; they invested in TPU early, and it has proven to be capable of state of the art model training today. Not only do they have their chip situation sorted, we also cannot ignore the brilliant AI researchers that work at Google. Their natural language models like Gemini are nothing to scoff at, and the company’s image and video models, Imagen 3 and Veo 2, are head and shoulders above anything else on the market. Lastly, in addition to internally developed chips and some of the best AI talent, Google also has the distribution network to back it up. They have billions of devices running Android and billions of devices that run services hosted on Google Cloud, analogous to Microsoft’s distribution capabilities.

Amazon

While Microsoft has gone for the horizontal partnership approach and Google has gone for the vertical integration approach, Amazon has found themselves somewhere in the middle. Amazon’s internal AI research team isn’t particularly strong, but it does have a formidable chip design team, and another very strong distribution network. Amazon’s internal chip design team has been making great strides, and their recent Trainium 2 chip has been able to pick up certain design wins. Apple is evaluating Trainium 2 for use, and Anthropic is due to receive a supercluster of 400k Tranium 2’s by the end of the year. This would make Amazon the 4th company whose AI accelerators will be deployed at scale: the first being Nvidia, the second being AMD, and the third being Google. Amazon’s partnership with Anthropic also runs quite deep; they’ve been stuffing Tranium 2’s down Anthropic’s throat- forcing the company to train and inference their future models on Amazon’s own chips. This will help create a better field tested software ecosystem for Amazon’s AI accelerator platform and will serve as an on ramp towards greater commercialization of its internally developed chips in the future. Amazon’s other strength is their distribution network. AWS is the largest public cloud provider, estimates point to a 10 percentage point market share lead on Azure. These clients, whose applications are stuck in AWS, will necessarily have to run their new AI enabled apps on AWS as well. You also cannot forget about Amazon's physical distribution capabilities. Their massive warehouse and shipping infrastructure around the world position Amazon as a large benefactor of physical AI products like humanoid robots and autonomous vehicles.