What is Inference - Bull Briefing

That sounds like a great topic for your blog! This CNBC interview with Ray Wang, chairman of Constellation Research, provides several high-level insights into the 2026 tech landscape and the evolution of the “AI trade.”

Here is a breakdown of the key themes from the video to help with your research:

1. Top Stock Picks for 2026

Ray Wang identified several companies he expects to outperform the NASDAQ this year, categorizing them based on their current momentum:

Continued Winners: Nvidia, Google, AMD, Intel, and Palantir are expected to maintain their growth momentum [01:21].
Potential Turnarounds: He noted that companies like Apple, Meta, and Oracle—which lagged in 2025—have the potential to “survive and make it” in the current market [01:10].

2. Nvidia’s Shift: From Training to Inference

A major reason for continued confidence in Nvidia is their strategic acquisition of Grok [01:41].

The Strategy: Wang explains that while the market has been focused on “training” AI models, the focus is now shifting toward inference (running the models).
TPU Market: The acquisition allows Nvidia to enter the TPU (Tensor Processing Unit) market, which is critical for inference [01:54].

3. The Rise of “AI Exponentials”

One of the most blog-worthy segments is the discussion of “Tiny Teams”—companies that achieve massive scale with very few employees [00:46].

Efficiency: He highlights companies with as few as 10–20 employees generating $200 million in Annual Recurring Revenue (ARR) [03:25].
Market Disruption: Wang predicts that in the next three years, a 1,000-person company could potentially take down a giant like Accenture, and a 100-person company could disrupt major SaaS winners [04:08].
Examples: He mentions Midjourney, Lovable, and Cursor as examples of these high-impact, low-headcount organizations [03:36].

4. Google’s Vertical Integration

Wang argues that Google holds a unique advantage because it is “vertically integrated” [02:30].

They control the entire stack: from the TPU chips to the LLM (Gemini 3) and the final agent level [02:40].
This integration, combined with their vast data sources, allows them to continuously improve their models at a scale others might struggle to match [02:45].

5. Oracle’s “NeoCloud” Ambitions

Despite some skepticism, Wang believes Oracle is a “NeoCloud” winner [04:54].

Cost Advantage: He claims some clients are paying one-fifth of what they would pay at AWS or Microsoft by switching to Oracle [05:12].
Key Partnerships: Oracle’s relationship with TikTok and its standing with the current administration are cited as major tailwinds [05:01].

AI 분야에서 **인퍼런스(Inference)**는 우리말로 **’추론’**이라고 합니다.

쉽게 설명하자면, 인공지능이 공부를 마치고 실전에 투입되어 ‘결과물을 내놓는 과정’ 그 자체를 의미합니다. 레이 왕이 영상에서 “인퍼런스가 뜰 것이다”라고 말한 맥락을 이해하기 위해 다음 두 단계를 비교해 보면 쉽습니다.

1. 학습 (Training) vs 추론 (Inference)

학습 (Training): 학생이 시험 공부를 하는 것과 같습니다. 엄청난 양의 데이터를 인공지능에게 주입해서 “개와 고양이를 구분하는 법”이나 “말하는 법”을 가르치는 과정입니다. 이때는 엄청난 계산 능력이 필요해서 엔비디아의 고성능 GPU가 필수적입니다.
추론 (Inference): 공부를 마친 인공지능이 실제로 시험 문제를 푸는 단계입니다. 우리가 챗GPT에 질문을 던졌을 때 답을 생성하거나, 사진을 올렸을 때 누구인지 알아맞히는 순간이 바로 ‘추론’입니다.

2. 레이 왕이 “인퍼런스가 중요하다”고 한 이유

그동안은 전 세계 빅테크 기업들이 AI 모델을 만드는 데(학습) 집중했기 때문에 학습용 칩 수요가 폭발했습니다. 하지만 이제는 만들어진 모델을 바탕으로 **수많은 서비스와 앱이 쏟아져 나오는 단계(실제 사용)**로 넘어가고 있습니다.

사용량의 폭증: 모델은 한 번 학습시키면 끝이지만, 전 세계 수억 명의 사용자가 매일 AI 서비스를 쓰면 ‘추론’은 24시간 내내 발생합니다.
효율성 중심: 추론 단계에서는 무조건 비싼 칩보다, 전력을 적게 먹으면서도 빠르게 답을 내놓는 TPU(텐서 처리 장치) 같은 특화된 칩이 더 중요해집니다.