TL;DR

  • Large language models (or LLMs, as we like to call them) like ChatGPT can’t “see” most video embeds because they ignore JavaScript, iframes, and images.
  • YouTube is blocked from ChatGPT’s browsing tools.
  • LLMs use two data sources: pretraining (a frozen-in-time snapshot of the web) and live browsing. 
  • Transcripts expose video content to LLMs and make them discoverable.
  • Marketers should balance SEO and LLM optimization when planning 2026 content strategies.
  • The fix: Make videos machine-readable and without killing the user experience (UX).

The New Search Reality

Sam Altman, the guy behind ChatGPT (AKA OpenAI), said that they’ve reached 800M weekly active users

This is pretty damn impressive if you ask me, especially since Google’s Gemini only has half those users, which is surprising since Gemini is in Google Workplace, Pixel devices, and so much more… Maybe Apple will give them a boost by using Google’s AI model to run Siri

Returning to the numbers, having close to 1 billion active users is pretty remarkable, and something that wasn’t conceivable just a few years ago. 

What is this telling us? User behaviour and search is shifting, and it raises a critical question: when an LLM searches the web to answer those prompts, does it see your best content?

What LLMs Actually See

When you ask ChatGPT a question, the model first checks its internal memory: Do I already know this? 

If so, it answers based on its pretraining data (a frozen snapshot of the internet). If not, it launches a live search.

That search isn’t like what a human does in a browser. GPT isn’t just Googling and clicking everywhere. 

Instead, it’s a text-only sweep of pages returned by search providers (mostly Bing, cause you know Microsoft holds a 27% stake in OpenAI). LLMs retrieve base HTML, no JavaScript, no images, no CSS, no iframes. Just the raw text.

That’s why most embedded videos, which rely on JavaScript or iframes, are invisible to AI search. What the model sees is just the embed code, not the actual content inside.

YouTube to Boost Visibility

What about YouTube? It’s free and the second-largest search engine in the world. Problem fixed! Right!? RIGHT!???

Well, not so fast. Who owns YouTube again? Google. 

What happens when you ask ChatGPT to search for a video on YouTube? Go ahead, try it! 

Or if you feel a bit lazy, here’s a screenshot for you:

ChatGPT, OpenAI’s flagship LLM, is blocked from directly accessing YouTube. When it attempts to do so, it returns an error. At best, the model guesses from titles and descriptions. At worst, it makes up a summary.

So if your 2026 content plan revolves around YouTube as your AI-friendly distribution channel, think again.

Transcripts to the Rescue

The fix is surprisingly simple: expose the transcript.

Pages using “LLM-friendly” embeds, where the transcript is quietly included in the HTML, are fully readable to ChatGPT. The same page with a standard video embed looks like a black box.

It’s not about gaming algorithms. It’s about accessibility, for machines. 

This is similar to how alt text makes images accessible to screen readers; transcripts make videos accessible to LLMs. 

But the goal isn’t to plaster transcripts all over your site. The smarter approach is to embed them behind the scenes, so that users still see a clean layout while LLMs receive the context they need. 

At Coveo, we do a bit of both. On a page that is specifically built for video consumption, such as our demo video about Coveo Relevance Generative Answering or our Agentic Masterclass, we load the transcript in HTML, and then format everything with JavaScript to offer a good UX (proper formatting, auto-scrolling, and enabling interactive features).

For regular pages, such as our Commerce webpage, we use an LLM-friendly embed code that preloads the transcript text behind the video. The user never sees it, but ChatGPT does.

Pretraining vs Browsing

AI models learn in two phases: 

  1. Pretraining: A frozen snapshot of the web, which runs monthly and captures text from across the internet. The data eventually becomes the inherent knowledge of future models. 
  2. Browsing: Real-time searches for information the model doesn’t already know.

This means LLM optimization has both short and long-term payoffs. In the short term, exposed transcripts help models see your content when browsing. In the long term, they’ll become part of the datasets future models train on.

So yes, a transcript might eventually become inherent knowledge, but only if it’s visible in the first place.

How This Changes My 2026 Plan

Here’s how I’m rethinking our content and search strategy for 2026:

  1. Expose transcripts behind video embeds. Make sure every page with a major video also includes readable text for LLMs
  2. Audit JavaScript-heavy content. Identify which pages render key content dynamically and consider server-side fallbacks.
  3. Diversify beyond YouTube. Keep YouTube for reach, but don’t depend on it for discoverability in AI search (other than maybe Gemini).
  4. Measure new referral patterns. Start tracking visits and conversions attributed to AI search assistants.
  5. Revisit metadata. Treat structured, readable text as the connecting element between traditional SEO and LLM visibility. 

This isn’t about chasing algos; it’s about being findable and understandable in the next era of search.