Published: 05/20/2025

Hey there! I recently tackled a neat challenge: pulling the transcript from a YouTube video and cleaning it up to feed into an AI like ChatGPT or Grok for a quick summary. The goal was to grab all the text from a segments-container element on YouTube’s page, strip out the noise (like timestamps and extra spaces), and copy it to the clipboard for easy pasting into an AI tool. To get there, I needed to click a couple of elements to reveal the transcript. Here’s how I pulled it off with a slick vanilla JavaScript script.

The Mission

The plan was to extract the full transcript text from a YouTube video’s segments-container element, which holds the caption data. YouTube buries this behind a couple of clicks: an element with the text “…more” to expand the description and another with aria-label="Show transcript" to display the transcript. The text needed to be clean—no newlines, numbers, colons (like timestamps), or excessive spaces—so it’s ready to paste into an AI for summarization. The script had to be robust, handling cases where elements might be missing, and it needed to run automatically when pasted into the browser console.

Why I Went with a Custom Script

When I set out to scrape YouTube transcripts, I could’ve leaned on third-party services, APIs, or something like a Greasemonkey script, but I wasn’t feeling those options. Third-party services often come with strings attached—subscriptions, rate limits, or sketchy data practices that make you wonder who’s peeking at your info. APIs, like YouTube’s official one, are powerful but require setup, authentication, and sometimes fees, which felt like overkill for a quick transcript grab. Greasemonkey scripts? They’re cool for automation, but installing a browser extension or user script can be a hassle, and you never know if some random script is sneaking in spyware or tracking junk.

Instead, I wanted something lean, self-contained, and under my control—a pure vanilla JavaScript snippet I could paste into the console and run on the spot. No dependencies, no external servers, just a few lines of code doing exactly what I need: click a couple of elements, clean the text, and copy it for AI summarization. This approach keeps things fast, transparent, and spyware-free, so I can trust what’s happening every step of the way. Plus, it’s easy to tweak if YouTube changes their layout. Total win.

How It Works

Here’s the breakdown of the script, step by step:

Step 1: Clicking the “…more” Element 🔍

First, we need to expand the video description to access the transcript option. I used document.querySelectorAll('*') to search all elements for one with the exact text “…more”. It could be a button, div, or anything else, so we keep it flexible. If found, we click it and log a confirmation. If not, we note it in the console and move on—no big deal, the transcript might still be accessible.

Step 2: Waiting for the Page to Catch Up ⏳

Clicking “…more” might trigger some dynamic content loading, so we pause for 3 seconds using setTimeout. This gives YouTube’s DOM time to update before we look for the next element.

Step 3: Clicking the “Show transcript” Element 📜

Next, we hunt for an element with aria-label="Show transcript", again using a broad selector ([aria-label="Show transcript"]) to catch any element type. If it’s there, we click it and log the action. If it’s missing, we log a warning and keep going, as the transcript might already be visible in some cases.

Step 4: Brief Pause for Transcript Loading ⏱️

After clicking “Show transcript,” we wait 1 second to ensure the transcript loads into the segments-container. This short delay helps avoid grabbing incomplete data.

Step 5: Extracting and Cleaning the Text 🧹

Now we target the segments-container element by its ID. If it’s not found, we log an error and stop. If it’s there, we use textContent to grab all the text, then clean it with regex:

Step 6: Error Handling 🛡️

If the “…more” element fails or is missing, we proceed to “Show transcript.” If that fails, we still try to grab the text. This ensures we get as far as possible, even on quirky pages.

The Code

Below is the full script. It’s an Immediately Invoked Function Expression (IIFE), so it runs the moment you paste it into your browser’s console. Just open a YouTube video, paste this in the console, and the cleaned transcript will be copied to your clipboard, ready for AI summarization.

(function processYouTubeTranscript() {
  // Timeout durations in milliseconds
  const MORE_CLICK_TIMEOUT = 3000; // Wait after "...more" click
  const TRANSCRIPT_CLICK_TIMEOUT = 1000; // Wait after "Show transcript" click

  // Reusable function to clean text from a container
  function cleanTextFromContainer(container) {
    if (!container) return "";
    return container.textContent
      .trim()
      .replace(/[\n\r0-9:]+/g, "")
      .replace(/\s+/g, " ")
      .trim();
  }

  // Click element with text "...more" if it exists
  const moreElement = Array.from(document.querySelectorAll("*")).find(
    (el) => el.textContent.trim() === "...more" && el instanceof HTMLElement
  );
  if (moreElement) {
    moreElement.click();
    console.log('Clicked "...more"');
  } else {
    console.warn('"...more" not found');
  }

  // Wait, then click element with aria-label="Show transcript"
  setTimeout(() => {
    const transcriptElement = document.querySelector('[aria-label="Show transcript"]');
    if (transcriptElement) {
      transcriptElement.click();
      console.log('Clicked "Show transcript"');
    } else {
      console.warn('"Show transcript" not found');
    }

    // Wait, then extract and log text
    setTimeout(() => {
      const container = document.getElementById("segments-container");
      const text = cleanTextFromContainer(container);
      console.log(
        text ? `Please summarize the following YouTube Transcript: ${text}` : "No text in segments-container"
      );
    }, TRANSCRIPT_CLICK_TIMEOUT);
  }, MORE_CLICK_TIMEOUT);
})();

Why It’s Solid

This script is built to handle YouTube’s quirks. It doesn’t care if the “…more” or “Show transcript” elements are buttons, links, or random divs—it’ll find and click them. If either is missing, it keeps going, ensuring you still get the transcript if it’s already visible. The try-catch blocks catch any weird errors (like a click failing due to page restrictions), and the delays (3 seconds and 1 second) give YouTube time to load the transcript.

The regex cleanup makes the text AI-ready by ditching timestamps and formatting junk, so you can paste it straight into ChatGPT or Grok for a clean summary. If the clipboard copy fails (e.g., you’re not on HTTPS), you’ll still see the text in the console to grab manually.

Tips for Use

What’s Next?

This script is a great starting point, but you could level it up. If YouTube’s loading times vary, you could replace setTimeout with a polling loop to wait for segments-container to update. Or, if you’re summarizing a bunch of videos, you could turn this into a browser extension for one-click scraping. Got ideas for more features? Drop a comment, and let’s keep the conversation going!

Thanks for reading, and happy scraping!