From "Transcribe 123 Videos" to a Self-Hosted AI Pipeline in One Session

The ask

I’ve been curating a TikTok collection of AI and Claude Code content. Tips, tutorials, and sharp takes from creators across the space. 123 videos. I wanted them all transcribed, summarized, and filed into a structured knowledge base I could search and build on.

Simple enough. Download audio, run Whisper, summarize with an LLM, organize by topic.

What followed was an hour of real infrastructure problem-solving. Not the kind you plan in advance. The kind that happens when you commit to “all 123” and figure it out as you go.

Starting point

The starting point was a fork of grandamenium/short-form-video-transcriber, a Python project for scraping TikTok metadata (via yt-dlp), downloading audio, running Whisper, and organizing output. It worked for single videos through Claude Code’s slash commands. But 123 videos needed something more automated.

The collection URL went into yt-dlp’s --flat-playlist mode to extract all video IDs. Cross-referenced against existing transcripts. 123 new, zero overlap. This collection contained entirely different creators from what I’d previously scraped.

Attempt 1: Anthropic API

The pipeline’s summarizer used the Claude API. First problem: no API key in the project. Found it in the Anthropic Console, added it to my dotfiles and GitHub secrets.

Wrote a batch processing script that kept Whisper loaded in memory (avoiding model reload per video), processed sequentially, and committed results every 10 videos for checkpointing:

def main():
    # Load Whisper once
    whisper_model = whisper.load_model(args.whisper_model)

    for i, video in enumerate(new_videos):
        audio_path = download_audio(video["url"], video["id"])
        transcript = transcribe_audio(whisper_model, audio_path)
        save_transcript(video, transcript)

        raw = llm_generate(args.llm, video, transcript)
        topic, filename, title, summary_body = parse_summary(raw, video)
        save_summary(video, topic, filename, title, summary_body, transcript)

        # Checkpoint every N videos
        if batch_count >= args.commit_every:
            git_commit_and_push(f"Batch: {processed}/{total}")

Set up a GitHub Actions workflow to run it unattended. Then the credits ran out.

Pivot: local LLM

I have a home server running Ollama with three models: qwen2.5:7b, llama3.1:8b, and mistral. Same machine already runs n8n and other self-hosted infrastructure.

Added Ollama as an LLM backend. One function, switchable via CLI flag:

def llm_generate(llm_backend, video, transcript):
    if llm_backend == "ollama":
        resp = requests.post(f"{host}/api/chat", json={
            "model": model,
            "stream": False,
            "messages": [
                {"role": "system", "content": SUMMARIZE_PROMPT},
                {"role": "user", "content": user_content},
            ],
        }, timeout=120)
        return resp.json()["message"]["content"]

    elif llm_backend == "anthropic":
        # Claude API path
        ...

--llm ollama|anthropic|none. Three backends, same output format.

The GitHub Actions saga

This is where the real infrastructure work happened.

Problem 1: Workflow dispatch returned HTTP 500. The repo had been transferred from a personal account to an organization. GitHub’s internal state was broken. Every dispatch attempt returned a server error. New workflow ID, renamed file, different inputs. Nothing worked. The fix was deleting the repo and recreating it fresh. Secrets re-added, code re-pushed, dispatch worked immediately.

Problem 2: pip install refused on Ubuntu 24.04. Python’s PEP 668 externally-managed-environment restriction. The runner’s system Python wouldn’t allow pip install without --break-system-packages. Fixed by moving everything into a venv in the workflow.

Problem 3: Slow startup. Every run spent 5+ minutes installing PyTorch, Whisper, and dependencies. That’s when containers became the obvious answer.

Containerizing

A minimal Dockerfile:

FROM python:3.12-slim

RUN apt-get update && \
    apt-get install -y --no-install-recommends ffmpeg git && \
    rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir openai-whisper yt-dlp requests anthropic \
    pydantic pydantic-settings tenacity

# Pre-download whisper base model
RUN python -c "import whisper; whisper.load_model('base')"

WORKDIR /app

CPU-only PyTorch (the server doesn’t use GPU for Whisper), all Python deps, and the Whisper model baked in. Built on the self-hosted runner, pushed to GitHub Container Registry.

The workflow dropped from 5+ minutes of setup to seconds.

Problem 4: Docker networking. --network host in the container options conflicted with GitHub Actions’ auto-created bridge network. Removed it and pointed Ollama at Docker’s bridge gateway (172.17.0.1) instead of localhost. Two-line fix.

Self-hosted runners

Installed a GitHub Actions runner on the Ollama server. Then added four more. One loop, registration tokens from the API, systemd services:

for i in 1 2 3 4; do
  TOKEN=$(gh api orgs/myorg/actions/runners/registration-token \
    -X POST --jq '.token')
  ssh server "cd ~/actions-runner-$i && \
    ./config.sh --unattended --url https://github.com/myorg \
      --token $TOKEN --name runner-$i --replace && \
    sudo ./svc.sh install && sudo ./svc.sh start"
done

Five runners, all on the same machine, all with access to Ollama at localhost. Ready for parallel workloads.

The output

Each video produces two files:

Transcript (transcripts/{video_id}.txt):

Video ID: 7603802867664751902
Title: New /insights command tells you exactly how to improve...
URL: https://www.tiktok.com/@deonnahodges/video/7603802867664751902
Duration: 45s

--- TRANSCRIPT ---

Here's how the new insights command works...

Summary (summaries/{topic}/{descriptive-name}.md):

---
video_id: 7603802867664751902
title: Claude Code Insights Command for Code Improvement
url: https://www.tiktok.com/@deonnahodges/video/...
topic: claude-code-tips
---

Followed by structured sections: Summary, Key Tips, Details, Full Transcript. All auto-categorized by topic, auto-named with descriptive slugs.

The architecture

Your machine (Claude Code)
    │
    ├── Triggers workflow via gh workflow run
    │
    └── GitHub Actions (self-hosted runner on home server)
            │
            ├── Docker container (pre-baked Whisper + deps)
            │     ├── yt-dlp → downloads audio
            │     ├── Whisper → transcribes
            │     └── Ollama (localhost) → summarizes
            │
            └── git commit + push every 10 videos

Total cost: zero. Whisper is open-source. Ollama runs locally. GitHub Actions is free for self-hosted runners. The server was already running.

The meta-point

None of this was planned. The Anthropic API running out forced the Ollama pivot. The repo transfer bug forced the recreation. Docker networking forced the bridge gateway workaround. Each failure was diagnosed and fixed in about two minutes.

This is what infrastructure automation actually looks like in practice. Not a clean architecture diagram designed upfront. A series of practical decisions made under real constraints. No API credits, broken GitHub state, Docker quirks, Python packaging politics.

The tools matter less than the pattern: keep the feedback loop tight, make failures cheap, commit progress early, and swap components when one breaks.

Claude Code wrote the scripts, the workflows, the Dockerfile. I made the decisions about architecture, where to pivot, and when to stop fighting a broken tool and recreate it. That division of labour (AI handles the implementation, human handles the judgment) is the one that actually scales.

123 videos. One session. Processing at roughly one per minute on a machine in my apartment.

Oh, and then Claude wrote this and posted it for me.