Skip to content

Status: Implemented 2026-02-22

Medium digest articles are behind a paywall. The Medium Digest utility used httpx with no authentication, so fetch requests received truncated/paywalled content instead of full articles. With a Medium subscription, the fetcher did not use it.

Use Playwright with a saved storage state (cookies + local storage) from an authenticated Medium session. Playwright loads pages as if the user were logged in, so paywalled content is accessible.

  1. Run uv run medium-digest capture-session
  2. Browser opens; log into Medium (if not already)
  3. Visit a paywalled article to confirm access
  4. Return to terminal, press Enter to save the session
  5. Edit config/config.yaml: set medium_storage_state_path: medium_session.json
  6. Normal runs use this session for authenticated fetches

When the session expires (usually after days/weeks), re-run capture-session.

Install Chromium for Playwright (one-time):

Terminal window
uv run playwright install chromium
  • Added _fetch_html_playwright() for authenticated fetches via stored session
  • _fetch_html() now accepts optional storage_state_path; uses Playwright when set
  • On Playwright failure (e.g. expired session), falls back to httpx with a warning
  • summarize_url() accepts storage_state_path parameter
  • CLI: added --storage-state for testing single URLs with auth
  • Dependency: playwright>=1.40.0
  • New command: medium-digest capture-session — launches headed browser, user logs in, saves session on Enter
  • New module: capture_session.py
  • Orchestrator passes storage_state_path from config to summarize_url
  • Dependency: playwright
  • config.yaml: added medium_storage_state_path (e.g. medium_session.json)
  • config.py: resolves medium_storage_state_path relative to config dir
  • .gitignore: ignores medium_session.json and *_session.json
  • README: paywall setup, capture-session usage, playwright install chromium

Testing: YouTube (Nate B Jones AI insights)

Section titled “Testing: YouTube (Nate B Jones AI insights)”

For testing purposes, include YouTube as a source — specifically Nate B Jones for his AI insights. This would extend the source_summarizer to handle YouTube URLs (transcript extraction) and/or the medium_digest orchestrator to pull from additional feeds beyond the Medium digest.

Status: Implemented — see Upgrade v2 below.

  • If medium_storage_state_path is unset or file missing: uses httpx (original behavior)
  • Session will eventually expire; re-run capture-session when needed

Digest Upgrade v2 — Filters, Dedup, Signal Rubric, YouTube

Section titled “Digest Upgrade v2 — Filters, Dedup, Signal Rubric, YouTube”

Status: Implemented 2026-02-22

Gmail digest → extract links (medium)
YouTube channels → fetch recent videos ─┐
├─ Merge → Pre-filter → Dedup → Summarize → Sort → Write Obsidian
URL override ─┘

New: medium_digest/src/medium_digest/filters.py

Rules applied before any LLM calls (saves time and cost):

  • domain_whitelist — if non-empty, only allow URLs from listed domains
  • url_blocklist_patterns — skip URLs matching any pattern (substring)
  • min_title_length — skip titles shorter than N characters

Configured in config.yaml under filters:. Defaults block newsletter links, membership pages, and tag/topic index pages.

New: medium_digest/src/medium_digest/dedup.py

  • Persists {url: "YYYY-MM-DD"} in medium_digest/data/processed_urls.json
  • Auto-created on first run; gitignored
  • Before each run, URLs already in the store are skipped
  • After writing output, newly processed URLs are added to the store

Configured via dedup_store_path in config.yaml (optional; default path used if null).

Updated: medium_digest/config/business_context.md — added Signal Rubric section with four weighted criteria:

CriterionWeightDescription
mission_alignment40%Directly advances $MART DEBT / client-first leverage / advisor education
actionability25%Can be applied to Strategy/Dev/Marketing this week
specificity20%Real data, metrics, named case studies
novelty15%Non-obvious, challenges conventional thinking

Updated LLM prompt in source_summarizer/src/source_summarizer/core.py:

  • Asks for rubric sub-scores alongside business_relevance_score
  • ArticleResult dataclass gains rubric: dict field
  • Markdown output shows: - **Signal:** mission 8 · action 7 · specific 6 · novel 5

New: source_summarizer/src/source_summarizer/youtube_fetcher.py

  • is_youtube_url(url) — detects YouTube URLs
  • extract_video_id(url) — handles watch, youtu.be, shorts, embed formats
  • fetch_youtube_transcript(url) — uses youtube-transcript-api (no API key); returns (title, transcript_text); prefers English, falls back to auto-generated

New: medium_digest/src/medium_digest/youtube_source.py

  • fetch_channel_videos(channel_id, max_videos) — fetches YouTube channel RSS feed (no API key)
  • fetch_all_youtube_sources(cfg) — iterates youtube_sources in config

Updated: source_summarizer/src/source_summarizer/core.py

  • summarize_url() now detects YouTube URLs and routes to transcript fetcher instead of httpx/Playwright
  • source parameter added; shown in each article’s markdown section as - **Source:** ...

Config addition in config.yaml:

youtube_sources:
- name: "Nate B Jones"
channel_id: "UCF0pVplsI8R5kcAqgtoRqoA" # @NateBJones
max_videos: 3
enabled: true
  • H1 heading: # Daily Digest – YYYY-MM-DD (was “Medium Daily Digest”)
  • Each article section now shows - **Source:** medium or - **Source:** YouTube – Nate B Jones
  • Signal rubric line added when available
FileChange
medium_digest/src/medium_digest/filters.pyNew — pre-filter rules
medium_digest/src/medium_digest/dedup.pyNew — URL deduplication store
medium_digest/src/medium_digest/youtube_source.pyNew — YouTube channel RSS fetcher
source_summarizer/src/source_summarizer/youtube_fetcher.pyNew — YouTube transcript fetcher
medium_digest/config/config.yamlAdded filters, dedup_store_path, youtube_sources
medium_digest/config/business_context.mdAdded Signal Rubric section
medium_digest/src/medium_digest/orchestrator.pyFull rewrite — multi-source, filters, dedup
source_summarizer/src/source_summarizer/core.pyRubric in prompt/output, YouTube routing, source field
source_summarizer/pyproject.tomlAdded youtube-transcript-api
.gitignoreIgnore medium_digest/data/

To add another YouTube channel, you need its channel ID:

  1. Open the channel page in a browser
  2. View Page Source (Ctrl+U)
  3. Search for "channelId" — copy the value
  4. Add to youtube_sources in config.yaml

Once this is robust in testing, add cron/Windows Task Scheduler to run medium-digest once per day after the digest usually arrives.