Next: Medium Digest Paywall Support
Section titled “Next: Medium Digest Paywall Support”Status: Implemented 2026-02-22
Problem
Section titled “Problem”Medium digest articles are behind a paywall. The Medium Digest utility used httpx with no authentication, so fetch requests received truncated/paywalled content instead of full articles. With a Medium subscription, the fetcher did not use it.
Solution: Playwright + Stored Session
Section titled “Solution: Playwright + Stored Session”Use Playwright with a saved storage state (cookies + local storage) from an authenticated Medium session. Playwright loads pages as if the user were logged in, so paywalled content is accessible.
User Workflow (One-Time Setup)
Section titled “User Workflow (One-Time Setup)”- Run
uv run medium-digest capture-session - Browser opens; log into Medium (if not already)
- Visit a paywalled article to confirm access
- Return to terminal, press Enter to save the session
- Edit
config/config.yaml: setmedium_storage_state_path: medium_session.json - Normal runs use this session for authenticated fetches
When the session expires (usually after days/weeks), re-run capture-session.
Prerequisites
Section titled “Prerequisites”Install Chromium for Playwright (one-time):
uv run playwright install chromiumImplementation Summary
Section titled “Implementation Summary”source_summarizer
Section titled “source_summarizer”- Added
_fetch_html_playwright()for authenticated fetches via stored session _fetch_html()now accepts optionalstorage_state_path; uses Playwright when set- On Playwright failure (e.g. expired session), falls back to httpx with a warning
summarize_url()acceptsstorage_state_pathparameter- CLI: added
--storage-statefor testing single URLs with auth - Dependency:
playwright>=1.40.0
medium_digest
Section titled “medium_digest”- New command:
medium-digest capture-session— launches headed browser, user logs in, saves session on Enter - New module:
capture_session.py - Orchestrator passes
storage_state_pathfrom config tosummarize_url - Dependency:
playwright
Config and security
Section titled “Config and security”config.yaml: addedmedium_storage_state_path(e.g.medium_session.json)config.py: resolvesmedium_storage_state_pathrelative to config dir.gitignore: ignoresmedium_session.jsonand*_session.json
Documentation
Section titled “Documentation”- README: paywall setup,
capture-sessionusage,playwright install chromium
Testing: YouTube (Nate B Jones AI insights)
Section titled “Testing: YouTube (Nate B Jones AI insights)”For testing purposes, include YouTube as a source — specifically Nate B Jones for his AI insights. This would extend the source_summarizer to handle YouTube URLs (transcript extraction) and/or the medium_digest orchestrator to pull from additional feeds beyond the Medium digest.
Status: Implemented — see Upgrade v2 below.
Fallback
Section titled “Fallback”- If
medium_storage_state_pathis unset or file missing: uses httpx (original behavior) - Session will eventually expire; re-run
capture-sessionwhen needed
Digest Upgrade v2 — Filters, Dedup, Signal Rubric, YouTube
Section titled “Digest Upgrade v2 — Filters, Dedup, Signal Rubric, YouTube”Status: Implemented 2026-02-22
Architecture
Section titled “Architecture”Gmail digest → extract links (medium)YouTube channels → fetch recent videos ─┐ ├─ Merge → Pre-filter → Dedup → Summarize → Sort → Write ObsidianURL override ─┘1. Pre-filter Rules and Whitelists
Section titled “1. Pre-filter Rules and Whitelists”New: medium_digest/src/medium_digest/filters.py
Rules applied before any LLM calls (saves time and cost):
domain_whitelist— if non-empty, only allow URLs from listed domainsurl_blocklist_patterns— skip URLs matching any pattern (substring)min_title_length— skip titles shorter than N characters
Configured in config.yaml under filters:. Defaults block newsletter links, membership pages, and tag/topic index pages.
2. Cross-run Deduplication
Section titled “2. Cross-run Deduplication”New: medium_digest/src/medium_digest/dedup.py
- Persists
{url: "YYYY-MM-DD"}inmedium_digest/data/processed_urls.json - Auto-created on first run; gitignored
- Before each run, URLs already in the store are skipped
- After writing output, newly processed URLs are added to the store
Configured via dedup_store_path in config.yaml (optional; default path used if null).
3. Explicit Signal Rubric
Section titled “3. Explicit Signal Rubric”Updated: medium_digest/config/business_context.md — added Signal Rubric section with four weighted criteria:
| Criterion | Weight | Description |
|---|---|---|
| mission_alignment | 40% | Directly advances $MART DEBT / client-first leverage / advisor education |
| actionability | 25% | Can be applied to Strategy/Dev/Marketing this week |
| specificity | 20% | Real data, metrics, named case studies |
| novelty | 15% | Non-obvious, challenges conventional thinking |
Updated LLM prompt in source_summarizer/src/source_summarizer/core.py:
- Asks for
rubricsub-scores alongsidebusiness_relevance_score ArticleResultdataclass gainsrubric: dictfield- Markdown output shows:
- **Signal:** mission 8 · action 7 · specific 6 · novel 5
4. YouTube Multi-source Support
Section titled “4. YouTube Multi-source Support”New: source_summarizer/src/source_summarizer/youtube_fetcher.py
is_youtube_url(url)— detects YouTube URLsextract_video_id(url)— handles watch, youtu.be, shorts, embed formatsfetch_youtube_transcript(url)— usesyoutube-transcript-api(no API key); returns (title, transcript_text); prefers English, falls back to auto-generated
New: medium_digest/src/medium_digest/youtube_source.py
fetch_channel_videos(channel_id, max_videos)— fetches YouTube channel RSS feed (no API key)fetch_all_youtube_sources(cfg)— iteratesyoutube_sourcesin config
Updated: source_summarizer/src/source_summarizer/core.py
summarize_url()now detects YouTube URLs and routes to transcript fetcher instead of httpx/Playwrightsourceparameter added; shown in each article’s markdown section as- **Source:** ...
Config addition in config.yaml:
youtube_sources: - name: "Nate B Jones" channel_id: "UCF0pVplsI8R5kcAqgtoRqoA" # @NateBJones max_videos: 3 enabled: true5. Output Changes
Section titled “5. Output Changes”- H1 heading:
# Daily Digest – YYYY-MM-DD(was “Medium Daily Digest”) - Each article section now shows
- **Source:** mediumor- **Source:** YouTube – Nate B Jones - Signal rubric line added when available
File Changes Summary
Section titled “File Changes Summary”| File | Change |
|---|---|
medium_digest/src/medium_digest/filters.py | New — pre-filter rules |
medium_digest/src/medium_digest/dedup.py | New — URL deduplication store |
medium_digest/src/medium_digest/youtube_source.py | New — YouTube channel RSS fetcher |
source_summarizer/src/source_summarizer/youtube_fetcher.py | New — YouTube transcript fetcher |
medium_digest/config/config.yaml | Added filters, dedup_store_path, youtube_sources |
medium_digest/config/business_context.md | Added Signal Rubric section |
medium_digest/src/medium_digest/orchestrator.py | Full rewrite — multi-source, filters, dedup |
source_summarizer/src/source_summarizer/core.py | Rubric in prompt/output, YouTube routing, source field |
source_summarizer/pyproject.toml | Added youtube-transcript-api |
.gitignore | Ignore medium_digest/data/ |
Dependency: Find Any YouTube Channel ID
Section titled “Dependency: Find Any YouTube Channel ID”To add another YouTube channel, you need its channel ID:
- Open the channel page in a browser
- View Page Source (Ctrl+U)
- Search for
"channelId"— copy the value - Add to
youtube_sourcesinconfig.yaml
Next: Scheduling
Section titled “Next: Scheduling”Once this is robust in testing, add cron/Windows Task Scheduler to run medium-digest once per day after the digest usually arrives.