MujRozhlas.cz audiostream downloader.
Find a file
2026-02-10 01:55:20 +01:00
debian feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00
.gitignore feat: Add initial Debian packaging files for the mujdownloader project and update gitignore. 2026-02-09 14:33:21 +01:00
browser_extract.js feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00
download_radiobook.py feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00
episode_api.json feat: Add initial Debian packaging files for the mujdownloader project and update gitignore. 2026-02-09 14:33:21 +01:00
extract_episodes.py feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00
MANUAL_EXTRACTION.md feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00
README.md feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00
requirements.txt Fix radiobook downloader: update selectors and expansion logic 2026-02-09 14:29:46 +01:00
setup.sh feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection. 2026-02-10 01:55:20 +01:00

MujRozhlas Radiobook Downloader

Python script to download all episodes from radiobook series on mujrozhlas.cz using Playwright to handle JavaScript-rendered content.

Prerequisites

1. Install Playwright

Check if python3-playwright is available in Debian repos:

apt-cache search python3-playwright

If not available, you'll need to create a Debian package (as per your preference) or install via pip:

pip3 install playwright
playwright install firefox

2. Install Python dependencies

sudo apt install python3-requests

Usage

Basic usage:

./download_radiobook.py "https://www.mujrozhlas.cz/radiokniha/zbynek-fiser-egon-bondy-statni-bezpecnost-pohled-do-zakulisi-sledovani-donaseni-vydirani"

With custom output directory:

./download_radiobook.py "https://www.mujrozhlas.cz/radiokniha/..." episodes/

How It Works

  1. Scraping: Uses Playwright to load the page with full JavaScript execution
  2. Detection: Tries multiple strategies to find audio URLs:
    • Looks for <audio> elements
    • Intercepts network requests for MP3 files
    • Extracts JSON data from <script> tags
    • Simulates clicking play buttons to trigger audio loading
  3. Debug Output: Saves HTML and JSON data for manual inspection if auto-detection fails
  4. Download: Downloads all found episodes with progress tracking

Expected Output

  • Episode MP3 files named as: 01_Episode_Title.mp3, 02_Next_Episode.mp3, etc.
  • Debug files:
    • page_debug.html - Full rendered HTML
    • script_data_*.json - Extracted JSON data from page

Troubleshooting

Cloudflare Protection

If the script is blocked by Cloudflare, you may need to:

  • Run with visible browser (set headless=False in the script)
  • Add delays between requests
  • Use residential proxy

No Episodes Found

Check the debug files in the downloads directory:

  • Inspect page_debug.html to see what was loaded
  • Check script_data_*.json for episode data structure
  • Manually extract audio URLs and modify the script

Manual Extraction

If automatic detection fails, you can manually add episodes to the script by modifying the scrape_episodes() method to directly populate self.episodes list with:

self.episodes = [
    {'url': 'https://...episode1.mp3', 'title': 'Episode 1', 'number': 1},
    {'url': 'https://...episode2.mp3', 'title': 'Episode 2', 'number': 2},
    # ...
]

Target Series

Current target: Zbyněk Fišer / Egon Bondy a Státní bezpečnost

  • 15 episodes total
  • Released daily from January 21, 2026
  • ~25-27 minutes per episode