MujRozhlas.cz audiostream downloader.

Find a file

Vítězslav Dvořák 32f22b6862 Add AppStream stock icon and install metainfo/icon Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-05-05 22:06:37 +02:00
debian	Add AppStream stock icon and install metainfo/icon	2026-05-05 22:06:37 +02:00
.gitignore	feat: Add initial Debian packaging files for the mujdownloader project and update gitignore.	2026-02-09 14:33:21 +01:00
browser_extract.js	feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection.	2026-02-10 01:55:20 +01:00
download_radiobook.py	feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection.	2026-02-10 01:55:20 +01:00
episode_api.json	feat: Add initial Debian packaging files for the mujdownloader project and update gitignore.	2026-02-09 14:33:21 +01:00
extract_episodes.py	feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection.	2026-02-10 01:55:20 +01:00
MANUAL_EXTRACTION.md	feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection.	2026-02-10 01:55:20 +01:00
mujdownloader.svg	Add AppStream stock icon and install metainfo/icon	2026-05-05 22:06:37 +02:00
README.md	feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection.	2026-02-10 01:55:20 +01:00
requirements.txt	Fix radiobook downloader: update selectors and expansion logic	2026-02-09 14:29:46 +01:00
setup.sh	feat: Implement manual URL extraction methods and switch Playwright to Firefox due to Cloudflare bot protection.	2026-02-10 01:55:20 +01:00

README.md

MujRozhlas Radiobook Downloader

Python script to download all episodes from radiobook series on mujrozhlas.cz using Playwright to handle JavaScript-rendered content.

Prerequisites

1. Install Playwright

Check if python3-playwright is available in Debian repos:

apt-cache search python3-playwright

If not available, you'll need to create a Debian package (as per your preference) or install via pip:

pip3 install playwright
playwright install firefox

2. Install Python dependencies

sudo apt install python3-requests

Usage

Basic usage:

./download_radiobook.py "https://www.mujrozhlas.cz/radiokniha/zbynek-fiser-egon-bondy-statni-bezpecnost-pohled-do-zakulisi-sledovani-donaseni-vydirani"

With custom output directory:

./download_radiobook.py "https://www.mujrozhlas.cz/radiokniha/..." episodes/

How It Works

Scraping: Uses Playwright to load the page with full JavaScript execution
Detection: Tries multiple strategies to find audio URLs:
- Looks for <audio> elements
- Intercepts network requests for MP3 files
- Extracts JSON data from <script> tags
- Simulates clicking play buttons to trigger audio loading
Debug Output: Saves HTML and JSON data for manual inspection if auto-detection fails
Download: Downloads all found episodes with progress tracking

Expected Output

Episode MP3 files named as: 01_Episode_Title.mp3, 02_Next_Episode.mp3, etc.
Debug files:
- page_debug.html - Full rendered HTML
- script_data_*.json - Extracted JSON data from page

Troubleshooting

Cloudflare Protection

If the script is blocked by Cloudflare, you may need to:

Run with visible browser (set headless=False in the script)
Add delays between requests
Use residential proxy

No Episodes Found

Check the debug files in the downloads directory:

Inspect page_debug.html to see what was loaded
Check script_data_*.json for episode data structure
Manually extract audio URLs and modify the script

Manual Extraction

If automatic detection fails, you can manually add episodes to the script by modifying the scrape_episodes() method to directly populate self.episodes list with:

self.episodes = [
    {'url': 'https://...episode1.mp3', 'title': 'Episode 1', 'number': 1},
    {'url': 'https://...episode2.mp3', 'title': 'Episode 2', 'number': 2},
    # ...
]

Target Series

Current target: Zbyněk Fišer / Egon Bondy a Státní bezpečnost

15 episodes total
Released daily from January 21, 2026
~25-27 minutes per episode