Marco/Bundescrawler

No description

Find a file

Marco Lents 14670538f6 clean up: explicit utf-8 encoding, proper exception handling, remove dead code - Add encoding="utf-8" to all file writes - Catch requests.RequestException instead of bare except - Use raise_for_status() to also retry on HTTP errors - Use removeprefix/removesuffix instead of lstrip/rstrip - Use makedirs(exist_ok=True) - Remove unused common_suffix function and commonprefix import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-13 22:27:03 +02:00
.gitignore	add egg-info to gitignore	2025-11-13 21:56:53 +01:00
crawler.py	clean up: explicit utf-8 encoding, proper exception handling, remove dead code	2026-04-13 22:27:03 +02:00
flake.lock	add dev environment	2025-11-13 18:22:58 +01:00
flake.nix	add dev environment	2025-11-13 18:22:58 +01:00
pyproject.toml	pin versions of dependencies	2025-12-02 10:44:03 +01:00
Readme.md	add Readme	2025-12-02 09:20:33 +01:00

Readme.md

Bundescrawler

This repository contains the scraper (I just liked the name Bundescrawler), which collects the available information from the sites of the representatives of the German Parlament.

How to use

Clone the repository
Install the dependencies with pip install .
Initialize a repository somewhere where you want to save the information.
Run python3 crawler.py -o <output directory>