Commit graph

28 commits

Author SHA1 Message Date
fb38bb5894 generate descriptive commit messages by diffing against previous run
Loads the previous raw.json before saving, compares against current
crawl, and generates commit messages listing: new/departed representatives,
party changes, new disclosures, and total profiles updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:37:50 +02:00
e5a43977c1 add collected votes directory with persistence
Creates Abstimmungen/ with one markdown file per vote topic, sorted by
party. Uses a JSON backing store so votes are preserved even after they
are removed from the Bundestag website.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:31:05 +02:00
14670538f6 clean up: explicit utf-8 encoding, proper exception handling, remove dead code
- Add encoding="utf-8" to all file writes
- Catch requests.RequestException instead of bare except
- Use raise_for_status() to also retry on HTTP errors
- Use removeprefix/removesuffix instead of lstrip/rstrip
- Use makedirs(exist_ok=True)
- Remove unused common_suffix function and commonprefix import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:27:03 +02:00
1a80fe1647 fix replacement character warnings by using decoded response text
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:21:40 +02:00
6a6c478b43 fix crash when biography page elements are missing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 20:09:36 +02:00
eba6cc7fb9 remove disclaimer 2025-12-02 09:20:41 +01:00
641dd19b8d only commit if there are changes 2025-11-20 11:04:06 +01:00
21d968fbac sort functions 2025-11-20 10:56:30 +01:00
b8eda046dd improve handling of missing jobs 2025-11-18 20:00:07 +01:00
801c38d985 handle resinged reps 2025-11-17 17:41:48 +01:00
026adbc0ea add parser feature to beautifulsoup constructor to silence warnings 2025-11-17 17:28:52 +01:00
0fcd725192 add job 2025-11-17 17:26:10 +01:00
3787f756f3 improve no disclosure handling 2025-11-16 13:01:56 +01:00
5501f07cf7 add disclosure files 2025-11-16 10:34:41 +01:00
6ef8fcc993 extract graceful url handling into separate function 2025-11-16 10:12:39 +01:00
d47627643f sort functions alphabetically also in raw 2025-11-15 15:38:56 +01:00
be2371e0f5 sort functions alphabetically and handle rate limit more gracefully 2025-11-15 15:35:11 +01:00
3f0130c75b fix disclosuure handling 2025-11-15 15:30:07 +01:00
93310a8030 fix handling of empty speeches 2025-11-15 14:43:31 +01:00
384bd83b20 more formatting fixes 2025-11-14 12:49:55 +01:00
62b1443798 fix formatting 2025-11-14 12:46:11 +01:00
1abf37d175 change formatting for individual output 2025-11-14 12:38:11 +01:00
7358b47384 individual output 2025-11-14 12:31:39 +01:00
d2fac39099 fix function decoding 2025-11-14 12:09:08 +01:00
aaa372fe21 git functionality 2025-11-14 11:40:07 +01:00
19cdfb486d save raw data as json 2025-11-14 11:28:39 +01:00
cb3186e00e full crawler functionality 2025-11-14 10:10:24 +01:00
f8b33e1d6b some basic functionality 2025-11-13 21:56:15 +01:00