Commit graph

30 commits

Author SHA1 Message Date
ddfdc29395 add README, party index, and letter indexes to output repo
- README.md with total count, party breakdown table, and directory links
- Parteien/<party>.md listing all members with links to their profiles
- Abgeordnete/<letter>/index.md listing all representatives per letter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:58:27 +02:00
64c75f9b1f fix: stage files before checking for changes
Previously git diff checked unstaged changes, so new untracked files
(like the new Abstimmungen/ directory) would not trigger a commit.
Now stages first, then checks the staged diff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:51:01 +02:00
fb38bb5894 generate descriptive commit messages by diffing against previous run
Loads the previous raw.json before saving, compares against current
crawl, and generates commit messages listing: new/departed representatives,
party changes, new disclosures, and total profiles updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:37:50 +02:00
e5a43977c1 add collected votes directory with persistence
Creates Abstimmungen/ with one markdown file per vote topic, sorted by
party. Uses a JSON backing store so votes are preserved even after they
are removed from the Bundestag website.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:31:05 +02:00
14670538f6 clean up: explicit utf-8 encoding, proper exception handling, remove dead code
- Add encoding="utf-8" to all file writes
- Catch requests.RequestException instead of bare except
- Use raise_for_status() to also retry on HTTP errors
- Use removeprefix/removesuffix instead of lstrip/rstrip
- Use makedirs(exist_ok=True)
- Remove unused common_suffix function and commonprefix import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:27:03 +02:00
1a80fe1647 fix replacement character warnings by using decoded response text
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:21:40 +02:00
6a6c478b43 fix crash when biography page elements are missing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 20:09:36 +02:00
eba6cc7fb9 remove disclaimer 2025-12-02 09:20:41 +01:00
641dd19b8d only commit if there are changes 2025-11-20 11:04:06 +01:00
21d968fbac sort functions 2025-11-20 10:56:30 +01:00
b8eda046dd improve handling of missing jobs 2025-11-18 20:00:07 +01:00
801c38d985 handle resinged reps 2025-11-17 17:41:48 +01:00
026adbc0ea add parser feature to beautifulsoup constructor to silence warnings 2025-11-17 17:28:52 +01:00
0fcd725192 add job 2025-11-17 17:26:10 +01:00
3787f756f3 improve no disclosure handling 2025-11-16 13:01:56 +01:00
5501f07cf7 add disclosure files 2025-11-16 10:34:41 +01:00
6ef8fcc993 extract graceful url handling into separate function 2025-11-16 10:12:39 +01:00
d47627643f sort functions alphabetically also in raw 2025-11-15 15:38:56 +01:00
be2371e0f5 sort functions alphabetically and handle rate limit more gracefully 2025-11-15 15:35:11 +01:00
3f0130c75b fix disclosuure handling 2025-11-15 15:30:07 +01:00
93310a8030 fix handling of empty speeches 2025-11-15 14:43:31 +01:00
384bd83b20 more formatting fixes 2025-11-14 12:49:55 +01:00
62b1443798 fix formatting 2025-11-14 12:46:11 +01:00
1abf37d175 change formatting for individual output 2025-11-14 12:38:11 +01:00
7358b47384 individual output 2025-11-14 12:31:39 +01:00
d2fac39099 fix function decoding 2025-11-14 12:09:08 +01:00
aaa372fe21 git functionality 2025-11-14 11:40:07 +01:00
19cdfb486d save raw data as json 2025-11-14 11:28:39 +01:00
cb3186e00e full crawler functionality 2025-11-14 10:10:24 +01:00
f8b33e1d6b some basic functionality 2025-11-13 21:56:15 +01:00