LLM Pipeline for Legal Coding
Designed and evaluated a two-stage LLM pipeline to support legal coding on freedom of expression. The goal was a reliable AI-assisted workflow that helps researchers structure large legal corpora while preserving human oversight.
- Processed a corpus of 164 laws across 62 countries
- Combined GPT-based extraction, BERT fine-tuning with LoRA, retrieval-based verification, and streamlined outputs
- Evaluated model behavior against human-coded data with precision-recall tradeoffs and failure modes