Raúl Alejandro Pérez Saucedo

I’m interested in applied NLP and building AI systems for policy, law, and sustainability.

I work on causal NLP and multilingual systems applied to legal and political text, where I've focused on transforming complex texts into structured data.

GitHub ↗ Hugging Face ↗ LinkedIn ↗ Resume ↗

Background

B.S. in Data Science and Mathematics

Tecnológico de Monterrey (ITESM)

About

I’m a data scientist and applied AI/NLP researcher with a background in Data Science and Mathematics from Tecnológico de Monterrey. Currently, at The Carter Center, I build and evaluate LLM-based workflows for comparative legal research, including freedom of expression and association laws across countries and languages.

Previously at Concordia University, I developed an LLM pipeline for extracting and classifying NGO regulation data from multilingual NGO laws, work presented at ARNOVA 2025. I’ve also worked at CEMEX, building SQL-based data systems and dashboards for operational and sustainability metrics.

I’m interested in how recent advances in NLP and emerging agentic AI workflows can help people reason over public-interest information, from law and governance to environmental data, especially in Latin America.

Current direction

Applied AI and NLP systems for public-interest analysis.

Long-term goal

Build, evaluate, and deploy systems for institutions and companies working on public-interest problems affecting our intstitutions and/or the planet.

Focus

I wan to help ensure advances in AI include Latin America and contribute to a stronger technical capacity and competitiveness in the region.

Research & Projects

LLM Pipeline for Legal Coding

Technical Report Co-authored

Designed and evaluated a two-stage LLM pipeline to support legal coding on freedom of expression. The goal was a reliable AI-assisted workflow that helps researchers structure large legal corpora while preserving human oversight.

Processed a corpus of 164 laws across 62 countries
Combined GPT-based extraction, BERT fine-tuning with LoRA, retrieval-based verification, and streamlined outputs
Evaluated model behavior against human-coded data with precision-recall tradeoffs and failure modes

Report ↗ Code ↗

Comparative NGO Regulations with LLMs

Conference Paper · ARNOVA 2025 Co-authored

I developed and tested an LLM-based workflow for analyzing NGO-related laws across countries, focused on how prompting strategy and model design affect reliability in comparative legal research.

Evaluated multilingual performance across 10 countries and 3 languages
Compared reliability across legal coding tasks

Paper ↗ Code ↗

Freedom of Expression in Venezuela and El Salvador

Policy Brief The Carter Center Co-authored

I wrote to a policy brief comparing freedom of expression trajectories in Venezuela and El Salvador, combining de jure legal coding with de facto indicators to study how democratic backsliding can follow different institutional pathways.

Combined an Institutional Grammar-coded legal panel with V-Dem freedom of expression indicators

PM2.5 Air Quality Sensor Calibration

Environmental Data

A project to improve the reliability of low-cost PM2.5 sensors using time-series regression and calibration against reference-grade monitors.

Built calibration models for low-cost air quality sensors
Improved measurement reliability for environmental data used in one of Mexico’s most industrialized regions

Current focus

Right now, I’m working on LLM-assisted legal coding and sharpening the evaluation framework behind it.

Interests

Applied NLP systems AI safety and accesibility AI governance Public-interest technology Environmental sustainability AI governance