Software
I develop open-source software to support data curation and computational research in the social sciences. All packages are freely available on GitHub: github.com/jaeyk
- MapAgora
- Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
 
- Use case: Used in “MapAgora, Civic Opportunity Datasets for the Study of American Local Politics and Public Policy”, Nature: Scientific Data (2025), and “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han
- Collaborator: Milan de Vries
 
- Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
- autotextclassifier
- Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
 
- Use case: Used in “MapAgora, Civic Opportunity Datasets for the Study of American Local Politics and Public Policy”, Nature: Scientific Data (2025), and “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han
- Collaborator: Milan de Vries
 
- Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
- validatednamesr
- Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
 
- Use case: Featured in “Validated Names for Experimental Studies on Ethnicity and Race” (Nature: Scientific Data, 2023) with by Charles Crabtree, S. Michael Gaddis, John B. Holbein, Cameron Guage, and William Marx.
- Collaborator: Charles Crabtree
 
- Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
- tidytweetjson
- Summary: An R package that transforms raw Twitter JSON files into clean, analysis-ready datasets. Useful for studying political discourse, social movements, and real-time reactions to events.
 
- Use case: Used in “COVID-19 and Asian Americans: How Elite Messaging and Social Exclusion Shape Partisan Attitudes” Perspectives on Politics (2021), with Nathan Chan and Vivien Leung.
 
- Summary: An R package that transforms raw Twitter JSON files into clean, analysis-ready datasets. Useful for studying political discourse, social movements, and real-time reactions to events.
- tidyethnicnews
- Summary: An R package for scraping, cleaning, and organizing ethnic newspaper articles from the Ethnic NewsWatch database. Designed to support media analysis and political communication research.
 
- Use case: Used in “Integrating Human and Machine Coding to Measure Political Issues in Ethnic Newspaper Articles” (Journal of Computational Social Science (2021)
 
- Summary: An R package for scraping, cleaning, and organizing ethnic newspaper articles from the Ethnic NewsWatch database. Designed to support media analysis and political communication research.