Software
I develop open-source software to support data curation and computational research in the social sciences. All packages are freely available on GitHub: github.com/jaeyk
- MapAgora
- Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
- Use case: Used in “MapAgora, Civic Opportunity Datasets for the Study of American Local Politics and Public Policy”, Nature Scientific Data (2025), and “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han
- Collaborator: Milan de Vries
- Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
- autotextclassifier
- Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
- Use case: Used in “MapAgora, Civic Opportunity Datasets for the Study of American Local Politics and Public Policy”, Nature Scientific Data (2025), and “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han
- Collaborator: Milan de Vries
- Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
- validatednamesr
- Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
- Use case: Featured in “Validated Names for Experimental Studies on Ethnicity and Race” (Nature Scientific Data, 2023) with by Charles Crabtree, S. Michael Gaddis, John B. Holbein, Cameron Guage, and William Marx.
- Collaborator: Charles Crabtree
- Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
- tidytweetjson
- Summary: An R package that transforms raw Twitter JSON files into clean, analysis-ready datasets. Useful for studying political discourse, social movements, and real-time reactions to events.
- Use case: Used in “COVID-19 and Asian Americans: How Elite Messaging and Social Exclusion Shape Partisan Attitudes” Perspectives on Politics (2021), with Nathan Chan and Vivien Leung.
- Summary: An R package that transforms raw Twitter JSON files into clean, analysis-ready datasets. Useful for studying political discourse, social movements, and real-time reactions to events.
- tidyethnicnews
- Summary: An R package for scraping, cleaning, and organizing ethnic newspaper articles from the Ethnic NewsWatch database. Designed to support media analysis and political communication research.
- Use case: Used in “Integrating Human and Machine Coding to Measure Political Issues in Ethnic Newspaper Articles” (Journal of Computational Social Science (2021)
- Summary: An R package for scraping, cleaning, and organizing ethnic newspaper articles from the Ethnic NewsWatch database. Designed to support media analysis and political communication research.