Software
I develop open-source software to support data curation and computational research in the social sciences. All packages are freely available on GitHub: github.com/jaeyk
- MapAgora
- Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
- Use case: Used in “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han.
- Collaborator: Milan de Vries
- Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
- autotextclassifier
- Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
- Use case: Also used in “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han.
- Collaborator: Milan de Vries
- Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
- validatednamesr
- Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
- Use case: Featured in “Validated Names for Experimental Studies on Ethnicity and Race” (Nature Scientific Data, 2023) with by Charles Crabtree, S. Michael Gaddis, John B. Holbein, Cameron Guage, and William Marx.
- Collaborator: Charles Crabtree
- Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
- tidytweetjson
- Summary: An R package that transforms raw Twitter JSON files into clean, analysis-ready datasets. Useful for studying political discourse, social movements, and real-time reactions to events.
- Use case: Used in “COVID-19 and Asian Americans: How Elite Messaging and Social Exclusion Shape Partisan Attitudes” Perspectives on Politics (2021), with Nathan Chan and Vivien Leung.
- Summary: An R package that transforms raw Twitter JSON files into clean, analysis-ready datasets. Useful for studying political discourse, social movements, and real-time reactions to events.
- tidyethnicnews
- Summary: An R package for scraping, cleaning, and organizing ethnic newspaper articles from the Ethnic NewsWatch database. Designed to support media analysis and political communication research.
- Use case: Used in “Integrating Human and Machine Coding to Measure Political Issues in Ethnic Newspaper Articles” (Journal of Computational Social Science (2021)
- Summary: An R package for scraping, cleaning, and organizing ethnic newspaper articles from the Ethnic NewsWatch database. Designed to support media analysis and political communication research.