Software

I develop open-source software to support data curation and computational research in the social sciences. All packages are freely available on GitHub: github.com/jaeyk

  1. MapAgora
    • Summary: An R package for retrieving and processing tax records (e.g., IRS Form 990), website data, and social media handles for U.S. nonprofit organizations. Built to support large-scale analysis of civic infrastructure.
    • Use case: Used in “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han.
    • Collaborator: Milan de Vries
  2. autotextclassifier
    • Summary: An R package for automated text classification using the tidymodels framework. Supports supervised learning pipelines with minimal setup for civic tech and policy research applications.
    • Use case: Also used in “The Unequal Landscape of Civic Opportunity in America”, Nature Human Behaviour (2023), with Milan de Vries and Hahrie Han.
    • Collaborator: Milan de Vries
  3. validatednamesr
    • Summary: An R package for accessing and analyzing the validated names dataset used in race and ethnicity experiments. Includes utilities for name filtering, trait inspection, and integration with survey tools.
    • Use case: Featured in “Validated Names for Experimental Studies on Ethnicity and Race” (Nature Scientific Data, 2023) with by Charles Crabtree, S. Michael Gaddis, John B. Holbein, Cameron Guage, and William Marx.
    • Collaborator: Charles Crabtree
  4. tidytweetjson
  5. tidyethnicnews
Back to top