I am a PhD candidate in Political Science and a D-Lab Senior Data Science Fellow at UC Berkeley and a Visiting Student Fellow at the SNF Agora Institute’s P3 Lab at Johns Hopkins University. I study political learning, organizing, and mobilization among marginalized populations using big data and data science. I also build tools that make social science research more efficient and reproducible.
I am currently working with Hahrie Han and Milan de Vries on Mapping the Modern Agora project that develops infrastructure for automatically collecting, parsing, and organizing data on civic organizations in the US based on more than three million tax reports.
My dissertation examines how government policies influenced U.S. minority coalition formation in the 1960s and 1970s. This project won WPSA’s Don T. Nakanishi Award for Distinguished Scholarship and Service in Asian Pacific American Politics. Portions of this work appear in Studies in American Political Development, Political Research Quarterly and the Journal of Computational Social Science.
My research on intersectional bias in hate speech and abusive language datasets appears in Proceedings of the Fourteenth International Conference on Web and Social Media (ICWSM), Data Challenge Workshop. I also have working papers (under review) on how threats influence political learning among marginalized populations and how social exclusion shapes Asian American partisanship.
I authored or co-authored and maintain six R packages for parsing large and complex data (tweets, newspaper articles, tax forms, websites, etc.) and making research computationally reproducible.
Teaching Computational Social Science
I am a proud recipient of the “Outstanding Graduate Student Instructor Award” and have taught computational social science at both graduate and undergraduate levels in semester-long courses and short workshops. I also served as a Data Science Education Program Fellow at UC Berkeley and co-organized the Summer Institute in Computational Social Science in the San Francisco Bay Area. I am currently working on an open textbook project titled “Computational Thinking for Social Scientists."
I love getting emails.
I am on the job market this year and will graduate in May 2021.
PhD Candidate in Political Science, 2016~
MA in Political Science, 2016
This project examines the impact of the War on Poverty programs on the formation of Asian American and Latino community organizations drawing on original large-scale organizational and text data.
This project turns historical ethnic newspaper articles into data to trace how issues varied among minority groups in the era of civil rights.
This project combines computational text analysis and a natural experiment to identify the causal effects of threats on information seeking among minority group members.
This project unpacks how different racial minority groups experience marginalization differently by applying factor analysis to one of the largest multi-racial surveyes conducted in the US.
This project analyzes how survey respondents interpret questions on racial solidarity using a within-group survey experimental design.
This project investigates to what extent and how South Koreans hold biased attitudes towards North Korean refugees by embedding list experiments in a nation-wide mobile survey.
Kim, Jae Yeon. Why Teaching Social Scientists How To Code Like A Professional Is Important. UC Berkeley D-Lab. September 23, 2020.
Haber, Jaren, Jae Yeon Kim, and Nick Camp. BAY-SICSS: Bridging Computational Social Scientists and Practitioners for Social Good. Berkeley Institute of Data Science. September 15, 2020.
Kim, Jae Yeon. Five Principles to Get Undergraduates Involved in Real-world Data Science Projects. SAGE Ocean. June 24, 2020.
Kim, Jae Yeon. How I Accidentally Became Interested in Data Science. UC Berkeley D-Lab. February 24, 2020.
tidytweetjson: R package for turning Tweet JSON files into a cleaned and wrangled dataset. The package takes takes 4 minutes to turn 2 million tweets into a tidy dataframe.
tidyethnicnews: R package for turning search results from one of the largest databases on ethnic newspapers and magazines published in the United States into a cleaned and wrangled dataset. The package 12.34 seconds to turn 5,684 articles into a tidy dataframe.
makereproducible: R package for making a project computationally reproducible before sharing it.
ParseIRS990: R package for parsing tax return forms filed with the U.S. Internal Revenue Service (with Milan de Vries)
GetAboutPages: R package for scraping an about page from an organization website (with Milan de Vries)
GetSocialMediaHandles: R package for finding a social media handle of an organization (with Milan de Vries)
TidyChaseBankStatements: R package for turning Chase Bank Statements into a tidy dataframe.
I was born and raised in South Korea, but by the time I finished college, I had also lived in Hong Kong and Taiwan. I moved to California in 2014 as a graduate student in political science at UC Berkeley. Prior to my graduate studies, I worked in the tech industry in South Korea. I was a strategy manager at a software startup and served on the advisory board of Naver, “The Google of South Korea.” When I don’t write or code, I enjoy running, cooking, and reading. I run 10k almost every morning.