Jae Yeon Kim

Jae Yeon Kim

Computational social scientist

UC Berkeley

I am a PhD candidate in Political Science and a D-Lab Senior Data Science Fellow at UC Berkeley and a Visiting Student Fellow at the SNF Agora Institute’s P3 Lab at Johns Hopkins University. I study political learning, organizing, and mobilization among marginalized populations using big data and data science. I also build tools that make social science research more efficient and reproducible.

Ongoing Research

I am currently working with Hahrie Han and Milan de Vries on Mapping the Modern Agora project that develops infrastructure for automatically collecting, parsing, and organizing data on civic organizations in the US based on more than three million tax reports.

Dissertation Research

My dissertation examines how government policies influenced U.S. minority coalition formation in the 1960s and 1970s. This project won WPSA’s Don T. Nakanishi Award for Distinguished Scholarship and Service in Asian Pacific American Politics. Portions of this work appear in Studies in American Political Development, Political Research Quarterly and the Journal of Computational Social Science.

Other Research

My research on intersectional bias in hate speech and abusive language datasets appears in Proceedings of the Fourteenth International Conference on Web and Social Media (ICWSM), Data Challenge Workshop. I also have working papers (under review) on how threats influence political learning among marginalized populations and how social exclusion shapes Asian American partisanship.

Research Software

I authored or co-authored and maintain six R packages for parsing large and complex data (tweets, newspaper articles, tax forms, websites, etc.) and making research computationally reproducible.

Teaching Computational Social Science

I am a proud recipient of the “Outstanding Graduate Student Instructor Award” and have taught computational social science at both graduate and undergraduate levels in semester-long courses and short workshops. I also served as a Data Science Education Program Fellow at UC Berkeley and co-organized the Summer Institute in Computational Social Science in the San Francisco Bay Area. I am currently working on an open textbook project titled “Computational Thinking for Social Scientists."

Contact Me

I love getting emails.


I am on the job market this year and will graduate in May 2021.


  • Computational social science
  • Racial and ethnic politics
  • Historical social science
  • Political behavior


  • PhD Candidate in Political Science, 2016~

    UC Berkeley

  • MA in Political Science, 2016

    UC Berkeley



Large-scale Twitter Analysis on COVID-19 and Anti-Asian Climate

This project traces how COVID-19 shaped an anti-Asian climate on Twitter, drawing on more than 1 million US-located tweets.

Dataset Bias in Machine Learning

This project identifies and estimates intersectional bias in datasets of hate speech and abusive language.

Policy Impact on Community Organizing

This project examines the impact of the War on Poverty programs on the formation of Asian American and Latino community organizations drawing on original large-scale organizational and text data.

Turning Ethnic Newspapers into Data

This project turns historical ethnic newspaper articles into data to trace how issues varied among minority groups in the era of civil rights.

Causal Inference Using Text Data and Natural Experiments

This project combines computational text analysis and a natural experiment to identify the causal effects of threats on information seeking among minority group members.

Categorizing Marginal Experience Using Survey

This project unpacks how different racial minority groups experience marginalization differently by applying factor analysis to one of the largest multi-racial surveyes conducted in the US.

Making Survey Questions More Interpretable

This project analyzes how survey respondents interpret questions on racial solidarity using a within-group survey experimental design.

Detecting Sensitive Attitudes

This project investigates to what extent and how South Koreans hold biased attitudes towards North Korean refugees by embedding list experiments in a nation-wide mobile survey.

Peer-reviewed articles

(2020). Intersectional Bias in Hate Speech and Abusive Language Datasets. Proceedings of the Fourteenth International Conference on Web and Social Media (ICWSM), Data Challenge Workshop.

PDF Code Slides

Other publications

Kim, Jae Yeon. Why Teaching Social Scientists How To Code Like A Professional Is Important. UC Berkeley D-Lab. September 23, 2020.

Haber, Jaren, Jae Yeon Kim, and Nick Camp. BAY-SICSS: Bridging Computational Social Scientists and Practitioners for Social Good. Berkeley Institute of Data Science. September 15, 2020.

Kim, Jae Yeon. Five Principles to Get Undergraduates Involved in Real-world Data Science Projects. SAGE Ocean. June 24, 2020.

Kim, Jae Yeon. How I Accidentally Became Interested in Data Science. UC Berkeley D-Lab. February 24, 2020.


For research

  1. Stable
  • (a) tidytweetjson: R package for turning Tweet JSON files into a cleaned and wrangled dataset. The package takes takes 4 minutes to turn 2 million tweets into a tidy dataframe.
  • (b) tidyethnicnews: R package for turning search results from one of the largest databases on ethnic newspapers and magazines published in the United States into a cleaned and wrangled dataset. The package 12.34 seconds to turn 5,684 articles into a tidy dataframe.
  • (c) makereproducible: R package for making a project computationally reproducible before sharing it.
  1. Still contains known or unknown bugs
  • (a) ParseIRS990: R package for parsing tax return forms filed with the U.S. Internal Revenue Service (with Milan de Vries)
  • (b) GetAboutPages: R package for scraping an about page from an organization website (with Milan de Vries)
  • (c) GetSocialMediaHandles: R package for finding a social media handle of an organization (with Milan de Vries)

For fun


Teaching Awards and Training

Graduate Seminars:

Undergraduate Lectures:

  • Graduate student instructor for Laura Stoker, Introduction to Empirical Analysis and Quantitative Methods, Department of Political Science, UC Berkeley, Fall 2016



Please view my CV.


I was born and raised in South Korea, but by the time I finished college, I had also lived in Hong Kong and Taiwan. I moved to California in 2014 as a graduate student in political science at UC Berkeley. Prior to my graduate studies, I worked in the tech industry in South Korea. I was a strategy manager at a software startup and served on the advisory board of Naver, “The Google of South Korea.” When I don’t write or code, I enjoy running, cooking, and reading. I run 10k almost every morning.