Jae Yeon Kim

Jae Yeon Kim

Computational social scientist

UC Berkeley

I am a PhD candidate in Political Science and a D-Lab Senior Data Science Fellow at UC Berkeley and a Visiting Student Fellow at P3 Lab at the SNF Agora Institute at Johns Hopkins University. I study political learning, organizing, and mobilization among marginalized populations using big data and data science. I also build tools that make social science research more efficient and reproducible.

Dissertation Research

My dissertation provides an original large-scale dataset and research software that enable data-intensive historical research on the politics of racial and ethnic minority groups. This project won WPSA’s Don T. Nakanishi Award for Distinguished Scholarship and Service in Asian Pacific American Politics. Portions of this work are published in Studies in American Political Development and conditionally accepted at the Journal of Computational Social Science.

Other Research

My research on bias in machine learning applications appears in Proceedings of the Fourteenth International Conference on Web and Social Media (ICWSM), Data Challenge Workshop.

Research Software

I authored and maintain three R packages: tidytweetjson, tidyethnicnews, and makereproducible

Teaching Computational Social Science

I am a proud recipient of the “Outstanding Graduate Student Instructor Award” and have taught computational social science at both graduate and undergraduate levels in semester-long courses and short workshops. I served as a Data Science Education Program Fellow at UC Berkeley and advised more than 40 applied data science projects, working with community partners and undergraduate students. I also co-organized the Summer Institute in Computational Social Science in the San Francisco Bay Area with a thematic focus on using computational social science for social good. I am currently working on an open textbook project titled “Computational Thinking for Social Scientists."

Contact Me

I love getting emails.


I am on the job market this year and will graduate in May 2021.


  • Computational social science
  • Racial and ethnic politics
  • Historical social science
  • Political behavior


  • PhD Candidate in Political Science, 2016~

    UC Berkeley

  • MA in Political Science, 2016

    UC Berkeley



Large-scale Twitter Analysis on COVID-19 and Anti-Asian Climate

This project traces how COVID-19 shaped an anti-Asian climate on Twitter, drawing on more than 1 million US-located tweets.

Dataset Bias in Machine Learning

This project identifies and estimates intersectional bias in datasets of hate speech and abusive language.

Policy Impact on Community Organizing

This project examines the impact of the War on Poverty programs on the formation of Asian American and Latino community organizations drawing on original large-scale organizational and text data.

Turning Ethnic Newspapers into Data

This project turns historical ethnic newspaper articles into data to trace how issues varied among minority groups in the era of civil rights.

Causal Inference Using Text Data and Natural Experiments

This project combines computational text analysis and a natural experiment to identify the causal effects of threats on information seeking among minority group members.

Categorizing Marginal Experience Using Survey

This project unpacks how different racial minority groups experience marginalization differently by applying factor analysis to one of the largest multi-racial surveyes conducted in the US.

Making Survey Questions More Interpretable

This project analyzes how survey respondents interpret questions on racial solidarity using a within-group survey experimental design.

Detecting Sensitive Attitudes

This project investigates to what extent and how South Koreans hold biased attitudes towards North Korean refugees by embedding list experiments in a nation-wide mobile survey.

Peer-reviewed articles

(2020). Intersectional Bias in Hate Speech and Abusive Language Datasets. Proceedings of the Fourteenth International Conference on Web and Social Media (ICWSM), Data Challenge Workshop.

PDF Code Slides

Other publications

Kim, Jae Yeon. Why Teaching Social Scientists How To Code Like A Professional Is Important. UC Berkeley D-Lab. September 23, 2020.

Haber, Jaren, Jae Yeon Kim, and Nick Camp. BAY-SICSS: Bridging Computational Social Scientists and Practitioners for Social Good. Berkeley Institute of Data Science. September 15, 2020.

Kim, Jae Yeon. Five Principles to Get Undergraduates Involved in Real-world Data Science Projects. SAGE Ocean. June 24, 2020.

Kim, Jae Yeon. How I Accidentally Became Interested in Data Science. UC Berkeley D-Lab. February 24, 2020.


  • tidytweetjson: R package for turning Tweet JSON files into a cleaned and wrangled dataset. The package takes takes 4 minutes to turn 2 million tweets into a tidy dataframe.
  • tidyethnicnews: R package for turning search results from one of the largest databases on ethnic newspapers and magazines published in the United States into a cleaned and wrangled dataset. The package 12.34 seconds to turn 5,684 articles into a tidy dataframe.
  • makereproducible: R package for making a project computationally reproducible before sharing it.


This original dataset traces the founding of Asian American and Latino advocacy and community service organizations over the last century. The dataset includes about 299 Asian American and 519 Latino advocacy and community service organizations. Each observation includes the organization title, the founding year, the physical address, and whether they operate as an advocacy, a community service, or a hybrid (active in both types of work) organization. Source materials were mainly collected from the following four databases: the Encyclopedia of Associations – National Organizations of the U.S., the Encyclopedia of Associations – Regional, State, and Local Organizations of the U.S., the National Directory of Nonprofit Organizations, and the National Center for Charitable Statistics. The funding information comes from Foundation Directory Online, which has compiled data from all 140,000 U.S. philanthropic foundations (2003-2017) and federal agencies (2014-2017).


Teaching Awards and Training

Graduate Seminars:

Undergraduate Lectures:

  • Graduate student instructor for Laura Stoker, Introduction to Empirical Analysis and Quantitative Methods, Department of Political Science, UC Berkeley, Fall 2016




  • Consultant, D-Lab, UC Berkeley, May 2019 - Present
    • Expertise: R, Python, applied statistics, machine learning, and data management
    • Please use this link to make a consulting request and use this link to shcedule a meeting.


Please view my CV or resume.


I was born and raised in South Korea, but by the time I finished college, I had also lived in Hong Kong and Taiwan. I moved to California in 2014 as a graduate student in political science at UC Berkeley. Prior to coming to the States, I worked in the tech industry in South Korea. I was a strategy manager at a software startup and served on the advisory board of Naver, “The Google of South Korea,” as its youngest member at age 27. I also published a popular-press book on how to get the best out of college (in Korean), which was sold more than 10,000 copies. My passion is finding patterns and building tools. When I don’t write or code, I enjoy being remarkably mediocre at running, cooking, and among other things.


Mostly online resources (indicated by *) I find useful for learning data science, applied statistics, programming, data visualization, machine learning, database management, and math.

Data science

Applied statistics


R programming

Python programming

UNIX programming

Programming in general

Data visualization

Machine learning

Intro to Machine Learning

Intro to Deep learning

Machine learning in R