I am a PhD candidate in Political Science and a D-Lab Senior Data Science Fellow at UC Berkeley and a Visiting Fellow at P3 Lab at the SNF Agora Institute at Johns Hopkins University. I study political learning, organizing, and mobilization among marginalized populations using data science. I also build tools that make social science research more efficient and reproducible.
Dissertation Research
My dissertation provides an original large-scale dataset and research software that enable data-intensive historical research on the politics of racial and ethnic minority groups. This project won WPSA’s Don T. Nakanishi Award for Distinguished Scholarship and Service in Asian Pacific American Politics. Portions of this work are published in Studies in American Political Development and conditionally accepted at the Journal of Computational Social Science.
Other Research
My research on bias in machine learning applications appears in Proceedings of the Fourteenth International Conference on Web and Social Media (ICWSM), Data Challenge Workshop.
Research Software
I authored and maintain three R packages: tidytweetjson, tidyethnicnews, and makereproducible
Teaching Computational Social Science
I am a proud recipient of the “Outstanding Graduate Student Instructor Award” and have taught computational social science at both graduate and undergraduate levels in semester-long courses and short workshops. I have served as a Data Science Education Program Fellow at UC Berkeley and advised more than 40 applied data science projects, working with community partners and undergraduate students. I also co-organized the Summer Institute in Computational Social Science in the San Francisco Bay Area with a thematic focus on using computational social science for social good. I am currently working on a book project titled “Computational Thinking for Social Scientists."
I am on the job market this year and will graduate in May 2021. Please feel free to contact me via email at jaeyeonkim@berkeley.edu.
PhD Candidate in Political Science, 2016~
UC Berkeley
MA in Political Science, 2016
UC Berkeley
This project examines the impact of the War on Poverty programs on the formation of Asian American and Latino community organizations drawing on original large-scale organizational and text data.
This project turns historical ethnic newspaper articles into data to trace how issues varied among minority groups in the era of civil rights.
This project combines computational text analysis and a natural experiment to identify the causal effects of threats on information seeking among minority group members.
This project unpacks how different racial minority groups experience marginalization differently by applying factor analysis to one of the largest multi-racial surveyes conducted in the US.
This project analyzes how survey respondents interpret questions on racial solidarity using a within-group survey experimental design.
This project investigates to what extent and how South Koreans hold biased attitudes towards North Korean refugees by embedding list experiments in a nation-wide mobile survey.
Kim, Jae Yeon. Why Teaching Social Scientists How To Code Like A Professional Is Important. UC Berkeley D-Lab. September 23, 2020.
Haber, Jaren, Jae Yeon Kim, and Nick Camp. BAY-SICSS: Bridging Computational Social Scientists and Practitioners for Social Good. Berkeley Institute of Data Science. September 15, 2020.
Kim, Jae Yeon. Five Principles to Get Undergraduates Involved in Real-world Data Science Projects. SAGE Ocean. June 24, 2020.
Kim, Jae Yeon. How I Accidentally Became Interested in Data Science. UC Berkeley D-Lab. February 24, 2020.
tidytweetjson: R package for turning Tweet JSON files into a cleaned and wrangled dataset. The package takes takes 4 minutes to turn 2 million tweets into a tidy dataframe.tidyethnicnews: R package for turning search results from one of the largest databases on ethnic newspapers and magazines published in the United States into a cleaned and wrangled dataset. The package 12.34 seconds to turn 5,684 articles into a tidy dataframe.makereproducible: R package for making a project computationally reproducible before sharing it.This original dataset traces the founding of Asian American and Latino advocacy and community service organizations over the last century. The dataset includes about 299 Asian American and 519 Latino advocacy and community service organizations. Each observation includes the organization title, the founding year, the physical address, and whether they operate as an advocacy, a community service, or a hybrid (active in both types of work) organization. Source materials were mainly collected from the following four databases: the Encyclopedia of Associations – National Organizations of the U.S., the Encyclopedia of Associations – Regional, State, and Local Organizations of the U.S., the National Directory of Nonprofit Organizations, and the National Center for Charitable Statistics. The funding information comes from Foundation Directory Online, which has compiled data from all 140,000 U.S. philanthropic foundations (2003-2017) and federal agencies (2014-2017).
I was born and raised in South Korea, but by the time I finished college, I had also lived in Hong Kong and Taiwan. I moved to California in 2014 as a graduate student in political science at UC Berkeley. Prior to coming to the States, I worked in the tech industry in South Korea. I was a strategy manager at a software startup and served on the advisory board of Naver, “The Google of South Korea,” as its youngest member at age 27. I also published a popular-press book on how to get the best out of college (in Korean), which was sold more than 10,000 copies. I love to work with real and messy data, find rigorous and practical ways to analyze them and communicate results with visual illustrations and plain words. My passion is finding patterns and building tools.
Mostly online resources (indicated by *) I find useful for learning data science, applied statistics, programming, data visualization, machine learning, database management, and math.