Chapter 1 Welcome

print("Hello, World!")

## [1] "Hello, World!"

Make simple things simple, and complex things possible. - Alan Kay

This is the website for Computational Thinking for Social Scientists, an open-access book designed to help social scientists think computationally and develop proficiency with modern computational tools and techniques. Mastering these skills enables researchers to collect, wrangle, analyze, and interpret data more efficiently—and with more enjoyment. It also empowers them to pursue research questions that might once have seemed impossible.

Horace Mann, the first great advocate of public education in the United States, once said:

“Education, then, beyond all other devices of human origin,
is a great equalizer of the conditions of men—the balance wheel of the social machinery.”

I believe in the transformative power of education. At the same time, I recognize that access to high-quality education remains deeply unequal. Too often, historically disadvantaged groups face greater barriers to both education and technology. This book is my small contribution toward realizing the democratic promise of education—especially in the emerging field of computational social science.

That said, this book is not a comprehensive guide to computational social science or any specific programming language, tool, or technique. For a broader introduction to the field, I highly recommend Matthew Salganik’s Bit by Bit (2017), which is comprehensive, accessible, and pedagogically rich.

This book is organized into two parts—Fundamentals and Applications—and includes eight main sessions:

1.1 Part I: Fundamentals

1.2 Part II: Applications

This book primarily uses R, and occasionally bash and Python.

1.3 Why R?

R is free, accessible (especially thanks to the tidyverse and RStudio), and cross-platform (Mac/Windows/Linux). It’s fast (thanks to Rcpp), extensible (with over 16,000 packages on CRAN), and supported by a large, inclusive community (see #rstats).

1.4 Why R + Python + Bash?

“For R and Python—Python is first and foremost a programming language. That has many strengths, but it usually means that to do data science in Python, you must first learn to program. With R, you can get up and running faster because a lot is built in—you don’t need to learn as many programming concepts. You can focus on being a great political scientist, or whatever you do, and just learn enough R to get things done.”
— Hadley Wickham

That said, the R ecosystem presents some challenges:

“Compared to other programming languages, the R community tends to focus more on results than on process. Software engineering practices such as source control or automated testing are not widely adopted. Inconsistencies persist across contributed packages and even within base R. The language reflects more than 20 years of evolution. R is not particularly fast, and poorly written R code can be painfully slow. It’s also memory-hungry.”
— Hadley Wickham

Still, the RStudio (now Posit) and tidyverse teams have made significant progress in addressing these limitations. This book will introduce some of those advances and show how combining R with Python and Bash can enhance your workflow.

If you’re serious about programming or software development, I strongly recommend learning Python as well—it can fill important gaps in software engineering that will also improve your R proficiency.

1.5 Special Thanks

This book is as much collected as it is authored. It is a remix of PS239T, a graduate-level course in computational methods at UC Berkeley originally developed by Rochelle Terman (now Assistant Professor of Political Science at the University of Chicago), and later revised by Rachel Bernhard (now Associate Professor at Nuffield College and the University of Oxford).

I have taught PS239T as a lead instructor in Spring 2019, a TA in Spring 2018, and a co-instructor with Nick Kuipers (now Assistant Professor of Politics at Princeton) in Spring 2020.

Other materials are adapted from workshops I developed for D-Lab, the Data Science Discovery Program at UC Berkeley, and the Summer Institute in Computational Social Science at Howard University and Mathematica.

I have cited all sources—books, articles, slides, blog posts, and videos—whenever I am aware of them.

1.6 Suggestions, Questions, or Comments

Please feel free to create an issue on the GitHub repository. If you find any typos, errors, or missing citations, I’d be grateful if you reported them there.

1.7 License

This work is licensed under a Creative Commons Attribution 4.0 International License.