Introduction. Working with Web Data and APIs. Record Linkage. Databases. Scaling up through Parallel and Distributed Computing. Information Visualization. Machine Learning. Text Analysis. Networks: The Basics. Data Quality and Inference Errors. Bias and Fairness. Privacy and Confidentiality. Workbooks
Ian Foster, PhD, is a professor of computer
science at the University of Chicago as well as a senior scientist
and distinguished fellow at Argonne National Laboratory. His
research addresses innovative applications of distributed,
parallel, and data-intensive computing technologies to scientific
problems in such domains as climate change and biomedicine. Methods
and software developed under his leadership underpin many large
national and international cyberinfrastructures. He is a fellow of
the American Association for the Advancement of Science, the
Association for Computing Machinery, and the British Computer
Society. He earned a PhD in computer science from Imperial College
London.
Rayid Ghani is a professor in the Machine Learning
Department (in the School of Computer Science) and the Heinz
College of Information Systems and Public Policy at Carnegie Mellon
University. His research focuses on developing and using Machine
Learning, AI, and Data Science methods for solving high impact
social good and public policy problems in a fair and equitable way
across criminal justice, education, healthcare, energy,
transportation, economic development, workforce development and
public safety. He is also the founder and director of the “Data
Science for Social Good” summer program for aspiring data
scientists to work on data mining, machine learning, big data, and
data science projects with social impact. Previously Rayid Ghani
was a faculty member at University of Chicago, and prior to that,
served as the Chief Scientist for Obama for America (Obama 2012
Campaign).
Ron Jarmin, PhD, is the Deputy Director at the
U.S. Census Bureau. He earned a PhD in economics from the
University of Oregon and has published in the areas of industrial
organization, business dynamics, entrepreneurship, technology and
firm performance, urban economics, Big Data, data access and
statistical disclosure avoidance. He oversees the Census Bureau’s
large portfolio of data collection, research and dissemination
activities for critical economic and social statistics including
the 2020 Decennial Census of Population and Housing.
Frauke Kreuter, PhD, is Professor at the
University of Maryland in the Joint Program in Survey Methodology,
Professor of Statistics and Methodology at the University of
Mannheim and head of the Statistical Methods group at the Institute
for Employment Research in Nuremberg, Germany. She is founder of
the International Program in Survey and Data Science, co-founder of
the Coleridge Initiative, fellow of the American Statistical
Association (ASA), and recipient of the WSS Cox and the ASA Links
Lecture Awards. Her research focuses on data quality, privacy, and
the effects of bias in data collection on statistical estimates and
algorithmic fairness.
Julia Lane, PhD, is a professor at the NYU Wagner
Graduate School of Public Service. She is also an NYU Provostial
Fellow for Innovation Analytics. She co-founded the Coleridge
Initiative as well as UMETRICS and STAR METRICS programs at the
National Science Foundation, established a data enclave at
NORC/University of Chicago, and co-founded the Longitudinal
Employer-Household Dynamics Program at the U.S. Census Bureau and
the Linked Employer Employee Database at Statistics New Zealand.
She is the author/editor of 10 books and the author of more than 70
articles in leading journals, including Nature and Science. She is
an elected fellow of the American Association for the Advancement
of Science and a fellow of the American Statistical
Association.
"Like the first edition, the new edition will continue to play an
important role for the intended audience and a wider professional
community. The much-needed second edition is timely and showcases a
wide range of examples and application examples from different
areas of the social sciences to demonstrate how the methods are
implemented using several real datasets. As expected with this kind
of book, the topics of this text are diverse in nature, but
interesting none the less. As it is well known, machine learning
techniques are subject to inherited bias in model selection and
consequently negatively impacts post estimation and prediction.
This new edition includes a new chapter on dealing with bias and
fairness in machine learning models, a much-needed fair and welcome
edition! Further, the authors have done an excellent job in
expanding the material on machine learning and text analysis. Like
the first edition, the main strength of the book is that it offers
a wide variety of applications that are based on real datasets
emerging from social science perspectives and useful for both
academic and professional communes. As Jupyter has become more
popular as the data scientists’ computational notebook of choice,
the book has new and improved hands-on Jupyter notebooks to
complement each chapter’s material. In conclusion, this new edition
has an impressive collection of material on useful and interesting
topics on big data. The book will be equally useful to graduate
students and researchers interested in gaining perspectives and
knowledge on this important topic. The new volume comprises of a
wealth of information, a kind of one-stop shop, and can be served
as a textbook and research reference book."
- S. Ejaz Ahmed, Brock University, CanadaPraise For First
Edition"This book builds a nice bridge connecting social science
and big data methodology. Big data such as social media and
electronic health records, empowered by the advances in information
technology, are an emerging phenomenon in recent years and present
unprecedented opportunities for social science research. This book
was written by pioneering scientists in applying big data methods
to address social science problems. As shown by numerous examples
in the book, social science could benefit significantly by
embracing the new mode of big data and taking advantage of the
technical progress in analysing such data. If you work in social
science and would like to explore the power of big data, this book
is clearly for you. Indeed, if you do not have previous experience
in dealing with big data, you should read this book first, before
implementing a big-data project.
As indicated by the title, this book acts as a practical guide and
targets readers with minimum big data experience, hence it is very
hands-on. … It covers all necessary steps to finish a big data
project: collecting raw data, cleaning and preprocessing data,
applying various modelling tools to analyze the data, evaluating
results, protecting privacy, and addressing ethical problems. … All
the important topics concerning big data are covered, making this
book a good reference that you should always keep on your
desk."
— Guoqiang Yu, Virginia Tech, in Journal of the American
Statistical Association, July 2017"…In summary, although there is a
growing number of books related to social science and big data,
this volume contains several non-trivial aspects which make it
worth to have in the library, possibly along with other similar
textbooks as a good complement to them."
—Stefano M. Iacus, University of Milan, in Journal of Statistical
Software, June 2017"This is a well-written book and showcases a
good number of examples and applications to demonstrate how the
methods are actually used in real life situation using real
datasets. Further, topics at hand are motivated by social science
data. … The chapters are nicely structured, well presented and
motivated by data examples. The main strength of the book is that
it still offers a good number of applications that are based on
real datasets emerging from social science perspectives. The book
will be useful to students, practitioners, and data analyst in the
respective fields. The editors did a very good job introducing the
book, it aims and goals, intendent audience, clarifying underneath
concepts and phrases, a must read before moving to other
chapters."
—S. Ejaz Ahmed, in Technometrics, April 2017"Economists and Social
Scientist have a lot to learn from Machine Learning, and Engineers
have a lot to learn from Econometricians and Statisticians. This
two way sharing is long overdue and it is time to start the
conversation. This book is a tour-de-force for anyone interested in
participating in such a discussion."
—Roberto Rigobon, Society of Sloan Fellows Professor of Applied
Economics, MIT"This ambitious sweep through data science techniques
provides an invaluable introduction to the toolbox of big data
methodologies, as applied to social science data. It provides
tremendous value not only to beginners in the field, but also to
experienced data scientists wishing round out their knowledge of
this broad and dynamic field."
—Kenneth Benoit, Department of Methodology, London School of
Economics and Political Science"Most social scientists would agree
that ‘big data’ – the term we use to encapsulate the huge amount of
electronic information we generate in our everyday lives – provide
the potential for path-breaking research not just into our
economic, social, and political lives but also the physical
environment we create and inhabit. However, few have the knowledge,
or critically, the tools that equip them to realize this potential.
This book provides a bridge between computer science, statistics,
and the social sciences, demonstrating this new field of ‘data
science’ via practical applications. The book is remarkable in many
ways. It originates from classes taught by leading practitioners in
this area to federal agency research staff, drawing in particular
upon the example of a hugely successful project that linked federal
research spending to outcomes in terms of patents, job creation,
and the subsequent career development of researchers. By making
these workbooks accessible, the book takes the novice on a
step-by-step journey through complex areas such as database
dynamics, data linkage, text analysis, networks and data
visualization. The book is a treasure trove of information. It
leads the field in the important task of bringing together computer
science, statistics, and social science. I strongly recommend that
all social scientists with an interest in ‘big data’ immerse
themselves in this book."
—Professor Peter Elias CBE, University of Warwick"The explosive
growth in big data and in new technologies to analyze these data is
transforming the practice of research in a variety of fields.
Foster, et al. provides a well-timed, valuable guide to the new
methods and tools associated with big data that can be used to
address critical research questions in the social science field.
The breadth of the material is impressive, providing a
comprehensive summary of the methods and tools as well as practical
guidance for their use. A key feature of the guide is the use of a
case study to illustrate how big data techniques can be used to
address a research question from beginning to end of the project,
including providing examples of computer code targeted to specific
steps in the project. Any researcher will find this unique guide to
be useful, and it is essential reading for any social science
practitioner that wants to use the best available data to conduct
influential research in the near future."
—Paul Decker, President and CEO, Mathematica Policy Research"The
typical statistics pedagogy has changed. In the past, textbooks
assumed that data was hard to obtain, but neatly organized in a
single file. Today, data is very easy to obtain from a number of
data sources, often very messy, and analysts are now responsible
for organizing it in addition to deriving useful insights. Foster,
Ghani, Jarmin, Kreuter, and Lane have assembled a book that gives a
pointed overview of tools to facilitate the entire digital lifespan
of data in this era of analytics. Big Data and Social Science gives
an evenhanded look at the myriad of ways to obtain data--whether
scraping the web, web APIs, or databases--to conducting statistical
analysis to doing analysis when your data cannot fit on a single
computer. Meanwhile, they provide sound, diligent advice on
pitfalls that still, and will always, exist. A book like this is
useful for social scientists, experienced statisticians,
econometricians, and computer programmers who want to see the tools
available to them. It will also be a helpful text for a budding
data scientist who wants a fairly technical preview of the
landscape."
—Tom Schenk Jr., Chief Data Officer, City of Chicago"In Big Data
and Social Science, the authors have deftly crafted one of the very
best "how-to" books on big data that researchers, enterprise
analysts, and government practitioners will find equally valuable.
From Nodes, to Edges, to Arcs, the book takes the reader along a
near-perfect path to understanding the fundamental elements of
constructing a practical and realistic model for Big Data Analysis
that any organization can execute by simply following the path
outlined in this book. Elegant in its simplicity, Big Data and
Social Science is one of those books that every research group and
data-analysis team will want to have on their reference shelf."
—Tom Herzog, Former Deputy Commissioner, NY State Department of
Corrections and Community Supervision
"The book includes a large volume of condensed information in which
the concepts are very well sketched around the right questions
along with examples explained in detail. [...] Through the
approached topics, the way of structuring and presenting
information, the wealth of resources provided “Big Data and Social
Science: A Practical Guide to Methods and Tools” offers to social
sciences students and researches a high-quality theoretical and
practical content."-Anca Vitcu, International Society for Clinical
Biostatistics, 72, 2021Praise For First Edition"This book builds a
nice bridge connecting social science and big data methodology. Big
data such as social media and electronic health records, empowered
by the advances in information technology, are an emerging
phenomenon in recent years and present unprecedented opportunities
for social science research. This book was written by pioneering
scientists in applying big data methods to address social science
problems. As shown by numerous examples in the book, social science
could benefit significantly by embracing the new mode of big data
and taking advantage of the technical progress in analysing such
data. If you work in social science and would like to explore the
power of big data, this book is clearly for you. Indeed, if you do
not have previous experience in dealing with big data, you should
read this book first, before implementing a big-data project.
As indicated by the title, this book acts as a practical guide and
targets readers with minimum big data experience, hence it is very
hands-on. … It covers all necessary steps to finish a big data
project: collecting raw data, cleaning and preprocessing data,
applying various modelling tools to analyze the data, evaluating
results, protecting privacy, and addressing ethical problems. … All
the important topics concerning big data are covered, making this
book a good reference that you should always keep on your
desk."
— Guoqiang Yu, Virginia Tech, in Journal of the American
Statistical Association, July 2017"…In summary, although there is a
growing number of books related to social science and big data,
this volume contains several non-trivial aspects which make it
worth to have in the library, possibly along with other similar
textbooks as a good complement to them."
—Stefano M. Iacus, University of Milan, in Journal of Statistical
Software, June 2017"This is a well-written book and showcases a
good number of examples and applications to demonstrate how the
methods are actually used in real life situation using real
datasets. Further, topics at hand are motivated by social science
data. … The chapters are nicely structured, well presented and
motivated by data examples. The main strength of the book is that
it still offers a good number of applications that are based on
real datasets emerging from social science perspectives. The book
will be useful to students, practitioners, and data analyst in the
respective fields. The editors did a very good job introducing the
book, it aims and goals, intendent audience, clarifying underneath
concepts and phrases, a must read before moving to other
chapters."
—S. Ejaz Ahmed, in Technometrics, April 2017"Economists and Social
Scientist have a lot to learn from Machine Learning, and Engineers
have a lot to learn from Econometricians and Statisticians. This
two way sharing is long overdue and it is time to start the
conversation. This book is a tour-de-force for anyone interested in
participating in such a discussion."
—Roberto Rigobon, Society of Sloan Fellows Professor of Applied
Economics, MIT"This ambitious sweep through data science techniques
provides an invaluable introduction to the toolbox of big data
methodologies, as applied to social science data. It provides
tremendous value not only to beginners in the field, but also to
experienced data scientists wishing round out their knowledge of
this broad and dynamic field."
—Kenneth Benoit, Department of Methodology, London School of
Economics and Political Science"Most social scientists would agree
that ‘big data’ – the term we use to encapsulate the huge amount of
electronic information we generate in our everyday lives – provide
the potential for path-breaking research not just into our
economic, social, and political lives but also the physical
environment we create and inhabit. However, few have the knowledge,
or critically, the tools that equip them to realize this potential.
This book provides a bridge between computer science, statistics,
and the social sciences, demonstrating this new field of ‘data
science’ via practical applications. The book is remarkable in many
ways. It originates from classes taught by leading practitioners in
this area to federal agency research staff, drawing in particular
upon the example of a hugely successful project that linked federal
research spending to outcomes in terms of patents, job creation,
and the subsequent career development of researchers. By making
these workbooks accessible, the book takes the novice on a
step-by-step journey through complex areas such as database
dynamics, data linkage, text analysis, networks and data
visualization. The book is a treasure trove of information. It
leads the field in the important task of bringing together computer
science, statistics, and social science. I strongly recommend that
all social scientists with an interest in ‘big data’ immerse
themselves in this book."
—Professor Peter Elias CBE, University of Warwick"The explosive
growth in big data and in new technologies to analyze these data is
transforming the practice of research in a variety of fields.
Foster, et al. provides a well-timed, valuable guide to the new
methods and tools associated with big data that can be used to
address critical research questions in the social science field.
The breadth of the material is impressive, providing a
comprehensive summary of the methods and tools as well as practical
guidance for their use. A key feature of the guide is the use of a
case study to illustrate how big data techniques can be used to
address a research question from beginning to end of the project,
including providing examples of computer code targeted to specific
steps in the project. Any researcher will find this unique guide to
be useful, and it is essential reading for any social science
practitioner that wants to use the best available data to conduct
influential research in the near future."
—Paul Decker, President and CEO, Mathematica Policy Research"The
typical statistics pedagogy has changed. In the past, textbooks
assumed that data was hard to obtain, but neatly organized in a
single file. Today, data is very easy to obtain from a number of
data sources, often very messy, and analysts are now responsible
for organizing it in addition to deriving useful insights. Foster,
Ghani, Jarmin, Kreuter, and Lane have assembled a book that gives a
pointed overview of tools to facilitate the entire digital lifespan
of data in this era of analytics. Big Data and Social Science gives
an evenhanded look at the myriad of ways to obtain data--whether
scraping the web, web APIs, or databases--to conducting statistical
analysis to doing analysis when your data cannot fit on a single
computer. Meanwhile, they provide sound, diligent advice on
pitfalls that still, and will always, exist. A book like this is
useful for social scientists, experienced statisticians,
econometricians, and computer programmers who want to see the tools
available to them. It will also be a helpful text for a budding
data scientist who wants a fairly technical preview of the
landscape."
—Tom Schenk Jr., Chief Data Officer, City of Chicago"In Big Data
and Social Science, the authors have deftly crafted one of the very
best "how-to" books on big data that researchers, enterprise
analysts, and government practitioners will find equally valuable.
From Nodes, to Edges, to Arcs, the book takes the reader along a
near-perfect path to understanding the fundamental elements of
constructing a practical and realistic model for Big Data Analysis
that any organization can execute by simply following the path
outlined in this book. Elegant in its simplicity, Big Data and
Social Science is one of those books that every research group and
data-analysis team will want to have on their reference shelf."
—Tom Herzog, Former Deputy Commissioner, NY State Department of
Corrections and Community Supervision
![]() |
Ask a Question About this Product More... |
![]() |