Introduction
Why this book?
Defining big data and its value
Social science, inference, and big data
Social science, data quality, and big data
New tools for new data
The book’s "use case"
The structure of the book
Resources
Capture and Curation
Working with Web Data and APIs
Introduction
Scraping information from the web
New data in the research enterprise
A functional view
Programming against an API
Using the ORCID API via a wrapper
Quality, scope, and management
Integrating data from multiple sources
Working with the graph of relationships
Bringing it together: Tracking pathways to impact
Summary
Resources
Acknowledgements and copyright
Record Linkage
Motivation
Introduction to record linkage
Preprocessing data
Classification
Record linkage and data protection
Summary
Resources
Databases
Introduction
DBMS: When and why
Relational DBMSs
Linking DBMSs and other tools
NoSQL databases
Spatial databases
Which database to use?
Summary
Resources
Programming with Big Data
Introduction
The MapReduce programming model
Apache Hadoop MapReduce
Apache Spark
Summary
Resources
Modeling and Analysis
Machine Learning
Introduction
What is machine learning?
The machine learning process
Problem formulation: Mapping a problem to machine learning
methods
Methods
Evaluation
Practical tips
How can social scientists benefit from machine learning?
Advanced topics
Summary
Resources
Text Analysis
Understanding what people write
How to analyze text
Approaches and applications
Evaluation
Text analysis tools
Summary
Resources
Networks: The Basics
Introduction
Network data
Network measures
Comparing collaboration networks
Summary
Resources
Inference and Ethics
Information Visualization
Introduction
Developing effective visualizations
A data-by-tasks taxonomy
Challenges
Summary
Resources
Errors and Inference
Introduction
The total error paradigm
Illustrations of errors in big data
Errors in big data analytics
Some methods for mitigating, detecting, and compensating for
errors
Summary
Resources
Privacy and Confidentiality
Introduction
Why is access at all important?
Providing access
The new challenges
Legal and ethical framework
Summary
Resources
Workbooks
Introduction
Environment
Workbook details
Resources
Bibliography
Ian Foster is a professor of computer science at the University of
Chicago as well as a senior scientist and distinguished fellow at
Argonne National Laboratory. His research addresses innovative
applications of distributed, parallel, and data-intensive computing
technologies to scientific problems in such domains as climate
change and biomedicine. Methods and software developed under his
leadership underpin many large national and international
cyberinfrastructures. He is a fellow of the American Association
for the Advancement of Science, the Association for Computing
Machinery, and the British Computer Society. He received a PhD in
computer science from Imperial College London.
Rayid Ghani is the director of the Center for Data Science and
Public Policy, research director at the Computation Institute, and
senior fellow at the Harris School of Public Policy at the
University of Chicago. His research focuses on using machine
learning and data science for high-impact social good and public
policy problems in areas such as education, healthcare, energy,
transportation, economic development, and public safety.
Ron S. Jarmin is the assistant director for research and
methodology at the U.S. Census Bureau, where he oversees a broad
research program in statistics, survey methodology, and economics
to improve economic and social measurement within the U.S. federal
statistical system. He is the author of many papers in the areas of
industrial organization, business dynamics, entrepreneurship,
technology and firm performance, urban economics, data access, and
statistical disclosure avoidance. He earned a PhD in economics from
the University of Oregon.
Frauke Kreuter is a professor at both the University of Maryland
and the University of Mannheim. She is also head of the Statistical
Methods Group at the Institute for Employment Research in Germany.
Among her over 100 publications are several textbooks in survey
statistics and data analysis. She established the International
Program in Survey and Data Science and is a fellow of the American
Statistical Association. She received a PhD from the University of
Konstanz.
Julia Lane is a professor at the NYU Wagner Graduate School of
Public Service and the NYU Center for Urban Science and Progress.
She is also an NYU Provostial Fellow for Innovation Analytics. She
co-founded the UMETRICS and STAR METRICS programs at the National
Science Foundation, established a data enclave at NORC/University
of Chicago, and co-founded the Longitudinal Employer-Household
Dynamics Program at the U.S. Census Bureau and the Linked Employer
Employee Database at Statistics New Zealand. She is the
author/editor of 10 books and the author of over 70 articles in
leading journals, including Nature and Science. She is an elected
fellow of the American Association for the Advancement of Science
and a fellow of the American Statistical Association. She received
a PhD in economics from the University of Missouri.
"This book builds a nice bridge connecting social science and big
data methodology. Big data such as social media and electronic
health records, empowered by the advances in information
technology, are an emerging phenomenon in recent years and present
unprecedented opportunities for social science research. This book
was written by pioneering scientists in applying big data methods
to address social science problems. As shown by numerous examples
in the book, social science could benefit significantly by
embracing the new mode of big data and taking advantage of the
technical progress in analysing such data. If you work in social
science and would like to explore the power of big data, this book
is clearly for you. Indeed, if you do not have previous experience
in dealing with big data, you should read this book first, before
implementing a big-data project.
As indicated by the title, this book acts as a practical guide and
targets readers with minimum big data experience, hence it is very
hands-on. … It covers all necessary steps to finish a big data
project: collecting raw data, cleaning and preprocessing data,
applying various modelling tools to analyze the data, evaluating
results, protecting privacy, and addressing ethical problems. … All
the important topics concerning big data are covered, making this
book a good reference that you should always keep on your
desk."
— Guoqiang Yu, Virginia Tech, in Journal of the American
Statistical Association, July 2017 "…In summary, although there is
a growing number of books related to social science and big data,
this volume contains several non-trivial aspects which make it
worth to have in the library, possibly along with other similar
textbooks as a good complement to them."
—Stefano M. Iacus, University of Milan, in Journal of Statistical
Software, June 2017 "This is a well-written book and showcases a
good number of examples and applications to demonstrate how the
methods are actually used in real life situation using real
datasets. Further, topics at hand are motivated by social science
data. … The chapters are nicely structured, well presented and
motivated by data examples. The main strength of the book is that
it still offers a good number of applications that are based on
real datasets emerging from social science perspectives. The book
will be useful to students, practitioners, and data analyst in the
respective fields. The editors did a very good job introducing the
book, it aims and goals, intendent audience, clarifying underneath
concepts and phrases, a must read before moving to other
chapters."
—S. Ejaz Ahmed, in Technometrics, April 2017 "Economists and
Social Scientist have a lot to learn from Machine Learning, and
Engineers have a lot to learn from Econometricians and
Statisticians. This two way sharing is long overdue and it is time
to start the conversation. This book is a tour-de-force for anyone
interested in participating in such a discussion."
—Roberto Rigobon, Society of Sloan Fellows Professor of Applied
Economics, MIT "This ambitious sweep through data science
techniques provides an invaluable introduction to the toolbox of
big data methodologies, as applied to social science data. It
provides tremendous value not only to beginners in the field, but
also to experienced data scientists wishing round out their
knowledge of this broad and dynamic field."
—Kenneth Benoit, Department of Methodology, London School of
Economics and Political Science "Most social scientists would agree
that ‘big data’ – the term we use to encapsulate the huge amount of
electronic information we generate in our everyday lives – provide
the potential for path-breaking research not just into our
economic, social, and political lives but also the physical
environment we create and inhabit. However, few have the knowledge,
or critically, the tools that equip them to realize this potential.
This book provides a bridge between computer science,
statistics, and the social sciences, demonstrating this new
field of ‘data science’ via practical applications. The book is
remarkable in many ways. It originates from classes taught by
leading practitioners in this area to federal agency research
staff, drawing in particular upon the example of a hugely
successful project that linked federal research spending to
outcomes in terms of patents, job creation, and the subsequent
career development of researchers. By making these workbooks
accessible, the book takes the novice on a step-by-step journey
through complex areas such as database dynamics, data linkage, text
analysis, networks and data visualization. The book is a treasure
trove of information. It leads the field in the important task of
bringing together computer science, statistics, and social
science. I strongly recommend that all social scientists with an
interest in ‘big data’ immerse themselves in this book."
—Peter Elias CBE, University of Warwick "The explosive growth in
big data and in new technologies to analyze these data is
transforming the practice of research in a variety of fields.
Foster, et al. provides a well-timed, valuable guide to the new
methods and tools associated with big data that can be used to
address critical research questions in the social science field.
The breadth of the material is impressive, providing a
comprehensive summary of the methods and tools as well as practical
guidance for their use. A key feature of the guide is the use of a
case study to illustrate how big data techniques can be used to
address a research question from beginning to end of the project,
including providing examples of computer code targeted to specific
steps in the project. Any researcher will find this unique guide to
be useful, and it is essential reading for any social science
practitioner that wants to use the best available data to conduct
influential research in the near future."
—Paul Decker, President and CEO, Mathematica Policy Research "The
typical statistics pedagogy has changed. In the past, textbooks
assumed that data was hard to obtain, but neatly organized in a
single file. Today, data is very easy to obtain from a number of
data sources, often very messy, and analysts are now responsible
for organizing it in addition to deriving useful insights. Foster,
Ghani, Jarmin, Kreuter, and Lane have assembled a book that gives a
pointed overview of tools to facilitate the entire digital lifespan
of data in this era of analytics. Big Data and Social Science gives
an evenhanded look at the myriad of ways to obtain data--whether
scraping the web, web APIs, or databases--to conducting statistical
analysis to doing analysis when your data cannot fit on a single
computer. Meanwhile, they provide sound, diligent advice on
pitfalls that still, and will always, exist. A book like this is
useful for social scientists, experienced statisticians,
econometricians, and computer programmers who want to see the tools
available to them. It will also be a helpful text for a budding
data scientist who wants a fairly technical preview of the
landscape."
—Tom Schenk Jr., Chief Data Officer, City of Chicago "In Big
Data and Social Science, the authors have deftly crafted one of the
very best "how-to" books on big data that researchers, enterprise
analysts, and government practitioners will find equally valuable.
From Nodes, to Edges, to Arcs, the book takes the reader along a
near-perfect path to understanding the fundamental elements of
constructing a practical and realistic model for Big Data Analysis
that any organization can execute by simply following the path
outlined in this book. Elegant in its simplicity, Big Data and
Social Science is one of those books that every research group and
data-analysis team will want to have on their reference shelf."
—Tom Herzog, Former Deputy Commissioner, NY State Department
of Corrections and Community Supervision "This book offers a
radically different programme of statistical training for those
social scientists looking to engage with "big data". Individual
chapters cover techniques and analytical approaches, each one
introducing software you may have heard about but not used (e.g.
Python, SQL, Hadoop, Tableau). The sheer scale of the task is
ideally designed for teams of people with skills across computing,
data and statistics, as well as hardware support of an
institutional nature. Thus, the text covers a full course of big
data skills with substantive examples drawn from social data
available online. Exercises are supported by Jupyter workbooks
which, using Anaconda, follow through description in the text, each
one offering additional and complementary programming and analysis
skills: a prodigious challenge. Examples of analysis both
comprehensible and tractable for the exposition remain
unsatisfyingly simple, but the references are extensive and
impressive. ...The initial gushing enthusiasm of the author is soon
tempered by practical and critical consideration of tools, but
overall the book promises more than is possible to deliver. The
result is a spare text, heavy on technical skills, which moves
efficiently through the subject without quite giving confidence for
application to real substantive questions."
-Thomas King, Biometrics June 2018 "...It is clear that the editors
have put a lot of thought into the structure and organization of
this book …the book demonstrates how to collect information about
grants and about people funded on grants from web sources of the
funding agencies and universities, how to link information coming
from different sources, how to store and organize such data in ways
that allow for quick summarization and data exploration, as well
for convenient extraction of data for further research. The parts
of the book with cautionary tales and advice regarding limitations
of data science as an approach to carry out social science research
should be required reading for all those involved in data
science...Overall, the book could be useful to researchers and data
analysts who would like to understand overarching ideas of data
science and big data analysis. Social scientists interested in the
topic will gain knowledge of basic steps required for working with
big data and will deepen their understanding of the tools and the
associated language used by the data science community. The book
could also be used by instructors of graduate and undergraduate
courses that touch on big data."
-Elena A. Erosheva, University of Washington
![]() |
Ask a Question About this Product More... |
![]() |