Preface About the Authors Introduction: Data Science, Many Skills What Is Data Science? The Steps in Doing Data Science The Skills Needed to Do Data Science Chapter 1 * About Data Storing Data-Using Bits and Bytes Combining Bytes Into Larger Structures Creating a Data Set in R Chapter 2 * Identifying Data Problems Talking to Subject Matter Experts Looking for the Exception Exploring Risk and Uncertainty Chapter 3 * Getting Started With R Installing R Using R Creating and Using Vectors Chapter 4 * Follow the Data Understand Existing Data Sources Exploring Data Models Chapter 5 * Rows and Columns Creating Dataframes Exploring Dataframes Accessing Columns in a Dataframe Chapter 6 * Data Munging Reading a CSV Text File Removing Rows and Columns Renaming Rows and Columns Cleaning Up the Elements Sorting Dataframes Chapter 7 * Onward With RStudio (R) Using an Integrated Development Environment Installing RStudio Creating R Scripts Chapter 8 * What's My Function? Why Create and Use Functions? Creating Functions in R Testing Functions Installing a Package to Access a Function Chapter 9 * Beer, Farms, and Peas and the Use of Statistics Historical Perspective Sampling a Population Understanding Descriptive Statistics Using Descriptive Statistics Using Histograms to Understand a Distribution Normal Distributions Chapter 10 * Sample in a Jar Sampling in R Repeating Our Sampling Law of Large Numbers and the Central Limit Theorem Comparing Two Samples Chapter 11 * Storage Wars Importing Data Using RStudio Accessing Excel Data Accessing a Database Comparing SQL and R for Accessing a Data Set Accessing JSON Data Chapter 12 * Pictures Versus Numbers A Visualization Overview Basic Plots in R Using ggplot2 More Advanced ggplot2 Visualizations Chapter 13 * Map Mashup Creating Map Visualizations With ggplot2 Showing Points on a Map A Map Visualization Example Chapter 14 * Word Perfect Reading in Text Files Using the Text Mining Package Creating Word Clouds Chapter 15 * Happy Words? Sentiment Analysis Other Uses of Text Mining Chapter 16 * Lining Up Our Models What Is a Model? Linear Modeling An Example-Car Maintenance Chapter 17 * Hi Ho, Hi Ho-Data Mining We Go Data Mining Overview Association Rules Data Association Rules Mining Exploring How the Association Rules Algorithm Works Chapter 18 * What's Your Vector, Victor? Supervised and Unsupervised Learning Supervised Learning via Support Vector Machines Support Vector Machines in R Chapter 19 * Shiny (R) Web Apps Creating Web Applications in R Deploying the Application Chapter 20 * Big Data? Big Deal! What Is Big Data? The Tools for Big Data Index
Jeffrey S. Saltz is currently an Associate Professor at Syracuse University, in the School of Information Studies. His research and teaching focus on helping organizations leverage information technology and data for competitive advantage. Specifically, Jeff's current research focuses on the socio-technical aspects of data science projects, such as how to coordinate and manage data science teams. In order to stay connected to the "real world", Jeff consults with clients ranging from professional football teams to Fortune 500 organizations. Prior to becoming a professor, Jeff's 20+ years of industry experience focused on leveraging emerging technologies and data analytics to deliver innovative business solutions. In his last corporate role, at JPMorgan Chase, he reported to the firm's Chief Information Officer and drove technology innovation across the organization. Jeff also held several other key technology management positions at the company, including CTO and Chief Information Architect. Jeff has also served as chief technology officer and principal investor at Goldman Sachs, where he invested and helped incubate technology start-ups. He started his career as a programmer, project leader and consulting engineer with Digital Equipment Corp. Jeff holds a B.S. degree in computer science from Cornell University, an M.B.A. from The Wharton School at the University of Pennsylvania and a Ph.D. in Information Systems from the New Jersey Institute of Technology. Jeffrey M. Stanton, Ph.D. (University of Connecticut, 1997) is Associate Provost of Academic Affairs and Professor of Information Studies at Syracuse University. Dr. Stanton's research focuses on organizational behavior and technology. He is the author of Information Nation: Educating the Next Generation of Information Professionals (2010), with Dr. Indira Guzman and Dr. Kathryn Stam. Stanton has also published many scholarly articles in peer-reviewed behavioral science journals, such as the Journal of Applied Psychology, Personnel Psychology, and Human Performance. His articles also appear in Journal of Computational Science Education, Computers and Security, Communications of the ACM, Computers in Human Behavior, the International Journal of Human-Computer Interaction, Information Technology and People, the Journal of Information Systems Education, the Journal of Digital Information, Surveillance and Society, and Behaviour & Information Technology. He also has published numerous book chapters on data science, privacy, research methods, and program evaluation. Dr. Stanton's methodological expertise is in psychometrics with published works on the measurement of job satisfaction and job stress. Dr. Stanton's research has been supported through 18 grants and supplements including the National Science Foundation's CAREER award.