Analyzing Baseball Data with R 2nd ed.(Chapman & Hall/CRC The R Series) paper 360 p. 18
Marchi, Max, Albert, Jim, Baumer, Benjamin S. 著
目次
Introduction The Lahman Database: Season-by-Season Data Bonds, Aaron, Ruth, and Rodriguez home run trajectories Obtaining the database The Master table The Batting table The Pitching table The Fielding table The Teams table Baseball questions Retrosheet Game-by-Game Data The McGwire and Sosa home run race Retrosheet Game logs Obtaining the game logs from Retrosheet Game log example Baseball questions Retrosheet Play-by- Play Data Event files Event example Baseball questions Pitch-by-Pitch Data MLBAM Gameday and PITCHf/x PITCHf/x Example Baseball questions Player Movement and Off-the-Bat Data PLAYER Statcast Baseball Savant data Baseball questions Summary Further Reading Exercises Introduction to R Introduction Installing R and RStudio The Tidyverse dplyr The pipe ggplot Other packages Data Frames Career of Warren Spahn Introduction Manipulations with data frames Merging and selecting from data frames Vectors Defining and computing with vectors Vector functions Vector index and logical variables Objects and Containers in R Character data and data frames Factors Lists Collection of R Commands R scripts R functions Reading and Writing Data in R Importing data from a file Saving datasets Packages Splitting, Applying and Combining Data Iterating using map() Another example Getting Help Further Reading Exercises Graphics Introduction Character Variable A bar graph Add axes labels and a title Other graphs of a character variable Saving Graphs Numeric Variable: One-Dimensional Scatterplot and Histogram Two Numeric Variables Scatterplot Building a graph, step-by-step A Numeric Variable and a Factor Variable Parallel stripcharts Parallel boxplots Comparing Ruth, Aaron, Bonds and A-Rod Getting the data Creating the player data frames Constructing the graph The Home Run Race Getting the data Extracting the variables Constructing the graph Further Reading Exercises The Relation Between Runs and Wins Introduction The Teams Table in the Lahman Databse Linear Regression The Pythagorean Formula for Winning Percentage The Exponent in the Pythagorean model Good and Bad Predictions by the Pythagorean model How Many Runs for a Win? Further Reading Exercises Value of Plays Using Run Expectancy The Run Expectancy Matrix Runs Scored in the Remainder of the Innings Creating the Matrix Measuring Success of a Batting Play Jose Altuve Opportunity and Success for all Hitters Position in the Batting Lineup Run Values of Different Base Hits Value of a home run Value of a single Value of Base Stealing Further Reading Exercises Balls and Strikes Effects Introduction Hitter’s Counts and Pitcher’s Counts An example for a single pitcher Pitch sequences from Retrosheet Functions for string manipulation Finding plate appearances going through a given count Expected run value by count The importance of the previous count Behavior by Count Swinging tendencies by count Propensity to swing by location Effect of the ball/strike count Pitch selection by count Umpires' behavior by count Further Reading Exercises Catcher Framing Introduction Acquiring Pitch-Level Data Where is the Strike Zone? Modeling Called Strike Percentage Visualizing the estimates Visualizing the estimated surface Controlling for handedness Modeling Catcher Framing Further Reading Exercises Career Trajectories Introduction Mickey Mantle’s Batting Trajectory Comparing Trajectories Some preliminary work Computing career statistics Computing similarity scores Defining age, OBP, SLG, and OPS variables Fitting and plotting trajectories General Patterns of Peak Ages Computing all affected trajectories Patterns of peak age over time Peak age and career at-bats Trajectories and Fielding Position Further Reading Exercises Simulation Introduction Simulating a Half Inning Markov chains Review of work in run expectancy Computing the transition probabilities Simulating the Markov chain Beyond run expectancy Transition probabilities for individual teams Simulating a Baseball Season The Bradley-Terry model Making up a schedule Simulating talents and computing win probabilities Simulating the regular season Simulating the post-season Function to simulate one season Simulating many seasons Further Reading Exercises Exploring Streaky Performances Introduction The Great Streak Finding game hitting streaks Moving batting averages Streaks in Individual at-Bats Streaks of hits and outs Moving batting averages Finding hitting slumps for all players Were Ichiro Suzuki and Mike Trout unusually streaky? Local Patterns of Statcast Launch Velocity Further Reading Exercises Using a Database to Compute Park Factors Introduction Installing MYSQL and Creating a Database Connection R to MYSQL Connecting using RMySQL Connecting R to other SQL backends Filling a MYSQL Game Log Database From From Retrosheet to R From R to MySQL Querying Data From R Introduction Coors Field and run scoring Building Your Own Baseball Database Lahman's database Retrosheet database PITCHf/x database Statcast database Calculating Basic Park Factors Loading the data into R Home run park factor Assumptions of the proposed approach Applying park factors Further Reading Exercises Batted Ball Data from Statcast Introduction Spray Charts Acquiring a year's worth of Statcast data Hitters' spray tendencies and in field defense Launch Angles and Exit Velocities Scatterplot of launch angle vs exit velocity Modeling Home Run Probabilities Generalized additive model Smooth predictions Using this model to estimate home run production Are Launch Angles Skills? Distribution of launch angle Is launch angle a skill? Further Reading Exercises Appendix A Retrosheet Files Reference Appendix B Accessing and Using MLBAM Gameday and PITCHf/x Data Appendix C Accessing and Using Statcast Data from Baseball-Savant