Chapter 4 Missing values
Missing values can complicate data exploration, therefore our aim is to find and remove them. After replacing the appropriate NA’s with 0, we use an R command that shows how many NA values are in each column of both the batting and pitching data.
## playerID name hits AVG SLG
## 0 0 0 0 0
## OBA G AB R H
## 0 0 0 0 0
## RBI 2B 3B HR SAC
## 0 0 0 0 0
## SB CS SO BB HBP
## 0 0 0 0 0
## TB GameAB GameH GameTB GameG
## 0 0 0 0 0
## GameR GameRBI Game2B Game3B GameHR
## 0 0 0 0 0
## GameSB H_should_have TB_should_have R_should_have RBI_should_have
## 0 0 0 0 0
## 2B_should_have 3B_should_have HR_should_have SB_should_have H_deviation
## 0 0 0 0 0
## TB_deviation R_deviation RBI_deviation 2B_deviation 3B_deviation
## 0 0 0 0 0
## HR_deviation SB_deviation
## 0 0
## playerID name throws ERA
## 0 0 0 0
## W L SV AVG_against
## 0 0 0 0
## G GS CG IP
## 0 0 0 0
## ER H SO BB
## 0 0 0 0
## HBP WP BK GameG
## 0 0 0 0
## GameIP GameER GameHits GameSO
## 0 0 0 0
## GameW GameL GameSV GameBB
## 0 0 0 0
## ER_should_have Hits_should_have SO_should_have W_should_have
## 0 0 0 0
## L_should_have SV_should_have BB_should_have ER_deviation
## 0 0 0 0
## Hits_deviation SO_deviation W_deviation L_deviation
## 0 0 0 0
## SV_deviation BB_deviation
## 0 0
As expected, we see that there are no NA values in either of the data sets, hence there is no missing data. Plotting the two data sets asserts this.