
Rick Scuteri-USA TODAY Sports
Last week a Fan Post explored whether the results of Spring Training were highly correlated to the expected results of the regular reason. With 8 losses in a row, it felt like the fanbase was being pulled in two directions with some believing 2020 would be disastrous for the Dbacks with others finding zero reason for concern regarding these early results. The data across MLB showed an extremely weak correlation between Spring Training W-L records and regular season W-L records (vindicating the latter group of Snake fans) but it also showed a predictability in the range of where a regular season W-L record would be based on Spring Training results. Thus, if the Dbacks had continued to lose in Spring Training and ended with a record at or below 0.400, the data suggested the Dbacks would struggle to break 0.500 in the regular season and would have a very small chance at a post-season appearance. Luckily, our Dbacks have gone on a streak of ties and wins lately as starters are taking a more active role in Spring Training!
The next assessment to make is whether there is a correlation between batting metrics in Spring Training and the regular season. Pitching and fielding metrics will be left aside as the former is suspected to be skewed as pitchers get back into form and the latter is skewed as position players move around the field more frequently. But batting is “static” – you get to the plate, you dig in, you try to hit the ball… there isn’t much else to it (in very general terms). This was more difficult to accomplish because the data set is much, much larger than team W-L records and Spring Training “stats” are spotty prior to the 2018 season.
Some general details about the data collection:
- Only 2019 and 2018 are analyzed
- To be considered, a player must have >36 plate appearances in Spring Training and have >150 plate appearances in the regular season
- SO%, SLG%, and OPS were looked at; again, it’s data availability more than anything
Why >36 plate appearances or >150 plate appearances, you ask? We need the statistical base to be large enough to not be intentionally introducing outliers into the data. Does this mean Chris Owings in 2019 will be left out of this analysis? Yes. (He hasn’t been a Dbacks for years, you need to let him go, people.)
Let’s go to the data! Like last time, we’ll be using a linear regression between Spring Training and regular season results. An R2 factor will be generated to assess the correlation between the two datasets – values closer to ‘1’ indicate the data are highly correlated while values closer to ‘0’ indicate the data are dissimilar and no correlation is present.
Strike-out percent (SO%):
2018 R2 = 0.241
2019 R2 = 0.011
Slugging percent (SLG%):
2018 R2 = 0.041
2019 R2 = 0.023
On-Base Plus Slugging (OPS):
2018 R2 = 0.027
2019 R2 = 0.009
Woah! So much data to parse through! What’s the overall message here though?
- SO%: While 2018 did show a stronger degree of correlation between Spring Training and the regular season, 2019 data shows the correlation to disappear meaning the data is probably highly volatile year-to-year based on who batters are facing and their approach to the plate (a higher number of batters strike out more often in the regular season than in Spring Training for whatever that is worth). Being able to go back to 2017, 2016, etc and look at this volatility would be good but the data does not look to be present at a mass scale.
- SLG%: Neither 2018 nor 2019 show any degree of correlation. SLG% is therefore not predictable between Spring Training and the regular season.
- OPS: This metric has nearly zero correlation between Spring Training and the regular season. If someone asks you to place a bet that David Peralta’s OPS will be consistent between spring and summer, don’t take that bet! (0.581 in 2019 ST | 0.804 in 2019 RS by the way which is an incredibly large difference)
An interesting factor that catches the eye is that R2 values decreased rather substantially in each category from 2018 to 2019. I wonder if this is an overall change in how players, teams, or coaches are treating Spring Training today compared to prior years (I’ll again bring up “Sabermetrics” which has altered how every individual in MLB is treating the game) or whether this is coincidental because we’re only able to look at 2-years of data.
The error between datasets at the player level also looks to be tremendously higher than what was observed for W-L team records. In this case, I wouldn’t place any floor or ceiling to a player’s SO%, SLG%, or OPS based on this analysis. One interesting question is whether the correlation would improve or the error minimize if we were to look at Spring Training relative to April and/or May player data. Unfortunately, this data is too difficult to obtain and analyze for this particular assessment.
So if you are worried about player metrics in Spring Training, have no fear that the results will carry over to their numbers for the season. In this case, sit back and enjoy watching Ketel Marte taking a High A pitcher deep!