Wednesday, July 27, 2016

Using a Simulation to Test a Baseball Question


I often listen to sports radio in the car and recently the talk of the town has been about the Houston Astros, who started the season abysmally but have turned things around lately. At the time of this writing the Astros have the best record in baseball for the past two months. Well, one of the reasons for their resurgence, supposedly, is the move of Jose Altuve from batting leadoff in the lineup to batting third. Altuve currently leads the MLB in batting average and in on-base percentage and is by far the Astros’ best offensive player. The rest of the offense is subpar, with only a couple of other above-average batters.

The rationale for moving Altuve to third is to get him into the “heart” of the lineup: he can drive in runs if the first two batters get on base and can depend on Carlos Correa, who bats fourth and is another above-average hitter, to drive him in should Altuve get on base. Baseball lineups consist of nine players and teams stack the top of their lineups with their best hitters, simply because those players see the plate most often. For example, someone who bats first gets on average 1/9 more plate appearances per game than someone who bats second and 5/9 more plate appearances per game than someone who bats sixth (under the reasonable assumption that the final batter of each game varies uniformly). The argument to keep Altuve first, therefore, is to maximize his plate appearances on the team. The downside of this is that our best hitter has to bat immediately following our worst hitters. In other words, during the middle innings of the game, Altuve often goes up to the plate with one or two outs already and no one on base. We’d rather him bat with the bases loaded and zero outs.

On paper, each approach has its pros and cons, and it’s not obvious which leads to the most runs. Does having the best player bat immediately following other above-average players increasing scoring? I decided to design a simple simulation to find out. My model simulated the scoring for a single team for a standard 9-inning game. I made a number of assumptions to keep things simple:

Players either record a single or an out for each plate appearance
The batting order stays the same for the duration of the game
Players on base only advance one base for each hit
The game ends after 9 innings or 27 outs

For my first design (A), I assigned on-base percentages of 0.500, 0.400, 0.400, 0.400, 0.300, 0.300, 0.300, 0.300, 0.300 for the nine batters, with the 0.300 batters always batting after the 0.400 batters. For the second design (B), I used 0.500, 0.400, 0.400, 0.400, 0.200, 0.200, 0.200, 0.200, with the 0.200 batters succeeding the 0.400 batters. I then moved the 0.500 batter to either the first, third, fourth, or ninth positions in the batting order and simulated 5,000 games for each scenario. The average runs per game are depicted in the table below.



First
Third
Fourth
Ninth
A
2.3668
2.3472
2.309
2.268
B
1.172
1.1868
1.2162
1.1358

Batting leadoff generated slightly more runs than batting third in design A but slightly less in design B. However, over the course of a 162-game regular season, the difference only amounts to about 3 runs and 2 runs, respectively. This difference is relatively insignificant and would likely not change the outcome of more than one game in a season. Interestingly, batting fourth yielded the most number of runs in design B but not A, which gives credence to the idea that a great batter is wasted if placed directly behind exceedingly poor hitters. However, the differential compared to batting first still amounts to only 7 runs for an entire season, which might affect the outcome of only one or two games. Finally, we can visualize the effects of plate appearances by directly comparing batting first to batting last, since the order remains the same in both cases. In design A, batting first yielded 16 more runs over the course of a season, whereas it only netted 6 extra runs in design B.

Generally, the data indicates that there is not a significant advantage to be gained by moving the best batter away from the leadoff position into a different spot in the lineup. However, my model was very simplified and did not take into account factors such as walks, extra-base hits, base running, multiple-out plays, sacrifice bats, and variations in hitting, pitching, or fielding depending on the situation. An actual sophisticated model would take into account walk rate, slugging percentage, base running, and performance with runners in scoring positions for each player, which could make the data more robust. Meanwhile, variations in pitching and fielding resulting from different in-game or out-of-game situations would likely average out over the course of a season.

Traditionally, teams prefer to put their best offensive players, especially those who hit for power, in positions three through five, and rarely will they have them leadoff. Part of the reason could be psychological: teams prefer that their lineups escalate in threat level instead of featuring the best player first in order to put pressure on opposing pitchers. Or this approach could be used to provide a slight morale boost to the batting team and its fans. Whatever the reason, the data seems to be relatively indifferent towards batting order, so there are likely some intangibles at play. 

No comments:

Post a Comment