Using a Simulation to Test a Baseball Question
I often listen to sports radio in the car and recently the talk of the town has been about the Houston Astros, who started the season abysmally but have turned things around lately. At the time of this writing the Astros have the best record in baseball for the past two months. Well, one of the reasons for their resurgence, supposedly, is the move of Jose Altuve from batting leadoff in the lineup to batting third. Altuve currently leads the MLB in batting average and in on-base percentage and is by far the Astros’ best offensive player. The rest of the offense is subpar, with only a couple of other above-average batters.
The rationale for moving Altuve to third is to get him
into the “heart” of the lineup: he can drive in runs if the first two batters
get on base and can depend on Carlos Correa, who bats fourth and is another
above-average hitter, to drive him in should Altuve get on base. Baseball
lineups consist of nine players and teams stack the top of their lineups with
their best hitters, simply because those players see the plate most often. For
example, someone who bats first gets on average 1/9 more plate appearances per
game than someone who bats second and 5/9 more plate appearances per game than
someone who bats sixth (under the reasonable assumption that the final batter
of each game varies uniformly). The argument to keep Altuve first, therefore,
is to maximize his plate appearances on the team. The downside of this is that
our best hitter has to bat immediately following our worst hitters. In other
words, during the middle innings of the game, Altuve often goes up to the plate
with one or two outs already and no one on base. We’d rather him bat with the
bases loaded and zero outs.
On paper, each approach has its pros and cons, and it’s
not obvious which leads to the most runs. Does having the best player bat immediately
following other above-average players increasing scoring? I decided to design a
simple simulation to find out. My model simulated the scoring for a single team
for a standard 9-inning game. I made a number of assumptions to keep things
simple:
Players either record a single or an out for each plate appearance
The batting order stays the same for the duration of the game
Players on base only advance one base for each hit
The game ends after 9 innings or 27 outs
For my first design (A), I assigned on-base percentages of
0.500, 0.400, 0.400, 0.400, 0.300, 0.300, 0.300, 0.300, 0.300 for the nine
batters, with the 0.300 batters always batting after the 0.400 batters. For the
second design (B), I used 0.500, 0.400, 0.400, 0.400, 0.200, 0.200, 0.200,
0.200, with the 0.200 batters succeeding the 0.400 batters. I then moved the 0.500
batter to either the first, third, fourth, or ninth positions in the batting
order and simulated 5,000 games for each scenario. The average runs per game
are depicted in the table below.
First
|
Third
|
Fourth
|
Ninth
|
|
A
|
2.3668
|
2.3472
|
2.309
|
2.268
|
B
|
1.172
|
1.1868
|
1.2162
|
1.1358
|
Batting leadoff generated slightly more runs than batting
third in design A but slightly less in design B. However, over the course of a
162-game regular season, the difference only amounts to about 3 runs and 2
runs, respectively. This difference is relatively insignificant and would
likely not change the outcome of more than one game in a season. Interestingly,
batting fourth yielded the most number of runs in design B but not A, which
gives credence to the idea that a great batter is wasted if placed directly
behind exceedingly poor hitters. However, the differential compared to batting
first still amounts to only 7 runs for an entire season, which might affect the
outcome of only one or two games. Finally, we can visualize the effects of
plate appearances by directly comparing batting first to batting last, since
the order remains the same in both cases. In design A, batting first
yielded 16 more runs over the course of a season, whereas it only netted 6
extra runs in design B.
Generally, the data indicates that
there is not a significant advantage to be gained by moving the best batter
away from the leadoff position into a different spot in the lineup. However, my
model was very simplified and did not take into account factors such as walks, extra-base
hits, base running, multiple-out plays, sacrifice bats, and variations in
hitting, pitching, or fielding depending on the situation. An actual sophisticated
model would take into account walk rate, slugging percentage, base running, and
performance with runners in scoring positions for each player, which could make
the data more robust. Meanwhile, variations in pitching and fielding resulting
from different in-game or out-of-game situations would likely average out over
the course of a season.
Traditionally, teams prefer to put
their best offensive players, especially those who hit for power, in positions
three through five, and rarely will they have them leadoff. Part of the reason
could be psychological: teams prefer that their lineups escalate in threat
level instead of featuring the best player first in order to put pressure on opposing
pitchers. Or this approach could be used to provide a slight morale boost to the batting
team and its fans. Whatever the reason, the data seems to be relatively indifferent
towards batting order, so there are likely some intangibles at play.