This work was completed under the guidance of my adviser Daniel Eck and the University of Illinois Urbana-Champaign Statistics Department for my master's thesis in applied mathematics.
The goal is to use the distributions generated by SEAM to create optimal batter-pitcher matchup-specific fielder alignments to decrease predicted BABIP via gradient descent techniques. A detailed description of the mathematics and motivation behind this problem can be found in the thesis.
I will give a short description of each folder's contents.
generated fields contains the standard in-play coordinates for all distributions. The coordinates for Guaranteed Rate Field were used for all testing as it is the most standard-shaped field in the MLB. seam contains helper files taken from the SEAM repository and were mainly used to generate the SEAM distributions for testing. statcast contains scraped Statcast data. validation contains 2022 Statcast data for out-of-sample validation, the table of values generated from the out-of-sample validation, and similarity scores between each (tested) batter and pitcher.
Generates the GAMs trained on Statcast data for each position based on whether they can field a ball in play. This is used to justify the elliptic fielder shape in the optimization's implementation.
File used to filter the SEAM distribution to fair territory for each stadium. The output is a flattened data table containing the in-play coordinates.
Images of validation results, fielder shape justification, and the "Visualizations" section.
placement-validation contains the main maximization method used to generate the optimal alignments. This was used to produce the validation data. The final section contains a distribution analysis of the generated BABIP reductions. optimal-fielder-alignments contains essentially the same methods as the previously mentioned file but was used to generate plots. getting-pitcher-and-batter-pools finds the similarities between each tested batter and pitcher. similarity-vs-babip-reduction extracts correlation between predicted BABIP reduction and similarity score. Most of the code in this file was omitted because it was trivial.