English 🇬🇧 Suomalainen 🇫🇮 Dansk 🇩🇰

Matthew's Portfolio

Matthew Wolf

About Me

I'm Matthew Wolf, an aspiring data analyst with a solid foundation in business analytics from George Washington University. With hands-on experience in data management and analysis, I've successfully tackled real-world projects like analyzing baseball pitch-level data, honing my skills in data collection, data cleaning, and visualization. Proficient in Python, R, Microsoft Excel, I have a passion for uncovering insights within dynamic and fast-paced environments. I'm eager to contribute to innovative projects and collaborate with cross-functional teams, ready to leverage my analytical expertise to drive impactful decisions in the pop culture and entertainment industry.

Python:

  • Pandas
  • Bokeh
  • Matplotlib
  • Seaborn
  • NumPy
  • Scikit-learn

R:

  • dplyr
  • ggplot2

Relevant Courses:

  • Programming for Analytics UG
  • Decision Models
  • Management Information Systems Technology
  • Business Analytics II
  • Semester Long Project: New York Mets - Pitch Level Data Analysis

    This project focuses on conducting a comprehensive analysis of pitch-level data for the New York Mets during the 2023 season. The data utilized in this analysis was sourced from official MLB Statcast data, providing detailed metrics such as release speed, spin rate, launch angle, and exit velocity for each pitch thrown by Mets pitchers.

    The primary goal of this project is to identify key performance metrics that correlate with pitching success, such as inducing ground balls, limiting opponent batting average, and maximizing strikeout rates. By leveraging advanced statistical techniques, such as clustering analysis and machine learning algorithms, actionable insights have been extracted to optimize the Mets' pitching strategy for future seasons.

    Future goals include enhancing the model's predictive capabilities and compatibility with other advanced techniques like Stochastic Gradient Descent, to further refine player evaluations and strategic decisions.

    Intro Graph

    Leading the way for the Mets in terms of sheer pitch volume was rookie Kodai Senga. Towards the left of the graph, primarily starting pitchers can be found, as they tend to go deeper into games compared to relief pitchers. On the right side of the graph, position players such as Luis Guillorme and Danny Mendick can be found, with pitch counts that do not even register on the graph.

    pitches used team graph

    The most frequently used fastballs among pitchers were Four Seamers. Among breaking balls, Sliders were the most commonly employed. Off-speed pitches, specifically changeups, were utilized the most by pitchers during games.

    Launch Angle vs launch speed

    Scatter plot shows avg. launch angle vs. avg. launch speed for each pitcher. Two outliers circled: Luis Guillorme & Danny Mendick. Data clusters mostly between 10-25 degrees and 85-90 MPH. Trend suggests higher angles correlate with higher speeds; more pitcher data needed for confirmation. Lower launch speeds create weak contact, aiding defense in run prevention. Players with lower angles tend to have lower speeds; investing in them could be advantageous.

    Exploratory Data Analysis

    Early findings from the exploratory data analysis revealed intriguing patterns and insights. For instance, clusters 0 and 3 consistently exhibited the lowest launch angles, indicating a tendency to induce ground balls—a favorable outcome for pitchers. Visualizations such as scatter plots of launch angle versus exit velocity highlighted correlations between these metrics, providing initial insights into effective pitching strategies.

    2023 NYM Arsenal

    Among all starting pitchers, identified by their notably taller bars on the graph, Kodai Senga stands out as the sole pitcher who utilized Forkballs at a noteworthy frequency. Additionally, he led in total pitch count for the year. Senga employed a diverse repertoire, showcasing seven distinct pitch types in 2023. This multifaceted approach likely contributes to his effective management on the mound. The data suggests that encouraging other starters to increase their usage of secondary, tertiary, and potentially quaternary pitches could yield similar benefits. Such a strategy could also lighten the workload for relievers, providing a tactical advantage across the pitching staff.

    BBrates

    Balls put into play are categorized into fly balls, ground balls, line drives, or popups based on their launch angles: ground balls (GB) are less than 10 degrees, line drives (LD) range from 10 to 25 degrees, fly balls (FB) span 25 to 50 degrees, and popups (PO) exceed 50 degrees. Most pitchers predominantly see ground balls as the majority of their balls in play, while popups typically constitute the smallest portion. Fly balls and line drives exhibit considerable variation among pitchers. Lower launch angles correlate with reduced exit velocities, thereby producing more ground balls. Pitchers with high ground ball rates tend to minimize hard-hit balls. However, effective team defense remains crucial for maximizing the benefits of a ground ball pitching strategy.

    Handedness

    Among the 37 pitchers analyzed, 28 were right-handed and 9 were left-handed. Increasing the number of left-handed starters and relievers would enhance roster flexibility for strategic matchups. Many starting pitchers displayed similar pitch counts against left-handed and right-handed hitters, exemplified by Tylor Megill's nearly even distribution of 1085 pitches to righties and 1150 to lefties. In contrast, relief pitchers often exhibited specialization, evident in visual disparities on their respective bar graphs. For instance, Adam Ottavino, a right-handed pitcher, noticeably favored pitching against right-handed hitters, leveraging his inherent advantage in those matchups.

    New York Mets Model

    I implemented an unsupervised learning model initially on the 37 New York Mets pitchers only, selecting a set of variables (seen below) I deemed to be significant.

    variable/description

    After performing PCA analysis, the first 2 components, which together explained more than 99.5% of the variance, were chosen.

    Based on an SSE elbow plot, it was determined that using 5 clusters provided the most descriptive segmentation.

    There is a potential issue where insufficient or less significant variables might have been chosen instead of more impactful ones.

    graph

    Of the 37 pitchers analyzed: 32.4% were in cluster 0, 24.3% were in cluster 1, 5.4% were in cluster 2, 27% were in cluster 3, and 10.8% were in cluster 4

    spinrate/speed graph

    There was more variation in the cluster’s average spin rates (2039-2638) than the cluster’s average release speed (87.8-89.7) (while ignoring the cluster of position players).

    This indicates that release speed may influence the distinction of pitchers less than spin rate, i.e., the team should more intensely monitor pitchers’ spin rates, not necessarily their release speeds when evaluating.

    MLB Model

    I applied an unsupervised learning model to the entire population of MLB pitchers, totaling 837 individuals, using the same set of variables as before. After conducting PCA analysis, the first two principal components were chosen as they collectively explained over 99.8% of the variance in the data. Through an SSE elbow plot, it was determined that six clusters provided the most descriptive segmentation, one more than what was identified in the NYM model. However, there is a potential for error in the analysis, possibly due to inadequate selection of variables or the inclusion of insignificant variables in place of more impactful ones.

    pic1 description

    Out of the 863 pitchers analyzed, the clustering results revealed varying distribution across six distinct clusters. The largest proportion was found in cluster 0, comprising 34.6% of the pitchers, indicating a dominant group within the dataset. Cluster 1 represented 14.3% of the pitchers, followed by cluster 3 at 14.4%. Clusters 2 and 4 each accounted for 5.1% and 2.1% respectively, reflecting smaller subsets within the population. Cluster 5 encompassed the highest proportion after cluster 0, consisting of 29% of the total pitchers analyzed.

    pic2 description

    Clusters 0 and 3 exhibited the lowest average launch angles at 11.5 and 11.9 degrees respectively. Cluster 5 also showed relatively consistent launch statistics among its players. In contrast, clusters 2 and 4 displayed greater variability in their launch angles. Given that lower launch angles tend to result in groundouts, which are advantageous for pitchers, prioritizing free-agent signees from clusters 0 and 3 is recommended. It is also prudent to avoid pitchers grouped with position players, as they may not perform as effectively in pitching roles. This strategy aims to optimize team performance by leveraging pitchers with consistent and effective pitching metrics.

    pic3 description

    Clusters 4 and 2 are distinctive due to their notably lower release speeds and spin rates, similar to the NYM model, primarily comprising position players rather than pitchers.

    Spin rates vary more significantly across clusters than release speeds.

    Notably, cluster 1 exhibits the highest spin rates and shows the least change in run expectancy, suggesting more consistent performance.

    Thus, prioritizing the evaluation of pitchers based on spin rates could offer valuable insights into their effectiveness and reliability on the field, aiding in strategic pitching decisions.

    Conclusion

    When targeting players in free agency or trade, prioritize those in clusters 0 and 3. These players have the lowest launch angles, which lead to more ground balls—a crucial factor for effective pitching. Consider trading surplus players from cluster 1 to gain promising prospects, as they boast the highest spin rates and the lowest change in run expectancy, making them attractive to other teams. To improve team-wide performance, increase the use of Forkballs across all pitchers, a strategy underutilized in MLB. Lastly, when evaluating pitchers, focus more on spin rate than other metrics like release speed, as spin rate tends to better differentiate their effectiveness.