Cutting-edge MLB pitch-predicting software utilizing the latest Statcast data. Open-source and free to use. Brought to you by baseball-analytica.com.
Read our technical writeup: Predicting MLB Pitch Sequences with xLSTM
- Two prediction algorithms: Similarity-based (nearest neighbor) and xLSTM sequence model
- Multiple interfaces: Python API, REST API server, and CLI
- Rich predictions: Pitch type probabilities, speed/location distributions, outcome analysis
- Batted ball predictions: Outcome probabilities from exit velocity and launch angle with context-aware filtering
- Disk-backed caching: Parquet cache with incremental Statcast updates
- Statcast powered: Uses MLB's comprehensive pitch tracking data via pybaseball
uv pip install pitchpredictOr with pip:
pip install pitchpredictRequires Python 3.12 or higher. We recommend using uv for faster, more reliable package management.
git clone https://github.com/baseball-analytica/pitchpredict.git
cd pitchpredict
uv syncimport asyncio
from pitchpredict import PitchPredict
async def main():
client = PitchPredict()
# Resolve MLBAM IDs (cached) for pitcher/batter
pitcher_id = await client.get_player_id_from_name("Clayton Kershaw")
batter_id = await client.get_player_id_from_name("Aaron Judge")
# Predict pitcher's next pitch
result = await client.predict_pitcher(
pitcher_id=pitcher_id,
batter_id=batter_id,
count_balls=0,
count_strikes=0,
score_bat=0,
score_fld=0,
game_date="2024-06-15",
algorithm="similarity"
)
print(result.basic_pitch_data["pitch_type_probs"])
# {'FF': 0.45, 'SL': 0.30, 'CU': 0.15, 'CH': 0.10}
asyncio.run(main())Pitcher and batter IDs are MLBAM IDs; use PitchPredict.get_player_id_from_name (or the REST /players/lookup endpoint) to resolve names.
Pitcher predictions return a PredictPitcherResponse model; use attribute access or model_dump() for a dict.
Caching is enabled by default and stores data in .pitchpredict_cache. Delete the folder to refresh cached data.
For xLSTM predictions, you must pass prev_pitches (empty list allowed for cold-start):
result = await client.predict_pitcher(
pitcher_id=pitcher_id,
batter_id=batter_id,
prev_pitches=[], # required for xLSTM, empty list is cold-start
game_date="2024-06-15",
algorithm="xlstm",
)xLSTM loads weights lazily. Weights will download automatically on first use. Alternatively, set PITCHPREDICT_XLSTM_PATH to a local checkpoint directory containing model.safetensors and config.json.
When providing history, each pitch in prev_pitches must include a pa_id (plate-appearance id).
Run predictions and look up players directly from the command line (no server required):
# Lookup player IDs
pitchpredict player lookup "Aaron Judge"
# Predict next pitch (names or MLBAM IDs)
pitchpredict predict pitcher "Zack Wheeler" "Juan Soto" --balls 1 --strikes 2
# Predict batter outcome given a pitch
pitchpredict predict batter "Aaron Judge" "Gerrit Cole" FF 96.5 0.15 2.85
# Predict batted-ball outcome (use --format json for machine-readable output)
pitchpredict predict batted-ball 102.3 24 --format jsonUse --verbose for detailed tables, and pitchpredict cache status to inspect the local cache.
Start the server:
pitchpredict serveMake a prediction:
curl "http://localhost:8056/players/lookup?name=Clayton%20Kershaw&fuzzy=true"
curl "http://localhost:8056/players/lookup?name=Aaron%20Judge&fuzzy=true"Use the returned key_mlbam values in the prediction request:
curl -X POST http://localhost:8056/predict/pitcher \
-H "Content-Type: application/json" \
-d '{
"pitcher_id": 477132,
"batter_id": 592450,
"count_balls": 0,
"count_strikes": 0,
"score_bat": 0,
"score_fld": 0,
"game_date": "2024-06-15",
"algorithm": "similarity"
}'pitcher_id and batter_id are MLBAM IDs; use /players/lookup to resolve names.
Predict batted ball outcomes:
curl -X POST http://localhost:8056/predict/batted-ball \
-H "Content-Type: application/json" \
-d '{
"launch_speed": 95.0,
"launch_angle": 18.0,
"algorithm": "similarity"
}'Lookup player IDs:
curl "http://localhost:8056/players/lookup?name=Aaron%20Judge&fuzzy=true"Lookup player metadata by MLBAM ID:
curl http://localhost:8056/players/592450Full documentation is available in the docs/ folder:
- Getting Started - Quick start guide
- Installation - Detailed installation instructions
- Python API Reference -
PitchPredictclass documentation - REST API Reference - Server endpoints
- CLI Reference - Command-line interface
- Algorithms - Similarity and xLSTM algorithms
- Caching - Cache behavior and storage layout
PitchPredict offers two algorithms (details in Algorithms):
Finds historical pitches most similar to the current game context using weighted nearest-neighbor analysis:
- Fetch all pitches thrown by the pitcher from Statcast (2015-01-01 through the requested
game_date). - Compute similarity scores across contextual features (batter ID, counts, bases, score, inning, date, fielders, rest days, strike zone) using softmaxed weights from
SimilarityWeights. - Sample the top
sample_pctg(default 0.05) most similar pitches. - Aggregate statistics and sample concrete pitches to produce predictions.
Batted ball predictions use continuous similarity scoring on exit velocity and launch angle, plus optional spray angle, bases state, outs, and date recency, then sample the top similar events for outcome probabilities and expected stats.
Uses an xLSTM sequence model trained on pitch sequences with a ~260-token vocabulary encoding pitch type, speed, spin, location, and result. The model consumes contextual features (player IDs, count, bases, score, inning, and more) to predict the next pitch token sequence, which is decoded back into pitch attributes and outcomes.
PitchPredict would not be possible without pybaseball, the open-source and MIT-licensed baseball data scraping library. The baseball data itself largely comes from Statcast, but Baseball-Reference and FanGraphs are sources as well.
MIT License - see LICENSE for details.