2025-06-18 · 8 min read
How I Fine-Tuned GPT-2 for Cricket Commentary
400k commentary rows, TensorFlow tricks, and plenty of slog-sweeps. This is how I trained a model to generate match-style commentary.
Why this project mattered
Sports commentary is emotional, fast, and contextual. Generic language models sound flat when they describe cricket moments.
I wanted a model that could generate short, punchy lines that feel like live commentary while still being grounded in what happened in the match.
Dataset and preprocessing
I collected a large commentary dataset and normalized inconsistent text patterns such as over formats, team names, punctuation, and special symbols.
I removed noisy rows, duplicated snippets, and low-value boilerplate so the model would focus on useful commentary style instead of artifacts.
- Built a cleaned dataset focused on meaningful ball-by-ball context
- Standardized abbreviations and match metadata
- Created train and validation splits to avoid leakage
Modeling approach
I used GPT-2 fine-tuning for generation quality and compared multiple prompt templates to stabilize tone.
I also tested sequence length and decoding settings because cricket commentary quality drops quickly when outputs become repetitive.
- Prompt format had a major impact on output style
- Sampling settings changed creativity versus factual consistency
- Short output targets gave the best production-like results
Training and evaluation
Training quality improved when I balanced learning rate, context window, and batch settings for the GPU budget available.
I evaluated with both offline metrics and manual review, since sports language needs human judgment for excitement and readability.
- Tracked loss curves across checkpoints
- Compared outputs on the same held-out match contexts
- Selected the checkpoint with best readability and style match
Key outcomes
The final model produced sharper, more idiomatic commentary lines than baseline prompting on a general model.
The biggest lesson was that dataset quality and prompt structure mattered more than trying complex model changes too early.