2025-06-18 · 8 min read

How I Fine-Tuned GPT-2 for Cricket Commentary

400k commentary rows, TensorFlow tricks, and plenty of slog-sweeps. This is how I trained a model to generate match-style commentary.

NLPFine-tuningCricketT5GPT-2

Why this project mattered

Sports commentary is emotional, fast, and contextual. Generic language models sound flat when they describe cricket moments.

I wanted a model that could generate short, punchy lines that feel like live commentary while still being grounded in what happened in the match.

I collected a large commentary dataset and normalized inconsistent text patterns such as over formats, team names, punctuation, and special symbols.

I removed noisy rows, duplicated snippets, and low-value boilerplate so the model would focus on useful commentary style instead of artifacts.

I used GPT-2 fine-tuning for generation quality and compared multiple prompt templates to stabilize tone.

I also tested sequence length and decoding settings because cricket commentary quality drops quickly when outputs become repetitive.

Training quality improved when I balanced learning rate, context window, and batch settings for the GPU budget available.

I evaluated with both offline metrics and manual review, since sports language needs human judgment for excitement and readability.

The final model produced sharper, more idiomatic commentary lines than baseline prompting on a general model.

The biggest lesson was that dataset quality and prompt structure mattered more than trying complex model changes too early.