The batting order matters
Why how you set up AI is more important than which AI you use
It was the bottom of the seventh, and I already knew it was over.
South Korea versus the Dominican Republic in the WBC quarterfinals. The gap wasn’t just on the scoreboard, it was in everything… the at-bats, the pitch selection, the defensive positioning. As someone who’s spent years looking at this game through data and a SABR analytics specialist, I didn’t need the final out to tell me what the numbers already had.
But losing a game I care about does one useful thing. It makes me ask questions. Which Korean hitters had been struggling before this tournament? Were there signs in the KBO numbers? What patterns were hiding in the data that nobody surfaced in time?
A few days ago, I was catching up with a close friend who’s a rising FIFA-licensed agent. We were talking about how sports organizations handle player data, and he said something offhand that I haven’t been able to shake.
“I’ve seen the dozens of note cards my broadcaster friends prepare before a game. Has to be hours of work.”
He wasn’t complaining. He was just observing. But what he was really describing is a challenge I’ve seen inside nearly every organization I’ve worked with, and a long list of startups across Asia, the Middle East, and North America. The pattern is identical whether you’re calling a baseball game or running a quarterly business review.
And what started as a personal baseball project accidentally became my analogy for explaining this to organizations.
So here’s what I actually built, because the experience is the lesson.
Approach one: just ask ChatGPT. ChatGPT and most LLMs do have web access these days, but current KBO statistics are not exactly their strong suit. The result was a confident-sounding answer built on thin air. This is what LLMs do when they lack grounding in specific, real data: they estimate, extrapolate, and occasionally just make things up while maintaining excellent posture.
Previously, I also built a Baseball Data Analyst custom GPT for the purpose of having a conversational experience with specialized context, voice interface, baseball analytics framing. Genuinely useful for general reasoning, historical context, analytical frameworks. But for specific private statistics? A well-configured GPT without grounded data is a confident intern who’s never actually seen your files.
Approach two: fine-tune a model on the data. What if I just trained the model on KBO player statistics directly?
I built a fine-tuning pipeline in Google Colab, because most of us don’t have enterprise GPUs sitting around. The process is more accessible than it sounds. Take a base model like Llama 3.1 8B from Hugging Face, apply LoRA (a lightweight technique that adjusts only a small fraction of model parameters, keeping compute costs reasonable), feed it your dataset, and train it on a free T4 GPU in Colab.
The part most tutorials gloss over is the data preparation. Your training data needs to be structured as input/output pairs in JSONL format, essentially teaching the model: “when asked this, respond like this.” One example from my KBO dataset looked like:
{"input": "What was Moon Bo-gyeong's batting average during the 2024 season?", "output": "Moon Bo-gyeong’s batting average was .301 in the 2024 KBO season."}Get that format wrong, and the training runs fine but the model learns nothing useful. Once the dataset is clean, the actual training call is straightforward:
trainer = SFTTrainer(
model=model,
train_dataset=dataset['train'],
eval_dataset=dataset['test'],
dataset_text_field='text',
max_seq_length=2048,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
num_train_epochs=3,
learning_rate=2e-4,
output_dir='kbo_checkpoints',
)
)
trainer.train()After training, you export the model and run it locally through Ollama.
The fine-tuned model learned patterns. It could speak with real familiarity about KBO teams and players. But here’s the problem with specific statistical data: fine-tuned models are confidently wrong in ways that will age badly. The model learned the language of the data, not the facts. When it didn’t know a number, it filled the gap with something plausible-sounding. In baseball, that’s embarrassing. In a business context, a wrong figure in a financial report, a patient record, a production defect rate becomes a liability.
Approach three: build a RAG. This is where I landed, and where most real-world data problems actually belong.
RAG or Retrieval-Augmented Generation, doesn’t ask the model to remember your data. It builds a retrieval layer that searches your actual data first, then hands the relevant results to the model as context. The model’s job becomes interpretation and presentation, not memorization. The model is the analyst. The retrieval system is the filing cabinet.
My KBO RAG chatbot runs entirely on my local machine. It indexes 5,289 player-seasons across all 10 KBO teams from 2016 to 2025, plus 1,571 player bios. When you ask about Hanwha Eagles’s 2024 roster, it retrieves actual JSON files from my dataset, formats the stats, and passes them to the LLM with a clear instruction: answer only from this data. No cloud API. No data leaving my machine. No guessing. The accuracy difference is not subtle.
This wasn’t my first time building a RAG for a real use case.
In 2022, when I co-founded a zero-knowledge social media platform for verified college students, we faced the same core problem. Students wanted recommendations on what to study to target specific careers. We had course catalogs from multiple US universities. The question was how to make that information actually useful in context.
We built a RAG pipeline referencing real course offerings from each university. When a student expressed interest in product management at tech companies, the system didn’t hallucinate a curriculum. It retrieved real courses from their specific school and surfaced relevant options. Grounded, personalized, accurate.
The lesson I took from that: the magic is never in the model alone. It’s in the retrieval layer, the data strategy, the system design that keeps the AI working with truth instead of probability.
Let me translate the baseball project into language that’s harder to ignore.
If your team is using a general-purpose LLM to draft copy or brainstorm campaign angles, that’s approach one. Useful. Saves time on first drafts. But it has no knowledge of your brand guidelines, last quarter’s performance, or what’s actually happening in your specific markets.
Now consider a large enterprise, say, Samsung. Decades of internal product data, engineering specs, global customer feedback, supply chain metrics. A fine-tuned model trained on that data would speak the language of the company fluently: valuable for internal knowledge management, onboarding, contextual documentation generation. But ask it for a specific defect rate from Q3’s production line, and if that number is off by half a percent, the consequences ripple.
This is also why many large enterprises aren’t rushing to push their data into cloud-hosted models. The interest in on-premises AI environments has grown significantly, running models locally, keeping proprietary datasets off third-party infrastructure entirely. Security around LLM and agentic deployments is no longer a footnote. It’s a board-level conversation. And for good reason: the more capable these systems get, the higher the cost of a breach or a leak.
For use cases requiring precision on real internal data, a RAG (or a hybrid architecture where the LLM retrieves from verified internal sources before synthesizing) is the right answer. The data stays grounded. The model stays in its lane.
Three approaches. Three different purposes. None of them inherently wrong. All of them wrong when applied to the wrong problem.
Most organizations are still evaluating AI by typing a question into a chatbot and judging the response. That’s like evaluating a pitcher by watching one warmup throw. It’s not a strategy.
Prompt engineering matters. But its real leverage shows up when a well-designed prompt becomes a component inside an agentic workflow where an AI agent retrieves data, applies reasoning, takes action, and feeds results into a downstream process. That’s when it stops being a conversation trick and starts being infrastructure.
What this requires is end-to-end understanding. Not just “we use AI.” Which kind? Grounded in what data? Embedded in which workflow? With what guardrails? Deployed where, and why?
The organizations that will lead the next phase aren’t the ones who adopted AI the earliest. They’re the ones who understood the setup.
Here’s the part that I think is worth sitting with.
The broadcaster spending hours preparing note cards, the analyst building spreadsheets from scratch, the marketing director assembling a quarterly report by hand, their work doesn’t disappear. But the hours of manual preparation, the risk of human error, the bottleneck of individual memory? That part is already being replaced. Not by AI that guesses. By AI that retrieves, grounds, and acts on real data.
The generalist who understands how to connect data, models, workflows, and business context that person is becoming increasingly valuable. Not the person who has heard of the concept, or who knows someone who knows. The work itself is becoming the proof.
Baseball season is almost here. And I’ll be watching with a RAG chatbot on my laptop, a fine-tuned model I trained in Colab, and a custom GPT built for baseball analysis. Three tools, each with a purpose.
The batting order matters. Know what you’re putting up to the plate, and why.

