2025-08-04 · 7 min read
How I built my AI clone that talks like me
A local-first AI clone built with Ollama and ChromaDB for real-time conversations shaped by my writing, projects, and personal style.
Goal and constraints
I wanted a chatbot that felt personal, not generic. It had to answer with my tone, my project context, and my preferred problem-solving style.
I also wanted local control for speed and privacy, so I used a local model runtime instead of relying fully on cloud APIs.
Knowledge pipeline
I converted my notes, blog drafts, and project documentation into chunks and embedded them into a vector database.
The retrieval layer was designed to fetch the most relevant memories before generation so responses stay grounded in real source material.
- Chunked long documents into retrieval-friendly units
- Stored embeddings in ChromaDB with metadata filters
- Used similarity search before response generation
Local model setup
Ollama handled local inference while the application orchestrated retrieval, prompt assembly, and response formatting.
I tuned prompts to preserve personality cues without making responses overly verbose or robotic.
- System prompts enforced style consistency
- Context windows were optimized for speed on local hardware
- Fallback behavior handled low-confidence retrieval
Real-time behavior
To keep interactions fluid, I prioritized low latency over unnecessary prompt complexity.
Streaming output and concise context selection made the chatbot feel responsive enough for daily use.
What worked best
Retrieval quality was the biggest factor in making the clone feel authentic.
The final system proved that personal assistants can be useful and practical without a heavy backend or expensive serving stack.