2025-08-04 · 7 min read

How I built my AI clone that talks like me

A local-first AI clone built with Ollama and ChromaDB for real-time conversations shaped by my writing, projects, and personal style.

GenAIChatbotsOllamaChromaDBVector Embeddings

Goal and constraints

I wanted a chatbot that felt personal, not generic. It had to answer with my tone, my project context, and my preferred problem-solving style.

I also wanted local control for speed and privacy, so I used a local model runtime instead of relying fully on cloud APIs.

Knowledge pipeline

I converted my notes, blog drafts, and project documentation into chunks and embedded them into a vector database.

The retrieval layer was designed to fetch the most relevant memories before generation so responses stay grounded in real source material.

  • Chunked long documents into retrieval-friendly units
  • Stored embeddings in ChromaDB with metadata filters
  • Used similarity search before response generation

Local model setup

Ollama handled local inference while the application orchestrated retrieval, prompt assembly, and response formatting.

I tuned prompts to preserve personality cues without making responses overly verbose or robotic.

  • System prompts enforced style consistency
  • Context windows were optimized for speed on local hardware
  • Fallback behavior handled low-confidence retrieval

Real-time behavior

To keep interactions fluid, I prioritized low latency over unnecessary prompt complexity.

Streaming output and concise context selection made the chatbot feel responsive enough for daily use.

What worked best

Retrieval quality was the biggest factor in making the clone feel authentic.

The final system proved that personal assistants can be useful and practical without a heavy backend or expensive serving stack.