Ep 8 - Kimi2, Is RAG still a thing? and the coming SaaS bloodbath.

RAG ArchitectureSaaS DisruptionKimi2 ModelAI Business ImpactKimi2RAG SystemsThe Build Podcastairagkimi2saasllmsai-architecturebusiness-strategy

Key Takeaways

Business

  • Real estate assistant apps leveraging vector technology show promising niche applications.
  • The SaaS industry is approaching a highly competitive phase, referred to as a 'bloodbath', requiring strategic differentiation.
  • Real-Time Vector Re-Ranking can offer a competitive edge in fast-paced digital products.

Technical

  • Vector stores are valuable but not strictly mandatory for all AI applications.
  • Local LLMs serve niche use cases rather than broad deployment.
  • Integration of specialized hardware like Groq Accelerators can significantly enhance vector processing performance.

Personal

  • Adapting to emerging AI tools requires a flexible mindset about technology choices, such as vector storage.
  • Staying informed about niche AI advancements can uncover unique business opportunities.
  • Recognizing both the hype and realistic applications of local LLMs helps in managing personal and professional expectations.

In this episode of The Build, Cameron Rohn and Tom Spencer unpack system design and market strategy around AI agents and SaaS disruption as they analyze Kimi2, RAG's relevance, and the coming "SaaS bloodbath." They begin by mapping technical architecture: discussing the Kimi Open Weight Model, VO3 character vectors, Superlinked Vector Computer, Groq Accelerator Integration, and Neo4J Graph Database as components in agent memory systems. They highlight Mixture-of-Experts pattern choices, real-time vector re-ranking, and MCP tools for orchestration, focusing on latency, cost, and developer ergonomics. The conversation then shifts to developer tooling and building in public strategies: how Langsmith, Vercel, and Supabase fit into CI/CD and hosting for AI products, plus using Langsmith for observability and MCP tools to manage multi-model deployment. They contrast agent-first workflows with traditional RAG and probe whether RAG remains a dominant pattern. They explore entrepreneurship and monetization: prototyping a real estate assistant backed by vectors, offering a Managed Kimi Inference Service, and competitive dynamics as many startups iterate publicly. Practical trade-offs—indexing vs. graph memory, batching on Groq accelerators, and where to open source—are emphasized. The episode ends with a forward-looking call to action: developers and founders should prioritize composable architectures and public iteration to survive and thrive in the next wave of AI products.