RustyRAG v0.2 hits sub-200ms latency, uses Groq & Jina AI for open-source RAG
Developer 'Show HN' launched RustyRAG v0.2, an open-source RAG API built in Rust that achieves sub-200ms latency on localhost and sub-600ms from Azure North Central US to Brazil. The update switched to Cerebras/Groq for LLM inference, replaced Cohere with Jina AI's local v5-text-nano-retrieval embeddings, and added optional contextual retrieval via LLM-generated chunk prefixes. This demonstrates that CPU-only, open-source RAG can achieve production-grade latency benchmarks previously requiring GPUs or proprietary services.