First Evaluate, Then Encode: A Practical Guide to Evaluation, Reordering, and Caching

Lightning

Details

Many retrieval-augmented generation (RAG) and code search processes rely on ad-hoc checks, which can fail during large-scale deployment.

This presentation will introduce an "evaluation-first" development workflow, which has been applied to a production-grade code search engine built on Python, PostgreSQL (pgvector), and OpenAI re-ranker. After introducing an automated evaluation suite before optimization, the average query latency decreased from 20 minutes to 30 seconds, achieving a 40-fold speed improvement, and relevance increased by about 30%. The content will cover:

  • Building task-specific evaluation datasets and metrics
  • Hybrid (lexical + approximate nearest neighbor) retrieval
  • Cross-encoder reranking for accuracy improvement
  • Semantic caching strategy to maintain index freshness and query speed

The talk will include benchmark results, live demonstrations, and a reference implementation under the MIT license—participants can clone and extend this implementation.