Answer Quality Benchmarks

Generates 200 synthetic Q&A pairs, runs them through the RAG pipeline, then uses LLM-as-judge to score groundedness and completeness.

Click "Run Answer Benchmarks" to evaluate answer quality.