EvaRAG: Evaluating Advanced RAG Techniques With Indexing and Distance Metrics

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Retrieval Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) with external knowledge. Yet, the performance of RAG pipelines is susceptible to design choices across retrieval, similarity metrics, indexing, and reranking. Despite growing adoption, little systematic work has explored the trade-offs between retrieval quality, semantic accuracy, computational efficiency, and cost in RAG systems. This study addresses this gap by conducting a comprehensive evaluation of RAG configurations across multiple dimensions. We propose a benchmarking framework that systematically varies retrievers (Fusion, HyDe, Hierarchical, SCaNN), indexing methods (HNSW, IVF, Flat), similarity metrics (Cosine, Inner Product, L2), and rerankers (BGE, minilm) over datasets of three scales (small, medium, and large). Performance is assessed through coverage, recall, MRR, and nDCG, while semantic quality is measured using correctness, faithfulness, and relevance. Efficiency is quantified via latency, throughput, and computational cost. Our experiments reveal that HNSW–IP–Fusion– minilm achieves the strongest semantic performance, with Coverage Retrieval of 0.942, Correctness of 0.909, and Faithfulness of 0.970, making it ideal for accuracy-critical tasks. Conversely, IVF–L2–Hierarchical demonstrates the lowest latency (1.736 ns) and cost, making it suitable for real-time deployments. Reranker analysis shows modest but consistent gains for minilm over BGE, while HyDe excels in precision at the expense of efficiency. Notably, no single configuration dominates; optimal designs depend on the application’s needs, whether it is maximizing semantic accuracy, minimizing latency, or striking a balance between the two. By demonstrating concrete trade-offs, this work provides a practical foundation for scaling RAG pipelines across diverse domains, including information retrieval, enterprise search, and knowledgeintensive reasoning.

Açıklama

Anahtar Kelimeler

Data retrieval, Large language model, Natural language processing, Question answering systems, RAG

Kaynak

IEEE Access

WoS Q Değeri

Scopus Q Değeri

Cilt

13

Sayı

Künye

Elkiran, H., & Rasheed, J. (2025). EvaRAG: Evaluating Advanced RAG Techniques With Indexing and Distance Metrics. IEEE Access, 13, 215724-215747.

Onay

İnceleme

Ekleyen

Referans Veren