EvaRAG: Evaluating Advanced RAG Techniques With Indexing and Distance Metrics

dc.authorscopusid59149323900
dc.authorscopusid57791962400
dc.contributor.authorElkıran, Harun
dc.contributor.authorRasheed, Jawad
dc.contributor.authorRasheed, Jawad
dc.date.accessioned2026-04-08T12:48:46Z
dc.date.issued2025
dc.departmentLisansüstü Eğitim Enstitüsü
dc.departmentMühendislik ve Doğa Bilimleri Fakültesi
dc.description.abstractRetrieval Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) with external knowledge. Yet, the performance of RAG pipelines is susceptible to design choices across retrieval, similarity metrics, indexing, and reranking. Despite growing adoption, little systematic work has explored the trade-offs between retrieval quality, semantic accuracy, computational efficiency, and cost in RAG systems. This study addresses this gap by conducting a comprehensive evaluation of RAG configurations across multiple dimensions. We propose a benchmarking framework that systematically varies retrievers (Fusion, HyDe, Hierarchical, SCaNN), indexing methods (HNSW, IVF, Flat), similarity metrics (Cosine, Inner Product, L2), and rerankers (BGE, minilm) over datasets of three scales (small, medium, and large). Performance is assessed through coverage, recall, MRR, and nDCG, while semantic quality is measured using correctness, faithfulness, and relevance. Efficiency is quantified via latency, throughput, and computational cost. Our experiments reveal that HNSW–IP–Fusion– minilm achieves the strongest semantic performance, with Coverage Retrieval of 0.942, Correctness of 0.909, and Faithfulness of 0.970, making it ideal for accuracy-critical tasks. Conversely, IVF–L2–Hierarchical demonstrates the lowest latency (1.736 ns) and cost, making it suitable for real-time deployments. Reranker analysis shows modest but consistent gains for minilm over BGE, while HyDe excels in precision at the expense of efficiency. Notably, no single configuration dominates; optimal designs depend on the application’s needs, whether it is maximizing semantic accuracy, minimizing latency, or striking a balance between the two. By demonstrating concrete trade-offs, this work provides a practical foundation for scaling RAG pipelines across diverse domains, including information retrieval, enterprise search, and knowledgeintensive reasoning.
dc.identifier.citationElkiran, H., & Rasheed, J. (2025). EvaRAG: Evaluating Advanced RAG Techniques With Indexing and Distance Metrics. IEEE Access, 13, 215724-215747.
dc.identifier.doi10.1109/ACCESS.2025.3646665
dc.identifier.endpage215747
dc.identifier.issn2169-3536
dc.identifier.orcid0000-0002-5834-6210
dc.identifier.orcid0000-0003-3761-1641
dc.identifier.scopus2-s2.0-105025908643
dc.identifier.startpage215724
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3646665
dc.identifier.urihttps://hdl.handle.net/20.500.12436/9326
dc.identifier.volume13
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartofIEEE Access
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Öğrenci
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectData retrieval
dc.subjectLarge language model
dc.subjectNatural language processing
dc.subjectQuestion answering systems
dc.subjectRAG
dc.titleEvaRAG: Evaluating Advanced RAG Techniques With Indexing and Distance Metrics
dc.typeArticle
dspace.entity.typePublication
relation.isAuthorOfPublicationf9b9b46c-d923-42d3-b413-dd851c2e913a
relation.isAuthorOfPublication.latestForDiscoveryf9b9b46c-d923-42d3-b413-dd851c2e913a

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
EvaRAG_Evaluating_Advanced_RAG_Techniques_With_Indexing_and_Distance_Metrics.pdf
Boyut:
5.4 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Article file

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: