An Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System

dc.authorscopusid59149323900
dc.authorscopusid57791962400
dc.contributor.authorElkıran, Harun
dc.contributor.authorRasheed, Jawad
dc.contributor.authorRasheed, Jawad
dc.contributor.department-temp
dc.date.accessioned2026-03-18T11:08:40Z
dc.date.issued2026
dc.departmentLisansüstü Eğitim Enstitüsü
dc.departmentMühendislik ve Doğa Bilimleri Fakültesi
dc.description.abstractRetrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for improving Large Language Models (LLMs) by incorporating external knowledge retrieval. RAG primarily aims to address the hallucination problem in LLMs that rely on extensive knowledge bases. A RAG system depends critically on design choices, including indexing strategies, retrieval methods, similarity metrics, and reranking models. The selection of configuration makes a RAG effective. Although the RAG system has received sufficient attention, there is very limited work on understanding the relative contributions of these components, and their statistical significance remains insufficiently understood. In this study, we conduct a comprehensive empirical evaluation of a modular RAG pipeline by systematically varying index structures, retrievers, rerankers, and similarity metrics. We evaluated performance using standard retrieval metrics such as Recall, Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, and Coverage; generationoriented quality metrics such as Correctness, Faithfulness, and Relevance; latency; and cost. Statistical robustness is ensured through ANOVA, effect size estimation, and multivariate regression analysis. Based on our results, the retriever and similarity metric choices dominate system performance, yielding statistically significant improvements with p-values less than 10−9 for retriever effects on R@1 and Coverage. At the same time, index selection exhibits a negligible impact across most metrics. Reranking primarily affects reranked metrics and downstream correctness, with MiniLM consistently outperforming BGE.
dc.identifier.citationElkiran, H., & Rasheed, J. (2026). An Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System. IEEE Access, 14, 26053-26066.
dc.identifier.doi10.1109/ACCESS.2026.3664852
dc.identifier.endpage26066
dc.identifier.issn2169-3536
dc.identifier.orcid0000-0002-5834-6210
dc.identifier.orcid0000-0003-3761-1641
dc.identifier.scopus2-s2.0-105030591844
dc.identifier.startpage26053
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2026.3664852
dc.identifier.urihttps://hdl.handle.net/20.500.12436/9274
dc.identifier.volume14
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartofIEEE Access
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Öğrenci
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectRetrieval-augmented generation
dc.subjectInformation retrieval
dc.subjectRetrieval
dc.subjectReranking
dc.subjectSimilarity/distance metrics
dc.titleAn Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System
dc.typeArticle
dspace.entity.typePublication
relation.isAuthorOfPublicationf9b9b46c-d923-42d3-b413-dd851c2e913a
relation.isAuthorOfPublication.latestForDiscoveryf9b9b46c-d923-42d3-b413-dd851c2e913a

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
An_Empirical_Evaluation_of_Retrieval_Reranking_and_Similarity_for_a_QampA-Based_Retrieval_Augmented_Generation_System.pdf
Boyut:
2.2 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Article file

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: