An Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System

Elkıran, Harun; Rasheed, Jawad

doi:10.1109/ACCESS.2026.3664852

An Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System

Dosyalar

An_Empirical_Evaluation_of_Retrieval_Reranking_and_Similarity_for_a_QampA-Based_Retrieval_Augmented_Generation_System.pdf (2.2 MB)

Tarih

2026

Yazarlar

Elkıran, Harun

Rasheed, Jawad

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for improving Large Language Models (LLMs) by incorporating external knowledge retrieval. RAG primarily aims to address the hallucination problem in LLMs that rely on extensive knowledge bases. A RAG system depends critically on design choices, including indexing strategies, retrieval methods, similarity metrics, and reranking models. The selection of configuration makes a RAG effective. Although the RAG system has received sufficient attention, there is very limited work on understanding the relative contributions of these components, and their statistical significance remains insufficiently understood. In this study, we conduct a comprehensive empirical evaluation of a modular RAG pipeline by systematically varying index structures, retrievers, rerankers, and similarity metrics. We evaluated performance using standard retrieval metrics such as Recall, Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, and Coverage; generationoriented quality metrics such as Correctness, Faithfulness, and Relevance; latency; and cost. Statistical robustness is ensured through ANOVA, effect size estimation, and multivariate regression analysis. Based on our results, the retriever and similarity metric choices dominate system performance, yielding statistically significant improvements with p-values less than 10−9 for retriever effects on R@1 and Coverage. At the same time, index selection exhibits a negligible impact across most metrics. Reranking primarily affects reranked metrics and downstream correctness, with MiniLM consistently outperforming BGE.

Anahtar Kelimeler

Retrieval-augmented generation, Information retrieval, Retrieval, Reranking, Similarity/distance metrics

Kaynak

IEEE Access

Cilt

14

Künye

Elkiran, H., & Rasheed, J. (2026). An Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System. IEEE Access, 14, 26053-26066.

Bağlantı

https://doi.org/10.1109/ACCESS.2026.3664852
https://hdl.handle.net/20.500.12436/9274

Koleksiyon

Bilgisayar Mühendisliği Bölümü Koleksiyonu
LEE Doktora Programı Makale Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

An Empirical Evaluation of Retrieval, Reranking, and Similarity for a Q&A-Based Retrieval Augmented Generation System

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon

Onay

İnceleme

Ekleyen

Referans Veren