A Hybrid Vision Transformer with Intra-Attention Architecture for Enhanced Medical Image Retrieval

Sucharitha, G.; Rasheed, Jawad; Potluri, Sirisha

doi:10.1109/avss65446.2025.11149925

A Hybrid Vision Transformer with Intra-Attention Architecture for Enhanced Medical Image Retrieval

dc.authorwosid	DXR-9356-2022
dc.authorwosid	AAY-5207-2020
dc.authorwosid	ADZ-9019-2022
dc.contributor.author	Sucharitha, G.
dc.contributor.author	Rasheed, Jawad
dc.contributor.author	Potluri, Sirisha
dc.contributor.author	Rasheed, Jawad
dc.contributor.department-temp
dc.date.accessioned	2026-06-17T11:24:23Z
dc.date.issued	2025
dc.department	Mühendislik ve Doğa Bilimleri Fakültesi
dc.description	IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS) / IEEE -- ISBN:979-8-3315-1481-5, 979-8-3315-1480-8 -- 2025.
dc.description.abstract	The rapid growth in medical imaging techniques and the expansionof medical image repositories have created a strong need for accurate image retrieval techniques to efficiently retrieve relevant images. In this approach, a Hybrid Vision Transformer (ViT) Architecture with intra-attention mechanism for enhanced image retrieval. This approach integrates the Convolutional Block Attention Module (CBAM) directly with the multi-head self-attention of Vision Transformer (ViT), enabling more adaptive and fine-grained feature refinement. Unlike traditional fusion-based methods, this model dynamically reweights feature representations by leveraging spatial and channel-wise attention at multiple transformer stages. With spatial attention applied at each stage of MSA, ViT learns to focus more on medically significant image regions, while channel attention enables ViT to prioritize the most informative features and suppress irrelevant information. Experimental results demonstrated the significance of proposed method over standalone features of ViT and other existing methods in terms of improved efficiency, precision and recall. These findings suggest that embedding CBAM within ViT’s self-attention layers can enhance retrieval accuracy while maintaining interpretability, making it a promising solution for medical image analysis.
dc.identifier.citation	Sucharitha, G., Rasheed, J., & Potluri, S.. (2025). A Hybrid Vision Transformer with Intra-Attention Architecture for Enhanced Medical Image Retrieval. 1–6. https://doi.org/10.1109/avss65446.2025.11149925
dc.identifier.doi	10.1109/avss65446.2025.11149925
dc.identifier.endpage	6
dc.identifier.isbn	979-8-3315-1481-5
dc.identifier.isbn	979-8-3315-1480-8
dc.identifier.issn	2643-6205
dc.identifier.orcid	0000-0003-3761-1641
dc.identifier.startpage	1
dc.identifier.uri	https://doi.org/10.1109/avss65446.2025.11149925
dc.identifier.uri	https://hdl.handle.net/20.500.12436/9623
dc.identifier.wos	WOS:001588601200066
dc.indekslendigikaynak	Web of Science
dc.language.iso	en
dc.publisher	IEEE
dc.relation.ispartof	IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A Hybrid Vision Transformer with Intra-Attention Architecture for Enhanced Medical Image Retrieval
dc.type	Conference Object
dspace.entity.type	Publication
relation.isAuthorOfPublication	f9b9b46c-d923-42d3-b413-dd851c2e913a
relation.isAuthorOfPublication.latestForDiscovery	f9b9b46c-d923-42d3-b413-dd851c2e913a

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Sucharitha-2025-A-hybrid-vision-transformer-with-in.pdf
Boyut:: 674.29 KB
Biçim:: Adobe Portable Document Format
Açıklama:: Proceedings file

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bilgisayar Mühendisliği Bölümü Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu