NRBO-AGP: A Novel Feature Selection Approach for Accurate Protein Solubility Prediction

dc.authorwosidABG-6273-2020
dc.authorwosidIZQ-5218-2023
dc.authorwosidL-8995-2017
dc.contributor.authorElmi, Zahra
dc.contributor.authorElmi, Soheila
dc.contributor.authorDanishvar, Sebelan
dc.contributor.department-temp
dc.date.accessioned2026-05-08T09:40:32Z
dc.date.issued2025
dc.departmentMühendislik ve Doğa Bilimleri Fakültesi
dc.description.abstractProtein solubility determines how well a protein dissolves in an aqueous solution, and this property is a criticalfactor in the functional analysis of proteins and biotechnological applications. Accurately estimating solubilitycan provide significant advantages in areas such as protein engineering and drug discovery. This study proposes anew feature selection method, Newton-Raphson-based Optimization and Adaptive Gradient Perturbation (NRBOAGP) for predicting protein solubility. The research combines the accuracy and speed of the Newton-Raphsonmethod with the capacity of population-based optimization techniques to balance exploration and exploitation. Using 3144 protein sequences from the eSOL database, descriptor features were obtained for each protein,resulting in a dataset with 3104 features. The performance of NRBO-AGP was compared with eight differentmetaheuristic algorithms and evaluated using five regression models: MLP, AdaBoost, Gradient Boosting Trees,Random Forest, and Support Vector Regressor (SVR). The best results were obtained with the Gradient Boostingand Random Forest. Mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination(𝑅2) metrics were used for performance evaluation. The results show that NRBO-AGP outperforms other metaheuristic algorithms in all regression models. The best results were achieved with Gradient Boosting and RandomForest, reaching MAE:0.0001 ± 0.0000, RMSE: 0.0008 ± 0.0000, and 𝑅2: 0.9908 ± 0.0005, and MAE: 0.0002 ± 0.0000,RMSE: 0.0025 ± 0.0000, and 𝑅2: 0.9908 ± 0.0005. These findings show that NRBO-AGP is an effective feature selection tool for predicting protein solubility. Multiple statistical analyses based on Friedman and Nemenyi testsshow that the NBRO-AGP method exhibits statistically significant superior performance (𝑝 < .05) compared toother metaheuristic algorithms in MAE and RMSE metrics and also achieves the highest performance in the 𝑅2score.
dc.identifier.citationElmi, Z., Elmi, S., & Danishvar, S. (2025). NRBO-AGP: A Novel Feature Selection Approach for Accurate Protein Solubility Prediction, 129194.
dc.identifier.doi10.1016/j.eswa.2025.129194
dc.identifier.endpage27
dc.identifier.issn0957-4174
dc.identifier.issn1873-6793
dc.identifier.orcid0000-0003-1487-8570
dc.identifier.startpage1
dc.identifier.urihttps://doi.org/10.1016/j.eswa.2025.129194
dc.identifier.urihttps://hdl.handle.net/20.500.12436/9504
dc.identifier.volume296
dc.identifier.wos001546967000003
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherElsevier Ltd.
dc.relation.ispartofExpert Systems With Applications
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectDrug discovery
dc.subjectProtein solubility prediction
dc.subjectMetaheuristic approach
dc.subjectFeature selection
dc.titleNRBO-AGP: A Novel Feature Selection Approach for Accurate Protein Solubility Prediction
dc.typeArticle
dspace.entity.typePublication

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
1-s2.0-S0957417425028106-main.pdf
Boyut:
8.2 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Article file

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: