Welcome to visit Zhongnan Medical Journal Press Series journal website!

Home Articles Vol 30,2026 No.5 Detail

Stacking ensemble analysis of key genes in pulmonary fibrosis

Published on Jun. 12, 2026Total Views: 106 times Total Downloads: 24 times Download Mobile

Author: ZHANG Xu 1 ZHANG Shiping 2 HAO Zhihang 1 LI Junyan 2 TAN Nana 2 NI Shifeng 3 WANG Le 2 HU Jingbo 4 WANG Huan 1

Affiliation: 1. School of Computer, Baoji University of Arts and Sciences, Baoji 721000, Shaanxi Province, China 2. College of Chemistry and Material Engineering, Baoji University of Arts and Sciences, Baoji 721013, Shaanxi Province, China 3. College of Life Sciences, Northwestern University, Xi'an 710069, China 4. School of Electronic and Electrical Engineering, Baoji University of Arts and Sciences, Baoji 721000, Shaanxi Province, China

Keywords: Pulmonary fibrosis Stacking ensemble Logistic regression Support vector machine Feature importance

DOI: 10.12173/j.issn.2097-4922.202603067

Reference: ZHANG Xu, ZHANG Shiping, HAO Zhihang, LI Junyan, TAN Nana, NI Shifeng, WANG Le, HU Jingbo4, WANG Huan1.Stacking ensemble analysis of key genes in pulmonary fibrosis[J]. Yaoxue QianYan Zazhi, 2026, 30(5): 748 - 758. DOI: 10.12173/j.issn.2097-4922.20260306710.12173/j.issn.2097-4922.202603067[Article in Chinese]

  • Abstract
  • Full-text
  • References
Abstract

Objective  To construct an ensemble learning model based on the stacking strategy, and to improve the stability of feature screening for high-dimensional, small-sample pulmonary fibrosis (PF) gene expression data and to identify candidate key genes associated with the disease.

Methods  The PF transcriptome datasets GSE70866 and GSE48149 were obtained from the Gene Expression Omnibus (GEO) database. GSE70866 was used as the training set for model construction and feature selection, while GSE48149 was used as an independent validation dataset to evaluate the model's generalization ability. Candidate key genes were screened after data preprocessing and differential expression analysis. In this study, a stacking ensemble framework composed of multiple base learners was constructed. Meta-features were generated through K-fold cross-validation, and Logistic regression (LR) together with support vector machine (SVM) were employed as Meta-learners for final classification. Model performance was evaluated using the F1-score and the area under the curve (AUC), while candidate genes were identified based on feature importance ranking.

Results  The constructed ensemble learning model demonstrated strong discriminative performance on the training dataset GSE70866, achieving an F1-score of 0.955 9 and an AUC of 0.948 2, outperforming individual models overall. On the independent validation dataset GSE48149, the confusion matrix further indicated that the model possessed good generalization capability. Based on integrated feature importance analysis across multiple models, candidate key genes were identified, including thioredoxin-like protein 4B (TXNL4B), C-C motif chemokine ligand 18 (CCL18), and ubiquitin-conjugating enzyme E2 Z (UBE2Z).

Conclusion  The stacking-based ensemble learning framework can improve model stability and prediction accuracy under high-dimensional, small-sample conditions, providing an effective method for screening candidate genes in transcriptome data and offering data-level support for exploring PF-related molecular processes.

Full-text
Please download the PDF version to read the full text: download
References

1. Wang JH, Li K, Hao D, et al. Pulmonary fibrosis: pathogenesis and therapeutic strategies[J]. MedComm, 2024, 5(10): e744. DOI: 10.1002/mco2.744.

2. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis[J]. BMC Bioinformatics, 2008, 9(1): 559. DOI: 10.1186/1471-2105-9-559.

3. Rezk SS, Selim KS. Metaheuristic-based ensemble learning: an extensive review of methods and applications[J]. Neural Comput Appl, 2024, 36(29): 17931-17959. DOI: 10.1007/s00521-024-10203-4.

4. Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics[J]. Nat Rev Genet, 2015, 16(6): 321-332. DOI: 10.1038/nrg3920.

5. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository[J]. Nucleic Acids Res, 2002, 30(1): 207-210. DOI: 10.1093/nar/30.1.207.

6. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets-update[J]. Nucleic Acids Res, 2012, 41(D1): D991-D996. DOI: 10.1093/nar/gks1193.

7. Meaney C, Wang X, Guan J, et al. Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users[J]. BMC Med Res Methodol, 2025, 25(1): 134. DOI: 10.1186/s12874-025-02561-x.

8. Hua Y, Wang L, Nguyen V, et al. A deep learning approach for transgender and gender diverse patient identification in electronic health records[J]. J Biomed Inform, 2023, 147: 104507. DOI:10.1016/j.jbi.2023.104507.

9. Pisner DA, Schnyer DM. Support vector machine[M]. Amsterdam: Elsevier, 2020: 101-121.

10. He W, Su X, Chen L, et al. Potential biomarkers and therapeutic targets of idiopathic pulmonary arterial hypertension[J]. Physiol Rep, 2022, 10(1): e15101. DOI: 10.14814/phy2.15101.

11. Dalmolin M, Azevedo KS, Souza LCD, et al. Feature selection in cancer classification: utilizing explainable artificial intelligence to uncover influential genes in machine learning models [J]. AI, 2024, 6(1): 2. DOI: 10.3390/ai6010002.

12. De Bernabé DB-V, Inamori KI, Yoshida-Moriguchi T, et al. Loss of α-dystroglycan laminin binding in epithelium-derived cancers is caused by silencing of large[J]. J Biol Chem, 2009, 284(17): 11279-11284. DOI: 10.1074/jbc.C900007200.

13. Schelp J, Monte D, Dewitte F, et al. Structure of UBE2Z enzyme provides functional insight into specificity in the FAT10 protein conjugation machinery[J]. J Biol Chem, 2016, 291(2): 630-639. DOI: 10.1074/jbc.M115.671545.

14. Poliseno L. Pseudogenes: Functions and Protocols[M]. Cham: Springer, 2021: 131-147.

15. Prasse A, Probst C, Bargagli E, et al. Serum CC-chemokine ligand 18 concentration predicts outcome in idiopathic pulmonary fibrosis[J]. Am J Respir Crit Care Med, 2009, 179(8): 717-723. DOI: 10.1164/rccm.200808-1201OC.

16. Ghanbar MI, Villabona-Rueda A, Philip N, et al.Macrophage CCL18 promotes lung inflammation in checkpoint inhibitor pneumonitis[J]. Am J Respir Cell Mol Biol, 2025, Online ahead of print. DOI: 10.1165/rcmb.2025-0405OC.

17. Ju Z, Xiang J, Xiao L, et al. TXNL4B regulates radioresistance by controlling the PRP3‐mediated alternative splicing of FANCI[J]. MedComm, 2023, 4(3): e258. DOI: 10.1002/mco2.258.

18. 张正轩. TGF-β因子对原代成纤维细胞基因表达与可变剪接的调控机制研究[D]. 内蒙古包头: 内蒙古科技大学, 2025. DOI: 10.27724/d.cnki.gnmgk.2025.000582.

Popular papers
Last 6 months