当前位置:首页 / 基于随机森林算法及人工神经网络的多形性胶质母细胞瘤诊断模型的构建
论著.生物信息技术 | 更新时间:2024-02-19
|
基于随机森林算法及人工神经网络的多形性胶质母细胞瘤诊断模型的构建
Establishment of diagnostic model of glioblastoma multiforme based on random forest algorithm and artificial neural network

广西医学 2023第45卷22期 页码:2717-2724

作者机构:李清杨,在读硕士研究生,研究方向为脑血管病。

基金信息:常德市技术研发和技术创新引导项目(常科函〔2022〕51号)

DOI:10.11675/j.issn.0253-4304.2023.22.10

  • 中文简介
  • 英文简介
  • 参考文献

目的 基于随机森林算法及人工神经网络(ANN)构建多形性胶质母细胞瘤(GBM)的诊断模型。方法 从GEO数据库中下载GSE4290、GSE50161数据集作为训练集,下载GSE66354、GSE11650、GSE15824数据集作为验证集。使用R语言4.2.1软件筛选训练集中GBM脑组织样本及正常脑组织样本之间的差异表达基因(DEGs)。使用Metascape在线工具对DEGs进行聚类分析,使用R语言4.2.1软件对DEGs进行基因本体论(GO)功能富集分析和京都基因与基因组百科全书(KEGG)通路富集分析。基于DEGs应用随机森林算法筛选关键基因,再以关键基因构建诊断GBM的ANN模型。绘制受试者工作特征(ROC)曲线,分别利用训练集和验证集对ANN模型进行内部和外部验证 。结果 共筛选出461个DEGs。聚类分析结果显示,DEGs主要富集于跨突触信号、神经元系统、调节化学突触传递等;GO功能富集分析结果显示,DEGs主要参与的生物过程包括调节跨突触信号、转运神经递质等,富集的细胞组分主要包括谷氨酸能突触、离子通道复合物等,主要涉及的分子功能包括γ-氨基丁酸(GABA)-A受体活性、GABA-门控氯离子通道活性等;KEGG通路富集分析结果显示,DEGs主要富集在调节化学突触传递、调节跨突触信号、运输神经递质等信号通路。筛选出6个关键基因,即KIAA0101、DnaJ(Hsp40)同源物亚家族C成员6(DNAJC6)、凝血因子Ⅱ凝血酶受体(F2R)、富含亮氨酸重复及免疫球蛋白结构域蛋白2(LINGO2)、微小染色体维持蛋白2(MCM2)、促肾上腺皮质激素释放激素(CRH)。基于上述6个关键基因构建的ANN模型诊断GBM的曲线下面积为0.952~1.000。结论 与GBM相关的关键DEGs有6个,分别为KIAA0101、DNAJC6、F2R、LINGO2、MCM2、CRH,基于这些基因构建的ANN模型对GBM具有较高的诊断效能。

ObjectiveTo establish the diagnostic model of glioblastoma multiforme (GBM) based on random forest algorithm and artificial neural network (ANN). MethodsThe datasets of GSE4290 and GSE50161 from the GEO database were downloaded as training set, and the datasets of GSE66354, GSE11650, and GSE15824 were downloaded as validation set. Differentially expressed genes (DEGs) were screened from GBM brain tissues samples and normal brain tissues samples in the training set by employing the R language 4.2.1 software. The Metascape online tool was used for cluster analysis on DEGs, and the R language 4.2.1 software was employed to perform Gene Ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis on DEGs. Key genes were screened by the random forest algorithm based on DEGs, and then the ANN model for diagnosing GBM was established through the key genes. The receiver operating characteristic (ROC) curve was drawn, and internal and external validation on ANN model were performed by using the training set and the validation set, respectively. ResultsA total of 461 DEGs were screened. The results of cluster analysis revealed that DEGs were mainly enriched in trans-synaptic signaling, neuronal system, and regulation of chemical synaptic transmission, etc.; in addition, the results of GO functional enrichment analysis indicated that DEGs were mainly involved in biological processes including the regulation of trans-synaptic signaling and transport of neurotransmitters, etc., involved in cellular compositions mainly including glutamatergic synapses and ion channel complexes, etc., and mainly involved in molecular functions containing γ-aminobutyric acid (GABA)-A receptor activity and GABA-gated chloride channel activity, etc. The results of KEGG pathway enrichment analysis interpreted that DEGs were mainly enriched in signaling pathways in terms of regulating chemical synaptic transmission, regulating trans-synaptic signaling, and transporting neurotransmitters. A total of 6 key genes were screened, namely KIAA0101, DnaJ (Hsp40) homolog subfamily C member 6 (DNAJC6), coagulation factor Ⅱ thrombin receptor (F2R), leucine rich repeat and immunoglobulin domain containing 2 (LINGO2), minichromosome maintenance complex component (MCM)2, and corticotropin-releasing hormone (CRH). Areas under the curve of ANN model established based on aforementioned 6 key genes for diagnosing GBM were 0.952-1.000. ConclusionThere are six key DEGs related to GBM, namely KIAA0101, DNAJC6, F2R, LINGO2, MCM2, and CRH, respectively, and the ANN model established based on these genes for GBM exerts relatively high diagnostic efficiency.

889

浏览量

167

下载量

0

CSCD

工具集