Design of an Intelligent System for Improving Classification of Cancer Diseases
Abstract
The methodologies that depend on gene expression profile have been able to detect cancer since its inception. The previous works have spent great efforts to reach the best results. Some researchers have achieved excellent results in the classification process of cancer based on the gene expression profile using different gene selection approaches and different classifiers
Early detection of cancer increases the probability of recovery. This thesis presents an intelligent decision support system (IDSS) for early diagnosis of cancer-based on the microarray of gene expression profiles. The problem of this dataset is the little number of examples (not exceed hundreds) comparing to a large number of genes (in thousands). So, it became necessary to find out a method for reducing the features (genes) that are not relevant to the investigated disease to avoid overfitting. The proposed methodology used information gain (IG) for selecting the most important features from the input patterns. Then, the selected features (genes) are reduced by applying the Gray Wolf Optimization algorithm (GWO). Finally, the methodology exercises support vector machine (SVM) for cancer type classification. The proposed methodology was applied to three data sets (breast, colon, and CNS) and was evaluated by the classification accuracy performance measurement, which is most important in the diagnosis of diseases. The best results were gotten when integrating IG with GWO and SVM rating accuracy improved to 96.67% and the number of features was reduced to 32 feature of the CNS dataset.
This thesis investigates several classification algorithms and their suitability to the biological domain. For applications that suffer from high dimensionality, different feature selection methods are considered for illustration and analysis. Moreover, an effective system is proposed. In addition, Experiments were conducted on three benchmark gene expression datasets. The proposed system is assessed and compared with related work performance.
Presentation
Download from Slideshare
Thesis
Download from Google Drive