Efficient approaches applied to hierarchical clustering in
Analysing the ALL / AML
Efficient approaches applied to hierarchical clustering in analysing the ALL / AML
Published 20, March 2013Cancer research is one of the major research areas in the medical field. Accurate prediction of different tumour types has great value in providing better treatment and toxicity minimization on the patients. Until now, cancer classification has been based primarily on morphological appearance and clinical based of tumor (Golub et al., 1999). This has serious limitations because of ambiguity. These conventional cancer classification methods are reported to have several limitations (Azuaje, 2000) in their diagnostic ability. It has been suggested that specifications of therapies according to tumor type‟s differentiated by pathogenetic patterns may maximize the efficacy of the patients (Alizadeh, 2000; Golub et al., 1999; Veer et al, 2002; Pomeroy, 2002; Zajchowski et al., 2001; Sorlie et al., 2001; Dubitzky, 2002; Veer & Jone, 2002; DeRisi et al., 1996). Also, the existing tumour classes have been found to be heterogeneous and comprises of diseases that are molecularly distinct and follow divergent clinical courses.
In particular classification of cancer types (Golub et al., 1999; Alizadeh et al., 2000; Bittner et al., 2000; Nielsen et al 2002; Tibshirani et al 2002 & Parmigiani et al 2002), conventional diagnostic procedures involve morphological, clinical, and molecular studies of the tissue, which both are highly subjective in their analysis and cause inconvenience and discomfort to the patient. Microarray experiments offer an alternative (or additional), objective means of cell classification through some predetermined functional of the gene expression levels for a new tissue sample of an unknown type. Whilst potentially very powerful, the statistical robustness of these methods is still hampered by the “large p, small n” problem; a microarray slide can typically hold tens of thousands of gene fragments whose responses here act as the predictor variables (p), whilst the number of patient tissue samples (n)available in such studies is much less (for the above examples, 38 in Golub et al, 96 in Alizadeh et al, 38 in Bittner et al, 41 in Nielsen et al, 63 in Tibshirani et al, and 80 in Parmigiani et al). It is obvious that those existing classification methods were not designed to handle this kind of data anciently and electively.
In order to gain a better insight into the problem of cancer classification, systematic approaches based on global gene expression analysis have been proposed. The present studyaimed to evaluate using the microarray gene expression data of human acute leukemia, and the target is to distinguish between ALL (acute lymphoblastic leukemia) and AML (acute myeloid leukemia), which is a typical cancer classification problem not well solved despite many years of efforts. This research aimed on the classification (prediction) part of this problem (Zhang & Ke, 2000) using the 2 datasets of standard leukemia for training and testing is obtained from ALL/ALM datasets23 and the performance of the proposed technique on clustering the ground truth data of the cancer classes, namely, acute myeloid leukemia (AML) and acute lymphoblast leukemia (ALL) are demonstrated. This high-dimensional training dataset is subjected to a multi-stage clustering technique, which performs clustering at diverse levels. The findings of the study indicated that in comparison to other methods, the proposed technique is faster when compared to existing clustering techniques in terms of performance. In addition, the method proposed in this study also helps to reduce data size thus improving the running time. The experimental results based on real datasets have demonstrated that the proposed technique is truly more robust and efficient than traditional hierarchical clustering.
Performing gene selection helps to reduce More importantly, gene selection removes a large number of irrelevant genes which improves the classification accuracy [GWB+00] and thus plays an important role in cancer classification. The expression levels of genes are known to contain the keys to address fundamental problems relating to the prevention and cure of diseases, biological evolution mechanisms and drug discovery. Unlike data from model organisms and cell lines that have uniform genetic background, and where experiments are conducted under controlled conditions, disease samples are typically much more heterogeneous. Differences in the genetic background of the subjects, disease stage, progression, and severity as well as the presence of disease subtypes contribute to the overall heterogeneity. Discovering genes or features that are most relevant to the disease in question and identifying disease subtypes from such heterogeneous data remains an open problem. Due to large variability in gene mutations and gene expression in this population, till date not all patients have the same response to therapy and pose high challenge to physicians for treatment.
The present study classified AML and ALL type of cancer. AML is considered to be an aggressive cancer and patients are often at a high risk of developing a cancer recurrence following therapy, particularly if they are not able to undergo high doses of therapy.
Chromosomal variables of AML cells, as well as levels of cancer cells in the blood, prior hematologic disorders and levels of specific enzymes may help to further distinguish patients into being at a high-risk, standard-risk or low-risk of developing a cancer recurrence, and treatment may be altered according to these stratifications. However, it is estimated that only approximately 50% of patients may be accurately classified into appropriate risk stratifications and hence, further refinement in classifications are needed. The present a multi stage clustering approach helps in determining disease characteristics and ultimately providing a platform to create individual treatment regimens. Further, through this method, one can identify different gene profiles aid in predicting the risk of a cancer recurrence and/or the response potential to specific therapies. Further, clustering approach also helps in predutin the survival probability of more than 57%. (Valk P, Verhaak R, Beijen M, et al , 2004). Thus, gene expression helps to classify patients according to type of cancer ultimately leading to a more individualized treatment approach to improve chances of optimal long-term survival.
In conclusion, the proposed novel multi-stage clustering technique enhanced the cancer diseases classification, prognosis and prediction of responses. This method enhanced the interpretability, visualization of the clustering results, increased robustness by reducing noise, outliers and handling of arbitrary shape and structures in comparison to previous clustering techniques.
Limitations of the study
Till present, though the numbers of genes in the gene expression data are huge, but the numbers of numbers of available data samples are very small. This has hindered the development of effective algorithms for cancer classification. With the limited amount of data sets available and small data size of these data sets, the scalability of the algorithms cannot be tested. Also, comparison of effectiveness of different algorithms cannot be done since they can be only compared on a very few data sets which may not be the representatives of the kind of expression data that will be available in the future. Currently, the gene expression data came from different laboratories; this means the data across those laboratories may not be standardized. As more laboratories acquire this technology, the amounts of large-scale gene expression data and profiles will grow rapidly, leading to a gene expression data explosion. This might introduce the following two issues: First, data from different labs needs to be combined to create a larger data set. Non-standardization of data will introduce noise and error into the classification accuracy. Second, data set from different labs may contain different sets of genes. This means either the data will contain missing values or methods need to be developed to efficiently combine the data from different labs by selecting only the common gene expression data (Lu and Han, 2006).
Researchers to mentor-We write your Assignments & Dissertation
With our team of researchers & Statisticians - Tutors India guarantees your grade & acceptance!
Read MoreReferences
Alizadeh, A., Eisen M. B., Davis R. E., et al. (2000) „Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling‟, Nature, 403(6769), pp.503–511.
Alizadeh, A., et. al. (2000) „Distinct types of diffuse larg e b-cell lymphoma identified by gene expression profiling‟, Nature, 403, pp.503–511.
Azuaje, A. (2000) „Interpretation of genome expression pat terns: computational challenges and opportunities‟, IEEE Engineering in Medicine and Biology.
Bittner, M., Meltzer, P., Chen, Y., et al. (2000) „Molecular classification of cutaneous malignant melanoma by gene expression profiling‟, Nature, 406(6795), pp.536–540.
DeRisi, J., Penland, L, P. and Brown, et al. (1996) „Use of a cdna microarray to analyse gene expression patterns in human cancer‟, Natural Genetics , 4, pp. 457–460.
Dubitzky, W., Granzow, M. and Berrar, D. (2002), „Comparing Symbolic and Subsymbolic
Machine Learning Approaches to Classification of Cancer and Gene Identification‟, Kluwer Academic.
Golub T. R., Slonim D. K., Tamayo P., et al. (1999) „Molecular classification of cancer: class discovery and class prediction by gene expression monitoring‟, Science, 286(5439), pp. 531– 537.
Golub, R., et al. (1999) „Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring‟, Science, pp.531–537.
Parmigiani, G., Garrett E. S., Anbazhagan, R. and Gabrielson, E. (2002) „A statistical framework for expression-based molecular classification in cancer‟, J Roy Statist Soc Ser B, 64(4), pp.717–736
Pomeroy, S., Tamayo, P., Gassenbeek, M., et al. (2002) „Prediction of central nervous embryonal tumour outcome based on gene expression‟, Nature, pp. 436–442.
Sorlie, T., et al. (2001) „Gene expression patterns of breast carcinomas distinguish tumor subclass with clinical implications‟, In Proc of National Academy of Science, pp. 10869– 10874.
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002) „Diagnosis of multiple cancer types by shrunken centroids of gene expression‟, Proc Natl Acad Sci USA, 99(10), pp.6567– 6572.
Veer, L. and Jone, D. (2002) „The microarray way to tailored cancer treatment‟, Nature Medicine, pp. 13–14.
Veer, L., Da, H., Bijver, M., et al. (2002) „Gene expression profiling predicts clinical outcome of breast Cancer‟, Nature, pp.530–536.
Zhang, X. and Ke, H. (2000) „ALL/AML Cancer Classification by Gene Expression Data Using SVM and CSVM Approach‟, Genome Informatics, 11, pp. 237–239.
Lu, Y. and Han, J. (2006) Cancer Classification Using Gene Expression Dat, University of Illinois, Urbana-Champaign, Urbana, U.S.A.
Zajchowski, D., et al. (2001) „Identification of gene expression profiles that predict the aggressive behavior of breast cancer cells‟, Cancer Research, pp. 5168–5178.
Full Fledged Academic Writing & Editing services
Original and high-standard Content
Plagiarism free document
Fully referenced with high quality peer reviewed journals & textbooks
On-time delivery
Unlimited Revisions
On call /in-person brainstorming session
More From TutorsIndia
Coursework Index Dissertation Index Dissertation Proposal Research Methodologies Literature Review Manuscript DevelopmentREQUEST REMOVAL