Learning from data: Concepts, Theory, and Methods Cherkassky V., Mulier F. Wiley-IEEE Press, 2007. 0 pp. Type: Book -------------------------------------------------------------------------------- Learning (I.2.6...); Deduction And Theorem Proving (I.2.3...); Knowledge Representation Formalisms And Methods (I.2.4...); General (I.5.0...) Design, Theory, Algorithms -------------------------------------------------------------------------------- There is a popular saying: “We are drowning in data, but starving for knowledge. ” Terabytes and petabytes of data are common in industrial and scientific datasets, and there is a definite need for efficient algorithms for analyzing such data. New algorithms are continuously in demand, mainly due to the variety of data involving text, and multimedia, and being one of the following: heterogeneous, distributed, streaming, high-dimensional and spatio-temporal, and so on. Often, traditional learning methods cannot be applied to these datasets, either due to their enormous size or due to the ill condition of the data itself. As evident, there are several challenges for designing an effective learning method. Even though most of these learning methods are data driven, sometimes it is also important for the algorithm designers to interact with the application experts to find out high level features that could help to obtain meaningful knowledge. A learning algorithm estimates an unknown mapping (dependency) between a system’s inputs and outputs from the available data. In this book, Cherkassky and Mulier attempt to present some of the important principles and issue in the field of learning dependencies from data. The authored collection of 11 chapters is organized into three parts. Part 1 (chapters 1 through 4) discuss the main theoretical concepts. A short introduction is provided in chapter 1, followed by inductive learning and regularization frameworks in chapters 2 and 3, respectively. Chapter 2 helps the reader to formulate a learning problem according to the problem environment. The authors discuss the motivation and theory behind the inductive learning principle of regularization, with a focus on the curse and complexity of dimensionality, choosing the model of optimal complexity, and so on. Chapter 4 presents the most popular statistical learning theory, which allows the reader to understand the various generalized learning methods, and how such methods could be used in developing neural networks, and support vector machines (SVM) for pattern recognition problems. Part II (chapters 5 through 8) introduces the different constructive learning methods, namely regression, classification, and density approximation problems. Various non-linear optimization strategies are presented in chapter 5, with a focus on stochastic approximation, iterative methods, and greedy optimization. Different methods for data and dimensionality reduction are presented in chapter 6. They include vector quantization and clustering, dimensionality reduction using statistical methods, and neural network methods. The chapter concludes with an introduction to methods for multivariate data analysis, including principal component analysis and independent component analysis. The authors present different regression methods in chapter 7. The methods include linear estimators, adaptive dictionary methods, adaptive kernel methods, and combinations of several methods. Classification problems are addressed in chapter 8, with a nice introduction to statistical decision theory and Fisher’s linear discriminant analysis. Further, the different classification methods like regression based methods, tree based methods, and nearest neighbor methods are discussed. The authors illustrate the importance of combining different methods, and also introduce bagging and boosting to improve the generalization of learning methods. Part III (chapters 9 through 11) address constructive learning approaches, with a focus on the Vapnik-Chervonenkis (VC) theoretical framework for predictive learning. The main focus of chapter 9 is on SVM for several inductive learning problems involving regression and classification problems. In chapter 10 the authors present the concepts of non-inductive interference and alternative learning formulations, with some nice illustrations. Short concluding remarks are provided in chapter 11. The authors have succeeded in summarizing some of the recent trends and future challenges in different learning methods, including enabling technologies and some interesting practical applications. The chapters are well organized, with most of the content explained well, without using many additional references. An interesting aspect that I noticed in some chapters is the illustration of the use of multiple approaches by showing that no single best method exists for all data mining problems. In Part 1, in the introduction, the authors could have mentioned something about non-statistical learning approaches like genetic programming, learning classifier systems, and so on. A standalone chapter on non-statistical learning would have been ideal. Even though there are plenty of illustrations, more real world applications would have been better. This book does not present much novel research, and might be ideal for a beginner in the data mining field. The best feature of this book is its simple presentation style, which does not use much mathematics. The contents and the references presented could serve as first aid information for many advanced data mining topics, however. I recommend this book for engineers, scientists, and practitioners who would like a state-of-the-art research overview of some of the statistical learning methods widely practiced in the data mining community. Finally, I would like to congratulate Cherkassky and Mulier for taking up this interesting challenge, and putting together all the relevant work in the area of statistical data mining.