Data Mining: The Textbook

Data Mining: The Textbook, Springer, May 2015
Charu C. Aggarwal.

Comprehensive textbook on data mining: Table of Contents
PDF Download Link (Free for computers connected to subscribing institutions only)
Buy hard-cover or PDF (PDF has embedded links for navigation on e-readers)
Buy low-cost paperback edition (Instructions for computers connected to subscribing institutions only)

The emergence of data science as a discipline requires the development of a book that goes beyond the traditional focus of books on fundamental data mining problems. More emphasis needs to be placed on the advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. This comprehensive data mining book explores the different aspects of data mining, starting from the fundamentals, and subsequently explores the complex data types and their applications. Therefore, this book may be used for both introductory and advanced data mining courses. The chapters of this book fall into one of three categories:
The fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems.
Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data.
Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor.
The book carefully balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners. Numerous illustrations, examples, and exercises are included with an emphasis on semantically interpretable examples.

Cost-effective methods for obtaining electronic and hardcopy versions

The book is available in both hardcopy (hardcover) and electronic versions. The hardcover is available at all the usual channels (e.g, Amazon, Barnes and Noble etc.), in Kindle format, and also directly from Springer in hardcopy and pdf format. The good thing about Springer is that electronic versions are often widely accessible at no cost to subscribing institutions, which is particularly convenient for students. My understanding is that a very large fraction of universities in North America, Europe, Australia, and New Zealand are subscribers, and a rapidly increasing number of universities in Asia are also subscribing. The electronic version is available at the following Springerlink pointer . For subscribing institutions click from a computer directly connected to your institution network to download the book for free. Springer uses the domain name of your computer to regulate access. To be eligible, your institution must subscribe to "e-book package english (Computer Science)" or "e-book package english (full collection)". If your institution is eligible, you will see a (free) `Download Book' button. Otherwise you will see a (paid) `Get Access' button. Sometimes you may be able to download it from your library e-collection, even when it is not Web-accessible from your institution. For those who prefer desk copies rather than electronic books, there are some very cost-effective methods to obtain a paperback MyCopy edition for $25 or less (subscribing institutions only). If you have ever published an article (even journal) with Springer, you are also entitled to an additional 40% author discount for any Springer book (including the $25 paperback edition) using the approach described here .
In general, for electronic versions, I highly recommend buying the PDF directly from springer over Amazon's Kindle edition. The PDF has embedded links that allows navigation over an e-reader, and will take about 18 MB on your device. Aside from this, one PDF allows you use over any device or computer. Since the PDFs are fully produced by Springer (rather than Amazon kindle, where Amazon plays a role in conversion), the look and feel is fully controlled by author and publisher. This makes the PDF versions of better quality than an Amazon Kindle.

About the Author

Charu Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. from IIT Kanpur in 1993 and his Ph.D. from Massachusetts Institute of Technology in 1996. He has worked extensively in the field of data mining, with particular interests in data streams, privacy, uncertain data and social network analysis. He has published 14 (3 authored and 11 edited) books, over 250 papers in refereed venues, and has applied for or been granted over 80 patents. His h-index is 70. Because of the commercial value of the above-mentioned patents, he has received several invention achievement awards and has thrice been designated a Master Inventor at IBM. He is a recipient of an IBM Corporate Award (2003) for his work on bio-terrorist threat detection in data streams, a recipient of the IBM Outstanding Innovation Award (2008) for his scientific contributions to privacy technology, and a recipient of an IBM Research Division Award (2008) for his scientific contributions to data stream research. He has received two best paper awards and an EDBT Test-of-Time Award (2014). He has served as the general or program co-chair of the IEEE Big Data Conference (2014), the ICDM Conference (2015), the ACM CIKM Conference (2015), and the KDD Conference (2016). He also co-chaired the data mining track at the WWW Conference 2009. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering from 2004 to 2008. He is an associate editor of the ACM Transactions on Knowledge Discovery and Data Mining Journal , an action editor of the Data Mining and Knowledge Discovery Journal , an associate editor of the IEEE Transactions on Big Data, and an associate editor of the Knowledge and Information Systems Journal. He is editor-in-chief of the ACM SIGKDD Explorations. He is a fellow of the SIAM (2015), ACM (2013) and the IEEE (2010) for "contributions to knowledge discovery and data mining techniques."

Solution Manual for Book

The solution manual for the book is available here from Springer. There is a link for the solution manual on this page. If you are an instructor, then you can obtain a copy. Please do not ask me directly for a copy of the solution manual. It can only be distributed by Springer.

Resources for book

The resources for this book will grow over time. Currently, I have not found time to prepare slides for teaching and will add them over time. I will also make the powerpoint figures of the book available soon. Meanwhile, I have added links to various sites on the internet where software is available for related material. In case you use the book and prepare slides, please try to share them on the internet. I can add a link from my site like the links below (with your name acknowledged of course).

Chapter 1: An Introduction to Data Mining

Data Sets from UCI Machine Learning Repository

Open Source Data Mining Software (WEKA Workbench)

Scikit-learn Tools (Python)

Apache Mahout Machine Learning Library

Spider Machine Learning Library (MATLAB)

KD-Nuggets: Resources in Data Mining

IBM SPSS Software Suite

Data Mining: The Textbook

Data Mining: The Textbook, Springer, May 2015 Charu C. Aggarwal.

Cost-effective methods for obtaining electronic and hardcopy versions

About the Author

Solution Manual for Book

The solution manual for the book is available here from Springer. There is a link for the solution manual on this page. If you are an instructor, then you can obtain a copy. Please do not ask me directly for a copy of the solution manual. It can only be distributed by Springer.

Resources for book

Chapter 2: Data Preparation

Chapter 3: Similarity and Distances

Chapters 4 and 5: Association Pattern Mining

Chapters 6 and 7: Data Clustering

Chapters 8 and 9: Outlier Analysis

Chapters 10 and 11: Data Classification

Chapter 12: Data Streams

Chapter 13: Mining Text Data

Chapter 14: Mining Time-Series Data

Chapter 15: Mining Discrete Sequences

Chapter 16: Mining Spatial Data

Chapter 17: Mining Graph Data

Chapter 18: Mining Web Data

Chapter 19: Social Network Analysis

Chapter 20: Privacy-Preserving Data Mining

Data Mining: The Textbook, Springer, May 2015
Charu C. Aggarwal.