Machine Learning for Text, Springer, March 2018
Charu C. Aggarwal.
Comprehensive textbook on text mining:
Table of Contents
PDF Download Link (Free for computers connected to subscribing institutions only)
Buy hard-cover or PDF (for general public)
Buy low-cost paperback edition using link on right of destination page (A link for buying the softcover at the discounted price of $25 shows on the destination page for computers connected to subscribing institutions only)
This book covers machine learning techniques from
text using both bag-of-words and sequence-centric methods. The scope of
coverage is vast, and it includes traditional information retrieval methods
and also recent methods
from neural networks and deep learning. The
chapters of this book can be organized into three categories:
Classical machine learning methods: These chapters discuss the
classical machine learning methods such as matrix factorization, topic modeling, dimensionality reduction,
clustering, classification, linear models, and evaluation. All these techniques treat text as a bag of words.
Contextual learning methods that combine
different types of text and also combine text with heterogeneous data types are covered.
Classical information retrieval and search engines:
Although this book is focussed on text mining, the importance of retrieval and ranking methods
in mining applications is quite significant. Therefore, the book covers the key aspects of
information retrieval, such as data structures, Web ranking, crawling, and search engine design.
Importance is given to different types of information retrieval scoring models and learning-to-rank techniques.
Sequence-centric, deep learning, and linguistic methods for mining: While the
bag-of-words representation can be useful for traditional applications like classification and clustering,
more advanced applications like machine translation, image captioning, opinion mining, information extraction, and
text segmentation require one to treat text as a sequence. These chapters discuss methods for sequence-centric
mining methods such as deep learning techniques, word2vec, recurrent neural networks, LSTMs, maximum entropy Markov models, and Conditional
Random Fields. Custom methods for applications like text summarization, opinion mining, and event detection are also
The book can be used as a textbook and it contains numerous exercises. However, it is also
designed to be useful to researchers and industrial practitioners. It therefore contains extensive
bibliographic references for researchers, and the bibliographic section also contains
software references for practitioners. Numerous examples and exercises
have been provided.
Cost-effective methods for obtaining electronic and hardcopy versions
The book is available in both hardcopy (hardcover) and electronic
The hardcover is available at all the usual channels (e.g, Amazon,
Barnes and Noble etc.), in Kindle format, and also directly from
Springer in hardcopy and pdf format. The good thing about Springer is
that electronic versions are often widely accessible at no cost to
subscribing institutions, which is particularly convenient for students.
My understanding is that a very large fraction of universities in
North America, Europe, Australia, and New Zealand are subscribers, and a
rapidly increasing number of universities in Asia are also
The electronic version is available at the following
Springerlink pointer . For subscribing institutions click from a computer directly connected to your institution network to download the book for free.
Springer uses the domain name of your computer to regulate access.
To be eligible, your institution must subscribe to "e-book package
english (Computer Science)" or "e-book package english (full
collection)". If your institution is eligible, you will see a (free)
`Download Book' button. Otherwise you will see a (paid) `Get Access'
button. Sometimes you may be able to download it from your library
e-collection, even when it is not Web-accessible from your institution.
For those who prefer desk copies rather than electronic books, there
are some very cost-effective methods to obtain a paperback
MyCopy edition for $25 or less (subscribing institutions only). If you
have ever published an article (even journal) with Springer, you are
also entitled to an additional 40% author discount for any Springer book
(including the $25 paperback edition) using the approach described here .
About the Author
Charu Aggarwal is a Distinguished Research Staff Member (DRSM) at the
IBM T. J. Watson Research
Center in Yorktown Heights, New York. He completed his B.Tech. from IIT
Kanpur in 1993 and his Ph.D. from
Massachusetts Institute of Technology in 1996. He has worked
extensively in the field of data mining, with particular interests in
data streams, privacy, uncertain data and social network analysis.
He has published 17 (6 authored and 11 edited) books, over 350 papers in
refereed venues, and has applied for or been granted over 80 patents.
His h-index is 91.
Because of the commercial value of the above-mentioned patents,
he has received several invention achievement awards and has thrice been
designated a Master Inventor at IBM.
He is a recipient of an IBM Corporate
Award (2003) for his work on bio-terrorist threat detection in data
streams, a recipient of the IBM Outstanding Innovation Award (2008)
for his scientific contributions to privacy technology, and two IBM Outstanding Technical Achievement
Awards for his work on streaming systems and high-dimensional data analysis.
He has received two best paper awards and an EDBT
Test-of-Time Award (2014). He has received the IEEE ICDM Research Contributions
Award (2015), which is one of two highest awards for research in the field of data mining.
He has served as the general or program co-chair of the IEEE Big Data
Conference (2014), the ICDM Conference (2015), the ACM CIKM Conference
(2015), and the KDD Conference (2016). He also co-chaired the data
mining track at the WWW Conference 2009. He served as an associate
editor of the IEEE Transactions on Knowledge and Data Engineering from 2004 to 2008. He is an editor-in-chief of the ACM Transactions on Knowledge Discovery and Data Mining Journal , an action editor of the Data Mining and Knowledge Discovery Journal ,
an associate editor of the IEEE Transactions on Big Data, and an
associate editor of the Knowledge and Information Systems Journal. He is
editor-in-chief of the ACM SIGKDD Explorations.
He is a fellow of the SIAM (2015), ACM (2013) and the IEEE (2010) for
"contributions to knowledge discovery and data mining techniques."
Solution Manual for Book
The solution manual for the book is available here from Springer. There is a link for the solution manual on this page. If you are an instructor, then you can obtain
a copy. Please do not ask me
directly for a copy of the solution manual. It can only be distributed by Springer after verifying that you
are an instructor.