NEURAL NETWORKS AND DEEP LEARNING: A TEXTBOOK (Second Edition)


Neural Networks and Deep Learning (Second Edition), Springer, July 2023

Charu C. Aggarwal.

Book on neural networks and deep learning Table of Contents

Free download for subscribing institutions only

Buy hardcover or e-version from Springer or Amazon (for general public): PDF from Springer is qualitatively preferable to Kindle

Buy low-cost paperback edition (MyCopy link on right appears for subscribing institutions only)

Lecture on backpropagation based on book presentation in Chapter 2 (provides a somewhat different approach to explaining it than you would normally see in textbooks):

This is a comprehensive textbook on neural networks and deep learning. The book discusses the theory and algorithms of deep learning. The theory and algorithms of neural networks are particularly important for understanding important concepts in deep learning, so that one can understand the important design concepts of neural architectures in different applications. Why do neural networks work? When do they work better than off-the-shelf machine learning models? When is depth useful? Why is training neural networks so hard? What are the pitfalls? Even though the book is not implementation-oriented, it is rich in discussing different applications. Applications associated with many different areas like recommender systems, machine translation, image captioning, image classification, reinforcement learning-based gaming and text analytics are covered. The second edition is a significant update over the first edition, with material on graph neural networks, attention mechanisms, adversarial learning, attention mechanisms, transformers, and large language models. All chapters have been revised sigificantly. Detailed chapters on backpropogation and graph neural networks were added. The following aspects are covered:

The basics of neural networks: Chapters 1, 2, and 3 discuss the basics of neural network design and also the fundamentals of training them. The backpropagation algorithm is described in Chapter 2. The simulation of various machine learning models with neural networks is provided in Chapter 2. Examples include least-squares regression, SVMs, logistic regression, Widrow-Hoff learning, singular value decomposition, and recommender systems. Recent models like word2vec are also explored, together with their connections with traditional matrix factorization. Exploring the interface between machine learning and neural networks is important because it provides a deeper understanding of how neural networks generalize known machine learning methods, and the cases in which neural networks have advantages over traditional machine learning.

Challenges in training neural networks: Although Chapters 1 and 2 provide an overview of the training methods for neural networks, a more detailed understanding of the training challenges is provided in Chapters 4 and 5. In particular, issues related to network depth and also overfitting are discussed. Chapter 6 presents a classical architecture, referred to as radial-basis function networks. Even though this architecture is no longer used frequently, it is important because it represents a direct generalization of the kernel support-vector machine. Restricted Boltzmann machines are covered in Chapter 7.

Advanced architectures and applications: A lot of the success in neural network design is a result of the specialized architectures for various domains and applications. Examples of such specialized architectures include recurrent neural networks and convolutional neural networks. Since the specialized architectures form the key to the understanding of neural network performance in various domains, most of the book will be devoted to this setting. Several advanced topics like deep reinforcement learning, graph neural networks, transformers, large language models, neural Turing mechanisms, and generative adversarial networks are discussed. The second edition contains significant focus on modern topics like attention, adversarial learning, graph neural networks, transformers, and large language models.

Some of the ``forgotten'' architectures like RBF networks and Kohonen self-organizing maps are included because of their potential in many applications. The book is written for graduate students, researchers, and practitioners. The book does require knowledge of probability and linear algebra. Furthermore basic knowledge of machine learning is helpful. Numerous exercises are available along with a solution manual to aid in classroom teaching. Where possible, an application-centric view is highlighted in order to give the reader a feel for the technology.


About the Author

Charu Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He completed his B.Tech. from IIT Kanpur in 1993 and his Ph.D. from Massachusetts Institute of Technology in 1996. He has worked extensively in the field of data mining, with particular interests in data streams, privacy, uncertain data and social network analysis. He has published 17 (6 authored and 11 edited) books, over 350 papers in refereed venues, and has applied for or been granted over 80 patents. His h-index is 80. Because of the commercial value of the above-mentioned patents, he has received several invention achievement awards and has thrice been designated a Master Inventor at IBM. He is a recipient of an IBM Corporate Award (2003) for his work on bio-terrorist threat detection in data streams, a recipient of the IBM Outstanding Innovation Award (2008) for his scientific contributions to privacy technology, and two IBM Outstanding Technical Achievement Awards for his work on streaming systems and high-dimensional data analysis. He has received two best paper awards and an EDBT Test-of-Time Award (2014). He has received the IEEE ICDM Research Contributions Award (2015), which is one of two highest awards for research in the field of data mining. He has served as the general or program co-chair of the IEEE Big Data Conference (2014), the ICDM Conference (2015), the ACM CIKM Conference (2015), and the KDD Conference (2016). He also co-chaired the data mining track at the WWW Conference 2009. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering from 2004 to 2008. He is an associate editor of the ACM Transactions on Knowledge Discovery and Data Mining Journal , an action editor of the Data Mining and Knowledge Discovery Journal , an associate editor of the IEEE Transactions on Big Data, and an associate editor of the Knowledge and Information Systems Journal. He is and editor-in-chief of the ACM Transactions on Knowledge Discovery from Data, and is also an editor-in-chief of the ACM SIGKDD Explorations. He is a fellow of the SIAM (2015), ACM (2013) and the IEEE (2010) for "contributions to knowledge discovery and data mining techniques."


Solution Manual for Book

The solution manual for the book is available here from Springer. There is a link for the solution manual on this page. If you are an instructor, then you can obtain a copy. Please do not ask me directly for a copy of the solution manual. It can only be distributed by Springer.


Resources for book

I have latex slides that can be used for teaching as well as video lectures for the book. The entire slide deck as a PDF is here The source for the latex slides is available here together with ppt figures. Your are free to use ppt figures in your own slides, but you must credit the book. Use of figures in another publication requires permission from Springer through copyright clearance center. These slides are currently only for the first edition in large part. I will update them over time.

I also have a few lectures on youtube, based on the slides below. Here is an example:


Chapter 1: An Introduction to Neural Networks

Slides PDF

Latex source of slides and figures

Chapter 2: Machine Learning with Shallow Neural Networks

Slides PDF

Latex source of slides and figures

Chapter 3: Training Deep Neural Networks

Slides PDF

Latex source of slides and figures

Chapter 4: Teaching Deep Learners to Generalize

Slides PDF

Latex source of slides and figures

Chapter 5: Radial Basis Function Networks

Slides PDF

Latex source of slides and figures

Chapter 6: Restricted Boltzmann Machines

Slides PDF

Latex source of slides and figures

Chapter 7: Recurrent Neural Networks

Slides PDF

Latex source of slides and figures

Chapter 8: Convolutional Neural Networks

Slides PDF

Latex source of slides and figures

Chapter 9: Deep Reinforcement Learning

Slides PDF

Latex source of slides and figures

Chapter 10: Advancement Topics in Deep Learning

Slides PDF

Latex source of slides and figures