Charu Aggarwal's Selected Publications


NEW: PRIVACY-PRESERVING DATA MINING BOOK:

Privacy-Preserving Data Mining: Models and Algorithms (Springer) Ed. Charu Aggarwal, Philip S. Yu -- Comprehensive survey driven book on Privacy-Preserving Data Mining Research with chapters contributed by prominent researchers in the field.

Table of Contents and introductory survey chapters

Book Cover



NEW: UNCERTAIN DATA BOOK (To appear in early 2009):

Managing and Mining Uncertain Data (Springer) Ed. Charu Aggarwal -- Comprehensive survey driven book on Uncertain Data with chapters contributed by prominent researchers in the field.

Table of Contents and introductory survey chapters


DATA STREAM BOOK:

Data Streams: Models and Algorithms (Springer) Ed. Charu Aggarwal, -- Comprehensive survey driven book on Data Stream Research with chapters contributed by prominent researchers in the field. The book contains survey chapters on different topics in data streams. Table of Contents

Sample Chapter

Available at Springer , Amazon.com , and Barnes and Noble .


My podcast on data streams from IBM Research

Data Stream Mining

C. C. Aggarwal, P. S. Yu. LOCUST: An Online Analytical Processing Framework for High Dimensional Classification of Data Streams. Proceedings of the ICDE Conference, 2008 (Extending Lazy Learning to High Dimensional Stream Classification).

C. C. Aggarwal. On Biased Reservoir Sampling in the Presence of Stream Evolution. Proceedings of the VLDB Conference, 2006 PDF of Presentation Slides (Reservoir Sampling for Evolving Data Streams).

C. C. Aggarwal, P. S. Yu. A Survey of Synopsis Construction Algorithms in Data Streams. Data Streams: Models and Algorithms ed. C. Aggarwal, Springer. (Survey on Synopsis Construction Methods - Reservoir Sampling, wavelets, histograms, sketches)

C. C. Aggarwal, P. S. Yu. On String Classification in Data Streams. Proceedings of the ACM KDD Conference, 2007. (String Classification in Data Streams)

C. C. Aggarwal. A Framework for Classification and Segmentation of Massive Audio Data Streams. Proceedings of the ACM KDD Conference, 2007. (Micro-clustering for speaker recognition)

C. C. Aggarwal, P. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Text and Categorical Clustering of High Dimensional Data Streams).

C. C. Aggarwal. On Futuristic Query Processing in Data Streams. Proceedings of the EDBT Conference, 2006 (Query Processing of future stream behavior).

C. C. Aggarwal. On Abnormality Detection in Spuriously Populated Data Streams. Proceedings of the ACM SIAM Conference on Data Mining, 2005. (Detecting Abnormal Events in Noisy Data Streams)

C. C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for High Dimensional Projected Clustering of Data Streams. Proceedings of the VLDB Conference, 2004 (Projected Clustering of High Dimensional Data Streams).

C. C. Aggarwal, J. Han, J. Wang, P. Yu. On Demand Classification of Data Streams. Proceedings of the ACM KDD Conference, 2004 (Classifying a point from an evolving data stream with the most optimized model when you receive it.)

C. C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for Clustering Evolving Data Streams. Proceedings of the VLDB Conference, 2003 (An OLAP Framework for Clustering Data Streams.)

C. C. Aggarwal. A Framework for Diagnosing Changes in Evolving Data Streams. Proceedings of the ACM SIGMOD Conference, 2003. ( Change Detection in Data Streams with diagnosis and visualization capability).

C. C. Aggarwal. An Intuitive Framework for Understanding Changes in Evolving Data Streams. Proceedings of the ICDE Conference, 2002 ( Detecting Change in Data Streams (summary))

C. C. Aggarwal, P. S. Yu. Online Analysis of Community Evolution in Data Streams. Proceedings of the ACM SIAM on Data Mining, 2005. (Community Detection and Evolution in Data Streams)

Privacy Preserving Data Mining

C. C. Aggarwal. On Unifying Privacy and Uncertain Data Models. ICDE Conference, 2008.

C. C. Aggarwal, P. S. Yu. On Privacy-Preservation of Text and Sparse Binary Data with Sketches. SIAM Conference on Data Mining, 2007.

C. C. Aggarwal, P. S. Yu. On Anonymization of Strings SIAM Conference on Data Mining, 2007.

C. C. Aggarwal. On Randomization, Public Information, and the Curse of Dimensionality. ICDE Conference, 2007.

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005. Slides. In PDF format. (The line between quasi-identifiers and sensitive attributes is often unclear because of partial knowledge. An analysis in high dimensionality when a large fraction of attributes is included in the anonymization process- the curse is ubiquitous!)

D. Agrawal, C. C. Aggarwal. On the design and quantification of Privacy Preserving Data Mining. ACM PODS Conference, 2001. (Perturbation Approach to Privacy Preserving Data Mining in a General Environment).

C. C. Aggarwal, S. Parthasarathy. Mining Massively Incomplete Data Sets by Conceptual Reconstruction. ACM KDD Conference, 2001. ( Privacy Preserving Data Mining , when many values are hidden or incomplete.)

C. C. Aggarwal, J. Pei, B. Zhang On Privacy Preservation against Adversarial Data Mining. ACM KDD Conference, 2006. (Privacy Preserving Data Mining with a Data Mining Proficient Adversary)

C. C. Aggarwal, P. S. Yu. On Variable Constraints in Privacy Preserving Data Mining. Proceedings of the ACM SIAM Conference on Data Mining, 2005. (Privacy Preserving Data Mining with Personalized Levels of Anonymity)

C. C. Aggarwal, P. S. Yu. A Condensation Based Approach to Privacy Preserving Data Mining Proceedings of the EDBT Conference, 2004. (Condensation Approach to Privacy in a Trusted Server Environment.)

Uncertain Data Mining

C. C. Aggarwal, P. S. Yu. Outlier Detection with Uncertain Data. SIAM Data Mining Conference, 2008.

C. C. Aggarwal, P. S. Yu. On Indexing High Dimensional Data with Uncertainity. (11-page version) SIAM Data Mining Conference, 2008. (2-page poster version appears in ICDE Conference, 2008).

C. C. Aggarwal, P. S. Yu. A Framework for Clustering Uncertain Data Streams. ICDE Conference, 2008.

C. C. Aggarwal. On Density Based Transforms for Uncertain Data Mining. ICDE Conference, 2007.

High Dimensional Data Mining

C. C. Aggarwal. Towards Local Supervised Dimensionality Reduction of High Dimensional Data. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Explores Local Supervised Dimensionality Reduction).

C. C. Aggarwal. Towards Systematic Design of Distance Functions for Data Mining Applications. Proceedings of the ACM KDD Conference, 2003. (Framework for tailoring distance functions to the summary characteristics of high dimensional data sets and user preferences.)

C. C. Aggarwal. Hierarchical Subspace Sampling: A Unified Framework for High Dimensional Data Reduction, Selectivity Estimation and Nearest Neighbor Search. Proceedings of the ACM SIGMOD Conference, 2002 (Subspace Sampling for Projected Clustering and Local Dimensionality Reduction )

C. C. Aggarwal, A Hinneburg, D. A. Keim. On the Surprising Behavior of Distance Metrics in High Dimensional Space. International Conference on Database Theory, (ICDT Conference), 2001. (Manhattan Metric is better than the Euclidean Metric. Fractional Metrics are even better.)

A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the VLDB Conference, 2000. (Discusses the roots of high dimensional sparsity.)

C. C. Aggarwal. A Human Computer Interactive Method for Projected Clustering. IEEE Transactions on Knowledge and Data Engineering. 16(4), pp 448--460, 2004. (Extended Version: IPCLUS: An Interactive Projected Clustering Algorithm).

C. C. Aggarwal, P. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference, 2000. (Finds non-axis parallel projected clusters. Also known as local dimensionality reduction .)

C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, J. Park. Fast Algorithms for Projected Clustering. ACM SIGMOD Conference, 1999. (Finds axis parallel projected clusters.)

C. C. Aggarwal, P. Yu. Outlier Detection for High Dimensional Data. ACM SIGMOD Conference, 2001. (Methods for projected outlier search.)

C. C. Aggarwal. Re-designing distance functions and distance based applications for high dimensional data. ACM SIGMOD Record, March 2001. (A summary paper on high dimensional data mining. )

C. C. Aggarwal. On the Effects of Dimensionality Reduction on High Dimensional Similarity Search. ACM PODS Conference, 2001. (Dimensionality Reduction Effects on Similarity Search.)

C. C. Aggarwal. On Point Sampling versus Space Sampling for Dimensionality Reduction. SIAM Conference on Data Mining, 2007.

Visually Interactive Data Mining

C. C. Aggarwal. Towards Exporatory Test Instance Specific Algorithms for High Dimensional Classification. Proceedings of the ACM KDD Conference, 2005 (Test Instance specific visual exploration and classification to find diagnostic classification causality of individual test instances.)

C. C. Aggarwal. Towards Meaningful Nearest Neighbor Search by Human-Computer Interaction. Proceedings of the ICDE Conference, 2002 (Visual nearest neighbor search by projections.)

C. C. Aggarwal. Towards Effective and Interpretable Data Mining by Visual Interaction. Proceedings of the ACM KDD Explorations, January 2002 (Summary paper on visual data mining.

C. C. Aggarwal. A Human-Computer Cooperative System for Effective High Dimensional Clustering. Proceedings of the KDD Conference, 2001. (Visual methods for projected clustering- IPCLUS - an interactive projected clustering algorithm.)

Web Crawling and Resource Discovery

C. C. Aggarwal. Collaborative Crawling: Mining User Experiences for Topical Resource Discovery. Proceedings of the KDD Conference, 2002. (Focussed Crawling by learning user access behavior.)

C. C. Aggarwal, F. Al-Garawi, P. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. WWW Conference, 2001 (Focussed Crawling by learning linkage patterns.)

C. C. Aggarwal. On Learning Strategies for Topic-Specific Web Crawling. Next Generation Data Mining Applications. Edited by: Zurada and Kantardzic, Published by IEEE. ISBN 0-471-65605-4. (Book Chapter on Focussed Crawling by learning and other adaptive strategies.)

Data Mining for Electronic Commerce

C. C. Aggarwal, J. L. Wolf, K. L. Wu, P. Yu. Horting Hatches an Egg: A New Graph Theoretic Approach for Collaborative Filtering. ACM KDD Conference, 1999. (A Collaborative Filtering Paper.)

C. C. Aggarwal, P. S. Yu A System for Automated Personalization of Web Portals. Proceedings of the VLDB Conference, 2002. (Methods for Targeted advertising and Portal Personalization.)

Long Pattern Discovery and Association Rule Mining

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad. A Tree Projection Algorithm For Generation of Frequent Itemsets. Journal on Parallel and Distributed Computing, (Special Issue on High Performance Data mining), 2001. (Tree Projection Algorithm)

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad. Depth First Generation of Long Patterns. KDD Conference, 2000. (Depth First Version of Tree Projection)

C. C. Aggarwal. Towards Long Pattern Generation in Dense Databases. ACM SIGKDD Explorations, Volume 3, Issue 1, 2001. (Summary Paper on the Topic.)

C. C. Aggarwal, P. Yu. Online Generation of Association Rules. ICDE Conference, 1998. (OLAP framework for association rule mining.)

C. C. Aggarwal, P. Yu. A New Framework for Itemset Generation. ACM PODS Conference, 1998. (Mining interesting itemsets).

C. C. Aggarwal, C. Procopiuc, P. Yu. Finding Localized Associations in Market Basket Data. Proceedings of the IEEE TKDE Journal, March 2002 (Magnifying the association rule discovery process by segmenting the data in a way which is friendly to association discovery.)

Indexing Non-conventional Data Domains

C. C. Aggarwal, P. Yu. The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. ACM KDD Conference, 2000. (Indexing high dimensional data by designing index-friendly distance functions.)

C. C. Aggarwal, P. Yu. On Effective Conceptual Indexing and Similarity Search in Text Data. IEEE International Conference on Data Mining (ICDM Conference), 2001. (New representation of text which provides effective and efficient similarity search).

C. C. Aggarwal, J. Wolf, P. Yu. A New Method for Similarity Indexing of Market Basket Data. ACM SIGMOD Conference, 1999. ( Indexing categorical data or Indexing Market Basket Data ).

C. C. Aggarwal, D. Agrawal. On Nearest Neighbor Indexing of Nonlinear Trajectories. Proceedings of the ACM PODS Conference, 2003. (First method for indexing mobile objects which are moving in a nonlinear trajectory).

Mining Text, XML, Strings and Other Specialized Domains

C. C. Aggarwal. Representation is Everything: Towards Efficient and Adaptable Similarity Measures for Biological Data. Proceedings of the ACM SIAM Conference on Data Mining, 2006 (Explores Alternatives to Alignment Based Similarity Measures).

C. C. Aggarwal, S. Gates, P. Yu. On the Merits of Using Supervised Clustering for Building Categorization Systems. ACM KDD Conference, 1999. ( Partial Supervision for Text Classification ).

M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data. Proceedings of the KDD Conference, 2003. (Structural Classification of XML documents).

C. C. Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, M. J. Zaki. XProj: A Framework for Structural Projected Clustering of XML Documents. Proceedings of the ACM KDD Conference, 2007 (Stuctural Clustering of XML Documents)

C. C. Aggarwal. On Effective Classification of Strings with Wavelets. Proceedings of the KDD Conference, 2002. (String Classification by Wavelet Decomposition).


Back to home page of Charu Aggarwal