Scalable Scientific Data Management and Mining
Abstract
Management, storage, efficient access and analysis of 100's of GBs to 100's of TBs of data, that is likely to be generated and/or used in various phases of large-scale scientific experiments and simulations is a challenging tasks. Current data management and analysis techniques do not measure up to the challenges posed by such requirements in terms of performance, scalability, ease of use and interfaces. Tera-scale computing requires newer models and approaches to solving the problems in storing, retrieving, managing, sharing, visualizing, organizing and analyzing data at such a massive scale.
In this talk we will describe the research and development work being performed at Northwestern University to address the above problems. In particular, we will describe the architecture and implementation of a metadata management system which allows the user to store, analyze and use access patterns, relationships amongst data sets, data analysis and I/O optimizations for scientific applications. We will describe the use of automatic I/O optimization techniques that can be incorporated into applications in a seamless fashion. We will also briefly present data mining techniques and on-line parallel data mining for processing results from scientific simulations.
Bio of the Speaker
Alok Choudhary is currently professor in the Electrical and Computer Engineering Department and The Kellogg School of Management at Northwestern University since September. From 1989 to 1996 he was an a faculty in the ECE department at Syracuse University. Alok Choudhary received his Ph.D. from University of Illinois, Urbana-Champaign, in Electrical and Computer Engineering, in 1989 and an M.S. from University of Massachusetts, Amherst, in 1986.
He received the National Science Foundation's Young Investigator Award in 1993, an IEEE Engineering Foundation award and an IBM Faculty Development award. His main research interests are in high-performance computing and communication systems and their applications in many domains including information processing (e.g., data mining) and scientific computing (e.g., scientific discoveries). In particular, his interests lie in the design and evaluation of architectures and software systems (from system software such as runtime systems to compilers), high-performance servers, high-performance databases and input-output. He has published more than 250 papers in various journals and conferences in the above areas. He has also written a book and several book chapters on the above topics.