1st International Conference on Digital Information Management (ICDIM)

1st International Conference on Digital Information Management (ICDIM)
December 06-08, 2006
Christ College, Bangalore, India

Co-Sponsored by

Semantic Web Technologies for Collaborative Knowledge Acquisition
(Tutorial to be conducted during ICDIM 2006)

Vasant Honavar and Doina Caragea
Artificial Intelligence Research Laboratory
Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Iowa State University
Ames, Iowa 50011, USA
honavar@cs.iastate.edu , dcaragea@cs.iastate.edu
www.cild.iastate.edu

TUTORIAL DESCRIPTION

Development of high throughput data acquisition technologies, together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. This has created unprecedented opportunities for data-driven knowledge acquisition and decision-making in a number of emerging increasingly data-rich application domains, such as bioinformatics, environmental informatics, medical informatics, enterprise informatics, security informatics (among others). However, the massive size, semantic heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in acquiring useful knowledge from the available data.

Against this background, there is an urgent need for software systems for collaborative knowledge acquisition from autonomous, semantically heterogeneous, distributed information sources.

This tutorial will:
(a) Introduce some of the specific challenges in the design of software systems for collaborative knowledge acquisition from autonomous, semantically heterogeneous, distributed information sources.

(b) Present a sufficient statistics based general framework for learning from such sources.

(c) Describe how this framework can be used to transform standard learning algorithms into algorithms for knowledge acquisition from distributed data and show that the resulting algorithms offer rigorous performance guarantees (relative to their centralized, single agent, or batch counterparts that assume centralized access to the entire data set).

(d) Introduce ontology-extended data sources (OEDS) to facilitate collaborative analysis of semantically heterogeneous information sources. OEDS make explicit the structure (schema) and semantics (content) of the data sources, as well as the query answering capabilities of these sources.

(e) Introduce a framework for specifying semantic correspondences that reconcile the semantic differences between a user view and the individual information sources in some important special cases (e.g., partial order ontologies) that are commonly encountered in practice.

(f) Describe how the sufficient statistics based framework for learning from distributed data can be extended to yield theoretically well-founded algorithms for learning from semantically heterogeneous, autonomous information sources.

(g) Point out some statistical problems that arise when learning from data in this setting, e.g., problems caused by the differences in the levels of abstraction used by autonomous
information sources to describe the objects of interest.

(h) Conclude with some open problems and promising avenues for further research.

TARGET AUDIENCE

The tutorial should be accessible to beginners, but should also be of interest to advanced graduate students, researchers, and practitioners who are unfamiliar with the specific topics to be covered.

TUTORIAL DURATION

The tutorial material will be organized into 3 modules of approximately 1 hour each with short breaks between modules (for a ½ day hour tutorial).

INSTRUCTOR BIOGRAPHIES

Dr. Vasant Honavar received his Ph.D. in Computer Science from the University of Wisconsin, Madison in 1990. He is currently a full professor of Computer Science at Iowa State University (ISU). Honavar directs the Center for Computational Intelligence, Learning and Discovery (www.cild.iastate.edu), which he founded in 2005 and the Artificial Intelligence Research Laboratory (which he founded in 1990) at ISU. Honavar's research and teaching interests include Artificial Intelligence, Machine Learning, Bioinformatics, Computational Molecular Biology, Collaborative Information Systems, Semantic Web, Environmental Informatics, Security Informatics, Social Informatics, Neural Computation, Systems Biology, Data Mining, Knowledge Discovery and Visualization. Honavar has published over 150 research articles in refereed journals, conferences and books, and has co-edited 6 books. Honavar is a co-chair of the 2006 AAAI Fall Symposium on Semantic Web for Collaborative Knowledge Acquisition.

Dr. Doina Caragea received her Ph.D. in Computer Science, specializing in artificial intelligence, in 2004 from Iowa State University, where she worked with Professor Vasant Honavar. Dr. Caragea has published more than 12 refereed conference papers and journal articles. Dr. Caragea is currently a postdoctoral research associate in the Iowa State University Center for Computational Intelligence, Learning, and Discovery. Her research interests include Artificial intelligence, Machine Learning, Data Mining and Knowledge Discovery, Statistical Query Answering, Visual Data Mining, Ontologies, Information Integration, Semantic Web, Computational Biology and Bioinformatics, and Collaborative Information Systems. She has published several papers in refereed conferences and journals on these topics. Caragea is a co-organizer of the 2006 AAAI Fall Symposium on Semantic Web for Collaborative Knowledge Acquisition.