ESWC 2008 Tutorial Materials
Transitioning Legacy Applications to Ontologies: A Hands-on Tutorial
Tutorial summary
Semantic-based software engineering has attracted a lot of attention recently from the Semantic Web community, as demonstrated by the annual SWESE workshop. Some particularly interesting questions are how to transition legacy software applications to ontologies and support software development teams with intelligent semantic-based assistive tools. This tutorial will enable students and interested researchers to familiarise themselves with current approaches in this area and also offer practical, hands-on experience with a number of relevant tools covering tasks such as ontology learning from software artefacts, transitioning databases to ontologies, and semantic-based access to software artefacts for developer teams. The tutorial will cover some key open-source tools, thus enabling the participants to put easily their newly learned skills into practice. It will also include evaluation results on real-world software applications, lessons learnt from applying semantic web tools to software engineering, and discuss outstanding open issues and research challenges.
Tutorial content and relevant materials
The full set of tutorial slides is now available.
ESWC tutorial Videos are now available here.
The tutorial core will be on semantic tools for transitioning software applications to ontologies, including a number of concrete, evaluated experiments and their results and lessons learnt, centred around the following 5 topics:
1. Transitioning web services and applications towards ontologies:
| Presenter: | Terry Payne, Southampton | |
| Description: | In this part of the tutorial, we will be outlining the need for a TAO methodology that supports the transitioning of legacy systems to Semantic Web Services, and illustrating how this methodology provides the framework for the rest of the TAO approach. We introduce the need for a methodology, and highlight some of the challenges in modeling the ontologies for both service annotation and support-document annotation, and then introduce the use of both formal models and procedural models. As the use of a Semantic Web Service Framework is crucial in the transitioning process, we also present two contrasting views of the most significant efforts to date: OWL-S and WSMO. | |
| Tutorial slides | ||
| Video |
2. Tools for learning domain ontologies from software artefacts (source code, web service definitions, SVN logs, discussion forums, manuals, etc.)
| Presenter: | Miha Grcar, Marko Grobelnik, JSI | |
| Description: | We will present the methodology for facilitating the acquisition of domain ontologies from software artifacts as envisioned in TAO WP2 (see http://www.tao-project.eu). We will discuss data sources that contain the required knowledge and data mining techniques and tools for aiding the domain expert in building the ontology.
The task of ontology learning from pieces of software (such as Web services or software libraries) is essentially discovering concepts and relations in the source code, accompanying documentation, and external sources (such as the Web). We first discuss the potentially relevant data sources that typically accompany a set of reusable software components. We make a distinction between structured data sources (code snippets, source code, reference manual…) and unstructured data sources (Web pages, forums, newsgroups…), joining the two kinds in a common framework by resorting to link analysis and text mining, respectively.In the course of TAO, we are developing a software tool called OntoSight. OntoSight provides an insight into the data at hand and creates – through visualization and interaction with the user – a semantic space suitable for the task at hand. It can be pipelined with OntoGen (see http://ontogen.ijs.si) to support a semi-automatic ontology construction process. The development of OntoSight is still in progress; we will nevertheless present the first prototype. |
|
| Demo | ||
| Tutorial slides | ||
| Video |
3. Semantic annotation of software artefacts (code, manuals, forum postings, wikis, bug reports, etc.)
Presenter: Kalina Bontcheva, Sheffield Description: Semantic annotation is a specific metadata generation task aiming to enable new information access methods. It enriches the text with semantic information, linked to a given ontology, thus enabling semantic-based search over the annotated content. In the case of legacy software applications, important parts are the software code and documentation. While there has been a significant body of research on semantic annotation of textual content (in the context of knowledge management applications), only limited attention has been paid to processing legacy software artefacts, and in general, to the problem of semantic-based software engineering. This is one of the key areas addressed here. Demo Tutorial slides Video
4. Heterogeneous Knowledge repositories for storing legacy content: requirements, scalability, and applicability
Presenter: Zlatina Marinova, Atanas Kiryakov, Ontotext Lab, Sirma Description: Storing heterogeneous knowledge in a unified way in a semantic repository will enable enterprises to address tasks such as intelligent enterprise search and business intelligence. A major requirement is that such a store is highly scalable and efficient. In this presentation we will discuss the challenges in implementing a heterogeneous store and will present our prototype implementation developed in TAO. We will also provide evaluation results and discussions. Software Tutorial slides Video
5. Tools for transitioning databases to ontologies, industrial use case and evaluation in aircraft maintenance:
Presenter: Farid Cerbah, Dassault Aviation Description: This part of the tutorial will introduce from both theoretical and practical perspectives the issue of transitioning databases to ontologies. Existing approaches and tools will be surveyed and practical results from representative use cases will be presented and discussed. Demo Tutorial slides Video
6. Supporting software developers with collaborative, user-friendly semantic search and intelligent access to software artefacts
Presenter: Danica Damljanovic, Valentin Tablan, Sheffield Description: We will present Question-based Interface for Ontologies (QuestIO) - a tool for querying ontologies using unconstrained language-based queries.
Many NLIs to structured data are already developed. However, those with reasonable performance, tend to require expensive customisation for each new domain or ontology. Additionally, they often require specific adherence to a pre-defined syntax which, in turn, means that users still have to undergo training. QuestIO can be easily embedded in any system or used with any ontology or knowledge base without prior customisation. Additionally, it requires no user training. With QuestIO, casual users (e.g. domain experts) can query ontologies without need to learn about ontology languages or query languages such as SeRQL or SPARQL. In this tutorial, we will give details of design and implementation of such a system, which is based on GATE processing resources. In the context of TAO project, we will demonstrate how we use this tool in the GATE case study i.e. for querying GATE domain ontology that is containing data about GATE software, documentation, developers and the like.
Demo Tutorial slides Video
7. Representing software models and database schemas in ontologies
| Presenter: | Jeff Pan, Aberdeen | |
| Description: | This section briefly introduces some design patterns and best practices of representing some features of Object Oriented models and database schema in ontologies. | |
| Tutorial slides | ||
| Video |
Motivation
On one hand, enterprise application integration is becoming increasingly important as companies tend to have many legacy software applications, which are typically large software systems that are vital to the organisation, but resist modification and evolution to meet new and constantly changing business requirements. Ontologies and semantics can play a useful role here through developments such as automatic ontology bootstrapping from legacy software artefacts and semantic web services.
At the same time, distributed organisations and open-source software are pushing software engineering increasingly towards distributed collaborative software development, where ontologies and semantics can also play an important role in assisting developers in a number of ways, e.g., code reuse, software design, documentation.
While semantic technology has the potential to benefit significantly both software engineering and software integration practices, its adoption tends to be expensive both in terms of experience required and money invested. Recent developments have therefore aimed at lowering these costs by making transitioning to ontologies fast and effective and building a reusable transitioning process. The aim is to minimize consulting time during migration and integration, thus minimize costs and reduce integration overheads and limit risk.
This tutorial will provide an in-depth overview of these problems and present state-of-the-art semantic technology for transitioning legacy application and software engineering towards ontologies and semantics. The hands-on exercises will center around open-source tools and exemplify their use on a representative set of software artefacts: relational databases, software code, user manuals, and discussion forum postings.
In addition to providing the audience with a thorough research background on these topics, the tutorial will conclude by presenting evaluation results and lessons learnt from applying semantic web tools to software engineering, which also give rise to a number of open questions and research challenges to be addressed.
Objectives of the tutorial
Provide researchers with practical and up-to-date knowledge of semantic web tools used for transitioning software applications to ontologies
Stimulate discussions with researchers on what other semantic tools need to be developed and outstanding challenges.
Enable semantic web researchers to easily test their future research ideas on software engineering problems.
Benefits for the attendees
The main benefit from this tutorial will come from its practical orientation. In other words, it will not only introduce the attendees to semantic-based software engineering, but also provide practical guidance and examples of how to apply Semantic Web tools in this area. In addition, the 5 chosen topics cover state-of-the-art research, thus providing attendees with the very latest developments in the area. The tools to be covered are all open source, which ensures that the tutorial provides skills which are easy to apply and do not require special software or licenses.
Target audience for the tutorial, including prerequisite knowledge
The target audience is researchers in the areas of Semantic Web. Some knowledge of ontologies and related notions will be beneficial. No previous knowledge of ontology learning and semantic annotation is required. The tutorial will be of particular interest to researchers who wish to test their tools and ideas in this challenging problem area.
A brief résumé of the presenters
The tutorial will be a collaborative effort between a number of researchers from several institutions:
University of Sheffield
Josef Stefan Institute
Ontotext Lab, Sirma
Dassault Aviation
Southampton University
University of Aberdeen
Most of these collaboration results arose in the context of the EU-funded project TAO .
The main contact for the tutorial will be Dr Kalina Bontcheva (kalina@dcs.shef.ac.uk) from the University of Sheffield.
Dr Bontcheva has given a number of successful tutorials on semantic annotation and language processing, including two at ESWS’04 and ESWC'05, one at RANLP’03, and a week-long course to students in Barcelona, Sept'07. Some relevant publications appear below, for a complete list see here:
K. Bontcheva, M. Sabou. Learning Ontologies from Software Artifacts: Exploring and Combining Multiple Sources. In Proceedings of Workshop on Semantic Web Enabled Software Engineering (SWESE), November 2006.
K. Bontcheva, H. Cunningham, A. Kiryakov, V. Tablan. Semantic Annotation and Human Language Technology. In Semantic Web Technologies: Trends and Research in Ontology-based Systems. J. Davies, R. Studer, P. Warren (eds). John Wiley, 2006. Abstract
K. Bontcheva, J. Davies, A. Duke, T. Glover, N. Kings, I. Thurlow. Semantic Information Access. In Semantic Web Technologies: Trends and Research in Ontology-based Systems. J. Davies, R. Studer, P. Warren (eds). John Wiley, 2006.
H. Cunningham, K. Bontcheva. Knowledge Management and Human Language: Crossing the Chasm. Journal of Knowledge Management, 9(5), 2005.
The co-presenters from the other institutions and their details appear below:
Dr Payne (Southampton) is a Lecturer within the Intelligence, Agents, Multimedia Group, at the University of Southampton. He holds a BSc. In Computer Systems Engineering from the University of Kent at Canterbury, UK, and an MSc & PhD in Artificial Intelligence from the University of Aberdeen, Scotland. He is currently engaged in research on Semantic Web Services, Semantics for Service Discovery, Agent-Based service coordination and interoperation and Semantic Web Portals. To date he has published over 50 papers and articles on this work, and in 2001 was the winner of the Semantic Web challenge at SWWS01.
The JSI team is also very experienced with giving tutorials on machine learning and data mining at previous ESWC and ISWC conferences and consists of Dr. Marko Grobelnik and Miha Grcar. Marko Grobelnik is an expert in analysis of large amounts of complex data with the purpose to extract useful knowledge. In particular, the areas of expertise comprise: Data Mining, Text Mining, Information Extraction, Link Analysis, and Data Visualization as well as more integrative areas such as Semantic Web, Knowledge Management and Artificial Intelligence. Miha Grcar is a researcher with focus on machine learning and data mining. In the context of software engineering and legacy applications he is "mining" legacy-application data in order to first discover knowledge that needs to be transitioned into the ontology, and to then facilitate the process of transitioning by providing semi-automatic means to the domain expert.
Dr Farid Cerbah is working at Dassault Aviation where he is conducting research projects in domains related to data mining and semantic technologies, with applications focused on exploitation of heterogeneous technical repositories. His research interests include NLP corpus-based methods and software infrastructures for terminology and ontology acquisition, Information Retrieval and categorisation methods for accessing and structuring specialised repositories, and ontology learning from structured data sources, such as relational databases.
Dr. Jeff Pan received his PhD degree in computer science from The University of Manchester, advised by Prof. Ian Horrocks on the topic of Description Logics reasoning support for the Semantic Web. He has been a Lecturer of Computing Science at the University of Aberdeen, UK, since 2005. His current research focuses primarily on the design of logics and ontology languages, automated reasoning, ontology reuse and usability, as well as the applications (such as in the Semantic Web, multimedia and software engineering) of all the above. Over the years, Dr. Pan has contributed to several high-profile projects, such as Advanced Knowledge Technologies (AKT), Wonder Web, Knowledge Web, and MOST - Marrying Ontology and Software Technology.
Resources
ESWC 2008 website
Presentation on Videolectures: http://videolectures.net/tao08_grcar_lwsdo/
Year 1 review presentation: OntoGen demo (recorded slides with voice-over)
Official OntoGen site: http://ontogen.ijs.si

