TAO Solutions for Knowledge Management
TAO project and Knowledge Management Solutions
The TAO project enables Knowledge Management providers to formalize a tacit knowledge of transitioning methodologies and to participate to the creation of tools that will help them in their day-to-day work on client projects.
The TAO components developed during the project can be used for most steps of the transitioning process both to implement the transition and to enrich the final ontology based solution with services.
At all steps of this transitioning process, the TAO Suite is used as an integrated environment from which the other tools can be used.
1. Ontology creation (from relational database model to ontology model) TAO tools can partially ease the initial creation of the ontology in two ways:
|
|
By automatically transitioning database schemas to a first version of the ontology; the RDBToOnto tool (developed in WP7 by Dassault) can automatically generate a draft of the ontology from database schemas. Since these schemas already contain some information about the domain, they are a good starting point. RDBToOnto can generate a simple ontology structure, with a list of classes and their datatype properties. Example: a tourism legacy applications with a database describing hotels and campsites along with their most common attributes.
|
|
This can save the knowledge engineer a precious time, allowing him to concentrate on value-added tasks such as defining ontology constraints or rules, or discussing modeling alternatives with domain experts. |
|
2. Ontology population (from reference metadata, application data and content to business objects and descriptions)
Data migration from monolithic applications to semantic ones is always a challenge. Here, different tools can be used:
|
|
Gate and the content augmentation tools (developed in WP3 by USFD and Mondeca) help in the analysis of textual and multimedia content to detect entities and relations, thus detecting “hidden� or “informal� knowledge; Example : analyzing a hotel description to find its relationships with other tourist items such as nearby attractions or monuments.
|
|
RDBToOnto populates ontologies automatically based on structured data contained in a database; Example : converting the content of a hotel database to a list of RDF instances.
|
|
| Knowledge repositories such as OWLIM and the Heterogenous Knowledge Store (developed in WP4 by OntoText) provide large scale and efficient storage for ontology instances, enabling fast queries which render this entity repository usable; Example : storing all the hotels and campsite instances along with their relations. | |
Once created and populated, the ontology model and instances are exchanged through web services, either with the outside world or within the application. Here, semantically annotated web services will provide easier and faster integration between different semantically enabled applications, or between different components of the same migrated application. Example : simplifying the communication between an end-user focused e-commerce website and a hotel reservation backoffice application.
The mapping between inputs and outputs of multiple components will be described in terms of the ontology model, making synchronization between them easier and implementable without coding. |
|
4. Annotated content (from ad-hoc enrichment solution to ontology-powered linguistic components)
Tools such as the TAO content augmentation tools are capable of using the instances and the ontology model to annotate the documents of the transitioned system; the documents are annotated in a standardized way, using URI identifiers and RDF metadata. Example : semantically annotating all web pages related to the geographic area with which the tourism application is concerned.
The Heterogenous Knowledge Store shall provide a repository for these annotated documents, so that they can be searched and browsed based on their metadata. Example : searching for all the web pages that are semantically annotated by a given instance in the knowledge base. |
|
TAO Relevance for Publishing, Tourism and Knowledge-based
Applications
TAO partners have developed knowledge management solutions for various industries where the software components or methodology results emanating from the TAO project. The main markets where the TAO results will be used are Publishing / Media, e-Tourism, and Knowledge Management (particularly for industrial and research laboratory applications).
| 1. Publishing and Media | |
New solutions for Publishing and Media industries need to satisfy several requirements: |
|
| Provide a unified terminology management system to aggregate heterogeneous local terminologies into a single ontology based repository; | |
| Serve terminologies to semantic content annotation tools; | |
| Serve terminologies to search engine tools; | |
| Provide a semantic knowledge representation repository to help publishers to offer higher value search services to their end users. | |
One example is the component CA Manager (developed in WP5 by Mondeca) from the TAO suite, which will be used to provide clients with a new generation of Content Annotation module:
|
|
| UIMA based application facilitating integration with text mining and categorization tools such as GATE, LUXID from TEMIS and other client implemented text mining components; | |
| Capacity of CA Manager to connect to external applications for annotation process and annotation enrichment: reasoning engine, external web services for geographical annotation (i.e. GeoNames); | |
Full integration of GATE to provide clients with an open source and reliable solution as an alternative to commercial solutions.
|
|
| The evangelization effort for this market will begin with existing clients such as Wolters Kluwer, Lexis Nexis, AFP, and Thomson using vectors such as product demonstration, meetings, prototype development and through international exhibitions such as Online / IMS (UK), the main international events for publishers and their IT providers. | |
| 2. Tourism | |
For the Tourism market, the TAO Suite results can be integrated withing existing solutions to satisfy the following requirements:
|
|
| a SOA environment to integrate external services into e-tourism applications; | |
needs for more automatic transitioning from legacy applications to semantic repository SOA environments.
|
|
One examples is Mondeca integrating the results of TAO work on Semantic Web Services into the ITM software platform to enable easier and faster integration of high value-added services into an e-tourism solution. Tourism is the most advanced market for the use of web services such as geographical services (Google maps, Mappy, Maporama...), translation services (Systran), and reservation services.
Transitioning
TAO work on semantic annotation (CA Manager) and transitioning are valuable tools for the facilitation of transitioning of e-tourism projects since the majority of clients have a specific legacy application to be transitioned. Tourism Information System manage a wide diversity of objects (infrastructure, events, catering, cultural heritage, people, companies, places...) along with a rich description of each object. For the tourism market, Mondeca directly integrates TAO Suite components and results into its product offering. Only business results will be promoted as tourism clients are not focused on technology itself but mostly on the short and mid-term business results: more interoperability via web services, and easier transitioning.
|
|
| 3. Knowledge base applications for industry and research | |
TAO partners have developed semantic technologies for Knowledge Management applications for industrial and research clients: PSA (Peugeot), INRA (French Institute for Agronomical Research), and FPNR (French Natural Park Federation). For example, Mondeca also developed a Suite of Semantic Widgets enabling the efficient creation of Semantic Portals for easier access to content and knowledge.
Building rich domain ontology model and terminological resources The TAO Suite can be used to help create both the ontology model and the terminological resources starting from existing unstructured content describing the organisation’s knowledge, since the majority of the existing knowledge will have been captured in documents with no or few semantic organization of the knowledge. TAO Suite components can help to get an overall view of the domain to formalize the ontology by extracting terms and concepts from the legacy content. This material will then be used to formalize the ontology model and build the terminological resources used for content semantic annotation. Content annotation The TAO Suite CA manager will be used in a human assisted process to annotate or enrich legacy content based on the domain ontology to populate the knowledge base application. A similar process will be implemented to assist users when publishing new content in the knowledge base, helping them in the content annotation process by suggesting terms.
|
|
Which applications are more likely to be transitioned to ontology
based solution
Some legacy application are more eligible than other to be migrated to ontology based software:
Applications with complex information structure Applications describing enterprises resources Applications managing referential Applications with constant evolutions of the data model Applications organizing and linking content and knowledge Applications which federate and organize heterogeneous resources |
|
Examples of legacy application candidate for transitioning to ontologies:
|
|
Management of Corporate referential , terminologies, data dictionaries shared by multiple applications
Management of heterogeneous content and knowledge resources (CRM, Information system architecture, business intelligence, organization of projects and resources, inventory of skills and knowledge, databases for call centers and client services…)
Management of complex and rich catalogues of heterogeneous products or services (product catalogues, training catalogues, e-learning catalogues, library catalogue…) |
|
The common characteristic of those applications is to first describe, organize and give access to resources description and multimedia contents.
General guidelines or “best practices for efficient transition�
The following points are what TAO partners consider as “best practices� in a migration process from legacy system to ontology based application:
1. Design ontologies as engineering systems
- Migrate data from legacy sources
- Control integrity of data against a set of constraints
- Support sophisticated user queries
- Federate data as description of business objects
- Federate applications through common definitions of business objects
Ontologies are knowledge representation artifacts built to be integrated in, and generally control, an information system, and have to be considered as the result of an engineering activity. As such, they must follow the general guidelines of any engineering activity, namely specification of need and requirements, iterative evaluation of the relevancy of a solution against requirements, integration with other components and general system architecture, etc. In short, quoting Tom Gruber, It Is What It Does
Typical requirements on what ontologies can "do" include:
2. Build ontologies as part of a transition process
The engineering tasks leading to the building and integration of ontologies can most of the time be considered as part of a transition process from a legacy system (document corpus, data, data schemes, vocabulary...) to a target system which will be ontology driven. The ontology is the backbone of the target system, and part of it has to be extracted, inferred, or otherwise migrated from the legacy system(s). In general, no explicit ontology is defined in the legacy system, but some implicit or latent ontology is present in terms of data structures, and will be identified by a careful audit of the legacy to migrate: existing data bases (schema if available, content, available export formats), document corpus to index, classify, or mine, terminology, controlled vocabulary, entity lists.
Another part of the ontology will be defined from the target system requirements, but there is of course no way to extract this part from the legacy.
3. Know the target system technical constraints
The ontology will be implemented in a software environment, of which technical characteristics have to be known before developing the ontology. The technical architecture and meta-model used in the target system (e.g., Mondeca ITM meta-model) will put specific technical constraints on the ontology ‘species’ and allowed constructs. If reasoning facilities are expected in the target system, the ontology constructs will have to be limited to those constructs supported by the inference tools.
4. Specify the target system functional requirements
The added value of the target system against legacy system is generally defined as a set of extra functional requirements. Those requirements have to be specified clearly and the ability of the ontology to meet them evaluated from a qualitative and quantitative viewpoint (performance). In particular, type and number of queries likely to be performed against knowledge bases, user interfaces expected in read and publication mode, are to be assessed whenever a modeling choice is open. In any case, the “good� modeling decision should not be the one which “represents the best� the “domain reality�, but the one which meets the functional requirements with the best performance.
5. Put the business objects at the core of the ontology
The ontology backbone taxonomy is built around core business object types. Those types are generally the ones known by all system users, from business experts to end users. They can be identified by several methods, such as the objects most frequently queried, those which appear as primary keys in data bases, as main taxonomy categories, terms in controlled vocabulary etc. Those objects will define the “core classes�, along with their main attributes. The core classes are not necessarily the “upper classes� of the ontology, but the most likely to be instantiated and the most often queried in the target system. Those core classes and attributes are generally not many, since they represent the business core. Whatever the way to extract them, their very definition is almost always the opportunity for domain experts and system users to go through a “conceptual audit� of their core business objects, going trough clarification of semantics and business logic, and disambiguation of terminology. In this task the role of the knowledge engineer is critical. He or she must push towards and facilitate the conceptual audit, but keep agnostic on its results. The knowledge engineer is not the domain expert. Stabilization of this core ontology, and its assessment against data migration and functional requirements should be the objective of a prototype system.
As necessary, core classes will be further extended by more generic (abstract) classes, generally in order to federate attributes. The generic classes defined this way are technical, and do not necessarily appear in the end user experience.
6. Manage both business terminology and logic, but don’t confuse them
The business logic which is formalized in the ontology is considered as distinct from, but not independent from, the terminology used to represent the concepts. The distinction is not always easy to grasp by knowledge experts, for whom the logic is strongly embedded in terminology. One task of the knowledge engineer, and particularly at the beginning of the process, is help making this distinction clear, show how the same business logic can be represented by different users using different terms, or the other way round, discover ambiguity of terms hiding several distinct business logic.
7. Consider data throughout the transition process
In an iterative way, and as often as possible, the capacity of the data to be represented using the ontology has to be assessed on samples of legacy data. This task has to be conducted on real data, using the workflow that will be used in production. It has to take into account the data sources and original format, their extraction and transformation in the best ad hoc format (tabulated text, CSV, XML, RDF …), import into the target system with integrity checking, test suite of queries against the resulting knowledge base.
The data migration is often a bottleneck in the process. Legacy data actually are rarely what their administrators think they are, or would like them to be. Assessing integrity of data samples against the ontology and finding inconsistencies can lead either to refine or correct bugs in the ontology, or to help legacy administrators to clean or improve their data in order to make them fit for migration. In most projects, both adjustments are needed.
Related Publications
Amardeilh F. "Semantic Annotation & Ontology Population", In Semantic Web Engineering in the Knowledge Society, Cardoso Jorge and Lytras Miltiadis D. (Eds), Idea Group Reference, 2008. (To Appear). Abstract - PDF.
Amardeilh, F. and Vatant, B. "Functional Wheels and
Conceptual Brakes : Will your ontology take-off?". At TIA07, Sophia
- Antipolias, 08-09 October 2007.
.
Cerbah, F. and Vatant, B. “Building Highly Structured Semantic Repositories through Reuse and Formalisation of Business Standards�, 1st European Semantic Technology Conference, Vienna, May 31 – June 1 2007.
Related Deliverables
D1.2 - SWS bootstrapping methodology
D2.2 - Software - Ontology Learning Software
D3.4 - User tools
D4.2 - Heterogeneous knowledge store
D5.2 - Architecture and integration requirements and specification
D6.2 - Case study 1: Domain ontology and semantic augmentation of legacy content
D7.2 - Case study 2: Domain ontology and semantic augmentation of legacy content
Further Materials

