Abstract

The Semantic Web is supposed to be an extension of the World Wide Web where the meaning of data is available to and processable by computers. The Semantic Web is supposed to be built up of a tower of languages, one level building on and extending the previous. However, there are several impediments to the realization of this vision. First, meaning in the current Semantic Web languages (RDF and RDFS) is not based on meaning in the World Wide Web, so there is a discontinuity between the Semantic Web and the World Wide Web, preventing information from the World Wide Web from being used in the Semantic Web. Second, the current view of RDF and RDFS makes the construction of more-powerful layers on top of them difficult at best and maybe even impossible. I will discuss these problems, propose solutions for parts of them, and lay out some potential solutions for other parts.

This talk was given as part of a SIKS (Dutch Research School for Information and Knowledge Systems) / Ontoweb (European IST Thematic Network) Master Class on "Logical Foundation of the Semantic Web" in Amsterdam, Netherlands, April 2002.

References

This talk is mostly based on two papers:

Peter F. Patel-Schneider and Jerome Simeon. "Building the Semantic Web on XML." International Semantic Web Conference, June 2002.
Peter F. Patel-Schneider and Dieter Fensel. "Layering the Semantic Web: Problems and Directions". International Semantic Web Conference, June 2002.

Semantic Web Vision

Bring structure to web pages
Permit software agents to carry out sophisticated tasks for users
Extension of the current web

(Tim Berners-Lee, James Hendler, and Ora Lassila. ``The Semantic Web''. Scientific American, May 2001.)

Semantic Web Tower

Semantic Web Tower (from Tim Berners-Lee)

Requirements for Semantic Web Languages

Form: The language(s) used in the semantic web need well-defined syntax. .

Otherwise software agents cannot determine what constructs they are using.

Meaning: The language(s) used in the semantic web need well-defined semantics.

Otherwise software agents cannot determine what the constructs that are using mean.

Elements of the Semantic Web Tower

XML - eXtensible Markup Language , including namespaces
XML Schema
RDF - Resource Description Framework
RDF Schema
DAML+OIL , an ontology language
other elements do not yet exist

XML

XML in 10 points

"XML is for structuring data"
XML (now) has a relatively firm form and meaning (XML 1.0 Recommendation)
XML has extra pieces, like XML Schema
"XML is the basis for RDF and the Semantic Web"

XML Schema

"XML Schemas express shared vocabularies"
XML Schemas "provide a means for defining the structure, content, and semantics of XML documents"
XML Schema provides
- a set of data types, and methods for defining new data types
- methods for defining structures, consisting of data types
- methods for associating structures and data types with XML constructs
XML Schemas are written in XML, but the meaning of XML Schemas is not related to the meaning of XML.

RDF

RDF is "a foundation for processsing metadata"
RDF is the language for the semantic web
RDF had a syntax and a data model
RDF is now getting a better syntax and model theory
RDF can be written in XML, but the meaning of RDF is not related to the meaning of XML

RDF Schema

RDF Schema provides mechanisms for defining RDF vocabularies
RDF Schema extends the type system of RDF to
- define classes and properties and inclusion relationships between them
- define domains and ranges for properties
RDF Schema is written in RDF, and the meaning of RDF Schema is an extension of the meaning of RDF

DAML+OIL

DAML+OIL provides a richer set of constructs for defining classes and properties and their relationships
DAML+OIL allows classes
- to have complete definitions
- be enumerations
- to locally restrict properties in various ways
DAML+OIL provides properties that are
- transitive
- functional
- inverse functional
DAML+OIL uses RDF syntax, but does not use the RDF model theory

The Current Vision of the Semantic Web Tower

XML is the base syntactic language for the Semantic Web.
- All Semantic Web languages can be written in XML dialects.
- The meaning of Semantic Web language is not necessarily related to their XML meaning.
RDF is the base language for the Semantic Web.
- All Semantic Web languages use RDF for their syntax.
- All Semantic Web languages build on RDF for their semantics.

Rationale for the Current Vision

XML is supposed to be the mechanism for defining all Web languages.
- XML Schema, XML ..., RDF, RDF Schema, OIL, ....
- Different languages are in different documents.
- Base data is contained in XML documents.
- XML systems can parse all Web documents, but will not understand them.
RDF is supposed to be the mechanism for defining all Semantic Web languages.
- RDF Schema, ....
- Different languages are in the same document.
- All information is contained in RDF documents.
- RDF systems should be able to parse and (partially) understand all Semantic Web documents.

Problems with the Semantic Web Vision

Disconnects at the Foundation
- The XML meaning is not used, so data written in XML cannot be used in the Semantic Web.
- XML Schema is not used in the Semantic Web languages.
An Inadequate Basis
- RDF is inadequate for providing either syntax or semantics for the entire Semantic Web.

A Disconnect at the Foundation

The Semantic Web needs a source of data.
Where will this come from?
- HTML?
- XML and XML Schema?
- RDF?

A Disconnect at the Foundation (po.xml extracts)

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
    </shipTo>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
          </item>
	...
    </items>
</purchaseOrder>

A Disconnect at the Foundation (po.xsd extracts)

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

 ...
</xsd:schema>

RDF is not built on XML

RDF cannot use most XML (and XML Schema) data
XML data
- is totally ordered
- is tree-like
- has no distinction between objects and relationships
- cannot be given identity
- can be regulated and typed using XML Schemas
RDF information
- is unordered
- is in the form of directed graphs
- has a distinction between objects and relationships
- can be given identity

A Digression into Model-Theoretic Semantics

Model-theoretic semantics is an excellent way of providing meaning.

can be tailored for lots of languages
generalize data models
based on
- interpretations - what the world could be like
- models of information - what interpretations are compatible with the information
inference is entailment - are all models of the premise also models of the consequence

Differences between Data Models and Interpretations

Data models have a 1-1 correspondence with pieces of the syntax.
Interpretations can have extra information
There is one data model for a collection of information.
There are many interpretaions for a collection of information
Data models are directly designed for implementation.
Interpretations are abstract notions.

RDF Model-Theoretic Semantics (abstracted)

An RDF interpretation is a triple:

R, a set of resources
IEXT, a mapping from resources to sets of pairs of resources or pairs of resources and strings
CEXT, a mapping from resources to sets of resources or strings

j in CEXT(o) iff <j,o> in EXT(rdf:type)

S, a mapping from URIs into resources

EXT(p) is a set of pairs that define the extension of a property.

An RDF graph is a labelled graph with nodes labelled with either URIs or strings and edges labelled with URIs.

RDF Model-Theoretic Semantics (abstracted)

An RDF interpretation is a model of an RDF graph if there is a mapping N from the nodes of the graph to resources with

for each node, n, with a URI label, u, N(n) = S(u)
for each node, n, with a string label, s, N(n) = s
for each triple in the KB J P O., <N(J),N(O)> is in IEXT(N(P)).

Example RDF Interpretation

<rdf:RDF>
  <rdf:Description about="http://www.w3.org/Home/Lassila">
    <Creator>
      <rdf:Description>
        <Name>
	  Ora Lassila
        </Name>
        <Email>
	  lassila@w3.org
        </Email>
    </Creator>
  </rdf:Description>
</rdf:RDF>

Example RDF Interpretation

Ovals are resources.
Rectanges are strings.
Oval labels are identifiers.
Edge labels are identifiers.
No order.

Another Example RDF Interpretation

XML Model-Theoretic Semantics (abstracted)

An XML interpretation is a node-labelled tree

node labels are either URIs or strings
there is a total order on the outgoing edges of each node
nodes with URI labels are either element nodes or attribute nodes

XML Model-Theoretic Semantics (abstracted)

An XML interpretation is a model of an XML document if there is

a mapping from the elements of the document to element nodes

the root element maps to the root of the tree
the name of the element is the name of the node

a mapping from the attributes of the document to attribute nodes

the name of the attribute is the name of the node

a mapping from attribute values to nodes labelled with strings

the node is a child of the mapping of the attribute
the value of the attribute is the string

a mapping from text to nodes labelled with strings

the value of the attribute is the string

child elements and attributes are mapped to child nodes

child nodes later in document order are greater in the tree order

Example XML Interpretation

Ovals are URI-labelled nodes.
Rectangles are string-labelled nodes.
Order is present.

A New Foundation for the Semantic Web

Harmonize XML data and RDF-ish information model.
- allow partial order
- allow non-tree
- do not distinguish between objects and relationships
- allow identifiers
Incorporate XML Schema into the Semantic Web
- as a way of providing structure and typing for XML documents.
  - Restrict the format of information in XML documents
- as a way of defining classes or types.
  - Independant of any XML document

Integrated Model-Theoretic Semantics (abstracted)

An interpretation is a six-tuple,

R, a set of resources
E, a set of relationships
EXT, a mapping from relationships to pairs of resources or pairs of resources and strings
CEXT, a mapping from resources to sets of resources
O, provides a strict partial order on relationships
S, a mapping from URIs to resources

XML (and RDF) documents are processed into document graphs that are like XML document graphs with the addition of RDF identifiers.

Integrated Model-Theoretic Semantics (abstracted)

An RDF interpretation is a model of an document graph if there is a mapping N from the nodes of the graph to resources with

for each element or attribute node, n, with identifier, r, N(n) = S(u)
for each element or attribute node, n, with name label, u, N(n) in CEXT(S(u))
for each text node, n, with label, s, N(n) = s
for each edge in the graph <n,m>, then there is e in E with EXT(e) = <N(n),N(m)>
if c and d are two children of n then the relationships from above are in the same O order as c and d are in document order

Example Interpretation

Ovals are Resources
- Outside labels are identifiers.
- Inside labels show class membership.
Rectangles are strings
Order is present.

A New Foundation for the Semantic Web

An Inadequate Basis

RDF syntax is triples, with little or no other organization.
- subject, predicate, object
- No possibility of variables, etc.
All RDF triples are asserted facts.
- No possibility of disjunctions, etc.

An Inadequate Basis (daml+oil-ex.daml extracts)

<rdf:RDF ... >

<daml:Ontology rdf:about="">
  <daml:versionInfo>$Id: ....>
  <daml:imports rdf:resource=".../daml+oil"/>
</daml:Ontology>

<daml:Class rdf:ID="Senior">
  <daml:intersectionOf rdf:parseType="daml:collection">
    <daml:Class rdf:about="#Person"/>
    <daml:Restriction>
      <daml:onProperty rdf:resource="#age"/>
      <daml:hasClass rdf:resource=".../daml+oil-ex-dt#over59"/>
    </daml:Restriction>
  </daml:intersectionOf>
</daml:Class>

An Inadequate Basis (daml+oil-ex.daml extracts)

<daml:Class rdf:ID="Height">
  <daml:oneOf rdf:parseType="daml:collection">
    <Height rdf:ID="short"/>
    <Height rdf:ID="medium"/>
    <Height rdf:ID="tall"/>
  </daml:oneOf>
</daml:Class>
<daml:Class rdf:ID="TallThing">
  <daml:sameClassAs>
    <daml:Restriction>
      <daml:onProperty rdf:resource="#hasHeight"/>
      <daml:hasValue rdf:resource="#tall"/>
    </daml:Restriction>
  </daml:sameClassAs>
</daml:Class>
</rdf:RDF>

Syntax Problems

Triples make syntax unnatural.
- Restrictions have to be split up.
- Collections have to split up.
  - DAML+OIL defines a special syntax extension for collections.
Triples allow for deviant syntax.
- Restrictions with too many pieces.
- Collections with missing or multiple components.
Triples allow additions at any time.
- Can't forbid new triples attached to old syntax.

Semantic Problems

Everything is triples; all triples have meaning.
All aspects of syntax contribute to meaning.
- e.g., ordering in collections
- the intersection of Student and Employee is different from the intersection of Employee and Student
All syntax refers to something.
- e.g., descriptions live in the domain
- to infer membership in a description, the description has to exist

Semantic Problems - A Theory of Classes

A Desirable Inference:

Premises:: John is an instance of Student.
John is an instance of Employee.
Conclusion:: John is an instance of the intersection of Student and Employee.

In an RDF extension, this requires that the intersection automatically exists whenever Student and Employee exists.
Many more of these examples can be devised.

Semantic Problems - A Theory of Classes

A Unfortunate Inference:

Premise:
Conclusion 1:: rdf:type is an instance of the restriction whose instances do not have an rdf:type link to the restriction itself.
Conclusion 2:: rdf:type is not an instance of the restriction whose instances do not have an rdf:type link to the restriction itself.

A Better Basis

Treat RDF (or XML) as a language for providing facts.
Use different syntaxes for the other Semantic Web languages.
- e.g., (part of) OIL for Ontologies
Similar to the situation with XML and XML Schema.

A Possible Ontology Language

An Ontology Language for the Semantic Web

A Possible Ontology Language - Example

<fowl:Ontology ...>

<DefinedClass ID="Woman">
   <superClasses>
     <class ID="Person" />
     <class ID="Female" />
   </superClasses>
</DefinedClass>

<DefinedClass ID="MarriedPerson">
   <superClasses>
     <class ID="Person" />
   </superClasses>
   <slot property="hasSpouse" required="true" singlevalued="true" />
</DefinedClass>

</fowl:Ontology>

DAML+OIL Version

<rdf:RDF ... >

<daml:Class rdf:ID="Woman">
  <daml:sameClassAs>
    <daml:intersectionOf rdf:parseType="daml:collection">
      <daml:Class rdf:about="#Person"/>
      <daml:Class rdf:about="#Female"/>
    </daml:intersectionOf>
  </daml:sameClassAs>
</daml:Class>

...

</rdf:RDF>

DAML+OIL Version

<daml:Class rdf:ID="Woman">
  <daml:sameClassAs>
    <daml:intersectionOf rdf:parseType="daml:collection">
      <daml:Class rdf:about="#Person"/>
      <daml:Restriction>
	<daml:onProperty daml:minCardinality="1">
	  <rdf:Property rdf:about="#hasSpouse"/>
	</daml:onProperty>
      </daml:Restriction>
      <daml:Restriction>
	<daml:onProperty daml:maxCardinality="1">
	  <rdf:Property rdf:about="#hasSpouse"/>
	</daml:onProperty>
      </daml:Restriction>
    </daml:intersectionOf>
  </daml:sameClassAs>
</daml:Class>

A New Semantic Web Vision

Not from me!
Buiding more of the Semantic Web may expose more problems.
- trust, certainty, ...
However, the more flexible yet integrated the better!
- want information to flow between the levels
- a single syntax and a single semantics is just too restrictive

Building the Semantic Web Tower

Abstract

References

Semantic Web Vision

Semantic Web Tower

Requirements for Semantic Web Languages

Elements of the Semantic Web Tower

XML

XML Schema

RDF

RDF Schema

DAML+OIL

The Current Vision of the Semantic Web Tower

Rationale for the Current Vision

Problems with the Semantic Web Vision

A Disconnect at the Foundation

A Disconnect at the Foundation (po.xml extracts)

A Disconnect at the Foundation (po.xsd extracts)

RDF is not built on XML

A Digression into Model-Theoretic Semantics

Differences between Data Models and Interpretations

RDF Model-Theoretic Semantics (abstracted)

RDF Model-Theoretic Semantics (abstracted)

Example RDF Interpretation

Example RDF Interpretation

Another Example RDF Interpretation

XML Model-Theoretic Semantics (abstracted)

XML Model-Theoretic Semantics (abstracted)

Example XML Interpretation

A New Foundation for the Semantic Web

Integrated Model-Theoretic Semantics (abstracted)

Integrated Model-Theoretic Semantics (abstracted)

Example Interpretation

A New Foundation for the Semantic Web

An Inadequate Basis

An Inadequate Basis (daml+oil-ex.daml extracts)

An Inadequate Basis (daml+oil-ex.daml extracts)

Syntax Problems

Semantic Problems

Semantic Problems - A Theory of Classes

Semantic Problems - A Theory of Classes

A Better Basis

A Possible Ontology Language

A Possible Ontology Language - Example

DAML+OIL Version

DAML+OIL Version

A New Semantic Web Vision