Building the Semantic Web on XML

by Peter F. Patel-Schneider
Jerome Simeon

Abstract

The semantic discontinuity between World-Wide Web languages, e.g., XML, XML Schema, and XPath, and Semantic Web languages, e.g., RDF, RDFS, and DAML+OIL, forms a serious barrier for the stated goals of the Semantic Web. This discontinuity results from a difference in modeling foundations between XML and logics. We propose to eliminate that discontinuity by creating a common semantic foundation for both the World-Wide Web and the Semantic Web, taking ideas from both. The common foundation results in essentially no change to XML, and only minor changes to RDF. But it allows the Semantic Web to get closer to its goal of describing the semantics of the World Wide Web. Other Semantic Web languages (including RDFS and DAML+OIL) are considerably changed because of this common foundation.

Semantic Web Vision

Bring structure to web pages
Permit software agents to carry out sophisticated tasks for users
Extension of the current web

(Tim Berners-Lee, James Hendler, and Ora Lassila. ``The Semantic Web''. Scientific American, May 2001.)

Requirements for Semantic Web Languages

Form: The languages used in the semantic web need well-defined syntax.

Otherwise software agents cannot determine what constructs they are using.

Meaning: The languages used in the semantic web need well-defined semantics.

Otherwise software agents cannot determine what the constructs that are using mean.

Semantic Web Tower

Semantic Web Tower (from Tim Berners-Lee)

Elements of the Semantic Web Tower

XML - eXtensible Markup Language , including namespaces -
- "XML is the basis for RDF and the Semantic Web" (XML in 10 points)
XML Schema -
- XML Schemas "provide a means for defining the structure, content, and semantics of XML documents"
RDF - Resource Description Framework -
- RDF is "a foundation for processing metadata"
RDF Schema -
- RDF Schema provides mechanisms for defining RDF vocabularies
DAML+OIL , an ontology language -
- DAML+OIL uses RDF syntax, but does not use the RDF semantic
other elements do not yet exist
- new ontology language (OWL) under development

The Current Vision of the Semantic Web Tower

XML is the base syntactic language for the Semantic Web.
- All Semantic Web languages can be written in XML dialects.
- The meaning of Semantic Web language is not necessarily related to their XML meaning.
RDF is the base language for the Semantic Web.
- All Semantic Web languages use RDF for their syntax.
- All Semantic Web languages build on RDF for their semantics.

Rationale for the Current Vision

XML is supposed to be the mechanism for defining all Web languages.
- XML Schema, XML ..., RDF, RDF Schema, OIL, ....
- Different languages are in different documents.
- Base data is contained in XML documents.
- XML systems can parse all Web documents, but will not understand them.
RDF is supposed to be the mechanism for defining all Semantic Web languages.
- RDF Schema, ....
- Different languages are in the same document.
- All information is contained in RDF documents.
- RDF systems should be able to parse and (partially) understand all Semantic Web documents.

Problems with the Semantic Web Vision

Disconnects at the Foundation
- The XML meaning is not used, so data written in XML cannot be used in the Semantic Web.
- XML Schema is not used in the Semantic Web languages.
An Inadequate Basis
- RDF is inadequate for providing either syntax or semantics for the entire Semantic Web.

A Disconnect at the Foundation

The Semantic Web needs a source of data.
Where will this come from?
- HTML?
- XML and XML Schema?
- RDF?

A Disconnect at the Foundation (po.xml extracts)

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
    </shipTo>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
          </item>
	...
    </items>
</purchaseOrder>

A Disconnect at the Foundation (po.xsd extracts)

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

 ...
</xsd:schema>

RDF is not built on XML

RDF cannot use most XML (and XML Schema) data
XML data
- is totally ordered
- is tree-like
- has no distinction between objects and relationships
- cannot be given identity
- can be regulated and typed using XML Schemas
RDF information
- is unordered
- is in the form of directed graphs
- has a distinction between objects and relationships
- can be given identity

Providing Meaning for Semantic Web Languages

Model-theoretic semantics is an excellent way of providing meaning.

can be tailored for lots of languages
generalize data models
based on
- interpretations - what the world could be like
- models of information - what interpretations are compatible with the information
inference is entailment - are all models of the premise also models of the consequence

RDF Model-Theoretic Semantics (heavily abstracted)

An RDF interpretation is a node- and edge-labelled graph

labels are identifiers (not types)
node labels are either URIs or strings
- nodes with strings as labels have no outgoing edges
edge labels are URIs
there is no order in the graph

RDF Model-Theoretic Semantics (heavily abstracted)

An RDF interpretation is a model of an RDF document if there is

a mapping from the description elements in the document to URI-labelled or unlabelled nodes in the interpretation
- if the document element has an ID, then the node has that ID as a label
- if the document element has a name, then the node is linked via an rdf:type labelled edge to a node with that name as label
- if the document element is a string, then the node has that string as a label
a mapping from the property elements in the document to edges in the graph that relate the two description elements and that have the ID of the element as label

XML Model-Theoretic Semantics (abstracted)

An XML interpretation is a node-labelled tree

node labels indicate typing information (not identification)
node labels are either QNames or strings
- nodes with QName labels are either element nodes or attribute nodes
- nodes with strings as labels have no outgoing edges
there is a total order on the outgoing edges of each node

XML Model-Theoretic Semantics (abstracted)

An XML interpretation is a model of an XML document if there is

a mapping from the elements of the document to element nodes

the root element maps to the root of the tree
the name of the element is the name of the node

a mapping from the attributes of the document to attribute nodes

the name of the attribute is the name of the node

a mapping from attribute values to nodes labelled with strings

the node is a child of the mapping of the attribute
the value of the attribute is the string

a mapping from text to nodes labelled with strings

the value of the attribute is the string

child elements and attributes are mapped to child nodes

child nodes later in document order are greater in the tree order

Example RDF Interpretation

Ovals are resources.
Rectanges are strings.
Oval labels are identifiers.
Edge labels are identifiers.
No order.

Another Example RDF Interpretation

Ovals are resources.
Rectanges are strings.
Oval labels are identifiers.
Edge labels are identifiers.
No order.

Example XML Interpretation

Ovals are URI-labelled nodes.
Rectangles are string-labelled nodes.
Order is present.

A New Foundation for the Semantic Web

Harmonize XML data and RDF-ish information model and build a model theory
- have partial order in interpretations
- allow non-tree graphs in interpretations
- labels only on nodes
- do not distinguish between objects and relationships
- allow RDF-style identifiers
Incorporate XML Schema into the Semantic Web

as a way of providing structure and typing for XML documents.
- Restrict the format of information in XML documents
as a way of defining classes or types.
- Independent of any XML document

Integrated Model-Theoretic Semantics (abstracted)

An interpretation is a six-tuple,

R, a set of resources
E, a set of relationships
EXT, a mapping from relationships to pairs of resources or pairs of resources and strings
CEXT, a mapping from resources to sets of resources
O, provides a strict partial order on relationships
S, a mapping from URIs to resources

XML (and RDF) documents are processed into document graphs that are like XML document graphs with the addition of RDF identifiers.

Integrated Model-Theoretic Semantics (abstracted)

An RDF interpretation is a model of an document graph if there is a mapping N from the nodes of the graph to resources with

for each element or attribute node, n, with identifier, r, N(n) = S(u)
for each element or attribute node, n, with name label, u, N(n) in CEXT(S(u))
for each text node, n, with label, s, N(n) = s
for each edge in the graph <n,m>, then there is e in E with EXT(e) = <N(n),N(m)>
if c and d are two children of n then the relationships from above are in the same O order as c and d are in document order

Example Interpretation

Ovals are Resources
- Outside labels are identifiers.
- Inside labels show class membership.
Rectangles are strings
Order is present.

A New Foundation for the Semantic Web

Status

Just a start for this idea so far
Lots of work still needed
- Integrate XML Schema
- Extend to ontology language
This is a change to the model theory for logic!

A New Semantic Web Vision

Not from me!
Building more of the Semantic Web may expose more problems.
- trust, certainty, ...
However, the more flexible yet integrated the better!
- want information to flow between the levels
- a single syntax and a single semantics is just too restrictive