Abstract
The Semantic Web is supposed to be an extension of the World Wide Web where
the meaning of data is available to and processable by computers.
The Semantic Web is supposed to be built up of a tower of languages, one
level building on and extending the previous.
However, there are several impediments to the realization of this vision.
First, meaning in the current Semantic Web languages (RDF and RDFS) is not
based on meaning in the World Wide Web, so there is a discontinuity between
the Semantic Web and the World Wide Web, preventing information from the
World Wide Web from
being used in the Semantic Web.
Second, the current view of RDF and RDFS
makes the construction of more-powerful layers on top of them difficult at
best and maybe even impossible.
I will discuss these problems, propose solutions for parts of
them, and lay out some potential solutions for other parts.
This talk was given as part of a
SIKS (Dutch Research School for Information and Knowledge Systems) /
Ontoweb (European IST Thematic Network) Master Class on
"Logical Foundation of the Semantic Web" in
Amsterdam, Netherlands, April 2002.
References
This talk is mostly based on two papers:
- Peter F. Patel-Schneider and Jerome Simeon.
"Building the Semantic Web on XML."
International Semantic Web Conference, June 2002.
- Peter F. Patel-Schneider and Dieter Fensel.
"Layering the Semantic Web: Problems and Directions".
International Semantic Web Conference, June 2002.
Semantic Web Vision
- Bring structure to web pages
- Permit software agents to carry out sophisticated tasks for users
- Extension of the current web
(Tim Berners-Lee, James Hendler, and Ora Lassila.
``The Semantic Web''. Scientific American, May 2001.)
Semantic Web Tower
Semantic Web Tower (from Tim Berners-Lee)
Requirements for Semantic Web Languages
Form:
The language(s) used in the semantic web need well-defined syntax.
.
- Otherwise software agents cannot determine what constructs
they are using.
Meaning:
The language(s) used in the semantic web need well-defined semantics.
- Otherwise software agents cannot determine what the
constructs that are using mean.
Elements of the Semantic Web Tower
- "XML Schemas express shared vocabularies"
- XML Schemas "provide a means for defining the structure, content, and
semantics of XML documents"
- XML Schema provides
- a set of data types, and methods for defining new data
types
- methods for defining structures, consisting of data types
- methods for associating structures and data types with XML
constructs
- XML Schemas are written in XML,
but the meaning of XML Schemas is not related to the meaning
of XML.
- RDF is "a foundation for processsing metadata"
- RDF is the language for the semantic web
- RDF had a syntax and a
data model
- RDF is now getting
a better syntax
and model theory
- RDF can be written in XML,
but the meaning of RDF is not related to the meaning of XML
- RDF Schema provides mechanisms for defining RDF vocabularies
- RDF Schema extends the type system of RDF to
- define classes and properties and inclusion relationships between
them
- define domains and ranges for properties
- RDF Schema is written in RDF,
and the meaning of RDF Schema is an extension of the meaning of RDF
- DAML+OIL provides a richer set of constructs for defining classes and
properties and their relationships
- DAML+OIL allows classes
- to have complete definitions
- be enumerations
- to locally restrict properties in various ways
- DAML+OIL provides properties that are
- transitive
- functional
- inverse functional
- DAML+OIL uses RDF syntax, but
does not use the RDF model theory
The Current Vision of the Semantic Web Tower
- XML is the base syntactic language for the Semantic Web.
- All Semantic Web languages can be written in XML dialects.
- The meaning of Semantic Web language is not necessarily
related to their XML meaning.
- RDF is the base language for the Semantic Web.
- All Semantic Web languages use RDF for their syntax.
- All Semantic Web languages build on RDF for their semantics.
Rationale for the Current Vision
- XML is supposed to be the mechanism for defining all Web languages.
- XML Schema, XML ..., RDF, RDF Schema, OIL, ....
- Different languages are in different documents.
- Base data is contained in XML documents.
- XML systems can parse all Web documents, but will not
understand them.
- RDF is supposed to be the mechanism for defining all Semantic Web
languages.
- RDF Schema, ....
- Different languages are in the same document.
- All information is contained in RDF documents.
- RDF systems should be able to parse and (partially) understand
all Semantic Web documents.
Problems with the Semantic Web Vision
- Disconnects at the Foundation
- The XML meaning is not used, so data written in XML cannot be
used in the Semantic Web.
- XML Schema is not used in the Semantic Web languages.
- An Inadequate Basis
- RDF is inadequate for providing either syntax or semantics for
the entire Semantic Web.
A Disconnect at the Foundation
- The Semantic Web needs a source of data.
- Where will this come from?
- HTML?
- XML and XML Schema?
- RDF?
A Disconnect at the Foundation (po.xml extracts)
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
</shipTo>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
</item>
...
</items>
</purchaseOrder>
A Disconnect at the Foundation (po.xsd extracts)
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
...
</xsd:schema>
RDF is not built on XML
- RDF cannot use most XML (and XML Schema) data
- XML data
- is totally ordered
- is tree-like
- has no distinction between objects and relationships
- cannot be given identity
- can be regulated and typed using XML Schemas
- RDF information
- is unordered
- is in the form of directed graphs
- has a distinction between objects and relationships
- can be given identity
A Digression into Model-Theoretic Semantics
Model-theoretic semantics is an excellent way of providing meaning.
- can be tailored for lots of languages
- generalize data models
- based on
- interpretations - what the world could be like
- models of information - what interpretations are compatible
with the information
- inference is entailment - are all models of the premise also models of
the consequence
Differences between Data Models and Interpretations
- Data models have a 1-1 correspondence with pieces of the syntax.
Interpretations can have extra information
- There is one data model for a collection of information.
There are many interpretaions for a collection of information
- Data models are directly designed for implementation.
Interpretations are abstract notions.
An RDF interpretation is a triple:
- R, a set of resources
- IEXT, a mapping from resources to sets of pairs of resources or
pairs of resources and strings
- CEXT, a mapping from resources to sets of resources or strings
- j in CEXT(o) iff
<j,o> in EXT(rdf:type)
- S, a mapping from URIs into resources
EXT(p) is a set of pairs that define the extension of a property.
An RDF graph is a labelled graph with nodes labelled with either URIs or
strings and edges labelled with URIs.
An RDF interpretation is a model of an RDF graph
if there is a mapping N from the nodes of the graph to resources with
- for each node, n, with a URI label, u,
N(n) = S(u)
- for each node, n, with a string label, s,
N(n) = s
- for each triple in the KB J P O.,
<N(J),N(O)> is in IEXT(N(P)).
Example RDF Interpretation
<rdf:RDF>
<rdf:Description about="http://www.w3.org/Home/Lassila">
<Creator>
<rdf:Description>
<Name>
Ora Lassila
</Name>
<Email>
lassila@w3.org
</Email>
</Creator>
</rdf:Description>
</rdf:RDF>
Example RDF Interpretation
- Ovals are resources.
- Rectanges are strings.
- Oval labels are identifiers.
- Edge labels are identifiers.
- No order.
Another Example RDF Interpretation
XML Model-Theoretic Semantics (abstracted)
An XML interpretation is a node-labelled tree
- node labels are either URIs or strings
- there is a total order on the outgoing edges of each node
- nodes with URI labels are either element nodes or attribute nodes
XML Model-Theoretic Semantics (abstracted)
An XML interpretation is a model of an XML document if there is
- a mapping from the elements of the document to element nodes
- the root element maps to the root of the tree
- the name of the element is the name of the node
- a mapping from the attributes of the document to attribute nodes
- the name of the attribute is the name of the node
- a mapping from attribute values to nodes labelled with strings
- the node is a child of the mapping of the attribute
- the value of the attribute is the string
- a mapping from text to nodes labelled with strings
- the value of the attribute is the string
- child elements and attributes are mapped to child nodes
- child nodes later in document order are greater in the tree
order
Example XML Interpretation
- Ovals are URI-labelled nodes.
- Rectangles are string-labelled nodes.
- Order is present.
A New Foundation for the Semantic Web
- Harmonize XML data and RDF-ish information model.
- allow partial order
- allow non-tree
- do not distinguish between objects and relationships
- allow identifiers
- Incorporate XML Schema into the Semantic Web
- as a way of providing structure and typing for XML documents.
- Restrict the format of information in XML documents
- as a way of defining classes or types.
- Independant of any XML document
Integrated Model-Theoretic Semantics (abstracted)
An interpretation is a six-tuple,
- R, a set of resources
- E, a set of relationships
- EXT, a mapping from relationships to pairs of resources or
pairs of resources and strings
- CEXT, a mapping from resources to sets of resources
- O, provides a strict partial order on relationships
- S, a mapping from URIs to resources
XML (and RDF) documents are processed into document graphs that are like
XML document graphs with the addition of RDF identifiers.
Integrated Model-Theoretic Semantics (abstracted)
An RDF interpretation is a model of an document graph
if there is a mapping N from the nodes of the graph to resources with
- for each element or attribute node, n, with identifier, r,
N(n) = S(u)
- for each element or attribute node, n, with name label, u,
N(n) in CEXT(S(u))
- for each text node, n, with label, s,
N(n) = s
- for each edge in the graph <n,m>, then there is
e in E with
EXT(e) = <N(n),N(m)>
- if c and d are two children of
n then the relationships from above are in the same
O order as c and d are in
document order
Example Interpretation
- Ovals are Resources
- Outside labels are identifiers.
- Inside labels show class membership.
- Rectangles are strings
- Order is present.
A New Foundation for the Semantic Web
An Inadequate Basis
- RDF syntax is triples, with little or no other organization.
- subject, predicate, object
- No possibility of variables, etc.
- All RDF triples are asserted facts.
- No possibility of disjunctions, etc.
<rdf:RDF ... >
<daml:Ontology rdf:about="">
<daml:versionInfo>$Id: ....>
<daml:imports rdf:resource=".../daml+oil"/>
</daml:Ontology>
<daml:Class rdf:ID="Senior">
<daml:intersectionOf rdf:parseType="daml:collection">
<daml:Class rdf:about="#Person"/>
<daml:Restriction>
<daml:onProperty rdf:resource="#age"/>
<daml:hasClass rdf:resource=".../daml+oil-ex-dt#over59"/>
</daml:Restriction>
</daml:intersectionOf>
</daml:Class>
<daml:Class rdf:ID="Height">
<daml:oneOf rdf:parseType="daml:collection">
<Height rdf:ID="short"/>
<Height rdf:ID="medium"/>
<Height rdf:ID="tall"/>
</daml:oneOf>
</daml:Class>
<daml:Class rdf:ID="TallThing">
<daml:sameClassAs>
<daml:Restriction>
<daml:onProperty rdf:resource="#hasHeight"/>
<daml:hasValue rdf:resource="#tall"/>
</daml:Restriction>
</daml:sameClassAs>
</daml:Class>
</rdf:RDF>
Syntax Problems
- Triples make syntax unnatural.
- Restrictions have to be split up.
- Collections have to split up.
- DAML+OIL defines a special syntax extension for
collections.
- Triples allow for deviant syntax.
- Restrictions with too many pieces.
- Collections with missing or multiple components.
- Triples allow additions at any time.
- Can't forbid new triples attached to old syntax.
Semantic Problems
- Everything is triples; all triples have meaning.
- All aspects of syntax contribute to meaning.
- e.g., ordering in collections
- the intersection of Student and Employee is different
from the intersection of Employee and Student
- All syntax refers to something.
- e.g., descriptions live in the domain
- to infer membership in a description, the description
has to exist
Semantic Problems - A Theory of Classes
A Desirable Inference:
- Premises:
- John is an instance of Student.
John is an instance of Employee.
- Conclusion:
- John is an instance of the intersection of Student and Employee.
- In an RDF extension, this requires that the intersection automatically
exists whenever Student and Employee exists.
- Many more of these examples can be devised.
Semantic Problems - A Theory of Classes
A Unfortunate Inference:
- Premise:
-
- Conclusion 1:
- rdf:type is an instance of the restriction whose instances do not
have an rdf:type link to the restriction itself.
- Conclusion 2:
- rdf:type is not an instance of the restriction whose instances
do not have an rdf:type link to the restriction itself.
A Better Basis
- Treat RDF (or XML) as a language for providing facts.
- Use different syntaxes for the other Semantic Web languages.
- e.g., (part of) OIL for Ontologies
- Similar to the situation with XML and XML Schema.
A Possible Ontology Language - Example
<fowl:Ontology ...>
<DefinedClass ID="Woman">
<superClasses>
<class ID="Person" />
<class ID="Female" />
</superClasses>
</DefinedClass>
<DefinedClass ID="MarriedPerson">
<superClasses>
<class ID="Person" />
</superClasses>
<slot property="hasSpouse" required="true" singlevalued="true" />
</DefinedClass>
</fowl:Ontology>
DAML+OIL Version
<rdf:RDF ... >
<daml:Class rdf:ID="Woman">
<daml:sameClassAs>
<daml:intersectionOf rdf:parseType="daml:collection">
<daml:Class rdf:about="#Person"/>
<daml:Class rdf:about="#Female"/>
</daml:intersectionOf>
</daml:sameClassAs>
</daml:Class>
...
</rdf:RDF>
DAML+OIL Version
<daml:Class rdf:ID="Woman">
<daml:sameClassAs>
<daml:intersectionOf rdf:parseType="daml:collection">
<daml:Class rdf:about="#Person"/>
<daml:Restriction>
<daml:onProperty daml:minCardinality="1">
<rdf:Property rdf:about="#hasSpouse"/>
</daml:onProperty>
</daml:Restriction>
<daml:Restriction>
<daml:onProperty daml:maxCardinality="1">
<rdf:Property rdf:about="#hasSpouse"/>
</daml:onProperty>
</daml:Restriction>
</daml:intersectionOf>
</daml:sameClassAs>
</daml:Class>
A New Semantic Web Vision
- Not from me!
- Buiding more of the Semantic Web may expose more problems.
- However, the more flexible yet integrated the better!
- want information to flow between the levels
- a single syntax and a single semantics is just too restrictive