<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
HaXml: Haskell and XML
</title>
</head>
<body bgcolor='#ffffff'>
<center>
<h1>HaXml</h1>
<table><tr><td width=200 align=center>
<a href="#what">What is HaXml?</a><br>
<a href="#how">How do I use it?</a><br>
<a href="#download">Downloads</a><br>
</td><td width=200 align=center>
<a href="#news">Recent news</a><br>
<a href="#who">Contacts</a><br>
<a href="#related">Related Work</a><br>
</td></tr></table>
</center>
<hr>
<p>
<font color="red">
<b>Warning!</b> The development versions (1.14 upwards) significantly
change the API of some modules! They may be incomplete, inconsistent,
and liable to change before the next release! Do not expect code
written against an earlier API to be compatible! DtdToHaskell has only
recently been fixed to work with the new APIs! <b>Warning!</b>
</font>
<a href="migrate.html">Notes for migrating code from the 1.13 version of
HaXml to the development version.</a>
<hr>
<center><h3><a name="what">What is HaXml?</a></h3></center>
<p>
<b>HaXml</b> is a collection of utilities for parsing, filtering,
transforming, and generating
<a href="http://www.w3.org/TR/REC-xml">XML</a> documents using
<a href="http://www.haskell.org">Haskell</a>. Its basic facilities
include:
<ul>
<li> a parser for XML,
<li> a separate error-correcting parser for HTML,
<li> a SAX-like stream parser for XML events,
<li> an XML validator,
<li> pretty-printers for XML and HTML.
</ul>
<p>
For processing XML documents, the following components are also provided:
<ul>
<li><em>Combinators</em> is a combinator library for generic XML document
processing, including transformation, editing, and generation.
<li><em>XmlContent</em> is a replacement class for Haskell's Show/Read
classes: it allows you to read and write ordinary Haskell data as XML
documents (and vice versa). The <em>DrIFT</em> tool (available from
<a href="http://repetae.net/~john/computer/haskell/DrIFT/">
<tt>http://repetae.net/~john/computer/haskell/DrIFT/</tt></a>)
can automatically derive this class for you.
<li><em>DtdToHaskell</em> is a tool for translating any valid XML DTD
into equivalent Haskell types, together with <em>XmlContent</em> instances.
<li>Finally, <em>Xtract</em> is a <em>grep</em>-like tool for XML documents,
loosely based on the XPath and XQL query languages. It can be used
either from the command-line, or within your own code as part of the
library.
</ul>
<hr>
<center><h3><a name="how">How do I use it?</a></h3></center>
<p>
<a href="HaXml/index.html">Detailed documentation of the HaXml APIs</a>
is generated automatically by Haddock directly from the source code.
<a href="http://haskell.org/HaXml/index.html">Documentation for the
previous (stable) version, HaXml-1.13.2</a>.
<p>
An introduction to HaXml for people who know more about XML than
about Haskell can be found at
<a href="http://www-106.ibm.com/developerworks/xml/library/x-matters14.html">
IBM DeveloperWorks</a>.
Please note that the DeveloperWorks article was based on an older
version of HaXml. If you try to use the examples given there, you
will need a couple of minor but important edits, given as a
<a href="developerworks.patch">diff patch here</a>.
<p>
Koen Roelandt has written a more recent tutorial about using HaXml
to clean up some ugly HTML pages.
<a href="http://www.krowland.net/tutorials/haxml_tutorial.html">
http://www.krowland.net/tutorials/haxml_tutorial.html</a>
<p>
A paper describing and comparing the generic Combinators with
the typed representation (DtdToHaskell/XmlContent) is available here:
(12 pages of double-column A4)
<ul>
<li> <a href="icfp99.dvi">icfp99.dvi</a> (LaTeX dvi format)
<li> <a href="icfp99.ps.gz">icfp99.ps.gz</a> (PostScript format - gzipped)
<li> <a href="icfp99.html">icfp99.html</a> (HTML format)
</ul>
<p>
Some additional info about using the various facilities is here:
<ul>
<li> <a href="Combinators.html">Combinators</a> (HTML format)
<li> <a href="XmlContent.html">XmlContent class</a> (HTML format)
<li> <a href="DtdToHaskell.html">DtdToHaskell tool</a> (HTML format)
<li> <a href="Xtract.html">Xtract tool</a> (HTML format)
</ul>
<p>
<b>Known problems:</b>
<ul>
<li> To use <em>-package HaXml</em> interactively with GHCi, you need
at least ghci-5.02.3.
<li> The function toDTD generates Parameter Entity Declarations in the internal
subset of the DTD, which don't conform to the strict well-formedness
conditions of XML. We think the constraint in question is spurious,
and any reasonable XML tool ought to deal adequately with full PEs.
Nevertheless, many standard XML processors reject these auto-generated
DTDs. The solution is easy - just write the DTD into a separate file!
<li> DtdToHaskell generates the Haskell String type for DTD attributes
that are of Tokenized or Notation Types in XML. This may not be
entirely accurate.
</ul>
<hr>
<center><h3><a name="download">Downloads</a></h3></center>
<p>
<b>Development versions:</b><br>
HaXml-1.19.1, release date 2007.11.01<br>
By HTTP:
<a href="http://www.cs.york.ac.uk/fp/HaXml-devel/HaXml-1.19.1.tar.gz">.tar.gz</a>,
<a href="http://www.cs.york.ac.uk/fp/HaXml-devel/HaXml-1.19.1.zip">.zip</a>.
<br>
By FTP:
<a href="ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/">
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/</a>
<p>
<b>Ongoing development:</b>
The development version of HaXml is also available through
<br><tt><a href="http://darcs.net/">darcs</a> get
http://www.cs.york.ac.uk/fp/darcs/HaXml</tt>
<p>
<b>Older versions:</b><br>
Stable version: for 1.13.2 see
<a href="http://haskell.org/HaXml/">
http://haskell.org/HaXml/</a>
<br>
By FTP:
<a href="ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/">
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/</a>
<br>
FreeBSD port:
<a href="http://freshports.org/textproc/hs-haxml/">
http://freshports.org/textproc/haxml/</a>
<center><h3><a name="install">Installation</a></h3></center>
<p>
To install HaXml, you must have a Haskell compiler: <em>ghc-6.2</em>
or later, and/or <em>nhc98-1.16/hmake-3.06</em> or later, and/or
<em>Hugs98 (Sept 2003)</em> or later. You must also first download and
install the <a href="http://www.cs.york.ac.uk/fp/polyparse">polyparse</a>
package as a pre-requisite.
<p>
Then, for more recent compilers,
use the standard Cabal method of installation:
<pre>
runhaskell Setup.hs configure [--prefix=...] [--buildwith=...]
runhaskell Setup.hs build
runhaskell Setup.hs install
</pre>
For older compilers, use:
<pre>
./configure [--prefix=...] [--buildwith=...]
make
make install
</pre>
to configure, build, and install HaXml as a package for your
compiler(s). You need write permission on the library installation
directories of your compiler(s). Afterwards, to gain access to
the HaXml libraries, you only need to add the option <tt>-package
HaXml</tt> to your compiler commandline (no option required for Hugs).
Various stand-alone tools are also built - DtdToHaskell, Xtract,
Validate, MkOneOf - and copied to the final installation location
specified by the <tt>--prefix=...</tt> option to <tt>configure</tt>.
<p>
To build/install on a Windows system without the Cygwin shell and
utilities, you can avoid the configure/make steps by simply using the
minimal <em>Build.bat</em> script. Edit it first for the location
of your compiler etc.
<hr>
<center><h3><a name="news">Recent news</a></h3></center>
<p>
Version 1.19.1 fixes a build error in 1.19.
Version 1.19 improved the lazy XML parsing, and fixed some space leaks
in the XtractLazy tool.
<p>
Version 1.18 pulled out the parser combinator libraries as a separate
package (called polyparse), which must now be downloaded and installed
before installing HaXml.
<p>
Version 1.17 essentially just fixes compatibility with ghc-6.6.
However, it also include a lazier pretty-printer to use in conjunction
with the lazy parser, to save running out of memory on large datasets.
<p>
Version 1.16 adds laziness to the parser combinator libraries, such that
they can start to return partial results before a whole entity has been
parsed. Partial is also used in the sense that the returned value can
contain bottom - an error which gets thrown as an exception when you try
to explore the inner regions of the value. In terms of XML, it means you
get an element back as soon as its start-tag has been consumed, but if
there are parse errors later on, BOOM. However, if there are no errors,
it does mean that your processing will be (a) faster and (b) less memory
hungry. Another cool thing is that, even in the presence of errors, you
still might get enough output to satisfy your processing task before the
error is noticed.
<p>
Use <tt>Text.XML.HaXml.ParseLazy</tt> and
<tt>Text.XML.HaXml.Html.ParseLazy</tt> to try it out. There are also
lazy versions of the supplied demo programs: <tt>CanonicaliseLazy</tt>
and <tt>XtractLazy</tt>.
<p>
Version 1.15 is essentially 1.14 with some bugfixes, and some new
functionality, especially in the parser combinator libraries. DrIFT now
supports deriving the XmlContent class, and DtdToHaskell now also
derives the XmlContent class, in addition to determining a collection of
Haskell datatypes equivalent to a given DTD.
<p>
Error messages from parsing are much improved in 1.15 - they should
locate any error far more specifically and accurately. Let me know
about examples which do not report correctly.
<p>
Prior to 1.14, there were two separate classes, Xml2Haskell and
Haskell2Xml. They are now combined into the single class XmlContent.
Make sure you get a recent version of DrIFT if you want to derive this
class from Haskell datatypes - the included version of DtdToHaskell has
not yet been updated for deriving the class the other way, from an XML DTD.
<p>
Version 1.14 also contains a new SAX-like stream parser.
<p>
A while back, Graham Klyne extended the 1.12 version of HaXml
significantly, in particular to ensure that the parser passes a large
XML acceptance test suite, and to deal more correctly with Unicode,
namespaces, and parameter entity expansion. His modifications will
eventually be merged back in to the main CVS tree, but in the meantime,
you can get his version here:
<a href="http://www.ninebynine.org/Software/HaskellUtils/">
<tt>http://www.ninebynine.org/Software/HaskellUtils/</tt></a>
<p>
The previous stable version (1.13) had the following features and fixes:<br>
<ul>
<li> Bugfixes to the document validator: no more infinite loops.
<li> Bugfixes to lexing mixed text and references between quote chars.
<li> Updated to work with ghc-6.4's new package mechanism.
</ul>
<br>
<a href="changelog.html">Complete Changelog</a><br>
<hr>
<center><h3><a name="who">Contacts</a></h3></center>
<p>
We are interested in hearing your feedback on these XML facilities -
suggestions for improvements, comments, criticisms, bug reports. Please mail
<ul>
<li> <a href="mailto:Malcolm.Wallace@cs.york.ac.uk">
Malcolm.Wallace@cs.york.ac.uk</a> (implementation & design)
</ul>
<p>
Development of these XML libraries was originally funded by Canon
Research Europe Ltd.. Subsequent maintenance and development has
been partially supported by the EPSRC, and the University of York.
<p><b>Licence:</b> The library is Free and Open Source Software,
i.e., the bits we wrote are copyright to us, but freely licensed
for your use, modification, and re-distribution, provided you don't
restrict anyone else's use of it. The HaXml library is distributed
under the GNU Lesser General Public Licence (LGPL) - see file
<a href="LICENCE-LGPL">LICENCE-LGPL</a> for more details. We allow one
special exception to the LGPL - see <a href="COPYRIGHT">COPYRIGHT</a>.
The HaXml tools are distributed under the GNU General Public Licence
(GPL) - see <a href="LICENCE-GPL">LICENCE-GPL</a>. (If you don't
like any of these licensing conditions, please contact us to discuss
your requirements.)
<hr>
<p>
<center><h3><a name="related">Related work</a></h3></center>
<ul>
<li>Joe English has written a more space-efficient parser for XML
in Haskell, called hxml. What is more, it can be used as a simple
drop-in replacement for the HaXml parser!
<a href="http://www.flightlab.com/~joe/hxml/">Available here</a>.
<li>Uwe Schmidt designed another
<a href="http://www.fh-wedel.de/~si/HXmlToolbox/">Haskell XML Toolbox</a>
based on the ideas of HaXml and hxml. It is well-maintained, and has
recently been updated to use arrow-based combinators rather than filters
as in HaXml.
<li>To use HaXml and HXT together, Henning Thielemann has put together
<a href="http://darcs.haskell.org/wraxml/README">WraXML</a>,
a wrapper using an alternative tree data structure, together with
conversions to/from HaXml and HXT.
<li>Some comparisons between functional language approaches to processing
XML can be found in
<a href="http://www.xml.com/pub/a/2001/02/14/functional.html">
Bijan Parsia's article on xml.com</a>
<li>Christian Lindig has written an XML parser in O'Caml:
<a href="http://www.cs.tu-bs.de/softech/people/lindig/software/tony.html">
here</a>.
<li>Andreas Neumann of the University of Trier has written a
validating XML parser in Standard ML:
<a href="http://www.informatik.uni-trier.de/~neumann/Fxp">here</a>.
<li>Erik Meijer and Mark Shields have a design for a functional programming
language that treats XML documents as basic data types:
<a href="http://www.cse.ogi.edu/~mbs/pub/xmlambda">XMLambda</a>.
<li>Benjamin Pierce and Haruo Hosoya have a different but similar design in
<a href="http://xduce.sourceforge.net/">XDuce</a>, which is
also implemented.
<li>Taking XDuce's approach further, is the very cool
<a href="http://www.cduce.org/">CDuce</a> by V�nique Benzaken,
Guiseppe Castagna, and Alain Frisch. The CDuce language does
fully statically-typed transformation of XML documents, thus
guaranteeing correctness, and what is more, it is also faster
than the untyped XSLT!
<li>The <a href="http://www.xcerpt.org/">Xcerpt project</a> uses HaXml
to create another rule-based query and transformation language for XML,
inspired by logic programming, and based on positional selection rather
than navigational selection.
<li>Ulf Wiger describes an Erlang toolkit for XML:
<a href="http://www.erlang.se/euc/00/">XMerL</a>
<li>The Java world has adopted the ideas from <em>DtdToHaskell</em> into
the Java Architecture for XML Binding
(<a href="http://java.sun.com/xml/jaxb/">JAXB</a>). JAXB translates
an XML Schema Definition into a set of Java classes, and provides
the runtime machinery (like <em>XmlContent</em>) for reading and
writing objects of those classes to/from XML files.
<li>There is a comprehensive reading list for XML and web programming in
functional languages <a href="http://readscheme.org/xml-web/">here</a>.
</ul>
<hr>
</body>
</html>
|