NOTES ON "i"
By Howard Trickey ( howard@research.bell-labs.com)
"i" is an unfinished Web browser for Plan 9.
This note describes the current state (as of June 2000),
what I think needs to be done, and some description of
how it works.
HISTORY
"i" started life as charon, the web browser I wrote for Inferno.
I converted the Limbo source to C using a program based on the
Limbo compiler (it walked the parse tree, emitting plausible C as
it went), but there was much manual work needed afterwards
because I wanted it to look like good C code as I would have
written it from scratch, rather than something that used a
Limbo emulation library all over the place.
TODO
There are three categories of things that need to be
done: (1) fix memory allocation (it is never freed right now);
(2) fix layout and protocol bugs; (3) add functionality.
I advise doing them in that order. Perhaps, rather than
fixing the protocol bugs, a new protocol should be designed
with the needs of acme integration in mind.
MEMORY ALLOCATION
The biggest problem in converting to C involves memory
management. The obvious but tedious way is to write
freeing code, put in reference counts where necessary,
and be sure to free exactly and wherever necessary.
I decided to try something different. The state inside "i"
is in two categories: long-term state, and per-page state
(where "page" means what you get if you load some URL;
maybe an HTML document, maybe a picture, maybe a frame
containing several HTML documents). Most of the state is
of the per-page kind, so I decided on this strategy:
Most memory allocation is done using an "emalloc"
that just takes hunks of memory out of a per-process
pool, with the intention that one will not free these
pieces individually, but rather, will free the whole
pool in bulk at various infrequent points (e.g., when
one goes to a new URL).
The main problem with this strategy is figuring out
how to deal with the fact that sometimes one wants
to allocate long-term state from within a process that
is mostly allocating short-term state; or, one wants
to convert short-term state into long-term state for
various reasons (e.g., caching). My design for handling
this was to have a special purpose "garbage collector";
really, "live stuff collector", that knows about the
structure of all my datatypes and the roots of live
data, and is able to walk to collect and copy stuff
from short-term to long-term pool.
Unfortunately, I did not finish all the work needed for this
design. The current code does the allocation from pools,
but not the short-term to long-term copy, so I just never
free anything (this must be fixed!). The file gc.c contains
code for doing the walking, based on hand-built data
structure tables, but I'm not sure the tables are completely
up to date. And I didn't do the work to record the roots
of live data that need to be transferred.
I would like to see my design carried through, because I
think it would have advantages in speed and robustness
(less easy to make programming errors re memory)
once properly in place. But it may turn out that I am wrong,
and that a more traditional keep-track-of-and-free
approach is needed.
BUG FIXES NEEDED
1. Layout bugs. You don't have to use "i" very much
before you run across layout bugs. There seem to
be a number related to table layout. Table layout is
a big pain, for several reasons: (a) the sizing algorithm
is complicated; (b) it gets more complicated when
you want to be efficient and avoid the need to redo
size calculations over & over for subtables; (c) the
extant browsers behave unexpectedly and differently,
especially in the face of table specifications that are
impossible to fulfil (e.g., because the sum of specified
column widths and padding of some subpiece is bigger
than the specified size of some enclosing column).
As distasteful as it is, one has to emulate the behavior
of, say, IE5, whether it is "correct" or not, or else many
pages will look nonsensical.
2. Internal protocol bugs. There is an internal protocol
(expected sequence of calls, callbacks) among the pieces
of "i". My design was intended to be such that there
would be no need for "killing" any processes or threads,
but rather, an orderly system of messages over channels
combined with flag setting, so that each thread and
process would know when to exit and/or when any
threads and processes it controlled exited. There is
one exception to that: if a "network fetching" process
is stuck in a system call waiting for some network
response, it may have to be "killed" (sent a note) to
bump it out of that system call, but after that it is
supposed to exit cleanly. In charon, I used a system
of precipitous kills to stop the processing of a page if
the user hit "stop" or some other URL link. This was
never very satisfactory, and seemed even less so in
the C/Plan 9 world, because of the difficulties of
stopping deadlocks due to channel communication
where one half goes away at random times. So "i" went
to the "orderly die" system alluded to above. Unfortunately,
it is very delicate to get exactly right, and if wrong,
leads to a deadlock situation. There are occasions where
that happens with "i" right now, showing that there
are bugs in my design with respect to this protocol.
They will probably be hard to track down and fix.
Probably it would be best to attempt to document
how and why the design is supposed to work. I made
an attempt to do that with a Promela model of the
communication, in /usr/howard/i/i.promela, but
I don't remember how up-to-date it is. One thing
to bear in mind is that sometimes the reason the
design is supposed to work has to do with what
are threads and what are processes --- things that
are in the same process, but different threads, need
less interlocking of access to global variables, and
there are other assumptions that can be made about
which messages are possible at which times.
FUNCTIONALITY NEEDED
The following things are still needed to make "i" a usable
web browser, in my estimated order of importance.
1. Animated gifs. The work is done to accumulate the
frames, but not to display them in sequence (the
framework for doing so is there, however, as adopted
from charon).
2. Javascript (& Document Object Model). This is an
enormous amount of work, and should probably be done
last, even though it is, unfortunately, quite important.
Sean Dorward (sean@research.bell-labs.com) wrote
a Javascript interpreter in Limbo, and I was using that
in charon, but that was only the tip of the iceberg. Getting
a match between charon's internal structure and the
document & browser models assumed by extant web
pages proved daunting. When I looked at it, many web
pages checked to see whether the browser was Internet
Explorer (version 3 or 4) or Netscape Navigator (version 3
or 4), and did up to four different versions of code to handle
each possibility --- and if you were something else,
forget it, they just ignored you. So I tried to emulate
Netscape 3, none too successfully. One of the problems
was the assumption that there could be many top-level
windows, which charon didn't have because it was aimed
at webphones, among other things. If the "i" engine gets
integrated into acme, perhaps there will be a cleaner fit.
And maybe the browser landscape is cleaner now: just
aiming to emulate IE5 should be sufficient (if that is
possible!). Unfortunately, the later version browsers
require renderers that are more complicated than the
current charon/"i" one: in particular, there is a need
for absolute and/or relative positioning of elements
(but then, none of the commercial browsers do this
very well and they all disagree with each other, so maybe
this feature is not often used).
The files jscript.c and script.h in this directory are the raw
output of my limbo-to-C converter on the charon-side part
of charon javascript implementation. It is probably useless.
The majority of the javascript code, the interpreter, has
not even been run through the converter. I could help
with this, if desired.
3. Support for "basic authorization" (code is there,
except ability to make popup dialogs, which is
partially there but needs some support in gui.c).
4. Implement "Save As".
5. SSL support (for https: URLs). My charon browser
had an implementation of the client side of SSL/2,
which was sufficient for this purpose. I ported much
of the code to C, but didn't finish. Paul Glick
(pg@research.bell-labs.com) finished it up, so it
should just be a matter of integrating that code in.
6. HTTP version 1.1, with persistent connections.
There was an attempt in the code to get ready to do this,
but it wasn't completed. I don't know how important
this is. I have run across one site that refused to
do HTTP version 1.0 any more.
7. CSS (Cascading Style Sheet) support. Would be nice.
Goes along with the Javascript/Document Object Model
thing.
8. XML/XSL support. I would like to see this too, but this
is very far out there. It does seem to be the coming thing.
|