Multi-Threading and S
A thread is a stream of control within an application
that executes a sequence of instructions.
Traditional applications have single
thread. A multi-threaded application consists of
one or more threads that execute concurrently.
On a single processor machine, the threads are shceduled
by a library or by the kernel to run in different time
slices. On multi-processor machines two ormore threads
can be executing simultaneously as scheduled by the
kernel.
The benefit of threads is clear on a multi-processor machine.
Potentially, a task made can be divided into subtasks, each of
whih is executed on its own processor and the time taken to complete
the overall task reduced.
On a single processor machine, threads offer significant benefits.
These include improved performance since when one task must wait for
a service (e.g. disk access) to become available, another task
can proceed. Also, since threads are scheduled by a 3rd party,
development of independent tasks is greatly simplified (e.g.
multi-source event loops).
There is no ``free lunch'' and the advantages of threads
are offset by the need for synchronization between threads.
Unlike traditional applications with a single thread,
the facility to be performing multiple instructions
simultaneously introduces the need for ensuring
that certain sub-tasks be serialized or at least not
simultaneous.
A thread implementation must provide synchronization mechanisms
to ensure data validity.
We have added thread support to the S
language.
This is a persistent, functional, object oriented language
primarily used for data analysis.
The work consisted of adding an S level interface to the Pthreads
routines and modifying the S internal C level code to make it
thread-safe and re-entrant.
The design of the S task/thread mechanism constituted a significant
part of the work and is being used to develop
a distributed version of S.
The work was completed on a single-processor Intel-based
machine running Linux (2.0.14).
We used Chris Provenzano's Pthreads user-level library
to provide the thread support.
We are now testing this on multiple processor machines
(2 dual Pentium II machines, a 4 processor Pentium Pro
machine all running a Linux SMP kernel & 2.0.30
and a dual processor Sun 450 running Solaris.)
In the near future, we will also employ an SGI Origin 2000.
The Solaris box allows us access to both Pthreads and the UNIX threads
interfaces.
This work - including
examples, S level documentation and the techincal description of
the underlying implementation
- is described in my
Ph.D. dissertation.
HTML versions of the documentation are available here.
This API is not guaranteed to be supported exactly
as-is in the future. It is unlikely to undergo significant
change. Sed caveat emptor.
Parallel and Distributed Computing
There are obvious similarities between parallel computing
using multiple processors (and a shared memory system)
and distributed computing across multiple machines
(with separate memory). We are now turning
our attention to distributed computing
in its general form and are currently using CORBA.
It turns out that the architecture for user level threads in S is
similar to the CORBA architecture. Similalry, the user level
API for threads will be mirrored for distributed computing allowing
users to develop code for either parallel or distributed systems
without caring which is used (except for tweaking performance).
Duncan Temple Lang<duncan@research.bell-labs.com>
Last modified: Fri May 22 08:58:58 EDT 1998