Plan 9 from Bell Labs’s /usr/web/sources/contrib/quanstro/fshistory/history.ms

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


.FP lucidasans
.TL
A Portable History
.AU
Erik Quanstrom
quanstro@coraid.com
.AB
History from the dump filesystem has traditionally been locked into
the fileserver on which it was created.  I describe the method used
for transferring our history from the old fileserver, Plato, to the
new fileserver, Kibbiee, without making any modifications to Plato.
.AE
.SH
Introduction
.PP
Earlier this year the dump filesystem on our main fileserver, Plato,
surpassed 75% usage.  Since Plato is running a 32-bit fileserver on
four-year-old hardware, we decided to build a new fileserver instead
of adding new disks to an old machine.  Usually starting with a new
on-disk layout means that history cannot be transferred to the new
machine and the old fileserver must be down while a snapshot of the
current filesystem is taken.  Even if the same on-disk format is used,
both fileservers must be offline during the transfer and there is the
risk of damaging the original filesystem due to operator error.  I
outline the method used to copy history from the 32-bit Plato to the
64-bit Kibbiee while Plato continued to serve files.
.SH
Method
.PP
Both fileservers were placed (silently) into allow mode.  Kibbiee's
date was initially set to Midnight on June 11, 2004, the first day of
Plato's dump.  For each historical day of the dump,
.I updatedb (1)
was used to generate a list of changes to the filesystem from the
previous day, a database and a log file.  For the first day of the
dump, the preceding day was set to the first day of the dump and the
database was empty.  Thus all files from that day were added.  The
.I C (1)
command was used to set the date on the fileserver to midnight
Standard Time on the day of the dump.  Changes to
.CW adm/users
were applied first so files created would have the correct owners,
then other changes were processed, both with
.I cphist (1).
Finally a dump was forced on Kibbiee.  This process was repeated
for each day of the dump.  1024 dumps were processed.  Each dump took
approximately ten minutes to process.
.SH
Fileserver Changes
.PP
No changes were made to Plato.  Kibbiee's kernel required changes to
PCI enumeration and interrupt handling.  The time functions were
updated to support Daylight Savings Time in any timezone using the
data files from the Plan 9 C library.  Support was added for reading
.CW nvram
from a Plan 9
.CW 9fat
partition by reading the partition table
.I 9load (8)
leaves in core.  This facilitates booting a cpu kernel on the
fileserver for maintenance operations such as loading a new kernel.
This can't be done easily on another machine because the fileserver
boots from a flash disk.
.PP
In addition, new drivers were written or ported for
the Myricom Lanai z8e 10 gigabit ethernet adapter,
the Intel i82563 PCIe-based gigabit ethernet adapter and
the Marvell 88SX[56]xx hotpluggable SATA controller.
.SH
Device Copy
.PP
Since the Marvell controller supports hotpluggable SATA drives, we are
able to copy the worm onto drives that are physically moved to an
offsite backup fileserver.  The same process for copying history as
outline here could be used.  However this seems error prone and
using allow mode at predictable times presents a security problem.  To
allow for online backups, the
.CW devcopy
command was added.  The following example will copy blocks 0 through
183140625 from
.CW [m2m3]
to
.CW [m7m8] ,
.IP
.P1
    devcopy start [m2m3] [m7m8] 0 183140625
.P2
.LP
The final two arguments are optional parameters.  The command is
executed in the background by a single, dedicated process.  Console
control returns immediately.  There can only be one active device copy
at a time.  With no arguments,
.CW devcopy
prints out its progress.  The arguments
.CW pause
or
.CW resume
apply to the last copy started.
.PP
It is expected that the operator will either arrange that no dump is taken
while a device is being copied, or will have an offline process to
erase the inconsistant data at the end of the dump filesystem.
.SH
QID relationships
.PP
In order for the history to work correctly on the new server,
we need to insure that files are appended or deleted and
recreated exactly as they were on the original.  Just inspecting the mode
bits is insufficient.  For example an mbox has the append bit
set.  However each time it's edited the old file is deleted and
a completely new file is written.
.PP
In order to transfer history correctly, we require that following
properties hold.  Whenever
to files on Plato have the same
.CW qid.vers ,
they must also have the same
.CW qid.vers
on Kibbiee, although the absolute value on either is not important.
We also require that whenever two files have the same
.CW qid.path
on the Plato that they have the same
.CW qid.path
on Kibbiee.
.PP
The new program
.I cphist (1)
does just this.  When files are listed as changed or created,
the
.CW qid.path
and
.CW qid.vers
are carefully inspected to decide if the file should be deleted and
copied or if only the modified blocks need to be copied.  The
last-modified user is also set.  This is sufficient for all cases except
when a file is renamed.  In this case, the old file is always deleted
and a new file is created.  This case resulted in Kibbiee's
fake WORM using about 20% more blocks than Plato's.
.SH
Issues
.PP
Due to a spam problem in 2006, there were two directories on Plato
that had over 2 million entries.  Scanning these directories took
approximately 45 minutes each.  This increased the time needed to
transfer one day's dump from 10 minutes to 100 minutes.  This problem
was overcome by binding an empty directory over the two large
directories before running
.I updatedb (1).
.PP
Two errors were encountered when a non-empty directory was deleted
and replaced by a file.  This problem was encountered because
.I updatedb (1)
does not emit deletes first.  This problem was addressed in an
.I ad-hoc
fashion.
.PP
After Kibbiee became the active filesystem, it was discovered that
a missing line in the 9p code in the fileserver kernel prevented the
.CW muid
from being set.  Thus
.I history (1)
output lists incorrect users.  Since
.CW adm/users
changed during the process of loading history, the mapping of users
to the
.CW uid
of channel loading the history changed as history was being loaded.

Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to webmaster@9p.io.