Plan 9 from Bell Labs’s /usr/web/sources/contrib/nemo/root/sys/man/1/mktags

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


.TH MAN 1
.SH NAME
mktags, looktags, tagfiles, rdtrie, qhash, tagfs \- file indexing and searching tools
.SH SYNOPSIS
[
.B DB=
.I dbpath
]
.B looktags
[
.B -n
]
.I tag ...
.PP
.B mktags
[
.B -d
]
.I dbpath
.I file ...
.PP
.B tagfiles
[
.B -d
]
.I triepath
.I file ...
.PP
.B rdtrie
.I triepath
[
.I tag ...
]
.PP
.B qhash
[
.B -dv
]
.I hashpath
[
.I qid ...
]
.PP
.B qhash
[
.B -dv
]
.B -a
.I hashpath
[
.I qid
.I path ...
]
.PP
.B qhash
[
.B -dv
]
.B -c
.I hashpath
.I file ...
.PP
.B tagfs
[
.B -abcD
]
[
.B -s
.I srv
]
[
.B -m
.I mnt
]
.I triepath
.SH DESCRIPTION
These tools can be used to index files based on content and
perform word searches using the resulting data bases. 
The first two programs are Rc scripts providing the primary user
interface. The other programs provide the actual software for
indexing and searching.
.PP
.B Mktags
creates a database named
.I dbpath
that maps from tags (words) to file names. Only given
.I files
are indexed (including subdirectories as well). Any word in the
path name for a file, and any word contained in the file (for most
files) is a valid search tag for the file.
A database is made of two files: a trie and a hash table. The
name of the trie has the suffix
.B .trie.db
and the name of the hash has the suffix
.BR .hash.db .
The path to the database (files) without any suffix is considered
the name of the database. 
.PP
By convention, there is a system wide data base at
.B /lib/sys
(that is,
.B /lib/sys.trie.db
and
.BR /lib/sys.hash.db )
and a per-user data base at
.B $home/lib/$user
(that is,
.B $home/lib/$user.trie.db
and
.BR $home/lib/$user.hash.db ).
.PP
.B Looktags
searches the system and the user databases for files that match
the query specified by its arguments. By default, only file names
are printed. Flag
.B -n
instructs
.B looktags
to run
.IR grep (1)
to print some of the matching lines.
.PP
A query is made of lists of tags separated by the "\fB:\fP" character,
each as a distinct argument. A file matches the query if it is associated
(contains) to all the tags on one of the lists. For example,
.EX
	looktags a b c : d e
.EE
would search for files either matching all of
.BR a ,
.BR b ,
and
.BR c
or
matching all of
.B d
and
.BR e .
.PP
.B Looktags
can be instructed to use a different database by
defining the
.B DB
environment variable to contain a list of names for the
databases to be used (without any file name suffixes).
.PP
To speed up searches, the trie part of the database can be
kept in memory using
.BR tagfs .
When using a database named
.BI / a / b / dbname
the program
.B looktags
searches first
for a file named
.BI /srv/ dbname .tagfs
(to reach a server holding an in-memory version of the trie part of the database), and
uses it instead if available. Otherwise,
.B looktags
looks for the host identified by
.B $search
in the
.IR ndb (6)
database. Should it be found,
.B looktags
imports its
.B /srv
to look for
.BI /srv/ dbname .tagfs
on it. This is used to share an in-memory database among several machines sharing
a network. Only as a last resort would
.B looktags
read the database by itself to execute the query.
.PP
.B Tagfiles
tags every
.I file
mentioned (recurring for directories) as an argument using the Trie stored in
the file
.IR trie .
Here,
.I trie
must include the
.B .trie.db
suffix if any.
.B Mktags
relies on this program.
.PP
For each file indexed,
.B tagfiles
uses every word in its path name as a tag to search for the file.
Also,
.B tagfiles
looks at the file name suffix and uses
.IR file (1)
to determine the type of file and pick a particular indexing method.
For text files,
.B tagfiles
reads entire file contents and associates each word contained
in the file as a tag to search for the file. For other types of file,
.B tagfiles
tries to execute external programs to extract the list of tags
for each file. Should the appropriate external program not exist,
.B tagfiles
would still try to index the file as text when appropriate.
.PP
The following programs may be executed by
.B tagfiles
to obtain tags for files. They are expected to write tags
for the file given as an argument, one per line:
.TF taglimbo
.TP
.B tagc
to tag C source.
.TP
.B taglimbo
to tag Limbo source.
.TP
.B taghtml
to tag HTML files.
.TP
.B tagman
to tag manual pages
.TP
.B tagrc
to tag Rc scripts
.TP
.B tagtroff
to tag roff source.
.TP
.B tagdoc
to tag Microsoft Office documents, including rich text format.
.TP
.B tagpdf
to tag Adobe PDF files.
.TP
.B tageps
to tag Adobe EPS files.
.TP
.B tagps
to tag PosctScript files.
.PP
.B Rdtrie
can be used to inspect and query the Trie in the database. The Trie data structure keeps all the known
tags in a trie, maintaining a list
of Qids for each tag.
.PP
Without any
.I tag
argument in the command line,
.B rdtrie
reads and prints the entire Trie file,
.IR trie .
Otherwise,
.B rdtrie
reads
.I trie
and then
interprets any following arguments as a query. The Qid matching
the query are printed in the standard output. See above for the syntax
of queries.
.B Looktags
relies on this program to execute its query.
.PP
.B Qhash
maintains a file name hash table in the database. This data structure is used to
translate Qids into file names. In what follows, Qid means actually the
.B Qid.path
field of the file's Qid, in base 16.
Also, the argument
.I hash
is mandatory and has to be the path for the hash file in the database,
including the
.B .hash.db
suffix.
.PP
The first invocation syntax (without using
flags
.B -a
or 
.BR -c )
can be used to retrieve the path names for the given
.I qids
in the command line. This is used by
.B looktags
to retrieve the paths for matching files.
.PP
Under flag
.B -a
the program
.B qhash
adds the following argument pairs (each with a
.I qid
and
.IR path )
to the
.I hash
file.
.PP
Under flag
.B -c
.B qhash
retrieves Qids and (absolute) path names for
.I file (s)
mentioned as arguments (recurring for directories), and adds
them to the database. This is used by
.B mktags
to create/update the hash file in the data base.
.PP
In any of the programs above, flags
.B -d
and
.B -v
(when available)
enable certain debug messages to track problems while using the programs.
.PP
The program
.B tagfs
can be used to update a Trie and
is an alternative to
.B rdtrie
to perform searches by keeping the entire Trie in memory. It is a file
system that serves by default the pipe at
.BI /srv/ triename .tagfs
(where
.I triename
is the base name of the
.I triepath
witout suffixes), mountingitself at
.BR /mnt/tags .
Flags
.B -s
and
.B -m
can be used to instruct
.B tagfs
to serve
.I srv
instead or to mount itself at
.I mnt
instead.
.PP
The single directory served contains a
.B ctl
file that can be read to gather statistics about the Trie and can be written to
modify the trie. A write of the string
.B sync
writes the in-memory database back to its file. A write of the form
.BI tag " qidpath tag ...
adds any
.I tag
to
.I qidpath
in the trie (but does not update the on-disk database).
.PP
A query can be made by creating a file, writing the query into it
(being careful to separate different tags and
.B :
characters with white space), and then reading from the same file
the list of qids that match the query. The query file is removed as soon
as it is closed after having read from it.
.SH EXAMPLES
Create the per-user and the system database:
.EX
	; mktags $home/lib/$user  $home /mail/box/$user/msgs
	; mktags /lib/sys /cfg  /rc /sys
.EE
.PP
Look for files mentioning either list append or queue append, then
repeat que query but using an alternate database kept at
.B /lib/other.trie.db
and
.BR /lib/other.hash.db :
.EX
	; looktags list append : queue append
	; DB=/lib/other looktags list append : queue append
.EE
.PP
Add (or update!) tags for files under
.B /usr/prof
to the personal database:
.EX
	; tagfiles $home/lib/$user.trie.db /usr/prof
	; qhash -c $home/lib/$user.hash.db /usr/prof
.EE
.PP
Place the system database in memory so that
.B looktags
can be faster, and add the tag
.B yoyoba
to file with qid
.B 8345f
.EX
	; tagfs /lib/sys.trie.db
	; echo tag 8345f yoyoba >/mnt/tags/ctl
	; echo sync >/mnt/tags/ctl
.EE
.PP
Make the system database at
.B whale.lsub.org
available to other hosts: First,
edit
.B /lib/ndb/local
to contain
.B search=whale.lsub.org
for the network entry. Second, at whale:
.EX
	whale% tagfs /lib/sys.trie.db
	whale% chmod a+rw /srv/sys.tagfs
.EE
Now from other hosts,
.B looktags
may use Whale's in-memory database.
.SH FILES
.TP
.B /sys/src/cmd/tags/updatetags
Example script to update the user database each minute.
.TP
.B /lib/sys.{trie,hash}.db
Per system data base.
.TP
.B $home/lib/$user.{trie,hash}.db
Per user database files.
.SH SOURCE
.B /sys/src/cmd/tags
.SH "SEE ALSO"
.IR grep (1),
.SH BUGS
There is no clear way to remove tags from a file.
The database is expected to be updated daily (at night)
to reflect changes during the day, and
.B tagfs
has to be restarted to see the effects.



Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to webmaster@9p.io.