[English | Japanese]
This tutorial is for users who begin using Namazu 2.0.
This tutorial is written for
in order to reduce the workload when using Namazu. Please refer manual to learn all features in Namazu. Also, installation guide is given in INSTALL file.
History of Namazu development from 1.3.0.x through 2.0 is as follows.
Namazu consists of three major components, mknmz, namazu, namazu.cgi.
You need the following softwares to build Namazu 2.0.
|Name||Description||Status||Current Version||Required Version||File name||Development and Distribution||Sources(Example)||Others|
|Perl||Perl Language||Required||5.8.8||>= 5.004||perl5.005_03.tar.gz||Larry Wall GNU CPAN||CPAN|
|make||maintain groups of programs||3.81||make-3.81.tar.gz||FSF||GNU||Required, when it cannot compile by make of system attachment.|
|gettext||translate message||Required only because of a multi-language message.||0.14.6||>= 0.13.1||gettext-0.14.6.tar.gz||FSF||GNU||Solaris is indispensable.|
|nkf||Network Kanji Filter||for Japanese processing only||2.0.7||>= 1.71||nkf207.tar.gz||
|nkf_utf8||avoid using version 1.90, 1.92, 2.0.0 - 2.0.3 (See notes)|
|NKF||nkf Perl Module||for Japanese processing only. ++||2.0.7||>= 1.71|
|KAKASI||Japanese/Romaji Conversion||for Japanese processing only. **||2.3.4||>= 2.x||kakasi-2.3.4.tar.gz||KAKASI Project||namazu.org|
|Text::Kakasi||KAKASI Perl Module||for Japanese processing only. ++||2.04||>= 1.05||Text-Kakasi-2.04.tar.gz||NOKUBI Takatsugu
|ChaSen||(ChaSen) -- Japanese Morphology Analyzer||for Japanese processing only. **||2.3.3||>= 2.0x||chasen-2.3.3.tar.gz||Nara Institute of Science and Technology||Distribution Policy||For libchasen.a in ChaSen 2.02 or earlier, refer below.|
|Text::ChaSen||ChaSen Perl Module||for Japanese processing only. ++||1.03||<=||Text-ChaSen-1.03.tar.gz||NOKUBI Takatsugu||Text::ChaSen|
|MeCab||Yet Another Japanese Morphology Analyzer||for Japanese processing only. **||0.93||>= 0.6||mecab-0.93.tar.gz||Taku Kudo||MeCab||from Namazu 2.0.15 (It corresponds since Namazu 2.0.16 since MeCab 0.90.)|
|mecab-perl||MeCab Perl Module||for Japanese processing only. ++||0.93||>= 0.76||mecab-perl-0.93.tar.gz||Taku Kudo||MeCab||from Namazu 2.0.15 (It corresponds since Namazu 2.0.16 since MeCab 0.90.)|
|File::MMagic||File Type||Included||1.27||>= 1.20||File-MMagic-1.27.tar.gz||NOKUBI Takatsugu||CPAN dist||This is packaged in Namazu distribution.|
perl Makefile.PL; make; make install. We recommend to install Perl modules, unless you have particular difficulties in doing so.
(Notes listed below are for Japanese processing only.)
|If you have everything ...||For segmentation, KAKASI is used by default, however, ChaSen can be used by specifying -c option. MeCab can be used by specifying -b option.|
|If you have one or more ...||When executing ./configure, Namazu selects which one to use. (KAKASI can be used by specifying -k option. ChaSen can be used by specifying -c option. MeCab can be used by specifying -b option.)|
make installdoes not install /usr/local/lib/libchasen.a automatically. So to build perl ChaSen module, you will need to do
cp libchasen.a /usr/local/lib ranlib /usr/local/lib/libchasen.a # depending on your systemmanually.
Since 2.0.6, the handling of environment variables was changed. Besides, new command line option was added in mknmz.
To use Namazu 2.0 under Japanese environment, you may need to set up environment variables for language selection.
With 2.0.5 (or earlier), the same environment variables were used to switch for both message translations and internal text processing.
With 2.0.6, We modified as follows.
The typical example to process Japanese is to set following values, depending on your system environment.
The actual command to set value show above may again depend your shell,
|C shell||Bourne shell etc|
With above example, value(ja) is set for LANG,
and all the processing will be for Japanese.
Some system may require
instead of just
If the variables are not properly set when mknmz is executed, the resulting index files are not in good shape. If you browse one of the file, NMZ.w, supposed to have one (Japanese) word per line, instead, you have long sentence not segmented in each line. In that case, namazu or namazu.cgi execution will not show you the correct results.
Since 2.0.6, the
--indexing-lang=LANG option has
been added in mknmz command.
You can specify language-processing-type with the option
(command line option given overrides environment variable).
If you wish to test
cd namazu-2.0.x ( ... where you have unpacked *.tar.gz)
env pkgdatadir=`pwd` scripts/mknmz (in case csh/tcsh)
pkgdatadir=. scripts/mknmz (in case with sh/bash).
These will refer adjacent
pl,filter,template etc, not exisiting stuff under
(To know more about this, see $PKGDATADIR variable in mknmz etc.)
You may try following examples for the first time to see the configuration, help, and to generate indexes for ~/Mail stuff, respectively.
./mknmz -C ./mknmz --help ./mknmz -O /tmp ~/Mail
If you just type
with no argument, a short usage will be displayed. If you
--help as an argument, a long usage will
be displayed. The option
-C will display the
configurations at the time. Useful to remember these 3
|None||Short Usage||Cannot add any argument|
|Long Usage||Ignores other arguments|
|Configurations||Other arguments will have meanings.|
First, create index.
(If you wish to run mknmz before
make install, please see
mknmz make install)
Format are changed slightly from versions 126.96.36.199. URI replacement is dealt with by specifying --replace option. URI replacement can be done during namazu/namazu.cgi execution. In this case, run mknmz without --replace option, and setup .namazurc so that URI replacement is performed during namazu/namazu.cgi execution.
Run mknmz as follows.
mknmz [options] target directory
The above example creates index in the current directory.
-O option to specify the output directory.
mkdir /tmp/index mknmz -O /tmp/index \ --replace='s#/foo/bar/doc/#http://foo.bar.jp/software/#' \ /foo/bar/doc
mknmz will output the following messages during the creation of index. If you wish to display messages in Japanese, please refer to Japanese Environment.
14 files are found to be indexed. 1/14 - /foo/bar/acrobat3.pdf [application/pdf] 2/14 - /foo/bar/excel97.xls [application/excel] 3/14 - /foo/bar/html.html [text/html] 4/14 - /foo/bar/mail-multipart.txt [message/rfc822] 5/14 - /foo/bar/mail.txt [message/rfc822] 6/14 - /foo/bar/man.1 [text/x-roff] 7/14 - /foo/bar/msg00000.html [text/html; x-type=mhonarc] 8/14 - /foo/bar/plain.txt [text/plain] 9/14 - /foo/bar/plain.txt.Z [text/plain] 10/14 - /foo/bar/plain.txt.bz2 [text/plain] 11/14 - /foo/bar/plain.txt.gz [text/plain] 12/14 - /foo/bar/rfc0000.txt [text/plain; x-type=rfc] 13/14 - /foo/bar/tex.tex [application/x-tex] 14/14 - /foo/bar/word97.doc [application/msword] Writing index files... [Base] Date: Thu Mar 16 22:14:01 2000 Added Documents: 14 Size (bytes): 58,701 Total Documents: 14 Added Keywords: 95 Total Keywords: 95 Wakati: module_kakasi -ieuc -oeuc -w Time (sec): 14 File/Sec: 1.00 System: linux Perl: 5.00503 Namazu: 2.0.X
This means "documents under
/foo/bar/doc/ will appear as
http://foo.bar.jp/software/, so please perform replacement like s#aaa#bbb# if written in Perl."
(In this example, (aaa) corresponds to (/foo/bar/doc/) and (bbb) corresponds to (http://foo.bar.jp/))
Namazu was originally developed for processing HTML documents, Namazu can now deal with various document styles. You will find useful scripts in /usr/local/share/namazu/filter, and detailed explanation will be found in Document filters in Namazu manual.
% mknmz ~/Mail/foobar
For mknmz command-line arguments, you get usage information from mknmz --help. With -C option, you get the configurations of the time.
Loaded rcfile: /home/foobar/.mknmzrc System: linux Namazu: 2.0.X Perl: 5.00503 File-MMagic: 1.27 NKF: module_nkf KAKASI: module_kakasi -ieuc -oeuc -w ChaSen: module_chasen -i e -j -F "%m " MeCab: module_mecab -Owakati -b 8192 Wakati: module_kakasi -ieuc -oeuc -w Lang_Msg: C Lang: C Coding System: euc CONFDIR: /usr/local/etc/namazu LIBDIR: /usr/local/share/namazu/pl FILTERDIR: /usr/local/share/namazu/filter TEMPLATEDIR: /usr/local/share/namazu/template Supported media types: (42) Unsupported media types: (2) marked with minus (-) probably missing application in your $path. application/excel: excel.pl application/gnumeric: gnumeric.pl application/ichitaro5: taro56.pl application/ichitaro6: taro56.pl application/ichitaro7: taro7_10.pl application/macbinary: macbinary.pl application/msword: msword.pl application/pdf: pdf.pl application/postscript: postscript.pl application/powerpoint: powerpoint.pl application/rtf: rtf.pl application/vnd.kde.kivio: koffice.pl application/vnd.kde.kpresenter: koffice.pl application/vnd.kde.kspread: koffice.pl application/vnd.kde.kword: koffice.pl application/vnd.oasis.opendocument.graphics: ooo.pl application/vnd.oasis.opendocument.presentation: ooo.pl application/vnd.oasis.opendocument.spreadsheet: ooo.pl application/vnd.oasis.opendocument.text: ooo.pl application/vnd.sun.xml.calc: ooo.pl application/vnd.sun.xml.draw: ooo.pl application/vnd.sun.xml.impress: ooo.pl application/vnd.sun.xml.writer: ooo.pl application/x-apache-cache: apachecache.pl application/x-bzip2: bzip2.pl application/x-compress: compress.pl - application/x-deb: deb.pl - application/x-dvi: dvi.pl application/x-gzip: gzip.pl application/x-js-taro: taro7_10.pl application/x-rpm: rpm.pl application/x-tex: tex.pl application/x-zip: zip.pl audio/mpeg: mp3.pl message/news: mailnews.pl message/rfc822: mailnews.pl text/hnf: hnf.pl text/html: html.pl text/html; x-type=mhonarc: mhonarc.pl text/html; x-type=pipermail: pipermail.pl text/plain text/plain; x-type=rfc: rfc.pl text/x-hdml: hdml.pl text/x-roff: man.pl
|short name||long name||description|
|-F||--target-list=FILE||read in list of target files for index creation|
|-t||--media-type=MTYPE||specify the document format of target files|
|--allow=PATTERN||specify the regular expression of target file names.|
|--deny=PATTERN||specify the regular expression of to-be-excluded file names.|
|--exclude=PATTERN||specify the regular expression of to-be-excluded path names.|
To search documents, do
% namazu query index
If you omit index, namazu will assume
/usr/local/var/namazu/index as target.
Set up for
namazu command will be done in
An example of namazurc can be found in
/usr/local/etc/namazu/namazurc-sample in Namazu
To use CGI on the web, you need to do various configuration. For Apache (Configuration)
|ScriptAlias||/cgi-bin/ /usr/local/apache/cgi-bin/||directory alias to /cgi-bin/ in URI|
|AddHandler||cgi-script .cgi||execute cgi for files ending with ".cgi"|
|DirectoryIndex||index.html||file name to display when specifying directory in URI|
.htaccess can do configurations other than the one
indicated by (Web administrator). (Note that these
configuration may be forbidden in Apache configuration.)
What is written here is not "guarantee". Just introduce the advanced usage that developers have in mind.
(Preparation) (Search display) mknmz namazu ^ | ^ | | v | v Original Document Index Search ResultNamazu prepares index of words in prior to the search request, and upon request, Namazu searches the document based on the prepared index. This "prepared index" is called index. In Namazu, NMZ.* are the index.
Index, Replace, Logging, Lang, TemplateFor further detail, see Manual
perl -MText::Kakasi -e '' perl -MText::ChaSen -e '' perl -MMeCab -e '' perl -MNKF -e ''You can take advantage of Perl modules if nothing is displayed. If you then do ./configure in namazu, these Perl modules will be used.