The MeCab full-text parser plugin, introduced in MySQL 5.7.6, is a word-based parser for Japanese that tokenizes documents into meaningful words. For example, MeCab tokenizes データベース管理 into データベース and 管理. The MeCab full-text parser plugin can be installed as an alternative to the built-in ngram full-text parser, which is a character-based parser that also supports Japanese.
The parsing properties of the built-in full-text parser,
described in Section 12.9, “Full-Text Search Functions”, also apply to
the InnoDB MeCab parser plugin, except for
differences documented in this section or otherwise noted. All
InnoDB full-text related variables are also
applicable to the MeCab parser plugin.
The MeCab parser plugin requires mecab and
mecab-ipadic. Both are packaged with MySQL
binaries and are found in
MYSQL_HOME/lib/mecab.
If you do not want use the mecab and
mecab-ipadic packages distributed MySQL
binaries, you can install mecab and
mecab-ipadic using a native package
management utility (on Fedora, Debian, and Ubuntu), or you can
build mecab and
mecab-ipadic from source. For information
about installing mecab and
mecab-ipadic using a native package
management utility, see
Installing MeCab From a
Binary Distribution (Optional). If you want to build
mecab and mecab-ipadic
from source, see
Building MeCab From
Source (Optional).
To install and configure the MeCab parser plugin, perform the following steps:
In the MySQL configuration file, set the
mecab_rc_file configuration
option to the location of the mecabrc
configuration file, which is the configuration file for
MeCab. If you are using the MeCab package distributed with
MySQL, the mecabrc file is located in
MYSQL_HOME/lib/mecab/etc/.
[mysqld] loose-mecab-rc-file=MYSQL_HOME/lib/mecab/etc/mecabrc
The loose prefix is an
option modifier. The
mecab_rc_file option is not
recognized by MySQL until the MeCaB parser plugin is
installed. The loose prefix allows you
restart MySQL without encountering an error.
If you are using your own MeCab installation, or you have
built MeCab from source, the location of the
mecabrc configuration file will differ.
The MySQL configuration file is my.cnf
by default. For information about the MySQL configuration
file and its location, see Section 4.2.6, “Using Option Files”.
The mecab-ipadic package distributed
with MySQL binaries includes three dictionaries
(ipadic_euc-jp,
ipadic_sjis, and
ipadic_utf-8). Modify the
mecabrc configuration file to specify
the dictionary you want to use. The
mecabrc configuration file packaged
with MySQL contains and entry similar to the following:
dicdir = /path/to/mysql/lib/mecab/lib/mecab/dic/ipadic_euc-jp
Modify this entry for the dictionary you want to use. For
example, if you want to use the
ipadic_utf-8 dictionary, change the
entry as follows:
dicdir=MYSQL_HOME/lib/mecab/dic/ipadic_utf-8
If you are using your own MeCab installation or have built
MeCab from source, the default dicdir
entry in the mecabrc file will differ,
as will the dictionaries and their location.
After MeCab is installed, you can use the
mecab_charset status
variable to view the character set currently used with
MeCab.
The ipadic_euc-jp dictionary
supports the ujis (default) and
eucjpms character sets.
ipadic_sjis supports the
sjis (default) and
cp932 character sets.
ipadic_utf-8 supports the
utf8 (default) and
utfmb4 character sets.
Restart MySQL.
Install the MeCab parser plugin:
The MeCab parser plugin is installed using
INSTALL PLUGIN syntax. The
plugin name is mecab, and the shared
library name is libpluginmecab.so. For
additional information about installing plugins, see
Section 5.1.8.1, “Installing and Uninstalling Plugins”.
INSTALL PLUGIN mecab SONAME 'libpluginmecab.so'; UNINSTALL PLUGIN mecab;
Once installed, the MeCab parser plugin loads at every normal MySQL restart.
Verify that the MeCab parser loaded correctly by creating a
test table with a FULLTEXT index that
uses the mecab parser.
CREATE TABLE t1
(
id INT AUTO_INCREMENT PRIMARY KEY,
doc CHAR(255),
FULLTEXT INDEX (doc) WITH PARSER mecab
) ENGINE=InnoDB DEFAULT CHARACTER SET utf8;The test table should be created without error if the mecab parser was installed successfully.
Installing mecab and
mecab-ipadic from a binary distribution
using a native package management utility is only necessary if
you do not want to the use the distributions packaged with the
MySQL binary. For example, on Fedora, you can use Yum to perform
the installation:
yum mecab-devel
On Debian or Ubuntu, you can perform an APT installation:
apt-get install mecab apt-get install mecab-ipadic
The mecab and
mecab-ipadic packages distributed with the
MySQL binary are recommended but if you want to build
mecab and mecab-ipadic
from source, basic installation steps are provided below. For
additional information, refer to the MeCab documentation.
Download the tar.gz packages for mecab
and mecab-ipadic from
https://code.google.com/p/mecab/downloads/list.
As of January, 2015, the latest available packages are:
mecab-0.996.tar.gz
mecab-ipadic-2.7.0-20070801.tar.gz
Install mecab:
tar zxfv mecab-0.996.tar cd mecab-0.996 ./configure make make check sudo make install
Install mecab-ipadic:
tar zxfv mecab-ipadic-2.7.0-20070801.tar cd mecab-ipadic-2.7.0-20070801 ./configure make sudo make install
Compile MySQL using the
WITH_MECAB CMake option. Set
the WITH_MECAB option to
system if you have installed
mecab and
mecab-ipadic to the default location.
-DWITH_MECAB=system
If you defined a custom installation directory, set
WITH_MECAB to the custom
directory. For example:
-DWITH_MECAB=/path/to/mecab
To create a FULLTEXT index that uses the
MeCab parser, specify WITH PARSER mecab with
CREATE TABLE,
ALTER TABLE, or
CREATE INDEX, as shown in the
following examples:
CREATE TABLE t1 ( id INT AUTO_INCREMENT PRIMARY KEY, doc CHAR(255), FULLTEXT INDEX ft_index (doc) WITH PARSER mecab ) ENGINE=InnoDB;
CREATE TABLE t1 (
id INT AUTO_INCREMENT PRIMARY KEY,
doc CHAR(255)
) ENGINE=InnoDB;
ALTER TABLE t1 ADD FULLTEXT INDEX ft_index (doc) WITH PARSER mecab;CREATE TABLE t1 (
id INT AUTO_INCREMENT PRIMARY KEY,
doc CHAR(255)
) ENGINE=InnoDB;
CREATE FULLTEXT INDEX ft_index ON t1 (doc) WITH PARSER mecab;The MeCab parser uses spaces as separators in query strings. For example, the MeCab parser tokenizes 'データベース管理' as 'データベース' and '管理'.
By default, the MeCab parser uses the default
InnoDB stopword list, which contains a short
list of English stopwords. To view the default
InnoDB stopword list, query
INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD.
For a stopword list applicable to Japanese, you must create your
own. For information about creating stopword lists, see
Section 12.9.4, “Full-Text Stopwords”.
For natural language mode search, the search term is converted to a union of tokens. For example, 'データベース管理' is converted to 'データベース 管理' .
For boolean mode search, the search term is converted to a search phrase. For example, 'データベース管理' is converted to '"データベース 管理"'.
Wildcard search terms are not tokenized. A search on 'データベース管理*' is performed on the prefix, 'データベース管理'.
Phrases are tokenized. For example, "データベース管理" is tokenized as "データベース 管理".
User Comments