Provided by: groonga-bin_6.0.1-1ubuntu1_amd64 

NAME
groonga - Groonga documentation
• news
CHARACTERISTICS OF GROONGA
Groonga overview
Groonga is a fast and accurate full text search engine based on inverted index. One of the
characteristics of Groonga is that a newly registered document instantly appears in search results. Also,
Groonga allows updates without read locks. These characteristics result in superior performance on
real-time applications.
Groonga is also a column-oriented database management system (DBMS). Compared with well-known
row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate
queries. Due to this advantage, Groonga can cover weakness of row-oriented systems.
The basic functions of Groonga are provided in a C library. Also, libraries for using Groonga in other
languages, such as Ruby, are provided by related projects. In addition, groonga-based storage engines are
provided for MySQL and PostgreSQL. These libraries and storage engines allow any application to use
Groonga. See usage examples.
Full text search and Instant update
In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears
in the result of the next query. In contrast, some full text search engines do not support instant
updates, because it is difficult to dynamically update inverted indexes, the underlying data structure.
Groonga also uses inverted indexes but supports instant updates. In addition, Groonga allows you to
search documents even when updating the document collection. Due to these superior characteristics,
Groonga is very flexible as a full text search engine. Also, Groonga always shows good performance
because it divides a large task, inverted index merging, into smaller tasks.
Column store and aggregate query
People can collect more than enough data in the Internet era. However, it is difficult to extract
informative knowledge from a large database, and such a task requires a many-sided analysis through trial
and error. For example, search refinement by date, time and location may reveal hidden patterns.
Aggregate queries are useful to perform this kind of tasks.
An aggregate query groups search results by specified column values and then counts the number of records
in each group. For example, an aggregate query in which a location column is specified counts the number
of records per location. Making a graph from the result of an aggregate query against a date column is an
easy way to visualize changes over time. Also, a combination of refinement by location and an aggregate
query against a date column allows visualization of changes over time in specific location. Thus
refinement and aggregation are important to perform data mining.
A column-oriented architecture allows Groonga to efficiently process aggregate queries because a
column-oriented database, which stores records by column, allows an aggregate query to access only a
specified column. On the other hand, an aggregate query on a row-oriented database, which stores records
by row, has to access neighbor columns, even though those columns are not required.
Inverted index and tokenizer
An inverted index is a traditional data structure used for large-scale full text search. A search engine
based on inverted index extracts index terms from a document when it is added. Then in retrieval, a query
is divided into index terms to find documents containing those index terms. In this way, index terms play
an important role in full text search and thus the way of extracting index terms is a key to a better
search engine.
A tokenizer is a module to extract index terms. A Japanese full text search engine commonly uses a
word-based tokenizer (hereafter referred to as a word tokenizer) and/or a character-based n-gram
tokenizer (hereafter referred to as an n-gram tokenizer). A word tokenizer-based search engine is
superior in time, space and precision, which is the fraction of relevant documents in a search result. On
the other hand, an n-gram tokenizer-based search engine is superior in recall, which is the fraction of
retrieved documents in the perfect search result. The best choice depends on the application in practice.
Groonga supports both word and n-gram tokenizers. The simplest built-in tokenizer uses spaces as word
delimiters. Built-in n-gram tokenizers (n = 1, 2, 3) are also available by default. In addition, a yet
another built-in word tokenizer is available if MeCab, a part-of-speech and morphological analyzer, is
embedded. Note that a tokenizer is pluggable and you can develop your own tokenizer, such as a tokenizer
based on another part-of-speech tagger or a named-entity recognizer.
Sharable storage and read lock-free
Multi-core processors are mainstream today and the number of cores per processor is increasing. In order
to exploit multiple cores, executing multiple queries in parallel or dividing a query into sub-queries
for parallel processing is becoming more important.
A database of Groonga can be shared with multiple threads/processes. Also, multiple threads/processes can
execute read queries in parallel even when another thread/process is executing an update query because
Groonga uses read lock-free data structures. This feature is suited to a real-time application that needs
to update a database while executing read queries. In addition, Groonga allows you to build flexible
systems. For example, a database can receive read queries through the built-in HTTP server of Groonga
while accepting update queries through MySQL.
Geo-location (latitude and longitude) search
Location services are getting more convenient because of mobile devices with GPS. For example, if you are
going to have lunch or dinner at a nearby restaurant, a local search service for restaurants may be very
useful, and for such services, fast geo-location search is becoming more important.
Groonga provides inverted index-based fast geo-location search, which supports a query to find points in
a rectangle or circle. Groonga gives high priority to points near the center of an area. Also, Groonga
supports distance measurement and you can sort points by distance from any point.
Groonga library
The basic functions of Groonga are provided in a C library and any application can use Groonga as a full
text search engine or a column-oriented database. Also, libraries for languages other than C/C++, such as
Ruby, are provided in related projects. See related projects for details.
Groonga server
Groonga provides a built-in server command which supports HTTP, the memcached binary protocol and the
Groonga Query Transfer Protocol (/spec/gqtp). Also, a Groonga server supports query caching, which
significantly reduces response time for repeated read queries. Using this command, Groonga is available
even on a server that does not allow you to install new libraries.
Mroonga storage engine
Groonga works not only as an independent column-oriented DBMS but also as storage engines of well-known
DBMSs. For example, Mroonga is a MySQL pluggable storage engine using Groonga. By using Mroonga, you can
use Groonga for column-oriented storage and full text search. A combination of a built-in storage engine,
MyISAM or InnoDB, and a Groonga-based full text search engine is also available. All the combinations
have good and bad points and the best one depends on the application. See related projects for details.
INSTALL
This section describes how to install Groonga on each environment. There are packages for major
platforms. It's recommended that you use package instead of building Groonga by yourself. But don't
warry. There is a document about building Groonga from source.
We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You
should use a 32-bit package just only for tests or development. You will encounter an out of memory error
with a 32-bit package even if you just process medium size data.
Windows
This section describes how to install Groonga on Windows. You can install Groogna by extracting a zip
package or running an installer.
We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You
should use a 32-bit package just only for tests or development. You will encounter an out of memory error
with a 32-bit package even if you just process medium size data.
Installer
For 32-bit environment, download x86 executable binary from packages.groonga.org:
• http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.exe
Then run it.
For 64-bit environment, download x64 executable binary from packages.goronga.org:
• http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.exe
Then run it.
Use command prompt in start menu to run /reference/executables/groonga.
zip
For 32-bit environment, download x86 zip archive from packages.groonga.org:
• http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.zip
Then extract it.
For 64-bit environment, download x64 zip archive from packages.groonga.org:
• http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.zip
Then extract it.
You can find /reference/executables/groonga in bin folder.
Build from source
First, you need to install required tools for building Groonga on Windows. Here are required tools:
• Microsoft Visual Studio Express 2013 for Windows Desktop
• CMake
Download zipped source from packages.groonga.org:
• http://packages.groonga.org/source/groonga/groonga-6.0.1.zip
Then extract it.
Move to the Groonga's source folder:
> cd c:\Users\%USERNAME%\Downloads\groonga-6.0.1
Configure by cmake. The following commnad line is for 64-bit version. To build 32-bit version, use -G
"Visual Studio 12 2013" parameter instead:
groonga-6.0.1> cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga
Build:
groonga-6.0.1> cmake --build . --config Release
Install:
groonga-6.0.1> cmake --build . --config Release --target Install
After the above steps, /reference/executables/groonga is found at c:\Groonga\bin\groonga.exe.
Mac OS X
This section describes how to install Groonga on Mac OS X. You can install Groonga by MacPorts or
Homebrew.
MacPorts
Install:
% sudo port install groonga
Homebrew
Install:
% brew install groonga
If you want to use MeCab as a tokenizer, specify --with-mecab option:
% brew install groonga --with-mecab
Then install and configure MeCab dictionary.
Install:
% brew install mecab-ipadic
Configure:
% sed -i '' -e 's,dicrc.*=.*,dicrc = /usr/local/lib/mecab/dic/ipadic,g' /usr/local/etc/mecabrc
Build from source
Install Xcode.
Download source:
% curl -O http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
Configure (see source-configure about configure options):
% ./configure
Build:
% make -j$(/usr/sbin/sysctl -n hw.ncpu)
Install:
% sudo make install
Debian GNU/Linux
This section describes how to install Groonga related deb packages on Debian GNU/Linux. You can install
them by apt.
We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You
should use a 32-bit package just only for tests or development. You will encounter an out of memory error
with a 32-bit package even if you just process medium size data.
wheezy
Add the Groonga apt repository.
/etc/apt/sources.list.d/groonga.list:
deb http://packages.groonga.org/debian/ wheezy main
deb-src http://packages.groonga.org/debian/ wheezy main
Install:
% sudo apt-get update
% sudo apt-get install -y --allow-unauthenticated groonga-keyring
% sudo apt-get update
% sudo apt-get install -y -V groonga
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo apt-get install -y -V groonga-tokenizer-mecab
If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem package.
Install groonga-token-filter-stem package:
% sudo apt-get install -y -V groonga-token-filter-stem
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
Install groonga-munin-plugins package:
% sudo apt-get install -y -V groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo apt-get install -y -V groonga-normalizer-mysql
jessie
New in version 5.0.3.
Add the Groonga apt repository.
/etc/apt/sources.list.d/groonga.list:
deb http://packages.groonga.org/debian/ jessie main
deb-src http://packages.groonga.org/debian/ jessie main
Install:
% sudo apt-get update
% sudo apt-get install -y --allow-unauthenticated groonga-keyring
% sudo apt-get update
% sudo apt-get install -y -V groonga
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo apt-get install -y -V groonga-tokenizer-mecab
If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem package.
Install groonga-token-filter-stem package:
% sudo apt-get install -y -V groonga-token-filter-stem
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
Install groonga-munin-plugins package:
% sudo apt-get install -y -V groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo apt-get install -y -V groonga-normalizer-mysql
Build from source
Install required packages to build Groonga:
% sudo apt-get install -y -V wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev
Download source:
% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
Configure (see source-configure about configure options):
% ./configure
Build:
% make -j$(grep '^processor' /proc/cpuinfo | wc -l)
Install:
% sudo make install
Ubuntu
This section describes how to install Groonga related deb packages on Ubuntu. You can install them by
apt.
We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You
should use a 32-bit package just only for tests or development. You will encounter an out of memory error
with a 32-bit package even if you just process medium size data.
PPA (Personal Package Archive)
The Groonga APT repository for Ubuntu uses PPA (Personal Package Archive) on Launchpad. You can install
Groonga by APT from the PPA.
Here are supported Ubuntu versions:
• 12.04 LTS Precise Pangolin
• 14.04 LTS Trusty Tahr
• 15.04 Vivid Vervet
• 15.10 Wily Werewolf
Enable the universe repository to install Groonga:
% sudo apt-get -y install software-properties-common
% sudo add-apt-repository -y universe
Add the ppa:groonga/ppa PPA to your system:
% sudo add-apt-repository -y ppa:groonga/ppa
% sudo apt-get update
Install:
% sudo apt-get -y install groonga
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo apt-get -y install groonga-tokenizer-mecab
If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem package.
Install groonga-token-filter-stem package:
% sudo apt-get -y install groonga-token-filter-stem
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
Install groonga-munin-plugins package:
% sudo apt-get -y install groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo apt-get -y install groonga-normalizer-mysql
Build from source
Install required packages to build Groonga:
% sudo apt-get -V -y install wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev
Download source:
% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
Configure (see source-configure about configure options):
% ./configure
Build:
% make -j$(grep '^processor' /proc/cpuinfo | wc -l)
Install:
% sudo make install
CentOS
This section describes how to install Groonga related RPM packages on CentOS. You can install them by
yum.
We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You
should use a 32-bit package just only for tests or development. You will encounter an out of memory error
with a 32-bit package even if you just process medium size data.
CentOS 5
Install:
% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
% sudo yum install -y groonga
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo yum install -y groonga-tokenizer-mecab
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the official CentOS
repository. You need to enable Repoforge (RPMforge) repository or EPEL repository to install it by
yum.
Enable Repoforge (RPMforge) repository on i386 environment:
% wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.i386.rpm
% sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.i386.rpm
Enable Repoforge (RPMforge) repository on x86_64 environment:
% wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.x86_64.rpm
% sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm
Enable EPEL repository on any environment:
% wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
% sudo rpm -ivh epel-release-5-4.noarch.rpm
Install groonga-munin-plugins package:
% sudo yum install -y groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo yum install -y groonga-normalizer-mysql
CentOS 6
Install:
% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
% sudo yum install -y groonga
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo yum install -y groonga-tokenizer-mecab
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the official CentOS
repository. You need to enable EPEL repository to install it by yum.
Enable EPEL repository on any environment:
% sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
Install groonga-munin-plugins package:
% sudo yum install -y groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo yum install -y groonga-normalizer-mysql
CentOS 7
Install:
% sudo yum install -y http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum install -y groonga
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo yum install -y groonga-tokenizer-mecab
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the official CentOS
repository. You need to enable EPEL repository to install it by yum.
Enable EPEL repository:
% sudo yum install -y epel-release
Install groonga-munin-plugins package:
% sudo yum install -y groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo yum install -y groonga-normalizer-mysql
Build from source
Install required packages to build Groonga:
% sudo yum install -y wget tar gcc-c++ make mecab-devel
Download source:
% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
Configure (see source-configure about configure options):
% ./configure
Build:
% make -j$(grep '^processor' /proc/cpuinfo | wc -l)
Install:
% sudo make install
Fedora
This section describes how to install Groonga related RPM packages on Fedora. You can install them by
yum.
NOTE:
Since Groonga 3.0.2 release, Groonga related RPM pakcages are in the official Fedora yum repository
(Fedora 18). So you can use them instead of the Groonga yum repository now. There is some exceptions
to use the Groonga yum repository because mecab dictionaries (mecab-ipadic or mecab-jumandic) are
provided by the Groonga yum repository.
We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You
should use a 32-bit package just only for tests or development. You will encounter an out of memory error
with a 32-bit package even if you just process medium size data.
Fedora 21
Install:
% sudo yum install -y groonga
Note that additional packages such as mecab-dic and mecab-jumandic packages require to install
groonga-release package which provides the Groonga yum repository beforehand:
% sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm
% sudo yum update
NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (GQTP protocol based server package)
See /server section about details.
If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.
Install groonga-tokenizer-mecab package:
% sudo yum install -y groonga-tokenizer-mecab
Then install MeCab dictionary. (mecab-ipadic or mecab-jumandic)
Install IPA dictionary:
% sudo yum install -y mecab-ipadic
Or install Juman dictionary:
% sudo yum install -y mecab-jumandic
There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install
groonga-munin-plugins package.
Install groonga-munin-plugins package:
% sudo yum install -y groonga-munin-plugins
There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use
that one, install groonga-normalizer-mysql package.
Install groonga-normalizer-mysql package:
% sudo yum install -y install groonga-normalizer-mysql
Build from source
Install required packages to build Groonga:
% sudo yum install -y wget tar gcc-c++ make mecab-devel libedit-devel
Download source:
% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
Configure (see source-configure about configure options):
% ./configure
Build:
% make -j$(grep '^processor' /proc/cpuinfo | wc -l)
Install:
% sudo make install
Oracle Solaris
This section describes how to install Groonga from source on Oracle Solaris.
Oracle Solaris 11
Install required packages to build Groonga:
% sudo pkg install gnu-tar gcc-45 system/header
Download source:
% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% gtar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
Configure with CFLAGS="-m64" CXXFLAGS="-m64" variables. They are needed for building 64-bit version. To
build 32-bit version, just remove those variables. (see source-configure about configure options):
% ./configure CFLAGS="-m64" CXXFLAGS="-m64"
Build:
% make
Install:
% sudo make install
Others
This section describes how to install Groonga from source on UNIX like environment.
To get more detail about installing Groonga from source on the specific environment, find the document
for the specific environment from /install.
Dependencies
Groonga doesn't require any special libraries but requires some tools for build.
Tools
Here are required tools:
• wget, curl or Web browser for downloading source archive
• tar and gzip for extracting source archive
• shell (many shells such as dash, bash and zsh will work)
• C compiler and C++ compiler (gcc and g++ are supported but other compilers may work)
• make (GNU make is supported but other make like BSD make will work)
You must get them ready.
You can use CMake instead of shell but this document doesn't describe about building with CMake.
Here are optional tools:
• pkg-config for detecting libraries
• sudo for installing built Groonga
You must get them ready if you want to use optional libraries.
Libraries
All libraries are optional. Here are optional libraries:
• MeCab for tokenizing full-text search target document by morphological analysis
• KyTea for tokenizing full-text search target document by morphological analysis
• ZeroMQ for /reference/suggest
• libevent for /reference/suggest
• MessagePack for supporting MessagePack output and /reference/suggest
• libedit for command line editing in /reference/executables/groonga
• zlib for compressing column value
• LZ4 for compressing column value
If you want to use those all or some libraries, you need to install them before installing Groonga.
Build from source
Groonga uses GNU build system. So the following is the simplest build steps:
% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
% ./configure
% make
% sudo make install
After the above steps, /reference/executables/groonga is found in /usr/local/bin/groonga.
The default build will work well but you can customize Groonga at configure step.
The following describes details about each step.
configure
First, you need to run configure. Here are important configure options:
--prefix=PATH
Specifies the install base directory. Groonga related files are installed under ${PATH}/ directory.
The default is /usr/local. In this case, /reference/executables/groonga is installed into
/usr/local/bin/groonga.
Here is an example that installs Groonga into ~/local for an user use instead of system wide use:
% ./configure --prefix=$HOME/local
--localstatedir=PATH
Specifies the base directory to place modifiable file such as log file, PID file and database files. For
example, log file is placed at ${PATH}/log/groonga.log.
The default is /usr/local/var.
Here is an example that system wide /var is used for modifiable files:
% ./configure --localstatedir=/var
--with-log-path=PATH
Specifies the default log file path. You can override the default log path is
/reference/executables/groonga command's --log-path command line option. So this option is not critical
build option. It's just for convenient.
The default is /usr/local/var/log/groonga.log. The /usr/local/var part is changed by --localstatedir
option.
Here is an example that log file is placed into shared NFS directory /nfs/log/groonga.log:
% ./configure --with-log-path=/nfs/log/groonga.log
--with-default-encoding=ENCODING
Specifies the default encoding. Available encodings are euc_jp, sjis, utf8, latin1, koi8r and none.
The default is utf-8.
Here is an example that Shift_JIS is used as the default encoding:
% ./configure --with-default-encoding=sjis
--with-match-escalation-threshold=NUMBER
Specifies the default match escalation threshold. See select-match-escalation-threshold about match
escalation threshold. -1 means that match operation never escalate.
The default is 0.
Here is an example that match escalation isn't used by default:
% ./configure --with-match-escalation-threshold=-1
--with-zlib
Enables column value compression by zlib.
The default is disabled.
Here is an example that enables column value compression by zlib:
% ./configure --with-zlib
--with-lz4
Enables column value compression by LZ4.
The default is disabled.
Here is an example that enables column value compression by LZ4:
% ./configure --with-lz4
--with-message-pack=MESSAGE_PACK_INSTALL_PREFIX
Specifies where MessagePack is installed. If MessagePack isn't installed with --prefix=/usr, you need to
specify this option with path that you use for building MessagePack.
If you installed MessagePack with --prefix=$HOME/local option, you should specify
--with-message-pack=$HOME/local to Groonga's configure.
The default is /usr.
Here is an example that uses MessagePack built with --prefix=$HOME/local option:
% ./configure --with-message-pack=$HOME/local
--with-munin-plugins
Installs Munin plugins for Groonga. They are installed into ${PREFIX}/share/groonga/munin/plugins/.
Those plugins are not installed by default.
Here is an example that installs Munin plugins for Groonga:
% ./configure --with-munin-plugins
--with-package-platform=PLATFORM
Installs platform specific system management files such as init script. Available platforms are redhat
and fedora. redhat is for Red Hat and Red Hat clone distributions such as CentOS. fedora is for Fedora.
Those system management files are not installed by default.
Here is an example that installs CentOS specific system management files:
% ./configure --with-package-platform=redhat
--help
Shows all configure options.
make
configure is succeeded, you can build Groonga by make:
% make
If you have multi cores CPU, you can make faster by using -j option. If you have 4 cores CPU, it's good
for using -j4 option:
% make -j4
If you get some errors by make, please report them to us: /contribution/report
make install
Now, you can install built Groonga!:
% sudo make install
If you have write permission for ${PREFIX}, you don't need to use sudo. e.g. --prefix=$HOME/local case.
In this case, use make install:
% make install
COMMUNITY
There are some places for sharing Groonga information. We welcome you to join our community.
Mailing List
There are mailing lists for discussion about Groonga.
For English speakers
groonga-talk@lists.sourceforge.net
For Japanese speakers
groonga-dev@lists.osdn.me
Chat room
There are chat rooms for discussion about Groonga.
For English speakers
groonga/en chat room on Gitter
For Japanese speakers
groonga/ja chat room on Gitter
Twitter
@groonga tweets Groonga related information.
Please follow the account to get the latest Groonga related information!
Facebook
Groonga page on Facebook shares Groonga related information.
Please like the page to get the latest Groonga related information!
TUTORIAL
Basic operations
A Groonga package provides a C library (libgroonga) and a command line tool (groonga). This tutorial
explains how to use the command line tool, with which you can create/operate databases, start a server,
establish a connection with a server, etc.
Create a database
The first step to using Groonga is to create a new database. The following shows how to do it.
Form:
groonga -n DB_PATH
The -n option specifies to create a new database and DB_PATH specifies the path of the new database.
Actually, a database consists of a series of files and DB_PATH specifies the file which will be the
entrance to the new database. DB_PATH also specifies the path prefix for other files. Note that database
creation fails if DB_PATH points to an existing file (For example, db open failed (DB_PATH): syscall
error 'DB_PATH' (File exists). You can operate an existing database in a way that is in the next
chapter).
This command creates a new database and then enters into interactive mode in which Groonga prompts you to
enter commands for operating that database. You can terminate this mode with Ctrl-d.
Execution example:
% groonga -n /tmp/groonga-databases/introduction.db
After this database creation, you can find a series of files in /tmp/groonga-databases.
Operate a database
The following shows how to operate an existing database.
Form:
groonga DB_PATH [COMMAND]
DB_PATH specifies the path of a target database. This command fails if the specified database does not
exist.
If COMMAND is specified, Groonga executes COMMAND and returns the result. Otherwise, Groonga starts in
interactive mode that reads commands from the standard input and executes them one by one. This tutorial
focuses on the interactive mode.
Let's see the status of a Groonga process by using a /reference/commands/status command.
Execution example:
% groonga /tmp/groonga-databases/introduction.db
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "5.0.6-128-g8029ddb",
# "alloc_count": 206,
# "command_version": 1,
# "starttime": 1439995916,
# "default_command_version": 1
# }
# ]
As shown in the above example, a command returns a JSON array. The first element contains an error code,
execution time, etc. The second element is the result of an operation.
NOTE:
You can format a JSON using additional tools. For example, grnwrap, Grnline, jq and so on.
Command format
Commands for operating a database accept arguments as follows:
Form_1: COMMAND VALUE_1 VALUE_2 ..
Form_2: COMMAND --NAME_1 VALUE_1 --NAME_2 VALUE_2 ..
In the first form, arguments must be passed in order. This kind of arguments are called positional
arguments because the position of each argument determines its meaning.
In the second form, you can specify a parameter name with its value. So, the order of arguments is not
defined. This kind of arguments are known as named parameters or keyword arguments.
If you want to specify a value which contains white-spaces or special characters, such as quotes and
parentheses, please enclose the value with single-quotes or double-quotes.
For details, see also the paragraph of "command" in /reference/executables/groonga.
Basic commands
/reference/commands/status
shows status of a Groonga process.
/reference/commands/table_list
shows a list of tables in a database.
/reference/commands/column_list
shows a list of columns in a table.
/reference/commands/table_create
adds a table to a database.
/reference/commands/column_create
adds a column to a table.
/reference/commands/select
searches records from a table and shows the result.
/reference/commands/load
inserts records to a table.
Create a table
A /reference/commands/table_create command creates a new table.
In most cases, a table has a primary key which must be specified with its data type and index type.
There are various data types such as integers, strings, etc. See also /reference/types for more details.
The index type determines the search performance and the availability of prefix searches. The details
will be described later.
Let's create a table. The following example creates a table with a primary key. The name parameter
specifies the name of the table. The flags parameter specifies the index type for the primary key. The
key_type parameter specifies the data type of the primary key.
Execution example:
table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
The second element of the result indicates that the operation succeeded.
View a table
A /reference/commands/select command can enumerate records in a table.
Execution example:
select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
When only a table name is specified with a table parameter, a /reference/commands/select command returns
the first (at most) 10 records in the table. [0] in the result shows the number of records in the table.
The next array is a list of columns. ["_id","Uint32"] is a column of UInt32, named _id.
["_key","ShortText"] is a column of ShortText, named _key.
The above two columns, _id and _key, are the necessary columns. The _id column stores IDs those are
automatically allocated by Groonga. The _key column is associated with the primary key. You are not
allowed to rename these columns.
Create a column
A /reference/commands/column_create command creates a new column.
Let's add a column. The following example adds a column to the Site table. The table parameter specifies
the target table. The name parameter specifies the name of the column. The type parameter specifies the
data type of the column.
Execution example:
column_create --table Site --name title --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
Load records
A /reference/commands/load command loads JSON-formatted records into a table.
The following example loads nine records into the Site table.
Execution example:
load --table Site
[
{"_key":"http://example.org/","title":"This is test record 1!"},
{"_key":"http://example.net/","title":"test record 2."},
{"_key":"http://example.com/","title":"test test record three."},
{"_key":"http://example.net/afr","title":"test record four."},
{"_key":"http://example.org/aba","title":"test test test record five."},
{"_key":"http://example.com/rab","title":"test test test test record six."},
{"_key":"http://example.net/atv","title":"test test test record seven."},
{"_key":"http://example.org/gat","title":"test test record eight."},
{"_key":"http://example.com/vdw","title":"test test record nine."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]
The second element of the result indicates how many records were successfully loaded. In this case, all
the records are successfully loaded.
Let's make sure that these records are correctly stored.
Execution example:
select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ],
# [
# 7,
# "http://example.net/atv",
# "test test test record seven."
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ]
# ]
# ]
# ]
Get a record
A /reference/commands/select command can search records in a table.
If a search condition is specified with a query parameter, a /reference/commands/select command searches
records matching the search condition and returns the matched records.
Let's get a record having a specified record ID. The following example gets the first record in the Site
table. More precisely, the query parameter specifies a record whose _id column stores 1.
Execution example:
select --table Site --query _id:1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]
Next, let's get a record having a specified key. The following example gets the record whose primary key
is "http://example.org/". More precisely, the query parameter specifies a record whose _key column stores
"http://example.org/".
Execution example:
select --table Site --query '_key:"http://example.org/"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]
Create a lexicon table for full text search
Let's go on to how to make full text search.
Groonga uses an inverted index to provide fast full text search. So, the first step is to create a
lexicon table which stores an inverted index, also known as postings lists. The primary key of this table
is associated with a vocabulary made up of index terms and each record stores postings lists for one
index term.
The following shows a command which creates a lexicon table named Terms. The data type of its primary key
is ShortText.
Execution example:
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
The /reference/commands/table_create command takes many parameters but you don't need to understand all
of them. Please skip the next paragraph if you are not interested in how it works.
The TABLE_PAT_KEY flag specifies to store index terms in a patricia trie. The default_tokenizer parameter
specifies the method for tokenizing text. This example uses TokenBigram that is generally called N-gram.
The normalizer parameter specifies to normalize index terms.
Create an index column for full text search
The second step is to create an index column, which allows you to search records from its associated
column. That is to say this step specifies which column needs an index.
Let's create an index column. The following example creates an index column for a column in the Site
table.
Execution example:
column_create --table Terms --name blog_title --flags COLUMN_INDEX|WITH_POSITION --type Site --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]
The table parameter specifies the index table and the name parameter specifies the index column. The type
parameter specifies the target table and the source parameter specifies the target column. The
COLUMN_INDEX flag specifies to create an index column and the WITH_POSITION flag specifies to create a
full inverted index, which contains the positions of each index term. This combination,
COLUMN_INDEX|WITH_POSITION, is recommended for the general purpose.
NOTE:
You can create a lexicon table and index columns before/during/after loading records. If a target
column already has records, Groonga creates an inverted index in a static manner. In contrast, if you
load records into an already indexed column, Groonga updates the inverted index in a dynamic manner.
Full text search
It's time. You can make full text search with a /reference/commands/select command.
A query for full text search is specified with a query parameter. The following example searches records
whose "title" column contains "this". The '@' specifies to make full text search. Note that a lower case
query matches upper case and capitalized terms in a record if NormalizerAuto was specified when creating
a lexcon table.
Execution example:
select --table Site --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]
In this example, the first record matches the query because its title contains "This", that is the
capitalized form of the query.
A /reference/commands/select command accepts an optional parameter, named match_columns, that specifies
the default target columns. This parameter is used if target columns are not specified in a query. [1]
The combination of "--match_columns title" and "--query this" brings you the same result that "--query
title:@this" does.
Execution example:
select --table Site --match_columns title --query this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]
Specify output columns
An output_columns parameter of a /reference/commands/select command specifies columns to appear in the
search result. If you want to specify more than one columns, please separate column names by commas
(',').
Execution example:
select --table Site --output_columns _key,title,_score --query title:@test
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# 1
# ],
# [
# "http://example.net/",
# "test record 2.",
# 1
# ],
# [
# "http://example.com/",
# "test test record three.",
# 2
# ],
# [
# "http://example.net/afr",
# "test record four.",
# 1
# ],
# [
# "http://example.org/aba",
# "test test test record five.",
# 3
# ],
# [
# "http://example.com/rab",
# "test test test test record six.",
# 4
# ],
# [
# "http://example.net/atv",
# "test test test record seven.",
# 3
# ],
# [
# "http://example.org/gat",
# "test test record eight.",
# 2
# ],
# [
# "http://example.com/vdw",
# "test test record nine.",
# 2
# ]
# ]
# ]
# ]
This example specifies three output columns including the _score column, which stores the relevance score
of each record.
Specify output ranges
A /reference/commands/select command returns a part of its search result if offset and/or limit
parameters are specified. These parameters are useful to paginate a search result, a widely-used
interface which shows a search result on a page by page basis.
An offset parameter specifies the starting point and a limit parameter specifies the maximum number of
records to be returned. If you need the first record in a search result, the offset parameter must be 0
or omitted.
Execution example:
select --table Site --offset 0 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ]
# ]
# ]
# ]
select --table Site --offset 3 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ]
# ]
# ]
# ]
select --table Site --offset 7 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ]
# ]
# ]
# ]
Sort a search result
A /reference/commands/select command sorts its result when used with a sortby parameter.
A sortby parameter specifies a column as a sorting creteria. A search result is arranged in ascending
order of the column values. If you want to sort a search result in reverse order, please add a leading
hyphen ('-') to the column name in a parameter.
The following example shows records in the Site table in reverse order.
Execution example:
select --table Site --sortby -_id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 7,
# "http://example.net/atv",
# "test test test record seven."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]
The next example uses the _score column as the sorting criteria for ranking the search result. The result
is sorted in relevance order.
Execution example:
select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# 4,
# "test test test test record six."
# ],
# [
# 5,
# 3,
# "test test test record five."
# ],
# [
# 7,
# 3,
# "test test test record seven."
# ],
# [
# 8,
# 2,
# "test test record eight."
# ],
# [
# 3,
# 2,
# "test test record three."
# ],
# [
# 9,
# 2,
# "test test record nine."
# ],
# [
# 1,
# 1,
# "This is test record 1!"
# ],
# [
# 4,
# 1,
# "test record four."
# ],
# [
# 2,
# 1,
# "test record 2."
# ]
# ]
# ]
# ]
If you want to specify more than one columns, please separate column names by commas (','). In such a
case, a search result is sorted in order of the values in the first column, and then records having the
same values in the first column are sorted in order of the second column values.
Execution example:
select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score,_id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# 4,
# "test test test test record six."
# ],
# [
# 5,
# 3,
# "test test test record five."
# ],
# [
# 7,
# 3,
# "test test test record seven."
# ],
# [
# 3,
# 2,
# "test test record three."
# ],
# [
# 8,
# 2,
# "test test record eight."
# ],
# [
# 9,
# 2,
# "test test record nine."
# ],
# [
# 1,
# 1,
# "This is test record 1!"
# ],
# [
# 2,
# 1,
# "test record 2."
# ],
# [
# 4,
# 1,
# "test record four."
# ]
# ]
# ]
# ]
footnote
[1] Currently, a match_columns parameter is available iff there exists an inverted index for full text
search. A match_columns parameter for a regular column is not supported.
Remote access
You can use Groonga as a server which allows remote access. Groonga supports the original protocol
(GQTP), the memcached binary protocol and HTTP.
Hypertext transfer protocol (HTTP)
How to run an HTTP server
Groonga supports the hypertext transfer protocol (HTTP). The following form shows how to run Groonga as
an HTTP server daemon.
Form:
groonga [-p PORT_NUMBER] -d --protocol http DB_PATH
The --protocol option and its argument specify the protocol of the server. "http" specifies to use HTTP.
If the -p option is not specified, Groonga uses the default port number 10041.
The following command runs an HTTP server that listens on the port number 80.
Execution example:
% sudo groonga -p 80 -d --protocol http /tmp/groonga-databases/introduction.db
%
NOTE:
You must have root privileges if you listen on the port number 80 (well known port). There is no such
a limitation about the port number 1024 or over.
How to send a command to an HTTP server
You can send a command to an HTTP server by sending a GET request to /d/COMMAND_NAME. Command parameters
can be passed as parameters of the GET request. The format is "?NAME_1=VALUE_1&NAME_2=VALUE_2&...".
The following example shows how to send commands to an HTTP server.
Execution example:
http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/status
Executed command:
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "5.0.6-128-g8029ddb",
# "alloc_count": 185,
# "command_version": 1,
# "starttime": 1439995935,
# "default_command_version": 1
# }
# ]
http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/select?table=Site&query=title:@this
Executed command:
select --table Site --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "japan",
# ".org",
# "http://example.net/",
# [
# "http://example.net/",
# "http://example.org/",
# "http://example.com/"
# ],
# "128452975x503157902",
# "This is test record 1!"
# ]
# ]
# ]
# ]
Administration tool (HTTP)
An HTTP server of Groonga provides a browser based administration tool that makes database management
easy. After starting an HTTP server, you can use the administration tool by accessing
http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/. Note that Javascript must be enabled for the tool to work
properly.
Security issues
Groonga servers don't support user authentication. Everyone can view and modify databases hosted by
Groonga servers. You are recommended to restrict IP addresses that can access Groonga servers. You can
use iptables or similar for this purpose.
Various data types
Groonga is a full text search engine but also serves as a column-oriented data store. Groonga supports
various data types, such as numeric types, string types, date and time type, longitude and latitude
types, etc. This tutorial shows a list of data types and explains how to use them.
Overview
The basic data types of Groonga are roughly divided into 5 groups --- boolean type, numeric types, string
types, date/time type and longitude/latitude types. The numeric types are further divided according to
whether integer or floating point number, signed or unsigned and the number of bits allocated to each
integer. The string types are further divided according to the maximum length. The longitude/latitude
types are further divided according to the geographic coordinate system. For more details, see
/reference/types.
In addition, Groonga supports reference types and vector types. Reference types are designed for
accessing other tables. Vector types are designed for storing a variable number of values in one element.
First, let's create a table for this tutorial.
Execution example:
table_create --name ToyBox --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
Boolean type
The boolean type is used to store true or false. To create a boolean type column, specify Bool to the
type parameter of /reference/commands/column_create command. The default value of the boolean type is
false.
The following example creates a boolean type column and adds three records. Note that the third record
has the default value because no value is specified.
Execution example:
column_create --table ToyBox --name is_animal --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","is_animal":true}
{"_key":"Flower","is_animal":false}
{"_key":"Block"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,is_animal
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "is_animal",
# "Bool"
# ]
# ],
# [
# "Monkey",
# true
# ],
# [
# "Flower",
# false
# ],
# [
# "Block",
# false
# ]
# ]
# ]
# ]
Numeric types
The numeric types are divided into integer types and a floating point number type. The integer types are
further divided into the signed integer types and unsigned integer types. In addition, you can choose the
number of bits allocated to each integer. For more details, see /reference/types. The default value of
the numeric types is 0.
The following example creates an Int8 column and a Float column, and then updates existing records. The
/reference/commands/load command updates the weight column as expected. On the other hand, the price
column values are different from the specified values because 15.9 is not an integer and 200 is too
large. 15.9 is converted to 15 by removing the fractional part. 200 causes an overflow and the result
becomes -56. Note that the result of an overflow/underflow is undefined.
Execution example:
column_create --table ToyBox --name price --type Int8
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table ToyBox --name weight --type Float
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","price":15.9}
{"_key":"Flower","price":200,"weight":0.13}
{"_key":"Block","weight":25.7}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,price,weight
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "price",
# "Int8"
# ],
# [
# "weight",
# "Float"
# ]
# ],
# [
# "Monkey",
# 15,
# 0.0
# ],
# [
# "Flower",
# -56,
# 0.13
# ],
# [
# "Block",
# 0,
# 25.7
# ]
# ]
# ]
# ]
String types
The string types are divided according to the maximum length. For more details, see /reference/types. The
default value is the zero-length string.
The following example creates a ShortText column and updates existing records. The third record ("Block"
key record) has the default value (zero-length string) because it's not updated.
Execution example:
column_create --table ToyBox --name name --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","name":"Grease"}
{"_key":"Flower","name":"Rose"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "Monkey",
# "Grease"
# ],
# [
# "Flower",
# "Rose"
# ],
# [
# "Block",
# ""
# ]
# ]
# ]
# ]
Date and time type
The date and time type of Groonga is Time. Actually, a Time column stores a date and time as the number
of microseconds since the Epoch, 1970-01-01 00:00:00. A Time value can represent a date and time before
the Epoch because the actual data type is a signed integer. Note that /reference/commands/load and
/reference/commands/select commands use a decimal number to represent a data and time in seconds. The
default value is 0.0, which means the Epoch.
NOTE:
Groonga internally holds the value of Epoch as pair of integer. The first integer represents the value
of seconds, on the other hand, the second integer represents the value of micro seconds. So, Groonga
shows the value of Epoch as floating point. Integral part means the value of seconds, fraction part
means the value of micro seconds.
The following example creates a Time column and updates existing records. The first record ("Monkey" key
record) has the default value (0.0) because it's not updated.
Execution example:
column_create --table ToyBox --name time --type Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Flower","time":1234567890.1234569999}
{"_key":"Block","time":-1234567890}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,time
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "time",
# "Time"
# ]
# ],
# [
# "Monkey",
# 0.0
# ],
# [
# "Flower",
# 1234567890.12346
# ],
# [
# "Block",
# -1234567890.0
# ]
# ]
# ]
# ]
Longitude and latitude types
The longitude and latitude types are divided according to the geographic coordinate system. For more
details, see /reference/types. To represent a longitude and latitude, Groonga uses a string formatted as
follows:
• "longitude x latitude" in milliseconds (e.g.: "128452975x503157902")
• "longitude x latitude" in degrees (e.g.: "35.6813819x139.7660839")
A number with/without a decimal point represents a longitude or latitude in milliseconds/degrees
respectively. Note that a combination of a number with a decimal point and a number without a decimal
point (e.g. 35.1x139) must not be used. A comma (',') is also available as a delimiter. The default value
is "0x0".
The following example creates a WGS84GeoPoint column and updates existing records. The second record
("Flower" key record) has the default value ("0x0") because it's not updated.
Execution example:
column_create --table ToyBox --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","location":"128452975x503157902"}
{"_key":"Block","location":"35.6813819x139.7660839"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "Monkey",
# "128452975x503157902"
# ],
# [
# "Flower",
# "0x0"
# ],
# [
# "Block",
# "128452975x503157902"
# ]
# ]
# ]
# ]
Reference types
Groonga supports a reference column, which stores references to records in its associated table. In
practice, a reference column stores the IDs of the referred records in the associated table and enables
access to those records.
You can specify a column in the associated table to the output_columns parameter of a
/reference/commands/select command. The format is Src.Dest where Src is the name of the reference column
and Dest is the name of the target column. If only the reference column is specified, it is handled as
Src._key. Note that if a reference does not point to a valid record, a /reference/commands/select command
outputs the default value of the target column.
The following example adds a reference column to the Site table that was created in
tutorial-introduction-create-table. The new column, named link, is designed for storing links among
records in the Site table.
Execution example:
column_create --table Site --name link --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","link":"http://example.net/"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,link._key,link.title --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "link._key",
# "ShortText"
# ],
# [
# "link.title",
# "ShortText"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# "http://example.net/",
# "test record 2."
# ]
# ]
# ]
# ]
The type parameter of the /reference/commands/column_create command specifies the table to be associated
with the reference column. In this example, the reference column is associated with the own table. Then,
the /reference/commands/load command registers a link from "http://example.org" to "http://example.net".
Note that a reference column requires the primary key, not the ID, of the record to be referred to. After
that, the link is confirmed by the /reference/commands/select command. In this case, the primary key and
the title of the referred record are output because link._key and link.title are specified to the
output_columns parameter.
Vector types
Groonga supports a vector column, in which each element can store a variable number of values. To create
a vector column, specify the COLUMN_VECTOR flag to the flags parameter of a
/reference/commands/column_create command. A vector column is useful to represent a many-to-many
relationship.
The previous example used a regular column, so each record could have at most one link. Obviously, the
specification is insufficient because a site usually has more than one links. To solve this problem, the
following example uses a vector column.
Execution example:
column_create --table Site --name links --flags COLUMN_VECTOR --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","links":["http://example.net/","http://example.org/","http://example.com/"]},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,links._key,links.title --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "links._key",
# "ShortText"
# ],
# [
# "links.title",
# "ShortText"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# [
# "http://example.net/",
# "http://example.org/",
# "http://example.com/"
# ],
# [
# "test record 2.",
# "This is test record 1!",
# "test test record three."
# ]
# ]
# ]
# ]
# ]
The only difference at the first step is the flags parameter that specifies to create a vector column.
The type parameter of the /reference/commands/column_create command is the same as in the previous
example. Then, the /reference/commands/load command registers three links from "http://example.org/" to
"http://example.net/", "http://example.org/" and "http://example.com/". After that, the links are
confirmed by the /reference/commands/select command. In this case, the primary keys and the titles are
output as arrays because links._key and links.title are specified to the output_columns parameter.
Various search conditions
Groonga supports to narrow down by using syntax like JavaScript, sort by the calculated value.
Additionally, Groonga also supports to narrow down & sort search results by using location information
(latitude & longitude).
Narrow down & Full-text search by using syntax like JavaScript
The filter parameter of select command accepts the search condition. There is one difference between
filter parameter and query parameter, you need to specify the condition by syntax like JavaScript for
filter parameter.
Execution example:
select --table Site --filter "_id <= 1" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/"
# ]
# ]
# ]
# ]
See the detail of above query. Here is the condition which is specified as filter parameter:
_id <= 1
In this case, this query returns the records which meets the condition that the value of _id is equal to
or less than 1.
Moreover, you can use && for AND search, || for OR search.
Execution example:
select --table Site --filter "_id >= 4 && _id <= 6" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 4,
# "http://example.net/afr"
# ],
# [
# 5,
# "http://example.org/aba"
# ],
# [
# 6,
# "http://example.com/rab"
# ]
# ]
# ]
# ]
select --table Site --filter "_id <= 2 || _id >= 7" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/"
# ],
# [
# 2,
# "http://example.net/"
# ],
# [
# 7,
# "http://example.net/atv"
# ],
# [
# 8,
# "http://example.org/gat"
# ],
# [
# 9,
# "http://example.com/vdw"
# ]
# ]
# ]
# ]
If you specify query parameter and filter parameter at the same time, you can get the records which meets
both of the condition as a result.
Sort by using scorer
select command accepts scorer parameter which is used to process each record of full-text search results.
This parameter accepts the conditions which is specified by syntax like JavaScript as same as filter
parameter.
Execution example:
select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 6,
# "http://example.com/rab",
# 424238335
# ],
# [
# 9,
# "http://example.com/vdw",
# 596516649
# ],
# [
# 7,
# "http://example.net/atv",
# 719885386
# ],
# [
# 2,
# "http://example.net/",
# 846930886
# ],
# [
# 8,
# "http://example.org/gat",
# 1649760492
# ],
# [
# 3,
# "http://example.com/",
# 1681692777
# ],
# [
# 4,
# "http://example.net/afr",
# 1714636915
# ],
# [
# 1,
# "http://example.org/",
# 1804289383
# ],
# [
# 5,
# "http://example.org/aba",
# 1957747793
# ]
# ]
# ]
# ]
select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 4,
# "http://example.net/afr",
# 783368690
# ],
# [
# 2,
# "http://example.net/",
# 1025202362
# ],
# [
# 5,
# "http://example.org/aba",
# 1102520059
# ],
# [
# 1,
# "http://example.org/",
# 1189641421
# ],
# [
# 3,
# "http://example.com/",
# 1350490027
# ],
# [
# 8,
# "http://example.org/gat",
# 1365180540
# ],
# [
# 9,
# "http://example.com/vdw",
# 1540383426
# ],
# [
# 7,
# "http://example.net/atv",
# 1967513926
# ],
# [
# 6,
# "http://example.com/rab",
# 2044897763
# ]
# ]
# ]
# ]
'_score' is one of a pseudo column. The score of full-text search is assigned to it. See
/reference/columns/pseudo about '_score' column.
In the above query, the condition of scorer parameter is:
_score = rand()
In this case, the score of full-text search is overwritten by the value of rand() function.
The condition of sortby parameter is:
_score
This means that sorting the search result by ascending order.
As a result, the order of search result is randomized.
Narrow down & sort by using location information
Groonga supports to store location information (Longitude & Latitude) and not only narrow down but also
sort by using it.
Groonga supports two kind of column types to store location information. One is TokyoGeoPoint, the other
is WGS84GeoPoint. TokyoGeoPoint is used for Japan geodetic system. WGS84GeoPoint is used for world
geodetic system.
Specify longitude and latitude as follows:
• "[latitude in milliseconds]x[longitude in milliseconds]"(e.g.: "128452975x503157902")
• "[latitude in milliseconds],[longitude in milliseconds]"(e.g.: "128452975,503157902")
• "[latitude in degrees]x[longitude in degrees]"(e.g.: "35.6813819x139.7660839")
• "[latitude in degrees],[longitude in degrees]"(e.g.: "35.6813819,139.7660839")
Let's store two location information about station in Japan by WGS. One is Tokyo station, the other is
Shinjyuku station. Both of them are station in Japan. The latitude of Tokyo station is 35 degrees 40
minutes 52.975 seconds, the longitude of Tokyo station is 139 degrees 45 minutes 57.902 seconds. The
latitude of Shinjyuku station is 35 degrees 41 minutes 27.316 seconds, the longitude of Shinjyuku
station is 139 degrees 42 minutes 0.929 seconds. Thus, location information in milliseconds are
"128452975x503157902" and "128487316x502920929" respectively. location information in degrees are
"35.6813819x139.7660839" and "35.6909211x139.7002581" respectively.
Let's register location information in milliseconds.
Execution example:
column_create --table Site --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","location":"128452975x503157902"}
{"_key":"http://example.net/","location":"128487316x502920929"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table Site --query "_id:1 OR _id:2" --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ],
# [
# "http://example.net/",
# "128487316x502920929"
# ]
# ]
# ]
# ]
Then assign the value of geo distance which is calculated by /reference/functions/geo_distance function
to scorer parameter.
Let's show geo distance from Akihabara station in Japan. In world geodetic system, the latitude of
Akihabara station is 35 degrees 41 minutes 55.259 seconds, the longitude of Akihabara station is 139
degrees 46 minutes 27.188 seconds. Specify "128515259x503187188" for geo_distance function.
Execution example:
select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ]
# ]
# ]
# ]
As you can see, the geo distance between Tokyo station and Akihabara station is 2054 meters, the geo
distance between Akihabara station and Shinjyuku station is 6720 meters.
The return value of geo_distance function is also used for sorting by specifying pseudo _score column to
sortby parameter.
Execution example:
select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")' --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ]
# ]
# ]
# ]
Groonga also supports to narrow down by "a certain point within specified meters".
In such a case, use /reference/functions/geo_in_circle function in filter parameter.
For example, search the records which exists within 5000 meters from Akihabara station.
Execution example:
select --table Site --output_columns _key,location --filter 'geo_in_circle(location, "128515259x503187188", 5000)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ]
# ]
# ]
# ]
There is /reference/functions/geo_in_rectangle function which is used to search a certain point within
specified region.
Drilldown
You learned how to search and sort searched results in the previous sections. Now that you can search as
you likes, but how do you summarize the number of records which has specific value in the column?
As you know, there is a naive solution to execute query by every the value of column, then you can get
the number of records as a result. It is a simple way, but it is not reasonable to many records.
If you are familiar with SQL, you will doubt with "Is there a similar SQL functionality to GROUP BY in
Groonga?".
Of course, Groonga provides such a functionality. It's called as drilldown.
drilldown enables you to get the number of records which belongs to specific the value of column at once.
To illustrate this feature, imagine the case that classification by domain and grouping by country that
domain belongs to.
Here is the concrete examples how to use this feature.
In this example, we add two columns to Site table. domain column is used for TLD (top level domain).
country column is used for country name. The type of these columns are SiteDomain table which uses domain
name as a primary key and SiteCountry table which uses country name as a primary key.
Execution example:
table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]
Here is a example of drilldown with domain column. Three kind of values are used in domain column -
".org", ".net" and ".com".
Execution example:
select --table Site --limit 0 --drilldown domain
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ]
# ]
# ]
Here is a summary of above query.
Drilldown by domain column
┌──────────┬─────────────────────────────┬─────────────────────────────────┐
│ Group by │ The number of group records │ Group records means │
│ │ │ following records │
├──────────┼─────────────────────────────┼─────────────────────────────────┤
│ .org │ 3 │ │
│ │ │ • http://example.org/ │
│ │ │ │
│ │ │ • http://example.org/aba │
│ │ │ │
│ │ │ • http://example.org/gat │
├──────────┼─────────────────────────────┼─────────────────────────────────┤
│ .net │ 3 │ │
│ │ │ • http://example.net/ │
│ │ │ │
│ │ │ • http://example.net/afr │
│ │ │ │
│ │ │ • http://example.net/atv │
├──────────┼─────────────────────────────┼─────────────────────────────────┤
│ .com │ 3 │ │
│ │ │ • http://example.com/ │
│ │ │ │
│ │ │ • http://example.com/rab │
│ │ │ │
│ │ │ • http://example.com/vdw │
└──────────┴─────────────────────────────┴─────────────────────────────────┘
The value of drilldown are returned as the value of _nsubrecs column. In this case, Site table is grouped
by ".org", ".net", ".com" domain. _nsubrecs shows that each three domain has three records.
If you execute drildown to the column which has table as a type, you can get the value of column which is
stored in referenced table. _nsubrecs pseudo column is added to the table which is used for drilldown.
this pseudo column stores the number of records which is grouped by.
Then, investigate referenced table in detail. As Site table use SiteDomain table as column type of
domain, you can use --drilldown_output_columns to know detail of referenced column.
Execution example:
select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 1,
# ".org",
# 3
# ],
# [
# 2,
# ".net",
# 3
# ],
# [
# 3,
# ".com",
# 3
# ]
# ]
# ]
# ]
Now, you can see detail of each grouped domain, drilldown by country column which has ".org" as column
value.
Execution example:
select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 1
# ]
# ]
# ]
# ]
Drilldown with multiple column
Drilldown feature supports multiple column. Use comma separated multiple column names as drildown
parameter. You can get the each result of drilldown at once.
Execution example:
select --table Site --limit 0 --drilldown domain,country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 3
# ],
# [
# "brazil",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "korea",
# 1
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]
Sorting drildown results
Use --drilldown_sortby if you want to sort the result of drilldown. For example, specify _nsubrecs as
ascending order.
Execution example:
select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "brazil",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ],
# [
# "japan",
# 3
# ]
# ]
# ]
# ]
limits drildown results
The number of drilldown results is limited to 10 as a default. Use drilldown_limit and drilldown_offset
parameter to customize orilldown results.
Execution example:
select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]
Note that drilldown to the column which stores string is slower than the columns which stores the other
types. If you drilldown to string type of column, create the table that type of primary key is string,
then create the column which refers that table.
Tag search and reverse resolution of reference relationships
As you know, Groonga supports to store array in column which refers other table. In fact, you can do tag
search by using array data which refers other table.
Tag search is very fast because Groonga use inverted index as data structure.
Tag search
Let's consider to create a search engine for an web site to share movies. Each movie may be associated
with multiple keywords which represents the content of movie.
Let's create tables for movie information, then search the movies.
First, create the Video table which stores movie information. the Video table has two columns. the title
column stores title of the movie. the tags column stores multiple tag information in reference Tag table.
Next, create the Tag table which stores tag information. the Tag table has one column. The tag string is
stored as primary key, then index_tags stores indexes for tags column of Video table.
Execution example:
table_create --name Video --flags TABLE_HASH_KEY --key_type UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name Tag --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Video --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Video --name tags --flags COLUMN_VECTOR --type Tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Tag --name index_tags --flags COLUMN_INDEX --type Video --source tags
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Video
[
{"_key":1,"title":"Soccer 2010","tags":["Sports","Soccer"]},
{"_key":2,"title":"Zenigata Kinjirou","tags":["Variety","Money"]},
{"_key":3,"title":"groonga Demo","tags":["IT","Server","groonga"]},
{"_key":4,"title":"Moero!! Ultra Baseball","tags":["Sports","Baseball"]},
{"_key":5,"title":"Hex Gone!","tags":["Variety","Quiz"]},
{"_key":6,"title":"Pikonyan 1","tags":["Animation","Pikonyan"]},
{"_key":7,"title":"Draw 8 Month","tags":["Animation","Raccoon"]},
{"_key":8,"title":"K.O.","tags":["Animation","Music"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 8]
After creating indexed column, you can do full-text search very fast. The indexed column is also
automatically updated when stored data is refreshed.
List up the movies that specific keywords are given.
Execution example:
select --table Video --query tags:@Variety --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 2,
# "Zenigata Kinjirou"
# ],
# [
# 5,
# "Hex Gone!"
# ]
# ]
# ]
# ]
select --table Video --query tags:@Sports --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "Soccer 2010"
# ],
# [
# 4,
# "Moero!! Ultra Baseball"
# ]
# ]
# ]
# ]
select --table Video --query tags:@Animation --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# "Pikonyan 1"
# ],
# [
# 7,
# "Draw 8 Month"
# ],
# [
# 8,
# "K.O."
# ]
# ]
# ]
# ]
You can search by tags such as "Variety", "Sports" and "Animation".
Reverse resolution of reference relationships
Groonga supports indexes for reverse resolution among tables. Tag search is one of concrete examples.
For example, you can search friendships by reverse resolution in social networking site.
Following example shows how to create User table which stores user information, username column which
stores user name, friends column which stores list of user's friends in array, index_friends column as
indexed column.
Execution example:
table_create --name User --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name username --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name friends --flags COLUMN_VECTOR --type User
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name index_friends --flags COLUMN_INDEX --type User --source friends
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table User
[
{"_key":"ken","username":"健作","friends":["taro","jiro","tomo","moritapo"]}
{"_key":"moritapo","username":"森田","friends":["ken","tomo"]}
{"_key":"taro","username":"ぐるんが太郎","friends":["jiro","tomo"]}
{"_key":"jiro","username":"ぐるんが次郎","friends":["taro","tomo"]}
{"_key":"tomo","username":"トモちゃん","friends":["ken","hana"]}
{"_key":"hana","username":"花子","friends":["ken","taro","jiro","moritapo","tomo"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]
Let's show list of users who contains specified user in friend list.
Execution example:
select --table User --query friends:@tomo --output_columns _key,username
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "username",
# "ShortText"
# ]
# ],
# [
# "ken",
# "健作"
# ],
# [
# "taro",
# "ぐるんが太郎"
# ],
# [
# "jiro",
# "ぐるんが次郎"
# ],
# [
# "moritapo",
# "森田"
# ],
# [
# "hana",
# "花子"
# ]
# ]
# ]
# ]
select --table User --query friends:@jiro --output_columns _key,username
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "username",
# "ShortText"
# ]
# ],
# [
# "ken",
# "健作"
# ],
# [
# "taro",
# "ぐるんが太郎"
# ],
# [
# "hana",
# "花子"
# ]
# ]
# ]
# ]
Then drilldown the count which shows user is listed as friend.
Execution example:
select --table User --limit 0 --drilldown friends
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 6
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "friends",
# "User"
# ],
# [
# "index_friends",
# "UInt32"
# ],
# [
# "username",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 6
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "taro",
# 3
# ],
# [
# "jiro",
# 3
# ],
# [
# "tomo",
# 5
# ],
# [
# "moritapo",
# 2
# ],
# [
# "ken",
# 3
# ],
# [
# "hana",
# 1
# ]
# ]
# ]
# ]
As you can see, it shows the results which follows reverse resolution of reference relationship.
Geo location search with index
Groonga supports to add indexes to the column which stores geo location information. Groonga is very
fast because it use such indexes against the column which contains geo location information to search
enormous number of records.
Execution example:
table_create --name GeoSite --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table GeoSite --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table GeoIndex --name index_point --type GeoSite --flags COLUMN_INDEX --source location
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table GeoSite
[
{"_key":"http://example.org/","location":"128452975x503157902"},
{"_key":"http://example.net/","location":"128487316x502920929"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 5000)' --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ]
# ]
# ]
# ]
These indexes are also used when sorting the records with geo location search.
Execution example:
select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 50000)' --output_columns _key,location,_score --sortby '-geo_distance(location, "128515259x503187188")' --scorer '_score = geo_distance(location, "128515259x503187188")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ]
# ]
# ]
# ]
match_columns parameter
Full-text search against multiple columns
Groonga supports full-text search against multiple columns. Let's consider blog site. Usually, blog site
has a table which contains title column and content column. How do you search the blog entry which
contains specified keywords in title or content?
In such a case, there are two ways to create indexes. One way is creating column index against each
column. The other way is creating one column index against multiple columns. Either way, Groonga supports
similar full-text search syntax.
Creating column index against each column
Here is the example which create column index against each column.
First, create Blog1 table, add title column which stores title string, message column which stores
content of blog entry.
Then create IndexBlog1 table for column indexes, add index_title column for title column, index_message
column for message column.
Execution example:
table_create --name Blog1 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog1 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_title --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_message --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog1
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
match_columns option of select command accepts multiple columns as search target. Specify query string
to query option. Then you can do full-text search title and content of blog entries.
Let's try to search blog entries.
Execution example:
select --table Blog1 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
Creating one column index against multiple columns
Groonga also supports one column index against multiple columns.
The difference for previous example is only one column index exists. Thus, There is one common column
index against title and message column.
Even though same column index is used, Groonga supports to search against title column only, message
column only and title or message column.
Execution example:
table_create --name Blog2 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog2 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog2 --name index_blog --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Blog2 --source title,message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog2
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
Let's search same query in previous section. You can get same search results.
Execution example:
select --table Blog2 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
NOTE:
There may be a question that "which is the better solution for indexing." It depends on the case.
• Indexes for each column - The update performance tends to be better than multiple colum index
because there is enough buffer for updating. On the other hand, the efficiency of disk usage is not
so good.
• Indexes for multiple column - It saves disk usage because it shares common buffer. On the other
hand, the update performance is not so good.
Full text search with specific index name
TODO
Nested index search among related table by column index
If there are relationships among multiple table with column index, you can search multiple table by
specifying reference column name.
Here is the concrete example.
There are tables which store blog articles, comments for articles. The table which stores articles has
columns for article and comment. And the comment column refers Comments table. The table which stores
comments has columns for comment and column index to article table.
if you want to search the articles which contain specified keyword in comment, you need to execute
fulltext search for table of comment, then search the records which contains fulltext search results.
But, you can search the records by specifying the reference column index at once.
Here is the sample schema.
Execution example:
table_create Comments TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Articles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles comment COLUMN_SCALAR Comments
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon articles_content COLUMN_INDEX|WITH_POSITION Articles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon comments_content COLUMN_INDEX|WITH_POSITION Comments content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments article COLUMN_INDEX Articles comment
# [[0, 1337566253.89858, 0.000355720520019531], true]
Here is the sample data.
Execution example:
load --table Comments
[
{"_key": 1, "content": "I'm using Groonga too!"},
{"_key": 2, "content": "I'm using Groonga and Mroonga!"},
{"_key": 3, "content": "I'm using Mroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Articles
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!"},
{"content": "Mroonga is fast!", "comment": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
You can write the query that search the records which contains specified keyword as a comment, then fetch
the articles which refers to it.
Query for searching the records described above:
select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"
You need to concatenate comment column of Articles table and content column of Comments table with
period( . ) as --match_columns arguments.
At first, this query execute fulltext search from content of Comments table, then fetch the records of
Articles table which refers to already searched records of Comments table. (Because of this, if you
comment out the query which creates index column article of Comments table, you can't get intended search
results.)
Execution example:
select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 1,
# 1,
# 1,
# "Groonga is fast!"
# ]
# ]
# ]
# ]
Now, you can search articles which contains specific keywords as a comment.
The feature of nested index search is not limited to the relationship between two table only.
Here is the sample schema similar to previous one. The difference is added table which express 'Reply'
and relationship is extended to three tables.
Execution example:
table_create Replies2 TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Replies2 content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Comments2 TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 comment COLUMN_SCALAR Replies2
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Articles2 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles2 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles2 comment COLUMN_SCALAR Comments2
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon2 TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 articles_content COLUMN_INDEX|WITH_POSITION Articles2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 comments_content COLUMN_INDEX|WITH_POSITION Comments2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 replies_content COLUMN_INDEX|WITH_POSITION Replies2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 article COLUMN_INDEX Articles2 comment
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Replies2 reply_to COLUMN_INDEX Comments2 comment
# [[0, 1337566253.89858, 0.000355720520019531], true]
Here is the sample data.
Execution example:
load --table Replies2
[
{"_key": 1, "content": "I'm using Rroonga too!"},
{"_key": 2, "content": "I'm using Groonga and Mroonga and Rroonga!"},
{"_key": 3, "content": "I'm using Nroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Comments2
[
{"_key": 1, "content": "I'm using Groonga too!", "comment": 1},
{"_key": 2, "content": "I'm using Groonga and Mroonga!", "comment": 2},
{"_key": 3, "content": "I'm using Mroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Articles2
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!", "comment": 2},
{"content": "Mroonga is fast!", "comment": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
Query for searching the records described above:
select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *"
select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *"
The first query searches mroonga from Comments2 table, the second one searches mroonga from Replies2 and
Comments2 table by using reference column index.
Execution example:
select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments2"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 2,
# 1,
# 2,
# "Groonga is useful!"
# ],
# [
# 3,
# 1,
# 3,
# "Mroonga is fast!"
# ]
# ]
# ]
# ]
select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments2"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 2,
# 1,
# 2,
# "Groonga is useful!"
# ]
# ]
# ]
# ]
As a result, the first query matches two article because of Comments2 table has two records which
contains mroonga as keyword.
On the other hand, the second one matches one article only because of Replies2 table has only one record
which contains mroonga as keyword, and there is one record which contains same keyword and refers to the
record in Comments2 table.
Indexes with Weight
TODO
Prefix search with patricia trie
Groonga supports to create a table with patricia trie option. By specifying it, You can do prefix
search.
And more, you can do suffix search against primary key by specifying additional option.
Prefix search by primary key
table_create command which uses TABLE_PAT_KEY for flags option supports prefix search by primary key.
Execution example:
table_create --name PatPrefix --flags TABLE_PAT_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table PatPrefix
[
{"_key":"James"}
{"_key":"Jason"}
{"_key":"Jennifer"},
{"_key":"Jeff"},
{"_key":"John"},
{"_key":"Joseph"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]
select --table PatPrefix --query _key:^Je
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 3,
# "Jennifer"
# ],
# [
# 4,
# "Jeff"
# ]
# ]
# ]
# ]
Suffix search by primary key
table_create command which uses TABLE_PAT_KEY and KEY_WITH_SIS for flags option supports prefix search
and suffix search by primary key.
If you set KEY_WITH_SIS flag, suffix search records also are added when you add the data. So if you
search simply, the automatically added records are hit in addition to the original records. In order to
search only the original records, you need a plan.
For example, in order to make this distinction between the original records and automatically added
records, add the original column indicating that it is the original record, and add original column is
true to the search condition. For attention, use --filter option because --query option is not specify
Bool type value intuitively.
Execution example:
table_create --name PatSuffix --flags TABLE_PAT_KEY|KEY_WITH_SIS --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table PatSuffix --name original --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table PatSuffix
[
{"_key":"ひろゆき","original":true},
{"_key":"まろゆき","original":true},
{"_key":"ひろあき","original":true},
{"_key":"ゆきひろ","original":true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select --table PatSuffix --query _key:$ゆき
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "original",
# "Bool"
# ]
# ],
# [
# 3,
# "ゆき",
# false
# ],
# [
# 2,
# "ろゆき",
# false
# ],
# [
# 5,
# "まろゆき",
# true
# ],
# [
# 1,
# "ひろゆき",
# true
# ]
# ]
# ]
# ]
select --table PatSuffix --filter '_key @$ "ゆき" && original == true'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "original",
# "Bool"
# ]
# ],
# [
# 5,
# "まろゆき",
# true
# ],
# [
# 1,
# "ひろゆき",
# true
# ]
# ]
# ]
# ]
Additional information about lexicon for full text search
Groonga uses lexicon for full text search as a table. Thus, Groonga can hold multiple information each
lexicon. For example, Groonga holds frequency of word, flags for stop word, importance of word and so
on.
TODO: Write document.
Let's create micro-blog
Let's create micro-blog with full text search by Groonga. Micro-blog is one of the broadcast medium in
the forms of blog. It is mainly used to post small messages like a Twitter.
Create a table
Let's create table.
table_create --name Users --flags TABLE_HASH_KEY --key_type ShortText
table_create --name Comments --flags TABLE_HASH_KEY --key_type ShortText
table_create --name HashTags --flags TABLE_HASH_KEY --key_type ShortText
table_create --name Bigram --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint
column_create --table Users --name name --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name follower --flags COLUMN_VECTOR --type Users
column_create --table Users --name favorites --flags COLUMN_VECTOR --type Comments
column_create --table Users --name location --flags COLUMN_SCALAR --type WGS84GeoPoint
column_create --table Users --name location_str --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name description --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name followee --flags COLUMN_INDEX --type Users --source follower
column_create --table Comments --name comment --flags COLUMN_SCALAR --type ShortText
column_create --table Comments --name last_modified --flags COLUMN_SCALAR --type Time
column_create --table Comments --name replied_to --flags COLUMN_SCALAR --type Comments
column_create --table Comments --name replied_users --flags COLUMN_VECTOR --type Users
column_create --table Comments --name hash_tags --flags COLUMN_VECTOR --type HashTags
column_create --table Comments --name location --flags COLUMN_SCALAR --type WGS84GeoPoint
column_create --table Comments --name posted_by --flags COLUMN_SCALAR --type Users
column_create --table Comments --name favorited_by --flags COLUMN_INDEX --type Users --source favorites
column_create --table HashTags --name hash_index --flags COLUMN_INDEX --type Comments --source hash_tags
column_create --table Bigram --name users_index --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Users --source name,location_str,description
column_create --table Bigram --name comment_index --flags COLUMN_INDEX|WITH_POSITION --type Comments --source comment
column_create --table GeoIndex --name users_location --type Users --flags COLUMN_INDEX --source location
column_create --table GeoIndex --name comments_location --type Comments --flags COLUMN_INDEX --source location
Users table
This is the table which stores user information. It stores name of user, profile, list of follower and
so on.
_key User ID
name User name
follower
List of following users
favorites
List of favorite comments
location
Current location of user (geolocation)
location_str
Current location of user (string)
description
User profile
followee
Indexes for follower column in Users table. With this indexes, you can search users who follows
the person.
Comments table
This is the table which stores comments and its metadata. It stores content of comment, posted date,
comment which reply to, and so on.
_key Comment ID
comment
Content of comment
last_modified
Posted date
replied_to
Comment which you reply to someone
replied_users
List of users who you reply to
hash_tags
List of hash tags about comment
location
Posted place (for geolocation)
posted_by
Person who write comment
favorited_by
Indexes for favorites column in Users table. With this indexes, you can search the person who
mark comment as favorite one.
HashTags table
This is the table which stores hash tags for comments.
_key Hash tag
hash_index
Indexes for Comments.hash_tags. With this indexes, you can search list of comments with specified
hash tags.
Bigram table
This is the table which stores indexes for full text search by user information or comments.
_key Word
users_index
Indexes of user information. This column contains indexes of user name (Users.name), current
location (Users.location_str), profile (Users.description).
comment_index
Indexes about content of comments (Comments.comment).
GeoIndex table
This is the table which stores indexes of location column to search geo location effectively.
users_location
Indexes of location column for Users table
comments_location
Indexes of location column for Comments table
Loading data
Then, load example data.
load --table Users
[
{
"_key": "alice",
"name": "Alice",
"follower": ["bob"],
"favorites": [],
"location": "152489000x-255829000",
"location_str": "Boston, Massachusetts",
"description": "Groonga developer"
},
{
"_key": "bob",
"name": "Bob",
"follower": ["alice","charlie"],
"favorites": ["alice:1","charlie:1"],
"location": "146249000x-266228000",
"location_str": "Brooklyn, New York City",
"description": ""
},
{
"_key": "charlie",
"name": "Charlie",
"follower": ["alice","bob"],
"favorites": ["alice:1","bob:1"],
"location": "146607190x-267021260",
"location_str": "Newark, New Jersey",
"description": "Hmm,Hmm"
}
]
load --table Comments
[
{
"_key": "alice:1",
"comment": "I've created micro-blog!",
"last_modified": "2010/03/17 12:05:00",
"posted_by": "alice",
},
{
"_key": "bob:1",
"comment": "First post. test,test...",
"last_modified": "2010/03/17 12:00:00",
"posted_by": "bob",
},
{
"_key": "alice:2",
"comment": "@bob Welcome!!!",
"last_modified": "2010/03/17 12:05:00",
"replied_to": "bob:1",
"replied_users": ["bob"],
"posted_by": "alice",
},
{
"_key": "bob:2",
"comment": "@alice Thanks!",
"last_modified": "2010/03/17 13:00:00",
"replied_to": "alice:2",
"replied_users": ["alice"],
"posted_by": "bob",
},
{
"_key": "bob:3",
"comment": "I've just used 'Try-Groonga' now! #groonga",
"last_modified": "2010/03/17 14:00:00",
"hash_tags": ["groonga"],
"location": "146566000x-266422000",
"posted_by": "bob",
},
{
"_key": "bob:4",
"comment": "I'm come at city of New York for development camp! #groonga #travel",
"last_modified": "2010/03/17 14:05:00",
"hash_tags": ["groonga", "travel"],
"location": "146566000x-266422000",
"posted_by": "bob",
},
{
"_key": "charlie:1",
"comment": "@alice @bob I've tried to register!",
"last_modified": "2010/03/17 15:00:00",
"replied_users": ["alice", "bob"],
"location": "146607190x-267021260",
"posted_by": "charlie",
}
{
"_key": "charlie:2",
"comment": "I'm at the Museum of Modern Art in NY now!",
"last_modified": "2010/03/17 15:05:00",
"location": "146741340x-266319590",
"posted_by": "charlie",
}
]
follower column and favorites column in Users table and replied_users column in Comments table are vector
column, so specify the value as an array.
location column in Users table, location column in Comments table use GeoPoint type. This type accepts
"[latitude]x[longitude]".
last_modified column in Comments table use Time type.
There are two way to specify the value. First, specify epoch (seconds since Jan, 1, 1970 AM 00:00:00)
directly. In this case, you can specify micro seconds as fractional part. The value is converted from
factional part to the time which is micro seconds based one when data is loaded. The second, specify the
timestamp as string in following format: "(YEAR)/(MONTH)/(DAY) (HOUR):(MINUTE):(SECOND)". In this way,
the string is casted to proper micro seconds when data is loaded.
Search
Let's search micro-blog.
Search users by keyword
In this section, we search micro-blog against multiple column by keyword. See match_columns to search
multiple column at once.
Let's search user from micro-blog's user name, location, description entries.
Execution example:
select --table Users --match_columns name,location_str,description --query "New York" --output_columns _key,name
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
# [[0, 1337566253.89858, 0.000355720520019531], 8]
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "bob",
# "Bob"
# ]
# ]
# ]
# ]
By using "New York" as searching keyword for user, "Bob" who lives in "New York" is listed in search
result.
Search users by geolocation data (GeoPoint)
In this section, we search users by column data which use type of GeoPoint. See search about GeoPoint
column.
Following example searches users who live in within 20km from specified location.
Execution example:
select --table Users --filter 'geo_in_circle(location,"146710080x-266315480",20000)' --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "charlie",
# "Charlie"
# ],
# [
# "bob",
# "Bob"
# ]
# ]
# ]
# ]
It shows that "Bob" and "Charlie" lives in within 20 km from station of "Grand Central Terminal".
Search users who follows specific user
In this section, we do reverse resolution of reference relationships which is described at index.
Following examples shows reverse resolution about follower column of Users table.
Execution example:
select --table Users --query follower:@bob --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "alice",
# "Alice"
# ],
# [
# "charlie",
# "Charlie"
# ]
# ]
# ]
# ]
It shows that "Alice" and "Charlie" follows "Bob".
Search comments by using the value of GeoPoint type
In this section, we search comments which are written within specific location.
Then, we also use drill down which is described at drilldown. Following example shows how to drill down
against search results. As a result, we get the value of count which is grouped by user, and hash tags
respectively.
Execution example:
select --table Comments --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Charlie",
# "I'm at the Museum of Modern Art in NY now!"
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ],
# [
# "Charlie",
# "@alice @bob I've tried to register!"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 2
# ],
# [
# "travel",
# 1
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "charlie",
# 2
# ],
# [
# "bob",
# 2
# ]
# ]
# ]
# ]
Above query searches comments which are posted within 20 km from Central Park in city of New York.
As specified range is 20 km, all comments with location are collected. You know that search results
contain 2 #groonga hash tags and one #travel hash tag, and bob and charlie posted 2 comments.
Search comments by keyword
In this section, we search comments which contains specific keyword. And more, Let's calculate the value
of _score which is described at search.
Execution example:
select --table Comments --query comment:@Now --output_columns comment,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "comment",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "I've just used 'Try-Groonga' now! #groonga",
# 1
# ],
# [
# "I'm at the Museum of Modern Art in NY now!",
# 1
# ]
# ]
# ]
# ]
By using 'Now' as a keyword, above query returns 2 comments. It also contains count of 'Now' as the value
of _score.
Search comments by keyword and geolocation
In this section, we search comments by specific keyword and geolocation. By using --query and --filter
option, following query returns records which are matched to both conditions.
Execution example:
select --table Comments --query comment:@New --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 1
# ],
# [
# "travel",
# 1
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "bob",
# 1
# ]
# ]
# ]
# ]
It returns 1 comment which meets both condition. It also returns result of drilldown. There is 1 comment
which is commented by Bob.
Search comments by hash tags
In this section, we search comments which contains specific hash tags. Let's use reverse resolution of
reference relationships.
Execution example:
select --table Comments --query hash_tags:@groonga --output_columns posted_by.name,comment --drilldown posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "bob",
# 2
# ]
# ]
# ]
# ]
Above query returns 2 comments which contains #groonga hash tag. It also returns result of drilldown
grouped by person who posted it. It shows that there are 2 comments. Bob commented it.
Search comments by user id
In this section, we search comments which are posted by specific user.
Execution example:
select --table Comments --query posted_by:bob --output_columns comment --drilldown hash_tags
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "First post. test,test..."
# ],
# [
# "@alice Thanks!"
# ],
# [
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 2
# ],
# [
# "travel",
# 1
# ]
# ]
# ]
# ]
Above query returns 4 comments which are posted by Bob. It also returns result of drilldown by hash
tags. There are 2 comments which contains #groonga, and 1 comment which contains #travel as hash tag.
Search user's favorite comments
In this section, we search user's favorite comments.
Execution example:
select --table Users --query _key:bob --output_columns favorites.posted_by,favorites.comment
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "favorites.posted_by",
# "Users"
# ],
# [
# "favorites.comment",
# "ShortText"
# ]
# ],
# [
# [
# "alice",
# "charlie"
# ],
# [
# "I've created micro-blog!",
# "@alice @bob I've tried to register!"
# ]
# ]
# ]
# ]
# ]
Above query returns Bob's favorite comments.
Search comments by posted time
In this section, we search comments by posted time. See type of Time in data.
Let's search comments that posted time are older than specified time.
Execution example:
select Comments --filter 'last_modified<=1268802000' --output_columns posted_by.name,comment,last_modified --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ],
# [
# "last_modified",
# "Time"
# ]
# ],
# [
# "Alice",
# "I've created micro-blog!",
# 1268795100.0
# ],
# [
# "Bob",
# "First post. test,test...",
# 1268794800.0
# ],
# [
# "Alice",
# "@bob Welcome!!!",
# 1268795100.0
# ],
# [
# "Bob",
# "@alice Thanks!",
# 1268798400.0
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga",
# 1268802000.0
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 1
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "alice",
# 2
# ],
# [
# "bob",
# 3
# ]
# ]
# ]
# ]
Above query returns 5 comments which are older than 2010/03/17 14:00:00. It also returns result of
drilldown by posted person. There are 2 comments by Alice, 3 comments by Bob.
Query expansion
Groonga accepts query_expander parameter for /reference/commands/select command. It enables you to
extend your query string.
For example, if user searches "theatre" instead of "theater", query expansion enables to return search
results of "theatre OR theater". This kind of way reduces search leakages. This is what really user
wants.
Preparation
To use query expansion, you need to create table which stores documents, synonym table which stores query
string and replacement string. In synonym table, primary key represents original string, the column of
ShortText represents modified string.
Let's create document table and synonym table.
Execution example:
table_create Doc TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Doc body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Term TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Term Doc_body COLUMN_INDEX|WITH_POSITION Doc body
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Synonym TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Synonym body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Doc
[
{"_key": "001", "body": "Play all night in this theater."},
{"_key": "002", "body": "theatre is British spelling."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table Synonym
[
{"_key": "theater", "body": "(theater OR theatre)"},
{"_key": "theatre", "body": "(theater OR theatre)"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
In this case, it doesn't occur search leakage because it creates synonym table which accepts "theatre"
and "theater" as query string.
Search
Then, let's use prepared synonym table. First, use select command without query_expander parameter.
Execution example:
select Doc --match_columns body --query "theater"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
Above query returns the record which completely equal to query string.
Then, use query_expander parameter against body column of Synonym table.
Execution example:
select Doc --match_columns body --query "theater" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
In which cases, query string is replaced to "(theater OR theatre)", thus synonym is considered for full
text search.
SERVER
Server packages
The package groonga is the minimum set of fulltext search engine. If you want to use groonga for server
use, you can install additional preconfigured packages.
There are two packages for server use.
• groonga-httpd (nginx and HTTP protocol based server package)
• groonga-server-gqtp (/spec/gqtp protocol based server package)
There is the reason why groonga supports not only GQTP but also two HTTP server packages. /spec/gqtp -
Groonga Query Transfer Protocol is desined to reduce overheads and improve performance. But, GQTP is less
support of client library than HTTP protocol does. As HTTP is matured protocol, you can take advantage
of existing tool and there are many client library (See related projects for details). If you use
groonga-httpd package, you can also take benefits of nginx functionality.
We recommend to use groonga-httpd at first, because it provides fullfilling server functionality. If you
have performance issues which is derived from protocol overheads, consider to use groonga-server-gqtp.
NOTE:
In the previous versions, there is a groonga-server-http package (simple HTTP protocol based
server package). It is now marked as obsolete, please use groonga-httpd packages instead.
groonga-server-http package became a transitional package for groonga-httpd.
groonga-httpd
groonga-httpd is a nginx and HTTP protocol based server package.
Preconfigured setting:
┌────────────────────┬───────────────────────────────────────┐
│ Item │ Default value │
├────────────────────┼───────────────────────────────────────┤
│ Port number │ 10041 │
├────────────────────┼───────────────────────────────────────┤
│ Access log path │ /var/log/groonga/httpd/acccess.log │
├────────────────────┼───────────────────────────────────────┤
│ Error log path │ /var/log/groonga/http-query.log │
├────────────────────┼───────────────────────────────────────┤
│ Database │ /var/lib/groonga/db/* │
├────────────────────┼───────────────────────────────────────┤
│ Configuration file │ /etc/groonga/httpd/groonga-httpd.conf │
└────────────────────┴───────────────────────────────────────┘
Start HTTP server
Starting groonga HTTP server(Debian/Ubuntu/CentOS):
% sudo service groonga-httpd start
Starting groonga HTTP server(Fedora):
% sudo systemctl start groonga-httpd
Stop HTTP server
Stopping groonga HTTP server(Debian/Ubuntu/CentOS):
% sudo service groonga-httpd stop
Starting groonga HTTP server(Fedora):
% sudo systemctl stop groonga-httpd
Restart HTTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):
% sudo service groonga-httpd restart
Restarting groonga HTTP server(Fedora):
% sudo systemctl restart groonga-httpd
groonga-server-gqtp
groonga-server-gqtp is a /spec/gqtp protocol based server package.
┌─────────────┬───────────────────────────────────┐
│ Item │ Default value │
├─────────────┼───────────────────────────────────┤
│ Port number │ 10043 │
├─────────────┼───────────────────────────────────┤
│ process-log │ /var/log/groonga/groonga-gqtp.log │
├─────────────┼───────────────────────────────────┤
│ query-log │ /var/log/groonga/gqtp-query.log │
├─────────────┼───────────────────────────────────┤
│ Database │ /var/lib/groonga/db/* │
└─────────────┴───────────────────────────────────┘
Configuration file for server setting (Debian/Ubuntu):
/etc/default/groonga/groonga-server-gqtp
Configuration file for server setting (CentOS):
/etc/sysconfig/groonga-server-gqtp
Start GQTP server
Starting groonga GQTP server(Debian/Ubuntu/CentOS):
% sudo service groonga-server-gqtp start
Starting groonga GQTP server(Fedora):
% sudo systemctl start groonga-server-gqtp
Stop GQTP server
Stopping groonga GQTP server(Debian/Ubuntu/CentOS):
% sudo service groonga-server-http stop
Stopping groonga GQTP server(Fedora):
% sudo systemctl stop groonga-server-gqtp
Restart GQTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):
% sudo service groonga-server-gqtp restart
Restarting groonga HTTP server(Fedora):
% sudo systemctl restart groonga-server-gqtp
groonga-server-http
groonga-server-http is a simple HTTP protocol based server package.
NOTE:
groonga-server-http package is the transitional package since Groonga 4.0.8. Please use
groonga-httpd instead.
Preconfigured setting:
───────────────────────────────────────────────────
Item Default value
───────────────────────────────────────────────────
Port number 10041
───────────────────────────────────────────────────
process-log /var/log/groonga/groonga-http.log
───────────────────────────────────────────────────
query-log /var/log/groonga/http-query.log
───────────────────────────────────────────────────
Database /var/lib/groonga/db/*
┌─────────────┬───────────────────────────────────┐
│ │ │
--
CLIENT
Groonga supports the original protocol (/spec/gqtp), the memcached binary protocol and HTTP.
As HTTP and memcached binary protocol is matured protocol, you can use existing client libraries.
There are some client libraries which provides convenient API to connect to Groonga server in some
program languages. See Client libraries for details.
REFERENCE MANUAL
Executables
This section describes executable files provided by groonga package.
grndb
Summary
NOTE:
This executable command is an experimental feature.
New in version 4.0.9.
grndb manages a Groonga database.
Here are features:
• Checks whether database is broken or not.
• Recovers broken database automatically if the database is recoverable.
Syntax
grndb requires command and database path:
grndb COMMAND [OPTIONS] DATABASE_PATH
Here are available commands:
• check - Checks whether database is broken or not.
• recover - Recovers database.
Usage
Here is an example to check the database at /var/lib/groonga/db/db:
% grndb check /var/lib/groonga/db/db
Here is an example to recover the database at /var/lib/groonga/db/db:
% grndb recover /var/lib/groonga/db/db
Commands
This section describes available commands.
check
It checks an existing Groonga database. If the database is broken, grndb reports reasons and exits with
non-0 exit status.
NOTE:
You must not use this command for opened database. If the database is opened, this command may report
wrong result.
check has some options.
--target
New in version 5.1.2.
It specifies a check target object.
If your database is large and you know an unreliable object, this option will help you. check need more
time for large database. You can reduce check time by --target option to reduce check target.
The check target is checked recursive. Because related objects of unreliable object will be unreliable.
If the check target is a table, all columns of the table are also checked recursive.
If the check target is a table and its key type is another table, the another table is also checked
recursive.
If the check target is a column and its value type is a table, the table is also checked recursive.
If the check target is an index column, the table specified as value type and all sources are also
checked recursive.
Here is an example that checks only Entries table and its columns:
% grndb check --target Entries /var/lib/groonga/db/db
Here is an example that checks only Entries.name column:
% grndb check --target Entries.name /var/lib/groonga/db/db
recover
It recovers an existing broken Groonga database.
If the database is not broken, grndb does nothing and exits with 0 exit status.
If the database is broken and one or more index columns are only broken, grndb recovers these index
columns and exists with 0 exit status. It may take a long time for large indexed data.
If the database is broken and tables or data columns are broken, grndb reports broken reasons and exits
with non-0 exit status. You can know whether the database is recoverable or not by check command.
NOTE:
You must not use this command for opened database. If the database is opened, this command may break
the database.
grnslap
名前
grnslap - groongaプロセスの通信層のパフォーマンスをチェックするツール
書式
grnslap [options] [dest]
説明
grnslapは、groongaプロセスに対してリクエストを多重に行い、パフォーマンスをチェックするためのツールです。
Groonga独自プロトコルであるGQTPと、httpの両プロトコルでリクエストを行うことができます。また、リクエストの多重度を指定することができます。
クエリの内容を標準入力から与えることができます。実稼動環境でのクエリパタンに近いクエリを標準入力に与えることによって、実稼動環境に近い状態での検証を行うことができます。
現在は、make installしてもインストールは行われない。
オプション
-P リクエストのプロトコルを指定します。
http
httpでリクエストします。対象のhttpのパス群(GETパラメータを含む)をLF区切り形式で標準入力に与えると、それらのパスに順次アクセスします。
gqtp
gqtpでリクエストします。gqtpのリクエストをLF区切り形式で標準入力に与えると、それらのリクエストを順次行います。
-m リクエストの多重度を指定します。初期値は10です。
引数
dest 接続先のホスト名とポート番号をを指定します(デフォルト値は'localhost:10041')。ポート番号を指定しない場合には、10041が指定されたものとします。
サンプル
http://localhost:10041/d/status に、多重度100でリクエストを行う。
> yes /d/status | head -n 100 | grnslap -P http -m 100 localhost:10041
2009-11-12 19:34:09.998696|begin: max_concurrency=100 max_tp=10000
2009-11-12 19:34:10.011208|end : n=100 min=46 max=382 avg=0 qps=7992.966190 etime=0.012511
groonga executable file
Summary
groonga executable file provides the following features:
• Fulltext search server
• Fulltext search shell
• Client for Groonga fulltext search server
Groonga can be used as a library. If you want to use Groonga as a library, you need to write a program in
C, C++ and so on. Library use is useful for embedding fulltext search feature to your application, but
it's not easy to use.
You can use groonga executable file to get fulltext search feature.
If you want to try Groonga, fulltext search shell usage is useful. You don't need any server and client.
You just need one terminal. You can try Groonga like the following:
% groonga -n db
> status
[[0,1429687763.70845,0.000115633010864258],{"alloc_count":195,...}]
> quit
%
If you want to create an application that has fulltext search feature, fulltext search server usage is
useful. You can use Groonga as a server like RDBMS (Relational DataBase Management System). Client-server
model is a popular architecture.
Normally, client for Groonga fulltext server usage isn't used.
Syntax
groonga executable file has the following four modes:
• Standalone mode
• Server mode
• Daemon mode
• Client mode
There are common options in these modes. These common options is described later section.
Standalone mode
In standalone mode, groonga executable file runs one or more Groonga /reference/command against a local
Groonga database.
Here is the syntax to run shell that executes Groonga command against temporary database:
groonga [options]
Here is the syntax to create a new database and run shell that executes Groonga command against the new
database:
groonga [options] -n DB_PATH
Here is the syntax to run shell that executes Groonga command against existing database:
groonga [options] DB_PATH
Here is the syntax to run Groonga command against existing database and exit:
groonga [options] DB_PATH COMMAND [command arguments]
Server mode
In server mode, groonga executable file runs as a server. The server accepts connections from other
processes at local machine or remote machine and executes received Groonga /reference/command against a
local Groonga database.
You can choose one protocol from /server/http and /server/gqtp. Normally, HTTP is suitable but GQTP is
the default protocol. This section describes only about HTTP protocol usage.
In server mode, groonga executable file runs in the foreground. If you want to run Groonga server in the
background, see Daemon mode.
Here is the syntax to run Groonga server with temporary database:
groonga [options] --protocol http -s
Here is the syntax to create a new database and run Groonga server with the new database:
groonga [options] --protocol http -s -n DB_PATH
Here is the syntax to run Groonga server with existing database:
groonga [options] --protocol http -s DB_PATH
Daemon mode
In daemon mode, groonga executable file runs as a daemon. Daemon is similar to server but it runs in the
background. See Server mode about server.
Here is the syntax to run Groonga daemon with temporary database:
groonga [options] --protocol http -d
Here is the syntax to create a new database and run Groonga daemon with the new database:
groonga [options] --protocol http -d -n DB_PATH
Here is the syntax to run Groonga daemon with existing database:
groonga [options] --protocol http -d DB_PATH
--pid-path option will be useful for daemon mode.
Client mode
In client mode, groonga executable file runs as a client for GQTP protocol Groonga server. Its usage is
similar to Standalone mode. You can run shell and execute one command. You need to specify server address
instead of local database.
Note that you can use groonga executable file as a client for HTTP protocol Groonga server.
Here is the syntax to run shell that executes Groonga command against Groonga server that is running at
192.168.0.1:10043:
groonga [options] -c --host 192.168.0.1 --port 10043
Here is the syntax to run Groonga command against Groonga server that is running at 192.168.0.1:10043 and
exit:
groonga [options] -c --host 192.168.0.1 --port 10043 COMMAND [command arguments]
Options
-n Creates new database.
-c Executes groonga command in client mode.
-s Executes groonga command in server mode. Use "Ctrl+C" to stop the groonga process.
-d Executes groonga command in daemon mode. In contrast to server mode, groonga command forks in
daemon mode. For example, to stop local daemon process, use "curl
http://127.0.0.1:10041/d/shutdown".
-e, --encoding <encoding>
Specifies encoding which is used for Groonga database. This option is effective when you create
new Groonga database. This parameter specifies one of the following values: none, euc, utf8,
sjis, latin or koi8r.
-l, --log-level <log level>
Specifies log level. A integer value between 0 and 8. The meaning of value is:
┌───────────┬─────────────┐
│ log level │ description │
├───────────┼─────────────┤
│ 0 │ Nothing │
├───────────┼─────────────┤
│ 1 │ Emergency │
├───────────┼─────────────┤
│ 2 │ Alert │
├───────────┼─────────────┤
│ 3 │ Critical │
├───────────┼─────────────┤
│ 4 │ Error │
├───────────┼─────────────┤
│ 5 │ Warning │
├───────────┼─────────────┤
│ 6 │ Notice │
├───────────┼─────────────┤
│ 7 │ Info │
├───────────┼─────────────┤
│ 8 │ Debug │
└───────────┴─────────────┘
-a, --address <ip/hostname>
Deprecated since version 1.2.2: Use --bind-address instead.
--bind-address <ip/hostname>
New in version 1.2.2.
サーバモードかデーモンモードで実行するとき、listenするアドレスを指定します。(デフォルトは hostname
の返すホスト名)
-p, --port <port number>
クライアント、サーバ、またはデーモンモードで使用するTCPポート番号。
(クライアントモードのデフォルトは10043番、サーバ、またはデーモンモードのデフォルトは、HTTPの場合、10041番、GQTPの場合、10043番)
-i, --server-id <ip/hostname>
サーバモードかデーモンモードで実行するとき、サーバのIDとなるアドレスを指定します。(デフォルトは`hostname`の返すホスト名)
-h, --help
ヘルプメッセージを出力します。
--document-root <path>
httpサーバとしてgroongaを使用する場合に静的ページを格納するディレクトリを指定します。
デフォルトでは、データベースを管理するための汎用的なページに対応するファイルが/usr/share/groonga/admin_html以下にインストールされます。このディレクトリをdocument-rootオプションの値に指定して起動した場合、ウェブブラウザでhttp://hostname:port/index.htmlにアクセスすると、ウェブベースのデータベース管理ツールを使用できます。
--protocol <protocol>
http,gqtpのいずれかを指定します。(デフォルトはgqtp)
--log-path <path>
ログを出力するファイルのパスを指定します。(デフォルトは/var/log/groonga/groonga.logです)
--log-rotate-threshold-size <threshold>
New in version 5.0.3.
Specifies threshold for log rotation. Log file is rotated when log file size is larger than or
equals to the threshold (default: 0; disabled).
--query-log-path <path>
クエリーログを出力するファイルのパスを指定します。(デフォルトでは出力されません)
--query-log-rotate-threshold-size <threshold>
New in version 5.0.3.
Specifies threshold for query log rotation. Query log file is rotated when query log file size is
larger than or equals to the threshold (default: 0; disabled).
-t, --max-threads <max threasd>
最大で利用するスレッド数を指定します。(デフォルトはマシンのCPUコア数と同じ数です)
--pid-path <path>
PIDを保存するパスを指定します。(デフォルトでは保存しません)
--config-path <path>
設定ファイルのパスを指定します。設定ファイルは以下のようなフォーマットになります。:
# '#'以降はコメント。
; ';'以降もコメント。
# 'キー = 値'でオプションを指定。
pid-path = /var/run/groonga.pid
# '='の前後の空白はは無視される。↓は↑と同じ意味。
pid-path=/var/run/groonga.pid
# 'キー'は'--XXX'スタイルのオプション名と同じものが使える。
# 例えば、'--pid-path'に対応するキーは'pid-path'。
# ただし、キーが'config-path'のオプションは無視される。
--cache-limit <limit>
キャッシュ数の最大値を指定します。(デフォルトは100です)
--default-match-escalation-threshold <threshold>
検索の挙動をエスカレーションする閾値を指定します。(デフォルトは0です)
Command line parameters
dest 使用するデータベースのパス名を指定します。
クライアントモードの場合は接続先のホスト名とポート番号を指定します(デフォルト値は'localhost:10043')。ポート番号を指定しない場合には、10043が指定されたものとします。
command [args]
スタンドアロンおよびクライアントモードの場合は、実行するコマンドとその引数をコマンドライン引数に指定できます。コマンドライン引数にcommandを与えなかった場合は、標準入力から一行ずつEOFに達するまでコマンド文字列を読み取り、順次実行します。
Command
groongaコマンドを通してデータベースを操作する命令をコマンドと呼びます。コマンドは主にC言語で記述され、groongaプロセスにロードすることによって使用できるようになります。
それぞれのコマンドは一意な名前と、0個以上の引数を持ちます。
引数は以下の2種類の方法のいずれかで指定することができます。:
形式1: コマンド名 値1 値2,..
形式2: コマンド名 --引数名1 値1 --引数名2 値2,..
形式1でコマンドを実行する場合は、定義された順番で値を指定しなければならず、途中の引数の値を省略することはできません。形式2でコマンドを実行する場合は、「--引数名」のように引数の名前を明示しなければならない代わりに、任意の順番で引数を指定することが可能で、途中の引数の指定を省略することもできます。
標準入力からコマンド文字列を与える場合は、コマンド名と引数名と値は、空白(
)で区切ります。空白や、記号「"'()」のうちいずれかを含む値を指定したい場合は、シングルクォート(')かダブルクォート(")で値を囲みます。値として指定する文字列の中では、改行文字は'n'に置き換えて指定します。また、引用符に使用した文字を値の中で指定する場合には、その文字の前にバックスラッシュ('')
を指定します。バックスラッシュ文字自身を値として指定する場合には、その前にバックスラッシュを指定します。
You can write command list with continuous line which is represented by '\' character.:
table_create --name Terms \
--flags TABLE_PAT_KEY \
--key_type ShortText \
--default_tokenizer TokenBigram
Builtin command
以下のコマンドは組み込みコマンドとして予め定義されています。
status groongaプロセスの状態を表示します。
table_list
DBに定義されているテーブルのリストを表示します。
column_list
テーブルに定義されているカラムのリストを表示します。
table_create
DBにテーブルを追加します。
column_create
テーブルにカラムを追加します。
table_remove
DBに定義されているテーブルを削除します。
column_remove
テーブルに定義されているカラムを削除します。
load テーブルにレコードを挿入します。
select テーブルに含まれるレコードを検索して表示します。
define_selector
検索条件をカスタマイズした新たな検索コマンドを定義します。
quit データベースとのセッションを終了します。
shutdown
サーバ(デーモン)プロセスを停止します。
log_level
ログ出力レベルを設定します。
log_put
ログ出力を行います。
clearlock
ロックを解除します。
Usage
新しいデータベースを作成します。:
% groonga -n /tmp/hoge.db quit
%
作成済みのデータベースにテーブルを定義します。:
% groonga /tmp/hoge.db table_create Table 0 ShortText
[[0]]
%
サーバを起動します。:
% groonga -d /tmp/hoge.db
%
httpサーバとして起動します。:
% groonga -d -p 80 --protocol http --document-root /usr/share/groonga/admin_html /tmp/hoge.db
%
サーバに接続し、テーブル一覧を表示します。:
% groonga -c localhost table_list
[[0],[["id","name","path","flags","domain"],[256,"Table","/tmp/hoge.db.0000100",49152,14]]]
%
groonga-benchmark
名前
groonga-benchmark - groongaテストプログラム
書式
groonga-benchmark [options...] [script] [db]
説明
groonga-benchmarkは、groonga汎用ベンチマークツールです。
groongaを単独のプロセスとして利用する場合はもちろん、サーバプログラムとして利用する場合の動作確認や実行速度測定が可能です。
groonga-benchmark用のデータファイルは自分で作成することも既存のものを利用することもできます。既存のデータファイルは、ftp.groonga.orgから必要に応じダウンロードします。そのため、groonga及びgroonga-benchmarkが動作し、インターネットに接続できる環境であればgroongaコマンドの知識がなくてもgroongaの動作を確認できます。
現在は、Linux 及びWindows上で動作します。make installしてもインストールは行われません。
オプション
-i, --host <ip/hostname>
接続するgroongaサーバを、ipアドレスまたはホスト名で指定します。指定先にgroongaサーバが立ち上がっていない場合、接続不能となることに注意してください。このオプションを指定しない場合、groonga-benchmarkは自動的にlocalhostのgroongaサーバを起動して接続します。
-p, --port <port number>
自動的に起動するgroongaサーバ、または明示的に指定した接続先のgroonga
サーバが利用するポート番号を指定します。接続先のgroongaサーバが利用しているポートと、このオプションで指定したポート番号が異なる場合、接続不能となることに注意してください。
--dir ftp.groonga.org に用意されているスクリプトファイルを表示します。
--ftp ftp.groonga.orgとFTP通信を行い、scriptファイルの同期やログファイルの送信を行います。
--log-output-dir
デフォルトでは、groonga-benchmark終了後のログファイルの出力先ははカレントディレクトリです。このオプションを利用すると、任意のディレクトリに出力先を変更することができます。
--groonga <groonga_path>
groongaコマンドのパスを指定します。デフォルトでは、PATHの中からgroongaコマンドを探します。
--protocol <gqtp|http>
groongaコマンドが使うプロトコルとして gqtp または http を指定します。
引数
script groonga-benchmarkの動作方法(以下、groonga-benchmark命令と呼びます)を記述したテキストファイルです。拡張子は.scrです。
db groonga-benchmarkが利用するgroonga
データベースです。指定されたデータベースが存在しない場合、groonga-benchmarkが新規に作成します。またgroonga
サーバを自動的に起動する場合もこの引数で指定したデータベースが利用されます。接続するgroonga
サーバを明示的に指定した場合に利用するデータベースは、接続先サーバが使用中のデータベースになることに注意してください。
使い方
まず、シェル上(Windowsならコマンドプロンプト上)で:
groonga-benchmark test.scr 任意のDB名
とタイプしてください。もしgroonga-benchmarkが正常に動作すれば、:
test-ユーザ名-数字.log
というファイルが作成されるはずです。作成されない場合、このドキュメントの「トラブルシューティング」の章を参照してください。
スクリプトファイル
スクリプトファイルは、groonga-benchmark命令を記述したテキストファイルです。
";"セミコロンを利用して、一行に複数のgroonga-benchmark命令を記述することができます。一行に複数のgroonga-benchmark命令がある場合、各命令は並列に実行されます。
"#"で始まる行はコメントとして扱われます。
groonga-benchmark命令
現在サポートされているgroonga-benchmark命令は以下の11種類です。
do_local コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroonga-benchmark単体で実行します。スレッド数が指定されている場合、複数のスレッドで同じコマンドファイルを同時に実行します。繰り返し数が指定されてい場合、コマンドファイルの内容を繰り返し実行します。スレッド数、繰り返し数とも省略時は1です。1スレッドで複数回動作させたい場合は、do_local
コマンドファイル 1 [繰り返し数]と明示的に指定してください。
do_gqpt コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでGQTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。
do_http コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでHTTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。
rep_local コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroonga-benchmark単体で実行し、より詳細な報告を行います。
rep_gqpt コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでGQTP経由で実行し、より詳細な報告を行います。
スレッド数や繰り返し数の意味はdo_localと 同じです。
rep_http コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでHTTP経由で実行し、より詳細な報告を行います。
スレッド数や繰り返し数の意味はdo_localと 同じです。
out_local コマンドファイル 入力ファイル名
コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果をすべて”出力ファイル"に書きだします。この結果は、test_local, test_gqtp命令で利用します。なおこの命令の「出力ファイル」とは、groonga-benchmark実行時に自動的に作成されるログとは別のものです。groonga-benchmarkではコメントが利用できる以外、:
groonga < コマンドファイル > 出力ファイル
とした場合と同じです。
out_gqtp コマンドファイル 出力ファイル名
コマンドファイルをgroongaサーバでGQTP経由で実行します。その他はout_local命令と同等です。
out_http コマンドファイル 出力ファイル名
コマンドファイルをgroongaサーバでHTTP経由で実行します。その他はout_local命令と同等です。
test_local コマンドファイル 入力ファイル名
コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果を入力ファイルと比較します。処理時間など本質的要素以外に差分があった場合、差分を、入力ファイル.diffというファイルに書きだします。
コマンドファイル
コマンドファイルは、groonga組み込みコマンドを1行に1つずつ記述したテキストファイルです。拡張子に制限はありません。groonga組み込みコマンドに関しては
/reference/command を参照してください。
サンプル
スクリプトファイルのサンプルです。:
# sample script
rep_local test.ddl
do_local test.load;
do_gqtp test.select 10 10; do_local test.status 10
上記の意味は以下のとおりです。
1行目 コメント行。
2行目 test.ddl というコマンドファイルをgroonga単体で実行し、詳細に報告する。
3行目 test.load
というコマンドファイルをgroonga単体で実行する。(最後の";"セミコロンは複数のgroonga-benchmark命令を記述する場合に必要ですが、この例のように1つのgroonga-benchmark命令を実行する場合に付与しても問題ありません。)
4行目 test.select
というコマンドファイルをgroongaサーバで10個のスレッドで同時に実行する。各スレッドはtest.selectの中身を10回繰り返す。また同時に、groonga単体でtest.statusというコマンドファイルを10個のスレッドで実行する。
特殊命令
スクリプトファイルのコメント行には特殊コマンドを埋め込むことが可能です。現在サポートされている特殊命令は以下の二つです。
#SET_HOST <ip/hostname>
-i,
--hostオプションと同等の機能です。コマンドラインオプションに指定したIPアドレス/ホスト名と、SET_HOSTで指定したIPアドレス/ホスト名が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_HOSTが優先されます。SET_HOSTを利用した場合、サーバが自動的には起動されないのもコマンドラインオプションで指定した場合と同様です。
#SET_PORT <port number>
-p, --port
オプションと同等の機能です。コマンドラインオプションに指定したポート番号とSET_PORTで指定したポート番号が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_PORTが優先されます。
特殊命令はスクリプトファイルの任意の場所に書き込むことができます。同一ファイル内に複数回特殊命令を記述した場合、「最後の」特殊命令が有効となります。
例えば、
$ ./groonga-benchmark --port 20010 test.scr testdb
とコマンド上でポートを指定した場合でも、もしtest.scrの中身が
#SET_PORT 10900
rep_local test.ddl
do_local test.load;
rep_gqtp test.select 10 10; rep_local test.status 10
#SET_PORT 10400
であれば、自動的に起動されるgroongaサーバはポート番号10400を利用します。
groonga-benchmark実行結果
groonga-benchmarkが正常に終了すると、(拡張子を除いた)スクリプト名-ユーザ名-実行開始時刻.logという形式のログファイルがカレントディレクトリに作られます。ログファイルは自動的にftp.groonga.org
に送信されます。ログファイルは以下のようなjson形式のテキストです。
[{"script": "test.scr",
"user": "homepage",
"date": "2010-04-14 22:47:04",
"CPU": Intel(R) Pentium(R) 4 CPU 2.80GHz",
"BIT": 32,
"CORE": 1,
"RAM": "975MBytes",
"HDD": "257662232KBytes",
"OS": "Linux 2.4.20-24.7-i686",
"HOST": "localhost",
"PORT": "10041",
"VERSION": "0.1.8-100-ga54c5f8"
},
{"jobs": "rep_local test.ddl",
"detail": [
[0, "table_create res_table --key_type ShortText", 1490, 3086, [0,1271252824.25846,0.00144
7]],
[0, "column_create res_table res_column --type Text", 3137, 5956, [0,1271252824.2601,0.002
741]],
[0, "column_create res_table user_column --type Text", 6020, 8935, [0,1271252824.26298,0.0
02841]],
[0, "column_create res_table mail_column --type Text", 8990, 11925, [0,1271252824.26595,0.
002861]],
[0, "column_create res_table time_column --type Time", 12008, 13192, [0,1271252824.26897,0
.001147]],
[0, "status", 13214, 13277, [0,1271252824.27018,3.0e-05]],
[0, "table_create thread_table --key_type ShortText", 13289, 14541, [0,1271252824.27025,0.
001213]],
[0, "column_create thread_table thread_title_column --type ShortText", 14570, 17380, [0,12
71252824.27153,0.002741]],
[0, "status", 17435, 17480, [0,1271252824.2744,2.7e-05]],
[0, "table_create lexicon_table --flags 129 --key_type ShortText --default_tokenizer Token
Bigram", 17491, 18970, [0,1271252824.27446,0.001431]],
[0, "column_create lexicon_table inv_res_column 514 res_table res_column ", 18998, 33248,
[0,1271252824.27596,0.01418]],
[0, "column_create lexicon_table inv_thread_column 514 thread_table thread_title_column ",
33285, 48472, [0,1271252824.29025,0.015119]],
[0, "status", 48509, 48554, [0,1271252824.30547,2.7e-05]]],
"summary" :[{"job": "rep_local test.ddl", "latency": 48607, "self": 47719, "qps": 272.4281
73, "min": 45, "max": 15187, "queries": 13}]},
{"jobs": "do_local test.load; ",
"summary" :[{"job": "do_local test.load", "latency": 68693, "self": 19801, "qps": 1010.049
997, "min": 202, "max": 5453, "queries": 20}]},
{"jobs": "do_gqtp test.select 10 10; do_local test.status 10",
"summary" :[{"job": " do_local test.status 10", "latency": 805990, "self": 737014, "qps":
54.273053, "min": 24, "max": 218, "queries": 40},{"job": "do_gqtp test.select 10 10", "lat
ency": 831495, "self": 762519, "qps": 1967.164097, "min": 73, "max": 135631, "queries": 15
00}]},
{"total": 915408, "qps": 1718.359464, "queries": 1573}]
制限事項
• スクリプトファイルの一行には複数のgroonga-benchmark命令を記述できますが、すべてのスレッド数の合計は最大64までに制限されます。
• コマンドファイル中のgroongaコマンドの長さは最長5000000byteです。
トラブルシューティング
もし、groonga-benchmarkが正常に動作しない場合、まず以下を確認してください。
• インターネットに接続しているか? --ftp
オプションを指定すると、groonga-benchmarkは動作のたびにftp.groonga.orgと通信します。ftp.groonga.orgと通信可能でない場合、groonga-benchmarkは正常に動作しません。
• groonga サーバが動作していないか? groonga-benchmarkは、-i, --host
オプションで明示的にサーバを指定しないかぎり、自動的にlocalhostのgroongaサーバを立ち上げます。すでにgroongaサーバが動作している場合、groonga-benchmarkは正常に動作しない可能性があります。
• 指定したDBが適切か?
groonga-benchmarkは、引数で指定したDBの中身はチェックしません。もし指定されたDBが存在しなければ自動的にDBを作成しますが、もしファイルとして存在する場合は中身に関わらず動作を続けてしまい、結果が異常になる可能性があります。
以上の原因でなければ、問題はgroonga-benchmarkかgroongaにあります。ご報告をお願いします。
groonga-httpd
Summary
groonga-httpd is a program to communicate with a Groonga server using the HTTP protocol. It functions as
same as groonga-server-http. Although groonga-server-http has limited support for HTTP with a minimal
built-in HTTP server, groonga-httpd has full support for HTTP with an embedded nginx. All
standards-compliance and features provided by nginx is also available in groonga-httpd.
groonga-httpd has an Web-based administration tool implemented with HTML and JavaScript. You can access
to it from http://hostname:port/.
Synopsis
groonga-httpd [nginx options]
Usage
Set up
First, you'll need to edit the groonga-httpd configuration file to specify a database. Edit
/etc/groonga/httpd/groonga-httpd.conf to enable the groonga_database directive like this:
# Match this to the file owner of groonga database files if groonga-httpd is
# run as root.
#user groonga;
...
http {
...
# Don't change the location; currently only /d/ is supported.
location /d/ {
groonga on; # <= This means to turn on groonga-httpd.
# Specify an actual database and enable this.
groonga_database /var/lib/groonga/db/db;
}
...
}
Then, run groonga-httpd. Note that the control immediately returns back to the console because
groonga-httpd runs as a daemon process by default.:
% groonga-httpd
Request queries
To check, request a simple query (/reference/commands/status).
Execution example:
% curl http://localhost:10041/d/status
[
[
0,
1337566253.89858,
0.000355720520019531
],
{
"uptime": 0,
"max_command_version": 2,
"n_queries": 0,
"cache_hit_rate": 0.0,
"version": "4.0.1",
"alloc_count": 161,
"command_version": 1,
"starttime": 1395806036,
"default_command_version": 1
}
]
Loading data by POST
You can load data by POST JSON data.
Here is an example curl command line that loads two users alice and bob to Users table:
% curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users"
If you loads users from JSON file, prepare JSON file like this:
[
{"_key": "alice"},
{"_key": "bob"}
]
Then specify JSON file in curl command line:
% curl -X POST 'http://localhost:10041/d/load?table=Users' -H 'Content-Type: application/json' -d @users.json
Browse the administration tool
Also, you can browse Web-based administration tool at http://localhost:10041/.
Shut down
Finally, to terminate the running groonga-httpd daemon, run this:
% groonga-httpd -s stop
Configuration directives
This section describes only important directives. They are groonga-httpd specific directives and
performance related directives.
The following directives can be used in the groonga-httpd configuration file. By default, it's located
at /etc/groonga/httpd/groonga-httpd.conf.
Groonga-httpd specific directives
The following directives aren't provided by nginx. They are provided by groonga-httpd to configure
groonga-httpd specific configurations.
groonga
Synopsis:
groonga on | off;
Default
groonga off;
Context
location
Specifies whether Groonga is enabled in the location block. The default is off. You need to specify on to
enable groonga.
Examples:
location /d/ {
groonga on; # Enables groonga under /d/... path
}
location /d/ {
groonga off; # Disables groonga under /d/... path
}
groonga_database
Synopsis:
groonga_database /path/to/groonga/database;
Default
groonga_database /usr/local/var/lib/groonga/db/db;
Context
http, server, location
Specifies the path to a Groonga database. This is the required directive.
groonga_database_auto_create
Synopsis:
groonga_database_auto_create on | off;
Default
groonga_database_auto_create on;
Context
http, server, location
Specifies whether Groonga database is created automatically or not. If the value is on and the Groonga
database specified by groonga_database doesn't exist, the Groonga database is created automatically. If
the Groonga database exists, groonga-httpd does nothing.
If parent directory doesn't exist, parent directory is also created recursively.
The default value is on. Normally, the value doesn't need to be changed.
groonga_base_path
Synopsis:
groonga_base_path /d/;
Default
The same value as location name.
Context
location
Specifies the base path in URI. Groonga uses /d/command?parameter1=value1&... path to run command. The
form of path in used in groonga-httpd but groonga-httpd also supports
/other-prefix/command?parameter1=value1&... form. To support the form, groonga-httpd removes the base
path from the head of request URI and prepend /d/ to the processed request URI. By the path conversion,
users can use custom path prefix and Groonga can always uses /d/command?parameter1=value1&... form.
Nomally, this directive isn't needed. It is needed for per command configuration.
Here is an example configuration to add authorization to /reference/commands/shutdown command:
groonga_database /var/lib/groonga/db/db;
location /d/shutdown {
groonga on;
# groonga_base_path is needed.
# Because /d/shutdown is handled as the base path.
# Without this configuration, /d/shutdown/shutdown path is required
# to run shutdown command.
groonga_base_path /d/;
auth_basic "manager is required!";
auth_basic_user_file "/etc/managers.htpasswd";
}
location /d/ {
groonga on;
# groonga_base_path doesn't needed.
# Because location name is the base path.
}
groonga_log_path
Synopsis:
groonga_log_path path | off;
Default
/var/log/groonga/httpd/groonga.log
Context
http, server, location
Specifies Groonga log path in the http, server or location block. The default is
/var/log/groonga/httpd/groonga.log. You can disable logging to specify off.
Examples:
location /d/ {
groonga on;
# You can disable log for groonga.
groonga_log_path off;
}
groonga_log_level
Synopsis:
groonga_log_level none | emergency | alert | ciritical | error | warning | notice | info | debug | dump;
Default
notice
Context
http, server, location
Specifies Groonga log level in the http, server or location block. The default is notice. You can disable
logging by specifying none as log level.
Examples:
location /d/ {
groonga on;
# You can customize log level for groonga.
groonga_log_level notice;
}
groonga_query_log_path
Synopsis:
groonga_query_log_path path | off;
Default
/var/log/groonga/httpd/groonga-query.log
Context
http, server, location
Specifies Groonga's query log path in the http, server or location block. The default is
/var/log/groonga/httpd/groonga-query.log. You can disable logging to specify off.
Examples:
location /d/ {
groonga on;
# You can disable query log for groonga.
groonga_query_log_path off;
}
Query log is useful for the following cases:
• Detecting slow query.
• Debugging.
You can analyze your query log by groonga-query-log package. The package provides useful tools.
For example, there is a tool that analyzing your query log. It can detect slow queries from your query
log. There is a tool that replaying same queries in your query log. It can test the new Groonga before
updating production environment.
Performance related directives
The following directives are related to the performance of groonga-httpd.
worker_processes
For optimum performance, set this to be equal to the number of CPUs or cores. In many cases, Groonga
queries may be CPU-intensive work, so to fully utilize multi-CPU/core systems, it's essential to set this
accordingly.
This isn't a groonga-httpd specific directive, but an nginx's one. For details, see
http://wiki.nginx.org/CoreModule#worker_processes.
By default, this is set to 1. It is nginx's default.
groonga_cache_limit
This directive is introduced to customize cache limit for each worker process.
Synopsis:
groonga_cache_limit CACHE_LIMIT;
Default
100
Context
http, server, location
Specifies Groonga's limit of query cache in the http, server or location block. The default value is 100.
You can disable query cache to specify 0 to groonga_cache_limit explicitly.
Examples:
location /d/ {
groonga on;
# You can customize query cache limit for groonga.
groonga_cache_limit 100;
}
proxy_cache
In short, you can use nginx's reverse proxy and cache mechanism instead of Groonga's built-in query cache
feature.
Query cache
Groonga has query cache feature for /reference/commands/select command. The feature improves performance
in many cases.
Query cache feature works well on groonga-httpd except you use /reference/commands/cache_limit command on
2 or more workers. Normally, /reference/commands/cache_limit command isn't used. So there is no problem
on many cases.
Here is a description about a problem of using /reference/commands/cache_limit command on 2 or more
workers.
Groonga's query cache is available in the same process. It means that workers can't share the cache. If
you don't change cache size, it isn't a big problem. If you want to change cache size by
/reference/commands/cache_limit command, there is a problem.
There is no portable ways to change cache size for all workers.
For example, there are 3 workers:
+-- worker 1
client -- groonga-httpd (master) --+-- worker 2
+-- worker 3
The client requests /reference/commands/cache_limit command and the worker 1 receives it:
+-> worker 1 (changed!)
client -> groonga-httpd (master) --+-- worker 2
+-- worker 3
The client requests /reference/commands/cache_limit command again and the worker 1 receives it again:
+-> worker 1 (changed again!!!)
client -> groonga-httpd (master) --+-- worker 2
+-- worker 3
In this case, the worker 2 and the worker 3 aren't received any requests. So they don't change cache
size.
You can't choose a worker. So you can't change cache sizes of all workers by
/reference/commands/cache_limit command.
Reverse proxy and cache
You can use nginx's reverse proxy and cache feature for query cache:
+-- worker 1
client -- groonga-httpd (master) -- reverse proxy + cache --+-- worker 2
+-- worker 3
You can use the same cache configuration for all workers but you can't change cache configuration
dynamically by HTTP.
Here is a sample configuration:
...
http {
proxy_cache_path /var/cache/groonga-httpd levels=1:2 keys_zone=groonga:10m;
proxy_cache_valid 10m;
...
# Reverse proxy and cache
server {
listen 10041;
...
# Only select command
location /d/select {
# Pass through groonga with cache
proxy_cache groonga;
proxy_pass http://localhost:20041;
}
location / {
# Pass through groonga
proxy_pass http://localhost:20041;
}
}
# groonga
server {
location 20041;
location /d/ {
groonga on;
groonga_database /var/lib/groonga/db/db;
}
}
...
}
See the following nginx documentations for parameter details:
• http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path
• http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid
• http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
• http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass
Note that you need to remove cache files created by nginx by hand after you load new data to Groonga. For
the above sample configuration, run the following command to remove cache files:
% groonga DB_PATH < load.grn
% rm -rf /var/cache/groonga-httpd/*
If you use Groonga's query cache feature, you don't need to expire cache by hand. It is done
automatically.
Available nginx modules
All standard HTTP modules are available. HttpRewriteModule is disabled when you don't have PCRE (Perl
Compatible Regular Expressions). For the list of standard HTTP modules, see
http://wiki.nginx.org/Modules.
Groonga HTTP server
Name
Groonga HTTP server
Synopsis
groonga -d --protocol http DB_PATH
Summary
You can communicate by HTTP if you specify http to --protocol option. And output a file that is put under
the path, and correspond to specified URI to HTTP request if you specify static page path by
--document-root.
Groonga has an Web-based administration tool implemented with HTML and JavaScript. If you don't specify
--document-root, regarded as administration tool installed path is specified, so you can use
administration tool to access http://HOSTNAME:PORT/ in Web browser.
Command
You can use the same commands of Groonga that starts of the other mode to Groonga server that starts to
specify http.
A command takes the arguments. An argument has a name. And there are special arguments output_type and
command_version.
In standalone mode or client mode, a command is specified by the following format.
Format 1: COMMAND_NAME VALUE1 VALUE2,..
Format 2: COMMAND_NAME --PARAMETER_NAME1 VALUE1 --PARAMETER_NAME2 VALUE2,..
Format 1 and Format 2 are possible to mix. Output type is specified by output_type in the formats.
In HTTP server mode, the following formats to specify command:
Format: /d/COMMAND_NAME.OUTPUT_TYPE?ARGUMENT_NAME1=VALUE1&ARGUMENT_NAME2=VALUE2&...
But, they need URL encode for command names, arguments names and values.
You can use GET method only.
You can specify JSON, TSV and XML to output type.
command_version is specified for command specification compatibility. See
/reference/command/command_version for details.
Return value
The execution result is output that follows output type specification by the command.
groonga-suggest-create-dataset
NAME
groonga-suggest-create-dataset - Defines schema for a suggestion dataset
SYNOPSTIS
groonga-suggest-create-dataset [options] DATABASE DATASET
DESCTIPION
groonga-suggest-create-dataset creates a dataset for /reference/suggest. A database has many datasets.
This command just defines schema for a suggestion dataset.
This command generates some tables and columns for /reference/suggest.
Here is the list of such tables. If you specify 'query' as dataset name, following '_DATASET' suffix are
replaced. Thus, 'item_query', 'pair_query', 'sequence_query', 'event_query' tables are generated.
• event_type
• bigram
• kana
• item_DATASET
• pair_DATASET
• sequence_DATASET
• event_DATASET
• configuration
OPTIONS
None.
EXIT STATUS
TODO
FILES
TODO
EXAMPLE
TODO
SEE ALSO
/reference/suggest groonga-suggest-httpd groonga-suggest-learner
groonga-suggest-httpd
Summary
groonga-suggest-httpd is a program to provide interface which accepts HTTP request and returns suggestion
dataset, then saves logs for learning. groonga-suggest-httpd behaves similar in point of view of
suggestion functionality, but the name of parameter is different.
Synopsis
groonga-suggest-httpd [options] database_path
Usage
Set up
First you need to set up database for suggestion.
Execution example:
% groonga-suggest-create-dataset /tmp/groonga-databases/groonga-suggest-httpd query
Launch groonga-suggest-httpd
Execute groonga-suggest-httpd command:
Execution example:
% groonga-suggest-httpd /tmp/groonga-databases/groonga-suggest-httpd
After executing above command, groonga-suggest-httpd accepts HTTP request on 8080 port.
If you just want to save requests into log file, use -l option.
Here is the example to save log files under logs directory with log prefix for each file.:
% groonga-suggest-httpd -l logs/log /tmp/groonga-databases/groonga-suggest-httpd
Under logs directory, log files such as logYYYYmmddHHMMSS-00 are created.
Request to groonga-suggest-httpd
Here is the sample requests to learn groonga for query dataset:
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=92619&t=complete&q=g'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=93850&t=complete&q=gr'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94293&t=complete&q=gro'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94734&t=complete&q=groo'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95147&t=complete&q=grooon'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95553&t=complete&q=groonga'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95959&t=submit&q=groonga
Options
-p, --port
Specify http server port number. The default value is 8080.
-t, --n-threads
Specify number of threads. The default value is 8. This option accepts 128 as the max value, but
use the number of CPU cores for performance.
-s, --send-endpoint
Specify endpoint for sender.
-r, --receive-endpoint
Specify endpoint for receiver.
-l, --log-base-path
Specify path prefix of log.
--n-lines-per-log-file
Specify the number of lines in a log file. The default value is 1,000,000.
-d, --daemon
Specify this option to daemonize.
--disable-max-fd-check
Specify this option to disable checking max fd on start.
Command line parameters
There is one required parameter - database_path.
database_path
Specifies the path to a Groonga database. This database must be created by groonga-suggest-create-dataset
command because it executes required initialization for suggestion.
GET parameters
groonga-suggest-httpd accepts following GET parameters.
There are required parameters which depends on type of query.
Required parameters
┌─────┬──────────────────────────────┬──────┐
│ Key │ Description │ Note │
├─────┼──────────────────────────────┼──────┤
│ q │ UTF-8 encoded string which │ │
│ │ user fills in form │ │
├─────┼──────────────────────────────┼──────┤
│ t │ The type of query. The value │ │
│ │ of type must be complete, │ │
│ │ correct, suggest or submit. │ │
│ │ It also accepts multiple │ │
│ │ type of query which is │ │
│ │ concatinated by |. Note that │ │
│ │ submit is invalid value when │ │
│ │ you specify multiple type of │ │
│ │ query. │ │
└─────┴──────────────────────────────┴──────┘
Required parameters for learning
─────────────────────────────────────────────────────────────────────
Key Description Note
─────────────────────────────────────────────────────────────────────
s Elapsed time from 0:00 Note that you need specify
January 1, 1970 the value of s in
milliseconds
─────────────────────────────────────────────────────────────────────
i Unique ID to distinct user Use session ID or IP address
for example
─────────────────────────────────────────────────────────────────────
l Specify the name of dataset Note that dataset name must
for learning. It also be matched to following
accepts multiple dataset regular expression [A-Za-z
name which is concatinated ][A-Za-z0-9 ]{0,15}
by |
┌─────┬──────────────────────────────┬──────────────────────────────┐
│ │ │ │
--
SPECIFICATION
GQTP
GQTP is the acronym of Groonga Query Transfer Protocol. GQTP is the original protocol for groonga.
Protocol
GQTP is stateful client server model protocol. The following sequence is one processing unit:
• Client sends a request
• Server receives the request
• Server processes the request
• Server sends a response
• Client receives the response
You can do zero or more processing units in a session.
Both request and response consist of GQTP header and body. GQTP header is fixed size data. Body is
variable size data and its size is stored in GQTP header. The content of body isn't defined in GQTP.
GQTP header
GQTP header consists of the following unsigned integer values:
┌────────────┬───────┬───────────────────────┐
│ Name │ Size │ Description │
├────────────┼───────┼───────────────────────┤
│ protocol │ 1byte │ Protocol type. │
├────────────┼───────┼───────────────────────┤
│ query_type │ 1byte │ Content type of body. │
├────────────┼───────┼───────────────────────┤
│ key_length │ 2byte │ Not used. │
├────────────┼───────┼───────────────────────┤
│ level │ 1byte │ Not used. │
├────────────┼───────┼───────────────────────┤
│ flags │ 1byte │ Flags. │
├────────────┼───────┼───────────────────────┤
│ status │ 2byte │ Return code. │
├────────────┼───────┼───────────────────────┤
│ size │ 4byte │ Body size. │
├────────────┼───────┼───────────────────────┤
│ opaque │ 4byte │ Not used. │
├────────────┼───────┼───────────────────────┤
│ cas │ 8byte │ Not used. │
└────────────┴───────┴───────────────────────┘
All header values are encoded by network byte order.
The following sections describes available values of each header value.
The total size of GQTP header is 24byte.
protocol
The value is always 0xc7 in both request and response GQTP header.
query_type
The value is one of the following values:
┌─────────┬───────┬───────────────────────┐
│ Name │ Value │ Description │
├─────────┼───────┼───────────────────────┤
│ NONE │ 0 │ Free format. │
├─────────┼───────┼───────────────────────┤
│ TSV │ 1 │ Tab Separated Values. │
├─────────┼───────┼───────────────────────┤
│ JSON │ 2 │ JSON. │
├─────────┼───────┼───────────────────────┤
│ XML │ 3 │ XML. │
├─────────┼───────┼───────────────────────┤
│ MSGPACK │ 4 │ MessagePack. │
└─────────┴───────┴───────────────────────┘
This is not used in request GQTP header.
This is used in response GQTP header. Body is formatted as specified type.
flags
The value is bitwise OR of the following values:
┌───────┬───────┬─────────────────────────┐
│ Name │ Value │ Description │
├───────┼───────┼─────────────────────────┤
│ MORE │ 0x01 │ There are more data. │
├───────┼───────┼─────────────────────────┤
│ TAIL │ 0x02 │ There are no more data. │
├───────┼───────┼─────────────────────────┤
│ HEAD │ 0x04 │ Not used. │
├───────┼───────┼─────────────────────────┤
│ QUIET │ 0x08 │ Be quiet. │
├───────┼───────┼─────────────────────────┤
│ QUIT │ 0x10 │ Quit. │
└───────┴───────┴─────────────────────────┘
You must specify MORE or TAIL flag.
If you use MORE flag, you should also use QUIET flag. Because you don't need to show a response for your
partial request.
Use QUIT flag to quit this session.
status
Here are available values. The new statuses will be added in the future.
• 0: SUCCESS
• 1: END_OF_DATA
• 65535: UNKNOWN_ERROR
• 65534: OPERATION_NOT_PERMITTED
• 65533: NO_SUCH_FILE_OR_DIRECTORY
• 65532: NO_SUCH_PROCESS
• 65531: INTERRUPTED_FUNCTION_CALL
• 65530: INPUT_OUTPUT_ERROR
• 65529: NO_SUCH_DEVICE_OR_ADDRESS
• 65528: ARG_LIST_TOO_LONG
• 65527: EXEC_FORMAT_ERROR
• 65526: BAD_FILE_DESCRIPTOR
• 65525: NO_CHILD_PROCESSES
• 65524: RESOURCE_TEMPORARILY_UNAVAILABLE
• 65523: NOT_ENOUGH_SPACE
• 65522: PERMISSION_DENIED
• 65521: BAD_ADDRESS
• 65520: RESOURCE_BUSY
• 65519: FILE_EXISTS
• 65518: IMPROPER_LINK
• 65517: NO_SUCH_DEVICE
• 65516: NOT_A_DIRECTORY
• 65515: IS_A_DIRECTORY
• 65514: INVALID_ARGUMENT
• 65513: TOO_MANY_OPEN_FILES_IN_SYSTEM
• 65512: TOO_MANY_OPEN_FILES
• 65511: INAPPROPRIATE_I_O_CONTROL_OPERATION
• 65510: FILE_TOO_LARGE
• 65509: NO_SPACE_LEFT_ON_DEVICE
• 65508: INVALID_SEEK
• 65507: READ_ONLY_FILE_SYSTEM
• 65506: TOO_MANY_LINKS
• 65505: BROKEN_PIPE
• 65504: DOMAIN_ERROR
• 65503: RESULT_TOO_LARGE
• 65502: RESOURCE_DEADLOCK_AVOIDED
• 65501: NO_MEMORY_AVAILABLE
• 65500: FILENAME_TOO_LONG
• 65499: NO_LOCKS_AVAILABLE
• 65498: FUNCTION_NOT_IMPLEMENTED
• 65497: DIRECTORY_NOT_EMPTY
• 65496: ILLEGAL_BYTE_SEQUENCE
• 65495: SOCKET_NOT_INITIALIZED
• 65494: OPERATION_WOULD_BLOCK
• 65493: ADDRESS_IS_NOT_AVAILABLE
• 65492: NETWORK_IS_DOWN
• 65491: NO_BUFFER
• 65490: SOCKET_IS_ALREADY_CONNECTED
• 65489: SOCKET_IS_NOT_CONNECTED
• 65488: SOCKET_IS_ALREADY_SHUTDOWNED
• 65487: OPERATION_TIMEOUT
• 65486: CONNECTION_REFUSED
• 65485: RANGE_ERROR
• 65484: TOKENIZER_ERROR
• 65483: FILE_CORRUPT
• 65482: INVALID_FORMAT
• 65481: OBJECT_CORRUPT
• 65480: TOO_MANY_SYMBOLIC_LINKS
• 65479: NOT_SOCKET
• 65478: OPERATION_NOT_SUPPORTED
• 65477: ADDRESS_IS_IN_USE
• 65476: ZLIB_ERROR
• 65475: LZO_ERROR
• 65474: STACK_OVER_FLOW
• 65473: SYNTAX_ERROR
• 65472: RETRY_MAX
• 65471: INCOMPATIBLE_FILE_FORMAT
• 65470: UPDATE_NOT_ALLOWED
• 65469: TOO_SMALL_OFFSET
• 65468: TOO_LARGE_OFFSET
• 65467: TOO_SMALL_LIMIT
• 65466: CAS_ERROR
• 65465: UNSUPPORTED_COMMAND_VERSION
size
The size of body. The maximum body size is 4GiB because size is 4byte unsigned integer. If you want to
send 4GiB or more larger data, use MORE flag.
Example
How to run a GQTP server
Groonga has a special protocol, named Groonga Query Transfer Protocol (GQTP), for remote access to a
database. The following form shows how to run Groonga as a GQTP server.
Form:
groonga [-p PORT_NUMBER] -s DB_PATH
The -s option specifies to run Groonga as a server. DB_PATH specifies the path of the existing database
to be hosted. The -p option and its argument, PORT_NUMBER, specify the port number of the server. The
default port number is 10043, which is used when you don't specify PORT_NUMBER.
The following command runs a server that listens on the default port number. The server accepts
operations to the specified database.
Execution example:
% groonga -s /tmp/groonga-databases/introduction.db
Ctrl-c
%
How to run a GQTP daemon
You can also run a GQTP server as a daemon by using the -d option, instead of the -s option.
Form:
groonga [-p PORT_NUMBER] -d DB_PATH
A Groonga daemon prints its process ID as follows. In this example, the process ID is 12345. Then, the
daemon opens a specified database and accepts operations to that database.
Execution example:
% groonga -d /tmp/groonga-databases/introduction.db
12345
%
How to run a GQTP client
You can run Groonga as a GQTP client as follows:
Form:
groonga [-p PORT_NUMBER] -c [HOST_NAME_OR_IP_ADDRESS]
This command establishes a connection with a GQTP server and then enters into interactive mode.
HOST_NAME_OR_IP_ADDRESS specifies the hostname or the IP address of the server. If not specified, Groonga
uses the default hostname "localhost". The -p option and its argument, PORT_NUMBER, specify the port
number of the server. If not specified, Groonga uses the default port number 10043.
Execution example:
% groonga -c
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "4.0.1",
# "alloc_count": 140,
# "command_version": 1,
# "starttime": 1395806078,
# "default_command_version": 1
# }
# ]
> ctrl-d
%
In interactive mode, Groonga reads commands from the standard input and executes them one by one.
How to terminate a GQTP server
You can terminate a GQTP server with a /reference/commands/shutdown command.
Execution example:
% groonga -c
> shutdown
%
See also
• /reference/executables/groonga
• /server/gqtp
検索
/reference/commands/select コマンドがqueryパラメータを使ってどのように検索するのかを説明します。
検索の挙動
検索の挙動には以下の3種類あり、検索結果によって動的に使い分けています。
1. 完全一致検索
2. 非わかち書き検索
3. 部分一致検索
どのように検索の挙動を使い分けているかを説明する前に、まず、それぞれの検索の挙動を説明します。
完全一致検索
検索対象文書は複数の語彙にトークナイズ(分割)され、それぞれを単位とした語彙表に索引を管理します。
検索キーワードも同一の方法でトークナイズされます。
このとき、検索キーワードをトークナイズした結果得られる語彙の配列と同一の配列を含む文書を検索する処理を完全一致検索と呼んでいます。
たとえば、TokenMecabトークナイザを使用した索引では「東京都民」という文字列は
東京 / 都民
という二つの語彙の配列として格納されます。この索引に対して「東京都」というキーワードで検索した時、このキーワードは、
東京 / 都
という二つの語彙の配列として処理されます。この語彙の並びは、「東京 /
都民」という語彙の並びには一致しませんので、完全一致検索ではヒットしません。
これに対して、TokenBigramトークナイザを使用した索引では「東京都民」という文字列は
東京 / 京都 / 都民 / 民
という四つの語彙の配列として格納されます。この索引に対して「東京都」というキーワードで検索した時、このキーワードは、
東京 / 京都
という二つの語彙の配列として処理されます。この語彙の並びは、「東京 / 京都 /
都民」という語彙の並びに含まれますので、完全一致検索でヒットします。
なお、TokenBigramトークナイザでは、アルファベット・数値・記号文字列についてはbigramを生成せず、一つの連続したトークンとして扱います。たとえば、「楽しいbilliard」という文字列は、
楽し / しい / billiard
という三つの語彙の配列として格納されます。これに対して「bill」というキーワードで検索した時、このキーワードは、
bill
という一つの語彙として処理されます。この語彙の並びは「楽し / しい /
billiard」という語彙の並びには含まれないので、完全一致でヒットしません。
これに対して、TokenBigramSplitSymbolAlphaトークナイザではアルファベット文字列についてもbigramを生成し、「楽しいbilliard」という文字列は、
楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d
という十一の語彙の配列として格納されます。これに対して「bill」というキーワードで検索した時、このキーワードは、
bi / il / ll
という三つの語彙として処理されます。この語彙の並びは「楽し / しい / いb / bi / il / ll / li / ia / ar /
rd / d」という語彙の並びに含まれるので、完全一致でヒットします。
非わかち書き検索
非わかち書き検索はパトリシア木で語彙表を構築している場合のみ利用可能です。
非わかち書き検索の挙動はTokenBigramなどN-gram系のトークナイザーを利用している場合とTokenMecabトークナイザーを利用している場合で挙動が変わります。
N-gram系のトークナイザーを利用している場合はキーワードで前方一致検索をします。
例えば、「bill」というキーワードで検索した場合、「bill」も「billiard」もヒットします。
TokenMeCabトークナイザーの場合はわかち書き前のキーワードで前方一致検索をします。
例えば、「スープカレー」というキーワードで検索した場合、「スープカレーバー」(1単語扱い)にヒットしますが、「スープカレー」("スープ"と"カレー"の2単語扱い)や「スープカレーライス」("スープ"と"カレーライス"の2単語扱い)にはヒットしません。
部分一致検索
部分一致検索はパトリシア木で語彙表を構築していて、かつ、KEY_WITH_SISオプションを指定している場合のみ利用可能です。KEY_WITH_SISオプションが指定されていない場合は非わかち書き検索と同等です。
部分一致検索の挙動はTokenBigramなどN-gram系のトークナイザーを利用している場合とTokenMecabトークナイザーを利用している場合で挙動が変わります。
Bigramの場合は前方一致検索と中間一致検索と後方一致検索を行います。
例えば、「ill」というキーワードで検索した場合、「bill」も「billiard」もヒットします。
TokenMeCabトークナイザーの場合はわかち書き後のキーワードで前方一致検索と中間一致検索と後方一致検索をします。
例えば、「スープカレー」というキーワードで検索した場合、「スープカレー」("スープ"と"カレー"の2単語扱い)や「スープカレーライス」("スープ"と"カレーライス"の2単語扱い)、「スープカレーバー」(1単語扱い)にもヒットします。
検索の使い分け
Groongaは基本的に完全一致検索のみを行います。完全一致検索でのヒット件数が所定の閾値以下の場合に限り、非わかち書き検索を行い、それでもヒット件数が閾値以下の場合は部分一致検索を行います。(閾値のデフォルト値は0です。)
ただし、すでに検索結果セットが存在する場合はたとえヒット件数が閾値以下でも完全一致検索のみを行います。
例えば、以下のようなクエリの場合は、それぞれの検索でヒット件数が閾値以下の場合は完全一致検索、非わかち書き検索、部分一致検索を順に行います。:
select Shops --match_column description --query スープカレー
しかし、以下のように全文検索を行う前に検索結果セットが存在する場合は完全一致検索のみを行います。(point >
3で閾値の件数よりヒットしている場合):
select Shops --filter '"point > 3 && description @ \"スープカレー\""'
そのため、descriptionに「スープカレーライス」が含まれていても、「スープカレーライス」は「スープカレー」に完全一致しないのでヒットしません。
LIMITATIONS
Groonga has some limitations.
Limitations of table
A table has the following limitations.
• The maximum one key size: 4KiB
• The maximum total size of keys: 4GiB or 1TiB (by specifying KEY_LARGE flag to table-create-flags)
• The maximum number of records: 268,435,455 (more than 268 million)
Keep in mind that these limitations may vary depending on conditions.
Limitations of indexing
A full-text index has the following limitations.
• The maximum number of distinct terms: 268,435,455 (more than 268 million)
• The maximum index size: 256GiB
Keep in mind that these limitations may vary depending on conditions.
Limitations of column
A column has the following limitation.
• The maximum stored data size of a column: 256GiB
トラブルシューティング
同じ検索キーワードなのに全文検索結果が異なる
同じ検索キーワードでも一緒に指定するクエリによっては全文検索の結果が異なることがあります。ここでは、その原因と対策方法を説明します。
例
まず、実際に検索結果が異なる例を説明します。
DDLは以下の通りです。BlogsテーブルのbodyカラムをTokenMecabトークナイザーを使ってトークナイズしてからインデックスを作成しています。:
table_create Blogs TABLE_NO_KEY
column_create Blogs body COLUMN_SCALAR ShortText
column_create Blogs updated_at COLUMN_SCALAR Time
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenMecab --normalizer NormalizerAuto
column_create Terms blog_body COLUMN_INDEX|WITH_POSITION Blogs body
テスト用のデータは1件だけ投入します。:
load --table Blogs
[
["body", "updated_at"],
["東京都民に深刻なダメージを与えました。", "2010/9/21 10:18:34"],
]
まず、全文検索のみで検索します。この場合ヒットします。:
> select Blogs --filter 'body @ "東京都"'
[[0,4102.268052438,0.000743783],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]]
続いて、範囲指定と全文検索を組み合わせて検索します(1285858800は2010/10/1
0:0:0の秒表記)。この場合もヒットします。:
> select Blogs --filter 'body @ "東京都" && updated_at < 1285858800'
[[0,4387.524084839,0.001525487],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]]
最後に、範囲指定と全文検索の順番を入れ替えて検索します。個々の条件は同じですが、この場合はヒットしません。:
> select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"'
[[0,4400.292570838,0.000647716],[[[0],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]]]]]
どうしてこのような挙動になるかを説明します。
原因
このような挙動になるのは全文検索時に複数の検索の挙動を使い分けているからです。ここでは簡単に説明するので、詳細は
/spec/search を参照してください。
検索の挙動には以下の3種類があります。
1. 完全一致検索
2. 非わかち書き検索
3. 部分一致検索
Groongaは基本的に完全一致検索のみを行います。上記の例では「東京都民に深刻なダメージを与えました。」を「東京都」というクエリで検索していますが、TokenMecabトークナイザーを使っている場合はこのクエリはマッチしません。
検索対象の「東京都民に深刻なダメージを与えました。」は
東京 / 都民 / に / 深刻 / な / ダメージ / を / 与え / まし / た / 。
とトークナイズされますが、クエリの「東京都」は
東京 / 都
とトークナイズされるため、完全一致しません。
Groongaは完全一致検索した結果のヒット件数が所定の閾値を超えない場合に限り、非わかち書き検索を行い、それでもヒット件数が閾値を超えない場合は部分一致検索を行います(閾値は1がデフォルト値となっています)。このケースのデータは部分一致検索ではヒットするので、「東京都」クエリのみを指定するとヒットします。
しかし、以下のように全文検索前にすでに閾値が越えている場合(「updated_at <
1285858800」で1件ヒットし、閾値を越える)は、たとえ完全一致検索で1件もヒットしない場合でも部分一致検索などを行いません。:
select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"'
そのため、条件の順序を変えると検索結果が変わるという状況が発生します。以下で、この情報を回避する方法を2種類紹介しますが、それぞれトレードオフとなる条件があるので採用するかどうかを十分検討してください。
対策方法1: トークナイザーを変更する
TokenMecabトークナイザーは事前に準備した辞書を用いてトークナイズするため、再現率よりも適合率を重視したトークナイザーと言えます。一方、TokenBigramなど、N-gram系のトークナイザーは適合率を重視したトークナイザーと言えます。例えば、TokenMecabの場合「東京都」で「京都」に完全一致することはありませんが、TokenBigramでは完全一致します。一方、TokenMecabでは「東京都民」に完全一致しませんが、TokenBigramでは完全一致します。
このようにN-gram系のトークナイザーを指定することにより再現率をあげることができますが、適合率が下がり検索ノイズが含まれる可能性が高くなります。この度合いを調整するためには
/reference/commands/select のmatch_columnsで使用する索引毎に重み付けを指定します。
ここでも、前述の例を使って具体例を示します。まず、TokenBigramを用いた索引を追加します。:
table_create Bigram TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create Bigram blog_body COLUMN_INDEX|WITH_POSITION Blogs body
この状態でも以前はマッチしなかったレコードがヒットするようになります。:
> select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"'
[[0,7163.448064902,0.000418127],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]]
しかし、N-gram系のトークナイザーの方がTokenMecabトークナイザーよりも語のヒット数が多いため、N-gram系のヒットスコアの方が重く扱われてしまいます。N-gram系のトークナイザーの方がTokenMecabトークナイザーよりも適合率の低い場合が多いので、このままでは検索ノイズが上位に表示される可能性が高くなります。
そこで、TokenMecabトークナイザーを使って作った索引の方をTokenBigramトークナイザーを使って作った索引よりも重視するように重み付けを指定します。これは、match_columnsオプションで指定できます。:
> select Blogs --match_columns 'Terms.blog_body * 10 || Bigram.blog_body * 3' --query '東京都' --output_columns '_score, body'
[[0,8167.364602632,0.000647003],[[[1],[["_score","Int32"],["body","ShortText"]],[13,"東京都民に深刻なダメージを与えました。"]]]]
この場合はスコアが11になっています。内訳は、Terms.blog_body索引(TokenMecabトークナイザーを使用)でマッチしたので10、Bigram.blog_body索引(TokenBigramトークナイザーを使用)でマッチしたので3、これらを合計して13になっています。このようにTokenMecabトークナイザーの重みを高くすることにより、検索ノイズが上位にくることを抑えつつ再現率を上げることができます。
この例は日本語だったのでTokenBigramトークナイザーでよかったのですが、アルファベットの場合はTokenBigramSplitSymbolAlphaトークナイザーなども利用する必要があります。例えば、「楽しいbilliard」はTokenBigramトークナイザーでは
楽し / しい / billiard
となり、「bill」では完全一致しません。一方、TokenBigramSplitSymbolAlphaトークナイザーを使うと
楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d
となり、「bill」でも完全一致します。
TokenBigramSplitSymbolAlphaトークナイザーを使う場合も重み付けを考慮する必要があることはかわりありません。
利用できるバイグラム系のトークナイザーの一覧は以下の通りです。
• TokenBigram: バイグラムでトークナイズする。連続する記号・アルファベット・数字は一語として扱う。
• TokenBigramSplitSymbol:
記号もバイグラムでトークナイズする。連続するアルファベット・数字は一語として扱う。
• TokenBigramSplitSymbolAlpha:
記号とアルファベットもバイグラムでトークナイズする。連続する数字は一語として扱う。
• TokenBigramSplitSymbolAlphaDigit: 記号・アルファベット・数字もバイグラムでトークナイズする。
• TokenBigramIgnoreBlank:
バイグラムでトークナイズする。連続する記号・アルファベット・数字は一語として扱う。空白は無視する。
• TokenBigramIgnoreBlankSplitSymbol:
記号もバイグラムでトークナイズする。連続するアルファベット・数字は一語として扱う。空白は無視する。
• TokenBigramIgnoreBlankSplitSymbolAlpha:
記号とアルファベットもバイグラムでトークナイズする。連続する数字は一語として扱う。空白は無視する。
• TokenBigramIgnoreBlankSplitSymbolAlphaDigit:
記号・アルファベット・数字もバイグラムでトークナイズする。空白は無視する。
対策方法2: 閾値をあげる
非わかち書き検索・部分一致検索を利用するかどうかの閾値は--with-match-escalation-threshold
configureオプションで変更することができます。以下のように指定すると、100件以下のヒット数であれば、たとえ完全一致検索でヒットしても、非わかち書き検索・部分一致検索を行います。:
% ./configure --with-match-escalation-threshold=100
この場合も対策方法1同様、検索ノイズが上位に現れる可能性が高くなることに注意してください。検索ノイズが多くなった場合は指定する値を低くする必要があります。
How to avoid mmap Cannot allocate memory error
Example
There is a case following mmap error in log file:
2013-06-04 08:19:34.835218|A|4e86e700|mmap(4194304,551,432017408)=Cannot allocate memory <13036498944>
Note that <13036498944> means total size of mmap (almost 12GB) in this case.
Solution
So you need to confirm following point of views.
• Are there enough free memory?
• Are maximum number of mappings exceeded?
To check there are enough free memory, you can use vmstat command.
To check whether maximum number of mappings are exceeded, you can investigate the value of
vm.max_map_count.
If this issue is fixed by modifying the value of vm.max_map_count, it's exactly the reason.
As groonga allocates memory chunks each 256KB, you can estimate the size of database you can handle by
following formula:
(database size) = vm.max_map_count * (memory chunks)
If you want to handle over 16GB groonga database, you must specify at least 65536 as the value of
vm.max_map_count:
database size (16GB) = vm.max_map_count (65536) * memory chunks (256KB)
You can modify vm.max_map_count temporary by sudo sysctl -w vm.max_map_count=65536.
Then save the configuration value to /etc/sysctl.conf or /etc/sysctl.d/*.conf.
See /reference/tuning documentation about tuning related parameters.
DEVELOPMENT
This section describes about developing with Groonga. You may develop an application that uses Groonga as
its database, a library that uses libgroonga, language bindings of libgroonga and so on.
Travis CI
This section describes about using Groonga on Travis CI. Travis CI is a hosted continuous integration
service for the open source community.
You can use Travis CI for your open source software. This section only describes about Groonga related
configuration. See Travis CI: Documentation about general Travis CI.
Configuration
Travis CI is running on 64-bit Ubuntu 12.04 LTS Server Edition. (See Travis CI: About Travis CI
Environment.) You can use apt-line for Ubuntu 12.04 LTS provided by Groonga project to install Groonga
on Travis CI.
You can custom build lifecycle by .travis.yml. (See Travis CI: Conifugration your Travis CI build with
.travis.yml.) You can use before_install hook or install hook. You should use before_install if your
software uses a language that is supported by Travis CI such as Ruby. You should use install otherwise.
Add the following sudo and before_install configuration to .travis.yml:
sudo: required
before_install:
- curl --silent --location https://github.com/groonga/groonga/raw/master/data/travis/setup.sh | sh
sudo: required configuration is required because sudo command is used in the setup script.
If you need to use install hook instead of before_install, you just substitute before_install: with
install:.
With the above configuration, you can use Groonga for your build.
Examples
Here are open source software that use Groonga on Travis CI:
• rroonga (Ruby bindings)
• rroonga on Travis CI
• .travis.yml for rroonga
• nroonga (node.js bindings)
• nroonga on Travis CI
• .travis.yml for nroonga
• logaling-command (A glossary management command line tool)
• logaling-command on Travis CI
• .travis.yml for logaling-command
HOW TO CONTRIBUTE TO GROONGA
We welcome your contributions to the groonga project. There are many ways to contribute, such as using
groonga, introduction to others, etc. For example, if you find a bug when using groonga, you are welcome
to report the bug. Coding and documentation are also welcome for groonga and its related projects.
As a user:
If you are interested in groonga, please read this document and try it.
As a spokesman:
Please introduce groonga to your friends and colleagues.
As a developer: Bug report, development and documentation
This section describes the details.
How to report a bug
There are two ways to report a bug:
• Submit a bug to the issue tracker
• Report a bug to the mailing list
You can use either way It makes no difference to us.
Submit a bug to the issue tracker
Groonga project uses GitHub issue tracker.
You can use English or Japanese to report a bug.
Report a bug to the mailing list
Groonga project has /community for discussing about groonga. Please send a mail that describes a bug.
How to contribute in documentation topics
We use Sphinx for documentation tool.
Introduction
This documentation describes about how to write, generate and manage Groonga documentation.
Install depended software
Groonga uses Sphinx as documentation tool.
Here are command lines to install Sphinx.
Debian GNU/Linux, Ubuntu:
% sudo apt-get install -V -y python-sphinx
CentOS, Fedora:
% sudo yum install -y python-pip
% sudo pip install sphinx
OS X:
% brew install python
% brew install gettext
% export PATH=`brew --prefix gettext`/bin:$PATH
% pip install sphinx
If the version of Python on your platform is too old, you'll need to install a newer version of Python
2.7 by your hand. For example, here are installation steps based on pyenv:
% pyenv install 2.7.11
% pyenv global 2.7.11
% pip install sphinx
Run configure with --enable-document
Groonga disables documentation generation by default. You need to enable it explicitly by adding
--enable-document option to configure:
% ./configure --enable-document
Now, your Groonga build is documentation ready.
Generate HTML
You can generate HTML by the following command:
% make -C doc html
You can find generated HTML documentation at doc/locale/en/html/.
Update
You can find sources of documentation at doc/source/. The sources should be written in English. See i18n
about how to translate documentation.
You can update the target file when you update the existing documentation file.
You need to update file list after you add a new file, change file path and delete existing file. You can
update file list by the following command:
% make -C doc update-files
The command updates doc/files.am.
I18N
We only had documentation in Japanese. We start to support I18N documentation by gettext based Sphinx
I18N feature. We'll use English as base language and translate English into other languages such as
Japanese. We'll put all documentations into doc/source/ and process them by Sphinx.
But we still use Japanese in doc/source/ for now. We need to translate Japanese documentation in
doc/source/ into English. We welcome to you help us by translating documentation.
Translation flow
After doc/source/*.txt are updated, we can start translation.
Here is a translation flow:
1. Install Sphinx, if it is not installed.
2. Clone Groonga repository.
3. Update .po files.
4. Edit .po files.
5. Generate HTML files.
6. Confirm HTML output.
7. Repeat 2.-4. until you get good result.
8. Send your works to us!
Here are command lines to do the above flow. Following sections describes details.
# Please fork https://github.com/groonga/groonga on GitHub
% git clone https://github.com/${YOUR_GITHUB_ACCOUNT}/groonga.git
% ./autogen.sh
% ./configure --enable-document
% cd doc/locale/${LANGUAGE}/LC_MESSAGES # ${LANGUAGE} is language code such as 'ja'.
% make update # *.po are updated
% editor *.po # translate *.po # you can use your favorite editor
% cd ..
% make html
% browser html/index.html # confirm translation
% git add LC_MESSAGES/*.po
% git commit
% git push
How to install Sphinx
See the introduction.
How to clone Groonga repository
First, please fork Groonga repository on GitHub. You just access https://github.com/groonga/groonga and
press Fork button. Now you can clone your Groonga repository:
% git clone https://github.com/${YOUR_GITHUB_ACCOUNT}/groonga.git
Then you need to configure your cloned repository:
% cd groonga
% ./autogen.sh
% ./configure --enable-document
The above steps are just needed at the first setup.
If you have troubles on the above steps, you can use source files available on
http://packages.groonga.org/source/groonga/ .
How to update .po files
You can update .po files by running make update on doc/locale/${LANGUAGE}/LC_MESSAGES. (Please substitute
${LANGUAGE} with your language code such as 'ja'.):
% cd doc/locale/ja/LC_MESSAGES
% make update
How to edit .po
There are some tools to edit .po files. .po files are just text. So you can use your favorite editor.
Here is a specialized editor for .po file edit list.
Emacs's po-mode
It is bundled in gettext.
Poedit It is a .po editor and works on many platform.
gted It is also a .po editor and is implemented as Eclipse plugin.
How to generate HTML files
You can generate HTML files with updated .po files by running make html on doc/locale/${LANGUAGE}.
(Please substitute ${LANGUAGE} with your language code such as 'ja'.):
% cd doc/locale/ja/
% make html
You can also generate HTML files for all languages by running make html on doc/locale:
% cd doc/locale
% make html
NOTE:
.mo files are updated automatically by make html. So you don't care about .mo files.
How to confirm HTML output
HTML files are generated in doc/locale/${LANGUAGE}/html/. (Please substitute ${LANGUAGE} with your
language code such as 'ja'.) You can confirm HTML output by your favorite browser:
% firefox doc/locale/ja/html/index.html
How to send your works
We can receive your works via pull request on GitHub or E-mail attachment patch or .po files themselves.
How to send pull request
Here are command lines to send pull request:
% git add doc/locale/ja/LC_MESSAGES/*.po
% git commit
% git push
Now you can send pull request on GitHub. You just access your repository page on GitHub and press Pull
Request button.
SEE ALSO:
Help.GitHub - Sending pull requests.
How to send patch
Here are command lines to create patch:
% git add doc/locale/ja/LC_MESSAGES/*.po
% git commit
% git format-patch origin/master
You can find 000X-YYY.patch files in the current directory. Please send those files to us!
SEE ALSO:
/community describes our contact information.
How to send .po files
Please archive doc/locale/${LANGUAGE}/LC_MESSAGES/ (Please substitute ${LANGUAGE} with your language code
such as 'ja'.) and send it to us! We extract and merge them to the Groonga repository.
SEE ALSO:
/community describes our contact information.
How to add new language
Here are command lines to add new translation language:
% cd doc/locale
% make add LOCALE=${LANGUAGE} # specify your language code such as 'de'.
Please substitute ${LANGUAGE} with your language code such as 'ja'.
SEE ALSO:
Codes for the Representation of Names of Languages.
C API
We still have C API documentation in include/groonga.h. But we want to move them into
doc/source/c-api/*.txt. We welcome to you help us by moving C API documentation.
We will use the C domain markup of Sphinx.
For Groonga developers
Repository
There is the repository of Groonga on GitHub. If you want to check-out Groonga, type the below command:
% git clone --recursive https://github.com/groonga/groonga.git
There is the list of related projects of Groonga (grntest, fluent-plugin-groonga and so on).
How to build Groonga at the repository
This document describes how to build Groonga at the repository for each build system. You can choose GNU
Autotools or CMake if you develop Groonga on GNU/Linux or Unix (*BSD, Solaris, OS X and so on). You need
to use CMake if you develop on Windows.
How to build Groonga at the repository by GNU Autotools
This document describes how to build Groonga at the repository by GNU Autotools.
You can't choose this way if you develop Groonga on Windows. If you want to use Windows for developing
Groonga, see windows_cmake.
Install depended software
TODO
• Autoconf
• Automake
• GNU Libtool
• Ruby
• Git
• Cutter
• ...
Checkout Groonga from the repository
Users use released source archive. But developers must build Groonga at the repository. Because source
code in the repository is the latest.
The Groonga repository is hosted on GitHub. Checkout the latest source code from the repository:
% git clone --recursive git@github.com:groonga/groonga.git
Create configure
You need to create configure. configure is included in source archive but not included in the repository.
configure is a build tool that detects your system and generates build configurations for your
environment.
Run autogen.sh to create configure:
% ./autogen.sh
Run configure
You can custom your build configuration by passing options to configure.
Here are recommended configure options for developers:
% ./configure --prefix=/tmp/local --enable-debug --enable-mruby --with-ruby
Here are descriptions of these options:
--prefix=/tmp/local
It specifies that you install your Groonga into temporary directory. You can do "clean install" by
removing /tmp/local directory. It'll be useful for debugging install.
--enable-debug
It enables debug options for C/C++ compiler. It's useful for debugging on debugger such as GDB and
LLDB.
--eanble-mruby
It enables mruby support. The feature isn't enabled by default but developers should enable the
feature.
--with-ruby
It's needed for --enable-mruby and running functional tests.
Run make
Now, you can build Groonga.
Here is a recommended make command line for developers:
% make -j8 > /dev/null
-j8 decreases build time. It enables parallel build. If you have 8 or more CPU cores, you can increase 8
to decreases more build time.
You can just see only warning and error messages by > /dev/null. Developers shouldn't add new warnings
and errors in new commit.
See also
• /contribution/development/test
How to build Groonga at the repository by CMake on GNU/Linux or Unix
This document describes how to build Groonga at the repository by CMake on GNU/Linux or Unix.
Unix is *BSD, Solaris, OS X and so on.
If you want to use Windows for developing Groonga, see windows_cmake.
You can't choose this way if you want to release Groonga. Groonga release system is only supported by GNU
Autotools build. See unix_autotools about GNU Autotools build.
Install depended software
TODO
• CMake
• Ruby
• Git
• Cutter
• ...
Checkout Groonga from the repository
Users use released source archive. But developers must build Groonga at the repository. Because source
code in the repository is the latest.
The Groonga repository is hosted on GitHub. Checkout the latest source code from the repository:
% git clone --recursive git@github.com:groonga/groonga.git
Run cmake
You need to create Makefile for your environment.
You can custom your build configuration by passing options to cmake.
Here are recommended cmake options for developers:
% cmake . -DCMAKE_INSTALL_PREFIX=/tmp/local -DGRN_WITH_DEBUG=on -DGRN_WITH_MRUBY=on
Here are descriptions of these options:
-DCMAKE_INSTALL_PREFIX=/tmp/local
It specifies that you install your Groonga into temporary directory. You can do "clean install" by
removing /tmp/local directory. It'll be useful for debugging install.
-DGRN_WITH_DEBUG=on
It enables debug options for C/C++ compiler. It's useful for debugging on debugger such as GDB and
LLDB.
-DGRN_WITH_MRUBY=on
It enables mruby support. The feature isn't enabled by default but developers should enable the
feature.
Run make
Now, you can build Groonga.
Here is a recommended make command line for developers:
% make -j8 > /dev/null
-j8 decreases build time. It enables parallel build. If you have 8 or more CPU cores, you can increase 8
to decreases more build time.
You can just see only warning and error messages by > /dev/null. Developers shouldn't add new warnings
and errors in new commit.
See also
• /contribution/development/test
How to build Groonga at the repository by CMake on Windows
This document describes how to build Groonga at the repository by CMake on Windows.
If you want to use GNU/Linux or Unix for developing Groonga, see unix_cmake.
Unix is *BSD, Solaris, OS X and so on.
Install depended software
• Microsoft Visual Studio Express 2013 for Windows Desktop
• CMake
• Ruby
• RubyInstaller for Windows is recommended.
• Git: There are some Git clients for Windows. For example:
• The official Git package
• TortoiseGit
• Git for Windows
• GitHub Desktop
Checkout Groonga from the repository
Users use released source archive. But developers must build Groonga at the repository. Because source
code in the repository is the latest.
The Groonga repository is hosted on GitHub. Checkout the latest source code from the repository:
> git clone --recursive git@github.com:groonga/groonga.git
Run cmake
You need to create Makefile for your environment.
You can custom your build configuration by passing options to cmake.
You must to pass -G option. Here are available -G value:
• "Visual Studio 12 2013": For 32bit build.
• "Visual Studio 12 2013 Win64": For 64bit build.
Here are recommended cmake options for developers:
> cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga -DGRN_WITH_MRUBY=on
Here are descriptions of these options:
-G "Visual Studio 12 2013 Win64"
-DCMAKE_INSTALL_PREFIX=C:\Groonga
It specifies that you install your Groonga into C:\\Groonga folder.
-DGRN_WITH_MRUBY=on
It enables mruby support. The feature isn't enabled by default but developers should enable the
feature.
Build Groonga
Now, you can build Groonga.
You can use Visual Studio or cmake --build.
Here is a command line to build Groonga by cmake --build:
> cmake --build . --config Debug
See also
• /contribution/development/test
Groonga 通信アーキテクチャ
GQTPでのアーキテクチャ
• comが外部からの接続を受け付ける。
• comは1スレッド。
• comがedgeを作る。
• edgeは接続と1対1対応。
• edgeはctxを含む。
• workerはthreadと1対1対応。
• workerは上限が個定数。
• workerは、1つのedgeと結びつくことができる。
• edgeごとにqueueを持つ。
• msgはcomによって、edgeのqueueにenqueueされる。
edgeがworkerに結びついていないときは、同時に、ctx_newというqueueに、msgをenqueueした対象のedgeをenqueueする。
ユーザーと協力して開発をうまく進めていくための指針
Groongaを使ってくれているユーザーと協力して
開発をうまく進めていくためにこうするといい、という事柄をまとめました。
まとめておくと、新しく開発に加わる人とも共有することができます。
twitter編
Groongaを使ってもらえるようにtwitterのアカウントGroongaを取得して
日々、リリースの案内をしたり、ユーザーサポートをしたりしています。
リリースの案内に利用する場合には、やりとりを考えなくて良いですが、
複数人によるサポートをGroongaで行う場合に、どうサポートするのが
良いのか/どうしてそうするのかという共通認識を持っていないと一貫性のないサポートとなってしま います。
twitterでサポートされている安心感からGroongaユーザーの拡大に繋げる
ことができるようにサポートの際に気をつけることをまとめます。
過去のツイートはおさらいしておく
理由
自分がツイートした内容を把握していない返信をされたら普通いい気はしません。
対応
過去のツイートをおさらいし、こうすれば良いという提案をできるのが望ましいです。:
良い例: ○○だと原因は□□ですね。××すると大丈夫です。
こちらから情報を提供する
理由
困っているユーザーが複数回ツイートして限られたなかで情報を提供してくれていることがあります。
その限られたツイートから解決方法が見つかればユーザーにとって余計な手間が少なくて済みます。
あれこれ情報提供を要求すると、ユーザーはそのぶん確認する作業が必要になります。
対応
最初に声をかけるときに解決策を1つか2つ提案できると望ましいです。ユーザーにあまり負担を感じさせないようにすると良いです。:
良い例: ○○の場合は□□の可能性があるので、××を試してもらえますか?
twitterでのやりとりはできるだけ他の場所(例えばredmine)へと誘導しない
理由
twitterは気軽につぶやけることが重要なので、気軽にできないことを相手に要求すると萎縮されてしまう可能性があります。
いきなりredmineでバグ報告をお願いすると、しりごみしてしまうかもしれません。:
駄目な例: 再現手順をMLかredmineに報告してもらえますか?
Groonga関連で気軽につぶやけないとなると開発者は困っている人を見つけられないし、利用者は困ったままとなるので、双方にとって嬉しくない状態になってしまいます。
対応
twitterでやりとりを完結できるようにします。
クエリの実現
Groongaのデータベースには大量のデータを格納し、その中から必要な部分を高速に取り出すことができます。必要な部分をGroongaのデータベースに問い合わせるためのクエリの表現と実行に関して、Groongaは複数の手段を用意しています。
クエリ実行のためのインタフェース
Groongaは低機能で単純なライブラリインタフェースから、高機能で複雑なコマンドインタフェースまでいくつかの階層的なインタフェースをユーザプログラムに提供しています。
クエリ実行のためのインタフェースも階層的なインタフェースのそれぞれに対応する形で用意されています。以下に低レイヤなインタフェースから順に説明します。
DB_API
DB_APIは、Groongaデータベースを操作するための一群のC言語向けAPI関数を提供します。DB_APIはデータベースを構成する個々の部分に対する単純な操作関数を提供します。DB_APIの機能を組み合わせることによって複雑なクエリを実行することができます。後述のすべてのクエリインタフェースはDB_APIの機能を組み合わせることによって実現されています。
grn_expr
grn_exprは、Groongaデータベースに対する検索処理や更新処理のための条件を表現するためのデータ構造で、複数の条件を再帰的に組み合わせてより複雑な条件を表現することができます。grn_exprによって表現されたクエリを実行するためには、grn_table_select()関数を使用します。
Groonga実行ファイル
Groongaデータベースを操作するためのコマンドインタープリタです。渡されたコマンドを解釈し、実行結果を返します。コマンドの実処理はC言語で記述されます。ユーザがC言語で定義した関数を新たなコマンドとしてGroonga実行ファイルに組み込むことができます。各コマンドはいくつかの文字列引数を受け取り、これをクエリとして解釈して実行します。引数をgrn_exprとして解釈するか、別の形式として解釈してDB_APIを使ってデータベースを操作するかはコマンド毎に自由に決めることができます。
grn_exprで表現できるクエリ
grn_exprは代入や関数呼び出しのような様々な操作を表現できますが、この中で検索クエリを表現するgrn_exprのことを特に条件式とよびます。条件式を構成する個々の要素を関係式と呼びます。条件式は一個以上の関係式か、あるいは条件式を論理演算子で結合したものです。
論理演算子は、以下の3種類があります。
&& (論理積)
|| (論理和)
! (否定)
関係式は、下記の11種類が用意されています。また、ユーザが定義した関数を新たな関係式として使うこともできます。
equal(==)
not_equal(!=)
less(<)
greater(>)
less_equal(<=)
greater_equal(>=)
contain()
near()
similar()
prefix()
suffix()
grn_table_select()
grn_table_select()関数は、grn_exprで表現された検索クエリを実行するときに使います。引数として、検索対象となるテーブル、クエリを表すgrn_expr、検索結果を格納するテーブル、それに検索にマッチしたレコードを検索結果にどのように反映するかを指定する演算子を渡します。演算子と指定できるのは下記の4種類です。
GRN_OP_OR
GRN_OP_AND
GRN_OP_BUT
GRN_OP_ADJUST
GRN_OP_ORは、検索対象テーブルの中からクエリにマッチするレコードを検索結果テーブルに加えます。GRN_OP_OR以外の演算子は、検索結果テーブルが空でない場合にだけ意味を持ちます。GRN_OP_ANDは、検索結果テーブルの中からクエリにマッチしないレコードを取り除きます。GRN_OP_BUTは、検索結果テーブルの中からクエリにマッチするレコードを取り除きます。GRN_OP_ADJUSTは、検索結果テーブルの中でクエリにマッチするレコードに対してスコア値の更新のみを行います。
grn_table_select()は、データベース上に定義されたテーブルや索引などを組み合わせて可能な限り高速に指定されたクエリを実行しようとします。
関係式
関係式は、検索しようとしているデータが満たすべき条件を、指定した値の間の関係として表現します。いずれの関係式も、その関係が成り立ったときに評価されるcallback、コールバック関数に渡されるargとを引数として指定することができます。callbackが与えられず、argのみが数値で与えられた場合はスコア値の係数とみなされます。主な関係式について説明します。
equal(v1, v2, arg, callback)
v1の値とv2の値が等しいことを表します。
not_equal(v1, v2, arg, callback)
v1の値とv2の値が等しくないことを表します。
less(v1, v2, arg, callback)
v1の値がv2の値よりも小さいことを表します。
greater(v1, v2, arg, callback)
v1の値がv2の値よりも大きいことを表します。
less_equal(v1, v2, arg, callback)
v1の値がv2の値と等しいか小さいことを表します。
greater_equal(v1, v2, arg, callback)
v1の値がv2の値と等しいか大きいことを表します。
contain(v1, v2, mode, arg, callback)
v1の値がv2の値を含んでいることを表します。また、v1の値が要素に分解されるとき、それぞれの要素に対して二つ目の要素が一致するためのmodeとして下記のいずれかを指定することができます。
EXACT: v2の値もv1の値と同様に要素に分解したとき、それぞれの要素が完全に一致する(デフォルト)
UNSPLIT: v2の値は要素に分解しない
PREFIX: v1の値の要素がv2の値に前方一致する
SUFFIX: v1の値の要素がv2の値に後方一致する
PARTIAL: v1の値の要素がv2の値に中間一致する
near(v1, v2, arg, callback)
v1の値の中に、v2の値の要素が接近して含まれていることを表します。(v2には値の配列を渡します)
similar(v1, v2, arg, callback)
v1の値とv2の値が類似していることを表します。
prefix(v1, v2, arg, callback)
v1の値がv2の値に対して前方一致することを表します。
suffix(v1, v2, arg, callback)
v1の値がv2の値に対して後方一致することを表します。
クエリの実例
grn_exprを使って様々な検索クエリを表現することができます。
検索例1
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 3);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);
tableのcolumnの値がstringを含むレコードをresultに返します。columnの値が'needle in
haystack'であるレコードr1と、columnの値が'haystack'であるレコードr2がtableに登録されていたとき、stringに'needle'を指定したなら、レコードr1のみがヒットします。
検索例2
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score1, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 5);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);
grn_obj_close(ctx, query);
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column2, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score2, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 5);
grn_table_select(ctx, table, query, result, GRN_OP_ADJUST);
grn_obj_close(ctx, query);
tableのcolumn1の値がstringにexactモードでヒットするレコードについて得られるスコア値にscore1を積算してresultにセットします。次に、resultにセットされたレコードのうち、column2の値がstringにexactモードでヒットするレコードについては、得られたスコア値にscore2を積算したものを、元のスコア値に加えます。
検索例3
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score1, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 5);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);
grn_obj_close(ctx, query);
if (grn_table_size(ctx, result) < t1) {
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, partial, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score2, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 3);
grn_table_select(ctx, table, query, result, GRN_OP_OR);
grn_obj_close(ctx, query);
}
tableのcolumn1の値がstringにexactモードでヒットするレコードについて得られるスコア値にscore1を積算してresultにセットします。得られた検索結果数がt1よりも小さい場合は、partialモードで再度検索し、ヒットしたレコードについて得られるスコア値にscore2を積算してresultに追加します。
検索例4
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 3);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);
tableのcolumnの値がstringに含まれるレコードをresultに返します。
columnの値が'needle'であるレコードr1と、columnの値が'haystack'であるレコードr2がtableに登録されていたとき、stringに'hay
in haystack'を指定したなら、レコードr2のみがヒットします。
リリース手順
前提条件
リリース手順の前提条件は以下の通りです。
• ビルド環境は Debian GNU/Linux (sid)
• コマンドラインの実行例はzsh
作業ディレクトリ例は以下を使用します。
• GROONGA_DIR=$HOME/work/groonga
• GROONGA_CLONE_DIR=$HOME/work/groonga/groonga.clean
• GROONGA_ORG_PATH=$HOME/work/groonga/groonga.org
• CUTTER_DIR=$HOME/work/cutter
• CUTTER_SOURCE_PATH=$HOME/work/cutter/cutter
ビルド環境の準備
以下にGroongaのリリース作業を行うために事前にインストール しておくべきパッケージを示します。
なお、ビルド環境としては Debian GNU/Linux
(sid)を前提として説明しているため、その他の環境では適宜読み替えて下さい。:
% sudo apt-get install -V debootstrap createrepo rpm mercurial python-docutils python-jinja2 ruby-full mingw-w64 g++-mingw-w64 mecab libmecab-dev nsis gnupg2 dh-autoreconf python-sphinx bison
Debian系(.deb)やRed Hat系(.rpm)パッケージのビルドには Vagrant
を使用します。apt-getでインストールできるのは古いバージョンなので、Webサイトから最新版をダウンロードしてインストールすることをおすすめします。
Vagrantで使用する仮想化ソフトウェア(VirtualBox、VMwareなど)がない場合、合わせてインストールしてください。なお、VirtualBoxはsources.listにcontribセクションを追加すればapt-getでインストールできます。:
% cat /etc/apt/sources.list
deb http://ftp.jp.debian.org/debian/ sid main contrib
deb-src http://ftp.jp.debian.org/debian/ sid main contrib
% sudo apt-get update
% sudo apt-get install virtualbox
また、rubyのrakeパッケージを以下のコマンドによりインストールします。:
% sudo gem install rake
パッケージ署名用秘密鍵のインポート
リリース作業ではRPMパッケージに対する署名を行います。 その際、パッケージ署名用の鍵が必要です。
Groongaプロジェクトでは署名用の鍵をリリース担当者の公開鍵で暗号化してリポジトリのpackages/ディレクトリ以下へと登録しています。
リリース担当者はリポジトリに登録された秘密鍵を復号した後に鍵のインポートを以下のコマンドにて行います。:
% cd packages
% gpg --decrypt release-key-secret.asc.gpg.(担当者) > (復号した鍵
ファイル)
% gpg --import (復号した鍵ファイル)
鍵のインポートが正常終了すると gpg --list-keys でGroongaの署名用の鍵を確認することができます。:
pub 1024R/F10399C0 2012-04-24
uid groonga Key (groonga Official Signing Key)
<packages@groonga.org>
sub 1024R/BC009774 2012-04-24
鍵をインポートしただけでは使用することができないため、インポートした鍵に対してtrust,signを行う必要があります。
以下のコマンドを実行して署名を行います。(途中の選択肢は省略):
% gpg --edit-key packages@groonga.org
gpg> trust
gpg> sign
gpg> save
gpg> quit
この作業は、新規にリリースを行うことになった担当者やパッケージに署名する鍵に変更があった場合などに行います。
リリース作業用ディレクトリの作成
Groongaのリリース作業ではリリース専用の環境下(コンパイルフラグ)でビルドする必要があります。
リリース時と開発時でディレクトリを分けずに作業することもできますが、誤ったコンパイルフラグでリリースしてしまう危険があります。
そのため、以降の説明では$GROONGA_DIR以下のディレクトリにリリース用の作業ディレクトリ(groonga.clean)としてソースコードをcloneしたものとして説明します。
リリース用のクリーンな状態でソースコードを取得するために$GROONGA_DIRにて以下のコマンドを実行します。:
% git clone --recursive git@github.com:groonga/groonga.git groonga.clean
この作業はリリース作業ごとに行います。
変更点のまとめ
前回リリース時からの変更点を$GROONGA_CLONE_DIR/doc/source/news.txtにまとめます。
ここでまとめた内容についてはリリースアナウンスにも使用します。
前回リリースからの変更履歴を参照するには以下のコマンドを実行します。:
% git log -p --reverse $(git tag | tail -1)..
ログを^commitで検索しながら、以下の基準を目安として変更点を追記していきます。
含めるもの
• ユーザへ影響するような変更
• 互換性がなくなるような変更
含めないもの
• 内部的な変更(変数名の変更やらリファクタリング)
Groongaのウェブサイトの取得
GroongaのウェブサイトのソースはGroonga同様にgithubにリポジトリを置いています。
リリース作業では後述するコマンド(make
update-latest-release)にてトップページのバージョンを置き換えることができるようになっています。
Groongaのウェブサイトのソースコードを$GROONGA_ORG_PATHとして取得するためには、$GROONGA_DIRにて以下のコマンドを実行します。:
% git clone git@github.com:groonga/groonga.org.git
これで、$GROONGA_ORG_PATHにgroonga.orgのソースを取得できます。
cutterのソースコード取得
Groongaのリリース作業では、cutterに含まれるスクリプトを使用しています。
そこであらかじめ用意しておいた$HOME/work/cutterディレクトリにてcutterのソースコードを以下のコマンドにて取得します。:
% git clone git@github.com:clear-code/cutter.git
これで、$CUTTER_SOURCE_PATHディレクトリにcutterのソースを取得できます。
configureスクリプトの生成
Groongaのソースコードをcloneした時点ではconfigureスクリプトが含まれておらず、そのままmakeコマンドにてビルドすることができません。
$GROONGA_CLONE_DIRにてautogen.shを以下のように実行します。:
% sh autogen.sh
このコマンドの実行により、configureスクリプトが生成されます。
configureスクリプトの実行
Makefileを生成するためにconfigureスクリプトを実行します。
リリース用にビルドするためには以下のオプションを指定してconfigureを実行します。:
% ./configure \
--prefix=/tmp/local \
--with-launchpad-uploader-pgp-key=(Launchpadに登録したkeyID) \
--with-groonga-org-path=$HOME/work/groonga/groonga.org \
--enable-document \
--with-ruby \
--enable-mruby \
--with-cutter-source-path=$HOME/work/cutter/cutter
configureオプションである--with-groonga-org-pathにはGroongaのウェブサイトのリポジトリをcloneした場所を指定します。
configureオプションである--with-cutter-source-pathにはcutterのソースをcloneした場所を指定します。
以下のようにGroongaのソースコードをcloneした先からの相対パスを指定することもできます。:
% ./configure \
--prefix=/tmp/local \
--with-launchpad-uploader-pgp-key=(Launchpadに登録したkeyID) \
--with-groonga-org-path=../groonga.org \
--enable-document \
--with-ruby \
--enable-mruby \
--with-cutter-source-path=../../cutter/cutter
あらかじめpackagesユーザでpackages.groonga.orgにsshログインできることを確認しておいてください。
ログイン可能であるかの確認は以下のようにコマンドを実行して行います。:
% ssh packages@packages.groonga.org
make update-latest-releaseの実行
make
update-latest-releaseコマンドでは、OLD_RELEASE_DATEに前回のリリースの日付を、NEW_RELEASE_DATEに次回リリースの日付を指定します。
2.0.2のリリースを行った際は以下のコマンドを実行しました。::
% make update-latest-release OLD_RELEASE=2.0.1 OLD_RELEASE_DATE=2012-03-29 NEW_RELEASE_DATE=2012-04-29
これにより、clone済みのGroongaのWebサイトのトップページのソース(index.html,ja/index.html)やRPMパッケージのspecファイルのバージョン表記などが更新されます。
make update-filesの実行
ロケールメッセージの更新や変更されたファイルのリスト等を更新するために以下のコマンドを実行します。:
% make update-files
make update-filesを実行すると新規に追加されたファイルなどが各種.amファイルへとリストアップされます。
リリースに必要なファイルですので漏れなくコミットします。
make update-poの実行
ドキュメントの最新版と各国語版の内容を同期するために、poファイルの更新を以下のコマンドにて実行します。:
% make update-po
make update-poを実行すると、doc/locale/ja/LC_MESSAGES以下の各種.poファイルが更新されます。
poファイルの翻訳
make update-poコマンドの実行により更新した各種.poファイルを翻訳します。
翻訳結果をHTMLで確認するために、以下のコマンドを実行します。:
% make -C doc/locale/ja html
% make -C doc/locale/en html
確認が完了したら、翻訳済みpoファイルをコミットします。
リリースタグの設定
リリース用のタグを打つには以下のコマンドを実行します。:
% make tag
NOTE:
タグを打った後にconfigureを実行することで、ドキュメント生成時のバージョン番号に反映されます。
リリース用アーカイブファイルの作成
リリース用のソースアーカイブファイルを作成するために以下のコマンドを$GROONGA_CLONE_DIRにて実行します。:
% make dist
これにより$GROONGA_CLONE_DIR/groonga-(バージョン).tar.gzが作成されます。
NOTE:
タグを打つ前にmake distを行うとversionが古いままになることがあります。 するとgroonga
--versionで表示されるバージョン表記が更新されないので注意が必要です。 make
distで生成したtar.gzのversionおよびversion.shがタグと一致することを確認するのが望ましいです。
パッケージのビルド
リリース用のアーカイブファイルができたので、パッケージ化する作業を行います。
パッケージ化作業は以下の3種類を対象に行います。
• Debian系(.deb)
• Red Hat系(.rpm)
• Windows系(.exe,.zip)
パッケージのビルドではいくつかのサブタスクから構成されています。
ビルド用パッケージのダウンロード
debパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。:
% cd packages/apt
% make download
これにより、lucid以降の関連する.debパッケージやソースアーカイブなどがカレントディレクトリ以下へとダウンロードされます。
rpmパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。:
% cd packages/yum
% make download
これにより、GroongaやMySQLのRPM/SRPMパッケージなどがカレントディレクトリ以下へとダウンロードされます。
Windowsパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。:
% cd packages/windows
% make download
これにより、Groongaのインストーラやzipアーカイブがカレントディレクトリ以下へとダウンロードされます。
sourceパッケージに必要なものをダウンロードするには以下のコマンドを実行します。:
% cd packages/source
% make download
これにより過去にリリースしたソースアーカイブ(.tar.gz)が
packages/source/filesディレクトリ以下へとダウンロードされます。
Debian系パッケージのビルド
Groongaのpackages/aptサブディレクトリに移動して、以下のコマンドを実行します。:
% cd packages/apt
% make build PALALLEL=yes
make build
PALALLEL=yesコマンドを実行すると、ディストリビューションのリリースとアーキテクチャの組み合わせでビルドを平行して行うことができます。
現在サポートされているのは以下の通りです。
• Debian GNU/Linux
• wheezy i386/amd64
• jessie i386/amd64
正常にビルドが終了すると$GROONGA_CLONE_DIR/packages/apt/repositories配下に.debパッケージが生成されます。
make build ではまとめてビルドできないこともあります。
その場合にはディストリビューションごとやアーキテクチャごとなど、個別にビルドすることで問題が発生している箇所を切り分ける必要があります。
生成したパッケージへの署名を行うには以下のコマンドを実行します。:
% make sign-packages
リリース対象のファイルをリポジトリに反映するには以下のコマンドを実行します。:
% make update-repository
リポジトリにGnuPGで署名を行うために以下のコマンドを実行します。:
% make sign-repository
Red Hat系パッケージのビルド
Groongaのpackages/yumサブディレクトリに移動して、以下のコマンドを実行します。:
% cd packages/yum
% make build PALALLEL=yes
make build
PALALLEL=yesコマンドを実行すると、ディストリビューションのリリースとアーキテクチャの組み合わせでビルドを平行して行うことができます。
現在サポートされているのは以下の通りです。
• centos-5 i386/x86_64
• centos-6 i386/x86_64
• centos-7 i386/x86_64
ビルドが正常終了すると$GROONGA_CLONE_DIR/packages/yum/repositories配下にRPMパッケージが生成されます。
• repositories/yum/centos/5/i386/Packages
• repositories/yum/centos/5/x86_64/Packages
• repositories/yum/centos/6/i386/Packages
• repositories/yum/centos/6/x86_64/Packages
• repositories/yum/centos/7/i386/Packages
• repositories/yum/centos/7/x86_64/Packages
リリース対象のRPMに署名を行うには以下のコマンドを実行します。:
% make sign-packages
リリース対象のファイルをリポジトリに反映するには以下のコマンドを実行します。:
% make update-repository
Windows用パッケージのビルド
packages/windowsサブディレクトリに移動して、以下のコマンドを実行します。:
% cd packages/windows
% make build
% make package
% make installer
make
releaseを実行することでbuildからuploadまで一気に実行することができますが、途中で失敗することもあるので順に実行することをおすすめします。
make buildでクロスコンパイルを行います。
正常に終了するとdist-x64/dist-x86ディレクトリ以下にx64/x86バイナリを作成します。
make packageが正常に終了するとzipアーカイブをfilesディレクトリ以下に作成します。
make installerが正常に終了するとWindowsインストーラをfilesディレクトリ以下に作成します。
パッケージの動作確認
ビルドしたパッケージに対しリリース前の動作確認を行います。
Debian系もしくはRed
Hat系の場合には本番環境へとアップロードする前にローカルのaptないしyumのリポジトリを参照して正常に更新できることを確認します。
ここでは以下のようにrubyを利用してリポジトリをwebサーバ経由で参照できるようにします。:
% ruby -run -e httpd -- packages/yum/repositories (yumの場合)
% ruby -run -e httpd -- packages/apt/repositories (aptの場合)
grntestの準備
grntestを実行するためにはGroongaのテストデータとgrntestのソースが必要です。
まずGroongaのソースを任意のディレクトリへと展開します。:
% tar zxvf groonga-(バージョン).tar.gz
次にGroongaのtest/functionディレクトリ以下にgrntestのソースを展開します。
つまりtest/function/grntestという名前でgrntestのソースを配置します。:
% ls test/function/grntest/
README.md binlib license test
grntestの実行方法
grntestではGroongaコマンドを明示的にしていすることができます。
後述のパッケージごとのgrntestによる動作確認では以下のようにして実行します。:
% GROONGA=(groongaのパス指定) test/function/run-test.sh
最後にgrntestによる実行結果が以下のようにまとめて表示されます。:
55 tests, 52 passes, 0 failures, 3 not checked tests.
94.55% passed.
grntestでエラーが発生しないことを確認します。
Debian系の場合
Debian系の場合の動作確認手順は以下の通りとなります。
• 旧バージョンをchroot環境へとインストールする
• chroot環境の/etc/hostsを書き換えてpackages.groonga.orgがホストを 参照するように変更する
• ホストでwebサーバを起動してドキュメントルートをビルド環境のもの (repositories/apt/packages)に設定する
• アップグレード手順を実行する
• grntestのアーカイブを展開してインストールしたバージョンでテストを実 行する
• grntestの正常終了を確認する
Red Hat系の場合
Red Hat系の場合の動作確認手順は以下の通りとなります。
• 旧バージョンをchroot環境へとインストール
• chroot環境の/etc/hostsを書き換えてpackages.groonga.orgがホストを参照するように変更する
• ホストでwebサーバを起動してドキュメントルートをビルド環境のもの(packages/yum/repositories)に設定する
• アップグレード手順を実行する
• grntestのアーカイブを展開してインストールしたバージョンでテストを実行する
• grntestの正常終了を確認する
Windows向けの場合
• 新規インストール/上書きインストールを行う
• grntestのアーカイブを展開してインストールしたバージョンでテストを実行する
• grntestの正常終了を確認する
zipアーカイブも同様にしてgrntestを実行し動作確認を行います。
リリースアナウンスの作成
リリースの際にはリリースアナウンスを流して、Groongaを広く通知します。
news.txtに変更点をまとめましたが、それを元にリリースアナウンスを作成します。
リリースアナウンスには以下を含めます。
• インストール方法へのリンク
• リリースのトピック紹介
• リリース変更点へのリンク
• リリース変更点(news.txtの内容)
リリースのトピック紹介では、これからGroongaを使う人へアピールする点や既存のバージョンを利用している人がアップグレードする際に必要な情報を提供します。
非互換な変更が含まれるのであれば、回避方法等の案内を載せることも重要です。
参考までに過去のリリースアナウンスへのリンクを以下に示します。
• [Groonga-talk] [ANN] Groonga 2.0.2
• http://sourceforge.net/mailarchive/message.php?msg_id=29195195
• [groonga-dev,00794] [ANN] Groonga 2.0.2
• http://osdn.jp/projects/groonga/lists/archive/dev/2012-April/000794.html
パッケージのアップロード
動作確認が完了し、Debian系、Red
Hat系、Windows向け、ソースコードそれぞれにおいてパッケージやアーカイブのアップロードを行います。
Debian系のパッケージのアップロードには以下のコマンドを実行します。:
% cd packages/apt
% make upload
Red Hat系のパッケージのアップロードには以下のコマンドを実行します。:
% cd packages/yum
% make upload
Windows向けのパッケージのアップロードには以下のコマンドを実行します。:
% cd packages/windows
% make upload
ソースアーカイブのアップロードには以下のコマンドを実行します。:
% cd packages/source
% make upload
アップロードが正常終了すると、リリース対象のリポジトリデータやパッケージ、アーカイブ等がpackages.groonga.orgへと反映されます。
Ubuntu用パッケージのアップロード
Ubuntu向けのパッケージのアップロードには以下のコマンドを実行します。:
% cd packages/ubuntu
% make upload
現在サポートされているのは以下の通りです。
• precise i386/amd64
• trusty i386/amd64
• vivid i386/amd64
アップロードが正常終了すると、launchpad.net上でビルドが実行され、ビルド結果がメールで通知されます。ビルドに成功すると、リリース対象のパッケージがlaunchpad.netのGroongaチームのPPAへと反映されます。公開されているパッケージは以下のURLで確認できます。
https://launchpad.net/~groonga/+archive/ubuntu/ppa
blogroonga(ブログ)の更新
http://groonga.org/blog/ および http://groonga.org/blog/ にて公開されているリリース案内を作成します。
基本的にはリリースアナウンスの内容をそのまま記載します。
cloneしたWebサイトのソースに対して以下のファイルを新規追加します。
• groonga.org/en/_post/(リリース日)-release.md
• groonga.org/ja/_post/(リリース日)-release.md
編集した内容をpushする前に確認したい場合にはJekyllおよびRedCloth(Textileパーサー)、RDiscount(Markdownパーサー)、JavaScript
interpreter(therubyracer、Node.jsなど)が必要です。 インストールするには以下のコマンドを実行します。:
% sudo gem install jekyll RedCloth rdiscount therubyracer
jekyllのインストールを行ったら、以下のコマンドでローカルにwebサーバを起動します。:
% jekyll serve --watch
あとはブラウザにてhttp://localhost:4000にアクセスして内容に問題がないかを確認します。
NOTE:
記事を非公開の状態でアップロードするには.mdファイルのpublished:をfalseに設定します。:
---
layout: post.en
title: Groonga 2.0.5 has been released
published: false
---
ドキュメントのアップロード
doc/source以下のドキュメントを更新、翻訳まで完了している状態で、ドキュメントのアップロード作業を行います。
そのためにはまず以下のコマンドを実行します。:
% make update-document
これによりcloneしておいたgroonga.orgのdoc/locale以下に更新したドキュメントがコピーされます。
生成されているドキュメントに問題のないことを確認できたら、コミット、pushしてgroonga.orgへと反映します。
Homebrewの更新
OS Xでのパッケージ管理方法として Homebrew があります。
Groongaを簡単にインストールできるようにするために、Homebrewへpull requestを送ります。
https://github.com/mxcl/homebrew
すでにGroongaのFormulaは取り込まれているので、リリースのたびにFormulaの内容を更新する作業を実施します。
Groonga 3.0.6のときは以下のように更新してpull requestを送りました。
https://github.com/mxcl/homebrew/pull/21456/files
上記URLを参照するとわかるようにソースアーカイブのurlとsha1チェックサムを更新します。
リリースアナウンス
作成したリリースアナウンスをメーリングリストへと流します。
• groonga-dev groonga-dev@lists.osdn.me
• Groonga-talk groonga-talk@lists.sourceforge.net
Twitterでリリースアナウンスをする
blogroongaのリリースエントリには「リンクをあなたのフォロワーに共有する」ためのツイートボタンがあるので、そのボタンを使ってリリースアナウンスします。(画面下部に配置されている)
このボタンを経由する場合、ツイート内容に自動的にリリースタイトル(「groonga
2.0.8リリース」など)とblogroongaのリリースエントリのURLが挿入されます。
この作業はblogroongaの英語版、日本語版それぞれで行います。
あらかじめgroongaアカウントでログインしておくとアナウンスを円滑に行うことができます。
以上でリリース作業は終了です。
リリース後にやること
リリースアナウンスを流し終えたら、次期バージョンの開発が始まります。
• Groonga プロジェクトの新規バージョン追加
• Groonga のbase_versionの更新
Groonga プロジェクトの新規バージョン追加
Groonga プロジェクトの設定ページ にて新規バージョンを追加します。(例: release-2.0.6)
Groonga バージョン更新
$GROONGA_CLONE_DIRにて以下のコマンドを実行します。:
% make update-version NEW_VERSION=2.0.6
これにより$GROONGA_CLONE_DIR/base_versionが更新されるのでコミットしておきます。
NOTE:
base_versionはtar.gzなどのリリース用のファイル名で使用します。
ビルド時のTIPS
ビルドを並列化したい
make build PALALLEL=yesを指定するとchroot環境で並列にビルドを 実行できます。
特定の環境向けのみビルドしたい
Debian系の場合、CODES,ARCHITECTURESオプションを明示的に指定することで、特定のリリース、アーキテクチャのみビルドすることができます。
squeezeのi386のみビルドしたい場合には以下のコマンドを実行します。:
% make build ARCHITECTURES=i386 CODES=squeeze
buildコマンド以外でも build-package-deb
build-repository-debなどのサブタスクでもARCHITECTURES,CODES指定は有効です。
Red
Hat系の場合、ARCHITECTURES,DISTRIBUTIONSオプションを明示的に指定することで、特定のリリース、アーキテクチャのみビルドすることができます。
fedoraのi386のみビルドしたい場合には以下のコマンドを実行します。:
% make build ARCHITECTURES=i386 DISTRIBUTIONS=fedora
buildコマンド以外でも build-in-chroot
build-repository-rpmなどのサブタスクでもARCHITECTURES,DISTRIBUTIONSの指定は有効です。
centosの場合、CENTOS_VERSIONSを指定することで特定のバージョンのみビルドすることができます。
パッケージの署名用のパスフレーズを知りたい
パッケージの署名に必要な秘密鍵のパスフレーズについては
リリース担当者向けの秘密鍵を復号したテキストの1行目に記載してあります。
バージョンを明示的に指定してドキュメントを生成したい
リリース後にドキュメントの一部を差し替えたい場合、特に何も指定しないと生成したHTMLに埋め込まれるバージョンが「v3.0.1-xxxxxドキュメント」となってしまうことがあります。gitでのコミット時ハッシュの一部が使われるためです。
これを回避するには、以下のようにDOCUMENT_VERSIONやDOCUMENT_VERSION_FULLを明示的に指定します。:
% make update-document DOCUMENT_VERSION=3.0.1 DOCUMENT_VERSION_FULL=3.0.1
テスト方法
TODO: Write in English.
TODO: Write about test/command/run-test.sh.
テスト環境の構築
Cutterのインストール
Groongaは、テストのフレームワークとして Cutter を用いています。
Cutterのインストール方法は プラットフォーム毎のCutterのインストール方法 をご覧下さい。
lcovのインストール
カバレッジ情報を計測するためには、lcov
1.6以上が必要です。DebianやUbuntuでは以下のようにしてインストールできます。:
% sudo aptitude install -y lcov
clangのインストール
ソースコードの静的解析を行うためには、clang(scan-build)をインストールする必要があります。DebianやUbuntuでは以下のようにしてインストールできます。:
% sudo aptitude install -y clang
libmemcachedのインストール
memcachedのバイナリプロトコルのテストを動作させるためには、libmemcachedの導入が必要です。squeeze以降のDebianやKarmic以降のUubntuでは以下の用にしてインストールできます。:
% sudo aptitude install -y libmemcached-dev
テストの動作
Groongaのトップディレクトリで、以下のコマンドを実行します。:
make check
カバレッジ情報
Groongaのトップディレクトリで、以下のコマンドを実行します。:
make coverage
すると、coverageディレクトリ以下に、カバレッジ情報が入ったhtmlが出力されます。
カバレッジには、Lines/Functions/Branchesの3つの対象があります。それぞれ、行/関数/分岐に対応します。Functionsがもっとも重要な対象です。すべての関数がテストされるようになっていることを心がけてください。
テストがカバーしていない部分の編集は慎重に行ってください。また、テストがカバーしている部分を増やすことも重要です。
様々なテスト
テストは、test/unitディレクトリにおいて、./run-test.shを実行することによっても行えます。run-test.shはいくつかのオプションをとります。詳細は、./run-test.sh
--helpを実行しヘルプをご覧ください。
特定のテスト関数のみテストする
特定のテスト関数(Cutterではテストと呼ぶ)のみをテストすることができます。
実行例:
% ./run-test.sh -n test_text_otoj
特定のテストファイルのみテストする
特定のテストファイル(Cutterではテストケースと呼ぶ)のみテストすることができます。
実行例:
% ./run-test.sh -t test_string
不正メモリアクセス・メモリリーク検出
環境変数CUTTER_CHECK_LEAKをyesと設定すると、valgrindを用いて不正メモリアクセスやメモリリークを検出しつつ、テストを動作させることができます。
run-test.shのみならず、make checkでも利用可能です。
実行例:
% CUTTER_CHECK_LEAK=yes make check
デバッガ上でのテスト実行
環境変数CUTTER_DEBUGをyesと設定すると、テストが実行できる環境が整ったgdbが実行されます。gdb上でrunを行うと、テストの実行が開始されます。
run-test.shのみならず、make checkでも利用可能です。
実行例:
% CUTTER_DEBUG=yes make check
静的解析
scan-buildを用いて、ソースコードの静的解析を行うことができます。scan_buildというディレクトリに解析結果のhtmlが出力されます。:
% scan-build ./configure --prefix=/usr
% make clean
% scan-build -o ./scan_build make -j4
configureは1度のみ実行する必要があります。
• genindex
• modindex
• search
AUTHOR
Groonga Project
COPYRIGHT
2009-2016, Brazil, Inc
6.0.1 March 28, 2016 GROONGA(1)