Provided by: groonga-bin_6.0.1-1ubuntu1_amd64
NAME
groonga - Groonga documentation • news
CHARACTERISTICS OF GROONGA
Groonga overview Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of Groonga is that a newly registered document instantly appears in search results. Also, Groonga allows updates without read locks. These characteristics result in superior performance on real-time applications. Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, Groonga can cover weakness of row-oriented systems. The basic functions of Groonga are provided in a C library. Also, libraries for using Groonga in other languages, such as Ruby, are provided by related projects. In addition, groonga-based storage engines are provided for MySQL and PostgreSQL. These libraries and storage engines allow any application to use Groonga. See usage examples. Full text search and Instant update In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears in the result of the next query. In contrast, some full text search engines do not support instant updates, because it is difficult to dynamically update inverted indexes, the underlying data structure. Groonga also uses inverted indexes but supports instant updates. In addition, Groonga allows you to search documents even when updating the document collection. Due to these superior characteristics, Groonga is very flexible as a full text search engine. Also, Groonga always shows good performance because it divides a large task, inverted index merging, into smaller tasks. Column store and aggregate query People can collect more than enough data in the Internet era. However, it is difficult to extract informative knowledge from a large database, and such a task requires a many-sided analysis through trial and error. For example, search refinement by date, time and location may reveal hidden patterns. Aggregate queries are useful to perform this kind of tasks. An aggregate query groups search results by specified column values and then counts the number of records in each group. For example, an aggregate query in which a location column is specified counts the number of records per location. Making a graph from the result of an aggregate query against a date column is an easy way to visualize changes over time. Also, a combination of refinement by location and an aggregate query against a date column allows visualization of changes over time in specific location. Thus refinement and aggregation are important to perform data mining. A column-oriented architecture allows Groonga to efficiently process aggregate queries because a column-oriented database, which stores records by column, allows an aggregate query to access only a specified column. On the other hand, an aggregate query on a row-oriented database, which stores records by row, has to access neighbor columns, even though those columns are not required. Inverted index and tokenizer An inverted index is a traditional data structure used for large-scale full text search. A search engine based on inverted index extracts index terms from a document when it is added. Then in retrieval, a query is divided into index terms to find documents containing those index terms. In this way, index terms play an important role in full text search and thus the way of extracting index terms is a key to a better search engine. A tokenizer is a module to extract index terms. A Japanese full text search engine commonly uses a word-based tokenizer (hereafter referred to as a word tokenizer) and/or a character-based n-gram tokenizer (hereafter referred to as an n-gram tokenizer). A word tokenizer-based search engine is superior in time, space and precision, which is the fraction of relevant documents in a search result. On the other hand, an n-gram tokenizer-based search engine is superior in recall, which is the fraction of retrieved documents in the perfect search result. The best choice depends on the application in practice. Groonga supports both word and n-gram tokenizers. The simplest built-in tokenizer uses spaces as word delimiters. Built-in n-gram tokenizers (n = 1, 2, 3) are also available by default. In addition, a yet another built-in word tokenizer is available if MeCab, a part-of-speech and morphological analyzer, is embedded. Note that a tokenizer is pluggable and you can develop your own tokenizer, such as a tokenizer based on another part-of-speech tagger or a named-entity recognizer. Sharable storage and read lock-free Multi-core processors are mainstream today and the number of cores per processor is increasing. In order to exploit multiple cores, executing multiple queries in parallel or dividing a query into sub-queries for parallel processing is becoming more important. A database of Groonga can be shared with multiple threads/processes. Also, multiple threads/processes can execute read queries in parallel even when another thread/process is executing an update query because Groonga uses read lock-free data structures. This feature is suited to a real-time application that needs to update a database while executing read queries. In addition, Groonga allows you to build flexible systems. For example, a database can receive read queries through the built-in HTTP server of Groonga while accepting update queries through MySQL. Geo-location (latitude and longitude) search Location services are getting more convenient because of mobile devices with GPS. For example, if you are going to have lunch or dinner at a nearby restaurant, a local search service for restaurants may be very useful, and for such services, fast geo-location search is becoming more important. Groonga provides inverted index-based fast geo-location search, which supports a query to find points in a rectangle or circle. Groonga gives high priority to points near the center of an area. Also, Groonga supports distance measurement and you can sort points by distance from any point. Groonga library The basic functions of Groonga are provided in a C library and any application can use Groonga as a full text search engine or a column-oriented database. Also, libraries for languages other than C/C++, such as Ruby, are provided in related projects. See related projects for details. Groonga server Groonga provides a built-in server command which supports HTTP, the memcached binary protocol and the Groonga Query Transfer Protocol (/spec/gqtp). Also, a Groonga server supports query caching, which significantly reduces response time for repeated read queries. Using this command, Groonga is available even on a server that does not allow you to install new libraries. Mroonga storage engine Groonga works not only as an independent column-oriented DBMS but also as storage engines of well-known DBMSs. For example, Mroonga is a MySQL pluggable storage engine using Groonga. By using Mroonga, you can use Groonga for column-oriented storage and full text search. A combination of a built-in storage engine, MyISAM or InnoDB, and a Groonga-based full text search engine is also available. All the combinations have good and bad points and the best one depends on the application. See related projects for details.
INSTALL
This section describes how to install Groonga on each environment. There are packages for major platforms. It's recommended that you use package instead of building Groonga by yourself. But don't warry. There is a document about building Groonga from source. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. Windows This section describes how to install Groonga on Windows. You can install Groogna by extracting a zip package or running an installer. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. Installer For 32-bit environment, download x86 executable binary from packages.groonga.org: • http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.exe Then run it. For 64-bit environment, download x64 executable binary from packages.goronga.org: • http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.exe Then run it. Use command prompt in start menu to run /reference/executables/groonga. zip For 32-bit environment, download x86 zip archive from packages.groonga.org: • http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.zip Then extract it. For 64-bit environment, download x64 zip archive from packages.groonga.org: • http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.zip Then extract it. You can find /reference/executables/groonga in bin folder. Build from source First, you need to install required tools for building Groonga on Windows. Here are required tools: • Microsoft Visual Studio Express 2013 for Windows Desktop • CMake Download zipped source from packages.groonga.org: • http://packages.groonga.org/source/groonga/groonga-6.0.1.zip Then extract it. Move to the Groonga's source folder: > cd c:\Users\%USERNAME%\Downloads\groonga-6.0.1 Configure by cmake. The following commnad line is for 64-bit version. To build 32-bit version, use -G "Visual Studio 12 2013" parameter instead: groonga-6.0.1> cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga Build: groonga-6.0.1> cmake --build . --config Release Install: groonga-6.0.1> cmake --build . --config Release --target Install After the above steps, /reference/executables/groonga is found at c:\Groonga\bin\groonga.exe. Mac OS X This section describes how to install Groonga on Mac OS X. You can install Groonga by MacPorts or Homebrew. MacPorts Install: % sudo port install groonga Homebrew Install: % brew install groonga If you want to use MeCab as a tokenizer, specify --with-mecab option: % brew install groonga --with-mecab Then install and configure MeCab dictionary. Install: % brew install mecab-ipadic Configure: % sed -i '' -e 's,dicrc.*=.*,dicrc = /usr/local/lib/mecab/dic/ipadic,g' /usr/local/etc/mecabrc Build from source Install Xcode. Download source: % curl -O http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % tar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 Configure (see source-configure about configure options): % ./configure Build: % make -j$(/usr/sbin/sysctl -n hw.ncpu) Install: % sudo make install Debian GNU/Linux This section describes how to install Groonga related deb packages on Debian GNU/Linux. You can install them by apt. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. wheezy Add the Groonga apt repository. /etc/apt/sources.list.d/groonga.list: deb http://packages.groonga.org/debian/ wheezy main deb-src http://packages.groonga.org/debian/ wheezy main Install: % sudo apt-get update % sudo apt-get install -y --allow-unauthenticated groonga-keyring % sudo apt-get update % sudo apt-get install -y -V groonga NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt-get install -y -V groonga-tokenizer-mecab If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem package. Install groonga-token-filter-stem package: % sudo apt-get install -y -V groonga-token-filter-stem There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. Install groonga-munin-plugins package: % sudo apt-get install -y -V groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo apt-get install -y -V groonga-normalizer-mysql jessie New in version 5.0.3. Add the Groonga apt repository. /etc/apt/sources.list.d/groonga.list: deb http://packages.groonga.org/debian/ jessie main deb-src http://packages.groonga.org/debian/ jessie main Install: % sudo apt-get update % sudo apt-get install -y --allow-unauthenticated groonga-keyring % sudo apt-get update % sudo apt-get install -y -V groonga NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt-get install -y -V groonga-tokenizer-mecab If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem package. Install groonga-token-filter-stem package: % sudo apt-get install -y -V groonga-token-filter-stem There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. Install groonga-munin-plugins package: % sudo apt-get install -y -V groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo apt-get install -y -V groonga-normalizer-mysql Build from source Install required packages to build Groonga: % sudo apt-get install -y -V wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev Download source: % wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % tar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 Configure (see source-configure about configure options): % ./configure Build: % make -j$(grep '^processor' /proc/cpuinfo | wc -l) Install: % sudo make install Ubuntu This section describes how to install Groonga related deb packages on Ubuntu. You can install them by apt. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. PPA (Personal Package Archive) The Groonga APT repository for Ubuntu uses PPA (Personal Package Archive) on Launchpad. You can install Groonga by APT from the PPA. Here are supported Ubuntu versions: • 12.04 LTS Precise Pangolin • 14.04 LTS Trusty Tahr • 15.04 Vivid Vervet • 15.10 Wily Werewolf Enable the universe repository to install Groonga: % sudo apt-get -y install software-properties-common % sudo add-apt-repository -y universe Add the ppa:groonga/ppa PPA to your system: % sudo add-apt-repository -y ppa:groonga/ppa % sudo apt-get update Install: % sudo apt-get -y install groonga NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo apt-get -y install groonga-tokenizer-mecab If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem package. Install groonga-token-filter-stem package: % sudo apt-get -y install groonga-token-filter-stem There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. Install groonga-munin-plugins package: % sudo apt-get -y install groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo apt-get -y install groonga-normalizer-mysql Build from source Install required packages to build Groonga: % sudo apt-get -V -y install wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev Download source: % wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % tar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 Configure (see source-configure about configure options): % ./configure Build: % make -j$(grep '^processor' /proc/cpuinfo | wc -l) Install: % sudo make install CentOS This section describes how to install Groonga related RPM packages on CentOS. You can install them by yum. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. CentOS 5 Install: % sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm % sudo yum makecache % sudo yum install -y groonga NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y groonga-tokenizer-mecab There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. NOTE: Groonga-munin-plugins package requires munin-node package that isn't included in the official CentOS repository. You need to enable Repoforge (RPMforge) repository or EPEL repository to install it by yum. Enable Repoforge (RPMforge) repository on i386 environment: % wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.i386.rpm % sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.i386.rpm Enable Repoforge (RPMforge) repository on x86_64 environment: % wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.x86_64.rpm % sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm Enable EPEL repository on any environment: % wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm % sudo rpm -ivh epel-release-5-4.noarch.rpm Install groonga-munin-plugins package: % sudo yum install -y groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo yum install -y groonga-normalizer-mysql CentOS 6 Install: % sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm % sudo yum makecache % sudo yum install -y groonga NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y groonga-tokenizer-mecab There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. NOTE: Groonga-munin-plugins package requires munin-node package that isn't included in the official CentOS repository. You need to enable EPEL repository to install it by yum. Enable EPEL repository on any environment: % sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm Install groonga-munin-plugins package: % sudo yum install -y groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo yum install -y groonga-normalizer-mysql CentOS 7 Install: % sudo yum install -y http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm % sudo yum install -y groonga NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y groonga-tokenizer-mecab There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. NOTE: Groonga-munin-plugins package requires munin-node package that isn't included in the official CentOS repository. You need to enable EPEL repository to install it by yum. Enable EPEL repository: % sudo yum install -y epel-release Install groonga-munin-plugins package: % sudo yum install -y groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo yum install -y groonga-normalizer-mysql Build from source Install required packages to build Groonga: % sudo yum install -y wget tar gcc-c++ make mecab-devel Download source: % wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % tar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 Configure (see source-configure about configure options): % ./configure Build: % make -j$(grep '^processor' /proc/cpuinfo | wc -l) Install: % sudo make install Fedora This section describes how to install Groonga related RPM packages on Fedora. You can install them by yum. NOTE: Since Groonga 3.0.2 release, Groonga related RPM pakcages are in the official Fedora yum repository (Fedora 18). So you can use them instead of the Groonga yum repository now. There is some exceptions to use the Groonga yum repository because mecab dictionaries (mecab-ipadic or mecab-jumandic) are provided by the Groonga yum repository. We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package for server. You should use a 32-bit package just only for tests or development. You will encounter an out of memory error with a 32-bit package even if you just process medium size data. Fedora 21 Install: % sudo yum install -y groonga Note that additional packages such as mecab-dic and mecab-jumandic packages require to install groonga-release package which provides the Groonga yum repository beforehand: % sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm % sudo yum update NOTE: groonga package is the minimum set of fulltext search engine. If you want to use Groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (GQTP protocol based server package) See /server section about details. If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package. Install groonga-tokenizer-mecab package: % sudo yum install -y groonga-tokenizer-mecab Then install MeCab dictionary. (mecab-ipadic or mecab-jumandic) Install IPA dictionary: % sudo yum install -y mecab-ipadic Or install Juman dictionary: % sudo yum install -y mecab-jumandic There is a package that provides Munin plugins. If you want to monitor Groonga status by Munin, install groonga-munin-plugins package. Install groonga-munin-plugins package: % sudo yum install -y groonga-munin-plugins There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you want to use that one, install groonga-normalizer-mysql package. Install groonga-normalizer-mysql package: % sudo yum install -y install groonga-normalizer-mysql Build from source Install required packages to build Groonga: % sudo yum install -y wget tar gcc-c++ make mecab-devel libedit-devel Download source: % wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % tar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 Configure (see source-configure about configure options): % ./configure Build: % make -j$(grep '^processor' /proc/cpuinfo | wc -l) Install: % sudo make install Oracle Solaris This section describes how to install Groonga from source on Oracle Solaris. Oracle Solaris 11 Install required packages to build Groonga: % sudo pkg install gnu-tar gcc-45 system/header Download source: % wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % gtar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 Configure with CFLAGS="-m64" CXXFLAGS="-m64" variables. They are needed for building 64-bit version. To build 32-bit version, just remove those variables. (see source-configure about configure options): % ./configure CFLAGS="-m64" CXXFLAGS="-m64" Build: % make Install: % sudo make install Others This section describes how to install Groonga from source on UNIX like environment. To get more detail about installing Groonga from source on the specific environment, find the document for the specific environment from /install. Dependencies Groonga doesn't require any special libraries but requires some tools for build. Tools Here are required tools: • wget, curl or Web browser for downloading source archive • tar and gzip for extracting source archive • shell (many shells such as dash, bash and zsh will work) • C compiler and C++ compiler (gcc and g++ are supported but other compilers may work) • make (GNU make is supported but other make like BSD make will work) You must get them ready. You can use CMake instead of shell but this document doesn't describe about building with CMake. Here are optional tools: • pkg-config for detecting libraries • sudo for installing built Groonga You must get them ready if you want to use optional libraries. Libraries All libraries are optional. Here are optional libraries: • MeCab for tokenizing full-text search target document by morphological analysis • KyTea for tokenizing full-text search target document by morphological analysis • ZeroMQ for /reference/suggest • libevent for /reference/suggest • MessagePack for supporting MessagePack output and /reference/suggest • libedit for command line editing in /reference/executables/groonga • zlib for compressing column value • LZ4 for compressing column value If you want to use those all or some libraries, you need to install them before installing Groonga. Build from source Groonga uses GNU build system. So the following is the simplest build steps: % wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz % tar xvzf groonga-6.0.1.tar.gz % cd groonga-6.0.1 % ./configure % make % sudo make install After the above steps, /reference/executables/groonga is found in /usr/local/bin/groonga. The default build will work well but you can customize Groonga at configure step. The following describes details about each step. configure First, you need to run configure. Here are important configure options: --prefix=PATH Specifies the install base directory. Groonga related files are installed under ${PATH}/ directory. The default is /usr/local. In this case, /reference/executables/groonga is installed into /usr/local/bin/groonga. Here is an example that installs Groonga into ~/local for an user use instead of system wide use: % ./configure --prefix=$HOME/local --localstatedir=PATH Specifies the base directory to place modifiable file such as log file, PID file and database files. For example, log file is placed at ${PATH}/log/groonga.log. The default is /usr/local/var. Here is an example that system wide /var is used for modifiable files: % ./configure --localstatedir=/var --with-log-path=PATH Specifies the default log file path. You can override the default log path is /reference/executables/groonga command's --log-path command line option. So this option is not critical build option. It's just for convenient. The default is /usr/local/var/log/groonga.log. The /usr/local/var part is changed by --localstatedir option. Here is an example that log file is placed into shared NFS directory /nfs/log/groonga.log: % ./configure --with-log-path=/nfs/log/groonga.log --with-default-encoding=ENCODING Specifies the default encoding. Available encodings are euc_jp, sjis, utf8, latin1, koi8r and none. The default is utf-8. Here is an example that Shift_JIS is used as the default encoding: % ./configure --with-default-encoding=sjis --with-match-escalation-threshold=NUMBER Specifies the default match escalation threshold. See select-match-escalation-threshold about match escalation threshold. -1 means that match operation never escalate. The default is 0. Here is an example that match escalation isn't used by default: % ./configure --with-match-escalation-threshold=-1 --with-zlib Enables column value compression by zlib. The default is disabled. Here is an example that enables column value compression by zlib: % ./configure --with-zlib --with-lz4 Enables column value compression by LZ4. The default is disabled. Here is an example that enables column value compression by LZ4: % ./configure --with-lz4 --with-message-pack=MESSAGE_PACK_INSTALL_PREFIX Specifies where MessagePack is installed. If MessagePack isn't installed with --prefix=/usr, you need to specify this option with path that you use for building MessagePack. If you installed MessagePack with --prefix=$HOME/local option, you should specify --with-message-pack=$HOME/local to Groonga's configure. The default is /usr. Here is an example that uses MessagePack built with --prefix=$HOME/local option: % ./configure --with-message-pack=$HOME/local --with-munin-plugins Installs Munin plugins for Groonga. They are installed into ${PREFIX}/share/groonga/munin/plugins/. Those plugins are not installed by default. Here is an example that installs Munin plugins for Groonga: % ./configure --with-munin-plugins --with-package-platform=PLATFORM Installs platform specific system management files such as init script. Available platforms are redhat and fedora. redhat is for Red Hat and Red Hat clone distributions such as CentOS. fedora is for Fedora. Those system management files are not installed by default. Here is an example that installs CentOS specific system management files: % ./configure --with-package-platform=redhat --help Shows all configure options. make configure is succeeded, you can build Groonga by make: % make If you have multi cores CPU, you can make faster by using -j option. If you have 4 cores CPU, it's good for using -j4 option: % make -j4 If you get some errors by make, please report them to us: /contribution/report make install Now, you can install built Groonga!: % sudo make install If you have write permission for ${PREFIX}, you don't need to use sudo. e.g. --prefix=$HOME/local case. In this case, use make install: % make install
COMMUNITY
There are some places for sharing Groonga information. We welcome you to join our community. Mailing List There are mailing lists for discussion about Groonga. For English speakers groonga-talk@lists.sourceforge.net For Japanese speakers groonga-dev@lists.osdn.me Chat room There are chat rooms for discussion about Groonga. For English speakers groonga/en chat room on Gitter For Japanese speakers groonga/ja chat room on Gitter Twitter @groonga tweets Groonga related information. Please follow the account to get the latest Groonga related information! Facebook Groonga page on Facebook shares Groonga related information. Please like the page to get the latest Groonga related information!
TUTORIAL
Basic operations A Groonga package provides a C library (libgroonga) and a command line tool (groonga). This tutorial explains how to use the command line tool, with which you can create/operate databases, start a server, establish a connection with a server, etc. Create a database The first step to using Groonga is to create a new database. The following shows how to do it. Form: groonga -n DB_PATH The -n option specifies to create a new database and DB_PATH specifies the path of the new database. Actually, a database consists of a series of files and DB_PATH specifies the file which will be the entrance to the new database. DB_PATH also specifies the path prefix for other files. Note that database creation fails if DB_PATH points to an existing file (For example, db open failed (DB_PATH): syscall error 'DB_PATH' (File exists). You can operate an existing database in a way that is in the next chapter). This command creates a new database and then enters into interactive mode in which Groonga prompts you to enter commands for operating that database. You can terminate this mode with Ctrl-d. Execution example: % groonga -n /tmp/groonga-databases/introduction.db After this database creation, you can find a series of files in /tmp/groonga-databases. Operate a database The following shows how to operate an existing database. Form: groonga DB_PATH [COMMAND] DB_PATH specifies the path of a target database. This command fails if the specified database does not exist. If COMMAND is specified, Groonga executes COMMAND and returns the result. Otherwise, Groonga starts in interactive mode that reads commands from the standard input and executes them one by one. This tutorial focuses on the interactive mode. Let's see the status of a Groonga process by using a /reference/commands/status command. Execution example: % groonga /tmp/groonga-databases/introduction.db status # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "uptime": 0, # "max_command_version": 2, # "n_queries": 0, # "cache_hit_rate": 0.0, # "version": "5.0.6-128-g8029ddb", # "alloc_count": 206, # "command_version": 1, # "starttime": 1439995916, # "default_command_version": 1 # } # ] As shown in the above example, a command returns a JSON array. The first element contains an error code, execution time, etc. The second element is the result of an operation. NOTE: You can format a JSON using additional tools. For example, grnwrap, Grnline, jq and so on. Command format Commands for operating a database accept arguments as follows: Form_1: COMMAND VALUE_1 VALUE_2 .. Form_2: COMMAND --NAME_1 VALUE_1 --NAME_2 VALUE_2 .. In the first form, arguments must be passed in order. This kind of arguments are called positional arguments because the position of each argument determines its meaning. In the second form, you can specify a parameter name with its value. So, the order of arguments is not defined. This kind of arguments are known as named parameters or keyword arguments. If you want to specify a value which contains white-spaces or special characters, such as quotes and parentheses, please enclose the value with single-quotes or double-quotes. For details, see also the paragraph of "command" in /reference/executables/groonga. Basic commands /reference/commands/status shows status of a Groonga process. /reference/commands/table_list shows a list of tables in a database. /reference/commands/column_list shows a list of columns in a table. /reference/commands/table_create adds a table to a database. /reference/commands/column_create adds a column to a table. /reference/commands/select searches records from a table and shows the result. /reference/commands/load inserts records to a table. Create a table A /reference/commands/table_create command creates a new table. In most cases, a table has a primary key which must be specified with its data type and index type. There are various data types such as integers, strings, etc. See also /reference/types for more details. The index type determines the search performance and the availability of prefix searches. The details will be described later. Let's create a table. The following example creates a table with a primary key. The name parameter specifies the name of the table. The flags parameter specifies the index type for the primary key. The key_type parameter specifies the data type of the primary key. Execution example: table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] The second element of the result indicates that the operation succeeded. View a table A /reference/commands/select command can enumerate records in a table. Execution example: select --table Site # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 0 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ] # ] # ] # ] When only a table name is specified with a table parameter, a /reference/commands/select command returns the first (at most) 10 records in the table. [0] in the result shows the number of records in the table. The next array is a list of columns. ["_id","Uint32"] is a column of UInt32, named _id. ["_key","ShortText"] is a column of ShortText, named _key. The above two columns, _id and _key, are the necessary columns. The _id column stores IDs those are automatically allocated by Groonga. The _key column is associated with the primary key. You are not allowed to rename these columns. Create a column A /reference/commands/column_create command creates a new column. Let's add a column. The following example adds a column to the Site table. The table parameter specifies the target table. The name parameter specifies the name of the column. The type parameter specifies the data type of the column. Execution example: column_create --table Site --name title --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] select --table Site # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 0 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ] # ] # ] # ] Load records A /reference/commands/load command loads JSON-formatted records into a table. The following example loads nine records into the Site table. Execution example: load --table Site [ {"_key":"http://example.org/","title":"This is test record 1!"}, {"_key":"http://example.net/","title":"test record 2."}, {"_key":"http://example.com/","title":"test test record three."}, {"_key":"http://example.net/afr","title":"test record four."}, {"_key":"http://example.org/aba","title":"test test test record five."}, {"_key":"http://example.com/rab","title":"test test test test record six."}, {"_key":"http://example.net/atv","title":"test test test record seven."}, {"_key":"http://example.org/gat","title":"test test record eight."}, {"_key":"http://example.com/vdw","title":"test test record nine."}, ] # [[0, 1337566253.89858, 0.000355720520019531], 9] The second element of the result indicates how many records were successfully loaded. In this case, all the records are successfully loaded. Let's make sure that these records are correctly stored. Execution example: select --table Site # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ], # [ # 2, # "http://example.net/", # "test record 2." # ], # [ # 3, # "http://example.com/", # "test test record three." # ], # [ # 4, # "http://example.net/afr", # "test record four." # ], # [ # 5, # "http://example.org/aba", # "test test test record five." # ], # [ # 6, # "http://example.com/rab", # "test test test test record six." # ], # [ # 7, # "http://example.net/atv", # "test test test record seven." # ], # [ # 8, # "http://example.org/gat", # "test test record eight." # ], # [ # 9, # "http://example.com/vdw", # "test test record nine." # ] # ] # ] # ] Get a record A /reference/commands/select command can search records in a table. If a search condition is specified with a query parameter, a /reference/commands/select command searches records matching the search condition and returns the matched records. Let's get a record having a specified record ID. The following example gets the first record in the Site table. More precisely, the query parameter specifies a record whose _id column stores 1. Execution example: select --table Site --query _id:1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ] # ] # ] # ] Next, let's get a record having a specified key. The following example gets the record whose primary key is "http://example.org/". More precisely, the query parameter specifies a record whose _key column stores "http://example.org/". Execution example: select --table Site --query '_key:"http://example.org/"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ] # ] # ] # ] Create a lexicon table for full text search Let's go on to how to make full text search. Groonga uses an inverted index to provide fast full text search. So, the first step is to create a lexicon table which stores an inverted index, also known as postings lists. The primary key of this table is associated with a vocabulary made up of index terms and each record stores postings lists for one index term. The following shows a command which creates a lexicon table named Terms. The data type of its primary key is ShortText. Execution example: table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] The /reference/commands/table_create command takes many parameters but you don't need to understand all of them. Please skip the next paragraph if you are not interested in how it works. The TABLE_PAT_KEY flag specifies to store index terms in a patricia trie. The default_tokenizer parameter specifies the method for tokenizing text. This example uses TokenBigram that is generally called N-gram. The normalizer parameter specifies to normalize index terms. Create an index column for full text search The second step is to create an index column, which allows you to search records from its associated column. That is to say this step specifies which column needs an index. Let's create an index column. The following example creates an index column for a column in the Site table. Execution example: column_create --table Terms --name blog_title --flags COLUMN_INDEX|WITH_POSITION --type Site --source title # [[0, 1337566253.89858, 0.000355720520019531], true] The table parameter specifies the index table and the name parameter specifies the index column. The type parameter specifies the target table and the source parameter specifies the target column. The COLUMN_INDEX flag specifies to create an index column and the WITH_POSITION flag specifies to create a full inverted index, which contains the positions of each index term. This combination, COLUMN_INDEX|WITH_POSITION, is recommended for the general purpose. NOTE: You can create a lexicon table and index columns before/during/after loading records. If a target column already has records, Groonga creates an inverted index in a static manner. In contrast, if you load records into an already indexed column, Groonga updates the inverted index in a dynamic manner. Full text search It's time. You can make full text search with a /reference/commands/select command. A query for full text search is specified with a query parameter. The following example searches records whose "title" column contains "this". The '@' specifies to make full text search. Note that a lower case query matches upper case and capitalized terms in a record if NormalizerAuto was specified when creating a lexcon table. Execution example: select --table Site --query title:@this # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ] # ] # ] # ] In this example, the first record matches the query because its title contains "This", that is the capitalized form of the query. A /reference/commands/select command accepts an optional parameter, named match_columns, that specifies the default target columns. This parameter is used if target columns are not specified in a query. [1] The combination of "--match_columns title" and "--query this" brings you the same result that "--query title:@this" does. Execution example: select --table Site --match_columns title --query this # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ] # ] # ] # ] Specify output columns An output_columns parameter of a /reference/commands/select command specifies columns to appear in the search result. If you want to specify more than one columns, please separate column names by commas (','). Execution example: select --table Site --output_columns _key,title,_score --query title:@test # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://example.org/", # "This is test record 1!", # 1 # ], # [ # "http://example.net/", # "test record 2.", # 1 # ], # [ # "http://example.com/", # "test test record three.", # 2 # ], # [ # "http://example.net/afr", # "test record four.", # 1 # ], # [ # "http://example.org/aba", # "test test test record five.", # 3 # ], # [ # "http://example.com/rab", # "test test test test record six.", # 4 # ], # [ # "http://example.net/atv", # "test test test record seven.", # 3 # ], # [ # "http://example.org/gat", # "test test record eight.", # 2 # ], # [ # "http://example.com/vdw", # "test test record nine.", # 2 # ] # ] # ] # ] This example specifies three output columns including the _score column, which stores the relevance score of each record. Specify output ranges A /reference/commands/select command returns a part of its search result if offset and/or limit parameters are specified. These parameters are useful to paginate a search result, a widely-used interface which shows a search result on a page by page basis. An offset parameter specifies the starting point and a limit parameter specifies the maximum number of records to be returned. If you need the first record in a search result, the offset parameter must be 0 or omitted. Execution example: select --table Site --offset 0 --limit 3 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ], # [ # 2, # "http://example.net/", # "test record 2." # ], # [ # 3, # "http://example.com/", # "test test record three." # ] # ] # ] # ] select --table Site --offset 3 --limit 3 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 4, # "http://example.net/afr", # "test record four." # ], # [ # 5, # "http://example.org/aba", # "test test test record five." # ], # [ # 6, # "http://example.com/rab", # "test test test test record six." # ] # ] # ] # ] select --table Site --offset 7 --limit 3 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 8, # "http://example.org/gat", # "test test record eight." # ], # [ # 9, # "http://example.com/vdw", # "test test record nine." # ] # ] # ] # ] Sort a search result A /reference/commands/select command sorts its result when used with a sortby parameter. A sortby parameter specifies a column as a sorting creteria. A search result is arranged in ascending order of the column values. If you want to sort a search result in reverse order, please add a leading hyphen ('-') to the column name in a parameter. The following example shows records in the Site table in reverse order. Execution example: select --table Site --sortby -_id # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 9, # "http://example.com/vdw", # "test test record nine." # ], # [ # 8, # "http://example.org/gat", # "test test record eight." # ], # [ # 7, # "http://example.net/atv", # "test test test record seven." # ], # [ # 6, # "http://example.com/rab", # "test test test test record six." # ], # [ # 5, # "http://example.org/aba", # "test test test record five." # ], # [ # 4, # "http://example.net/afr", # "test record four." # ], # [ # 3, # "http://example.com/", # "test test record three." # ], # [ # 2, # "http://example.net/", # "test record 2." # ], # [ # 1, # "http://example.org/", # "This is test record 1!" # ] # ] # ] # ] The next example uses the _score column as the sorting criteria for ranking the search result. The result is sorted in relevance order. Execution example: select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_score", # "Int32" # ], # [ # "title", # "ShortText" # ] # ], # [ # 6, # 4, # "test test test test record six." # ], # [ # 5, # 3, # "test test test record five." # ], # [ # 7, # 3, # "test test test record seven." # ], # [ # 8, # 2, # "test test record eight." # ], # [ # 3, # 2, # "test test record three." # ], # [ # 9, # 2, # "test test record nine." # ], # [ # 1, # 1, # "This is test record 1!" # ], # [ # 4, # 1, # "test record four." # ], # [ # 2, # 1, # "test record 2." # ] # ] # ] # ] If you want to specify more than one columns, please separate column names by commas (','). In such a case, a search result is sorted in order of the values in the first column, and then records having the same values in the first column are sorted in order of the second column values. Execution example: select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score,_id # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_score", # "Int32" # ], # [ # "title", # "ShortText" # ] # ], # [ # 6, # 4, # "test test test test record six." # ], # [ # 5, # 3, # "test test test record five." # ], # [ # 7, # 3, # "test test test record seven." # ], # [ # 3, # 2, # "test test record three." # ], # [ # 8, # 2, # "test test record eight." # ], # [ # 9, # 2, # "test test record nine." # ], # [ # 1, # 1, # "This is test record 1!" # ], # [ # 2, # 1, # "test record 2." # ], # [ # 4, # 1, # "test record four." # ] # ] # ] # ] footnote [1] Currently, a match_columns parameter is available iff there exists an inverted index for full text search. A match_columns parameter for a regular column is not supported. Remote access You can use Groonga as a server which allows remote access. Groonga supports the original protocol (GQTP), the memcached binary protocol and HTTP. Hypertext transfer protocol (HTTP) How to run an HTTP server Groonga supports the hypertext transfer protocol (HTTP). The following form shows how to run Groonga as an HTTP server daemon. Form: groonga [-p PORT_NUMBER] -d --protocol http DB_PATH The --protocol option and its argument specify the protocol of the server. "http" specifies to use HTTP. If the -p option is not specified, Groonga uses the default port number 10041. The following command runs an HTTP server that listens on the port number 80. Execution example: % sudo groonga -p 80 -d --protocol http /tmp/groonga-databases/introduction.db % NOTE: You must have root privileges if you listen on the port number 80 (well known port). There is no such a limitation about the port number 1024 or over. How to send a command to an HTTP server You can send a command to an HTTP server by sending a GET request to /d/COMMAND_NAME. Command parameters can be passed as parameters of the GET request. The format is "?NAME_1=VALUE_1&NAME_2=VALUE_2&...". The following example shows how to send commands to an HTTP server. Execution example: http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/status Executed command: status # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "uptime": 0, # "max_command_version": 2, # "n_queries": 0, # "cache_hit_rate": 0.0, # "version": "5.0.6-128-g8029ddb", # "alloc_count": 185, # "command_version": 1, # "starttime": 1439995935, # "default_command_version": 1 # } # ] http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/select?table=Site&query=title:@this Executed command: select --table Site --query title:@this # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "http://example.org/", # "japan", # ".org", # "http://example.net/", # [ # "http://example.net/", # "http://example.org/", # "http://example.com/" # ], # "128452975x503157902", # "This is test record 1!" # ] # ] # ] # ] Administration tool (HTTP) An HTTP server of Groonga provides a browser based administration tool that makes database management easy. After starting an HTTP server, you can use the administration tool by accessing http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/. Note that Javascript must be enabled for the tool to work properly. Security issues Groonga servers don't support user authentication. Everyone can view and modify databases hosted by Groonga servers. You are recommended to restrict IP addresses that can access Groonga servers. You can use iptables or similar for this purpose. Various data types Groonga is a full text search engine but also serves as a column-oriented data store. Groonga supports various data types, such as numeric types, string types, date and time type, longitude and latitude types, etc. This tutorial shows a list of data types and explains how to use them. Overview The basic data types of Groonga are roughly divided into 5 groups --- boolean type, numeric types, string types, date/time type and longitude/latitude types. The numeric types are further divided according to whether integer or floating point number, signed or unsigned and the number of bits allocated to each integer. The string types are further divided according to the maximum length. The longitude/latitude types are further divided according to the geographic coordinate system. For more details, see /reference/types. In addition, Groonga supports reference types and vector types. Reference types are designed for accessing other tables. Vector types are designed for storing a variable number of values in one element. First, let's create a table for this tutorial. Execution example: table_create --name ToyBox --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] Boolean type The boolean type is used to store true or false. To create a boolean type column, specify Bool to the type parameter of /reference/commands/column_create command. The default value of the boolean type is false. The following example creates a boolean type column and adds three records. Note that the third record has the default value because no value is specified. Execution example: column_create --table ToyBox --name is_animal --type Bool # [[0, 1337566253.89858, 0.000355720520019531], true] load --table ToyBox [ {"_key":"Monkey","is_animal":true} {"_key":"Flower","is_animal":false} {"_key":"Block"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select --table ToyBox --output_columns _key,is_animal # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "is_animal", # "Bool" # ] # ], # [ # "Monkey", # true # ], # [ # "Flower", # false # ], # [ # "Block", # false # ] # ] # ] # ] Numeric types The numeric types are divided into integer types and a floating point number type. The integer types are further divided into the signed integer types and unsigned integer types. In addition, you can choose the number of bits allocated to each integer. For more details, see /reference/types. The default value of the numeric types is 0. The following example creates an Int8 column and a Float column, and then updates existing records. The /reference/commands/load command updates the weight column as expected. On the other hand, the price column values are different from the specified values because 15.9 is not an integer and 200 is too large. 15.9 is converted to 15 by removing the fractional part. 200 causes an overflow and the result becomes -56. Note that the result of an overflow/underflow is undefined. Execution example: column_create --table ToyBox --name price --type Int8 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table ToyBox --name weight --type Float # [[0, 1337566253.89858, 0.000355720520019531], true] load --table ToyBox [ {"_key":"Monkey","price":15.9} {"_key":"Flower","price":200,"weight":0.13} {"_key":"Block","weight":25.7} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select --table ToyBox --output_columns _key,price,weight # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "price", # "Int8" # ], # [ # "weight", # "Float" # ] # ], # [ # "Monkey", # 15, # 0.0 # ], # [ # "Flower", # -56, # 0.13 # ], # [ # "Block", # 0, # 25.7 # ] # ] # ] # ] String types The string types are divided according to the maximum length. For more details, see /reference/types. The default value is the zero-length string. The following example creates a ShortText column and updates existing records. The third record ("Block" key record) has the default value (zero-length string) because it's not updated. Execution example: column_create --table ToyBox --name name --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table ToyBox [ {"_key":"Monkey","name":"Grease"} {"_key":"Flower","name":"Rose"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] select --table ToyBox --output_columns _key,name # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "name", # "ShortText" # ] # ], # [ # "Monkey", # "Grease" # ], # [ # "Flower", # "Rose" # ], # [ # "Block", # "" # ] # ] # ] # ] Date and time type The date and time type of Groonga is Time. Actually, a Time column stores a date and time as the number of microseconds since the Epoch, 1970-01-01 00:00:00. A Time value can represent a date and time before the Epoch because the actual data type is a signed integer. Note that /reference/commands/load and /reference/commands/select commands use a decimal number to represent a data and time in seconds. The default value is 0.0, which means the Epoch. NOTE: Groonga internally holds the value of Epoch as pair of integer. The first integer represents the value of seconds, on the other hand, the second integer represents the value of micro seconds. So, Groonga shows the value of Epoch as floating point. Integral part means the value of seconds, fraction part means the value of micro seconds. The following example creates a Time column and updates existing records. The first record ("Monkey" key record) has the default value (0.0) because it's not updated. Execution example: column_create --table ToyBox --name time --type Time # [[0, 1337566253.89858, 0.000355720520019531], true] load --table ToyBox [ {"_key":"Flower","time":1234567890.1234569999} {"_key":"Block","time":-1234567890} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] select --table ToyBox --output_columns _key,time # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "time", # "Time" # ] # ], # [ # "Monkey", # 0.0 # ], # [ # "Flower", # 1234567890.12346 # ], # [ # "Block", # -1234567890.0 # ] # ] # ] # ] Longitude and latitude types The longitude and latitude types are divided according to the geographic coordinate system. For more details, see /reference/types. To represent a longitude and latitude, Groonga uses a string formatted as follows: • "longitude x latitude" in milliseconds (e.g.: "128452975x503157902") • "longitude x latitude" in degrees (e.g.: "35.6813819x139.7660839") A number with/without a decimal point represents a longitude or latitude in milliseconds/degrees respectively. Note that a combination of a number with a decimal point and a number without a decimal point (e.g. 35.1x139) must not be used. A comma (',') is also available as a delimiter. The default value is "0x0". The following example creates a WGS84GeoPoint column and updates existing records. The second record ("Flower" key record) has the default value ("0x0") because it's not updated. Execution example: column_create --table ToyBox --name location --type WGS84GeoPoint # [[0, 1337566253.89858, 0.000355720520019531], true] load --table ToyBox [ {"_key":"Monkey","location":"128452975x503157902"} {"_key":"Block","location":"35.6813819x139.7660839"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] select --table ToyBox --output_columns _key,location # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ] # ], # [ # "Monkey", # "128452975x503157902" # ], # [ # "Flower", # "0x0" # ], # [ # "Block", # "128452975x503157902" # ] # ] # ] # ] Reference types Groonga supports a reference column, which stores references to records in its associated table. In practice, a reference column stores the IDs of the referred records in the associated table and enables access to those records. You can specify a column in the associated table to the output_columns parameter of a /reference/commands/select command. The format is Src.Dest where Src is the name of the reference column and Dest is the name of the target column. If only the reference column is specified, it is handled as Src._key. Note that if a reference does not point to a valid record, a /reference/commands/select command outputs the default value of the target column. The following example adds a reference column to the Site table that was created in tutorial-introduction-create-table. The new column, named link, is designed for storing links among records in the Site table. Execution example: column_create --table Site --name link --type Site # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Site [ {"_key":"http://example.org/","link":"http://example.net/"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] select --table Site --output_columns _key,title,link._key,link.title --query title:@this # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ], # [ # "link._key", # "ShortText" # ], # [ # "link.title", # "ShortText" # ] # ], # [ # "http://example.org/", # "This is test record 1!", # "http://example.net/", # "test record 2." # ] # ] # ] # ] The type parameter of the /reference/commands/column_create command specifies the table to be associated with the reference column. In this example, the reference column is associated with the own table. Then, the /reference/commands/load command registers a link from "http://example.org" to "http://example.net". Note that a reference column requires the primary key, not the ID, of the record to be referred to. After that, the link is confirmed by the /reference/commands/select command. In this case, the primary key and the title of the referred record are output because link._key and link.title are specified to the output_columns parameter. Vector types Groonga supports a vector column, in which each element can store a variable number of values. To create a vector column, specify the COLUMN_VECTOR flag to the flags parameter of a /reference/commands/column_create command. A vector column is useful to represent a many-to-many relationship. The previous example used a regular column, so each record could have at most one link. Obviously, the specification is insufficient because a site usually has more than one links. To solve this problem, the following example uses a vector column. Execution example: column_create --table Site --name links --flags COLUMN_VECTOR --type Site # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Site [ {"_key":"http://example.org/","links":["http://example.net/","http://example.org/","http://example.com/"]}, ] # [[0, 1337566253.89858, 0.000355720520019531], 1] select --table Site --output_columns _key,title,links._key,links.title --query title:@this # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "title", # "ShortText" # ], # [ # "links._key", # "ShortText" # ], # [ # "links.title", # "ShortText" # ] # ], # [ # "http://example.org/", # "This is test record 1!", # [ # "http://example.net/", # "http://example.org/", # "http://example.com/" # ], # [ # "test record 2.", # "This is test record 1!", # "test test record three." # ] # ] # ] # ] # ] The only difference at the first step is the flags parameter that specifies to create a vector column. The type parameter of the /reference/commands/column_create command is the same as in the previous example. Then, the /reference/commands/load command registers three links from "http://example.org/" to "http://example.net/", "http://example.org/" and "http://example.com/". After that, the links are confirmed by the /reference/commands/select command. In this case, the primary keys and the titles are output as arrays because links._key and links.title are specified to the output_columns parameter. Various search conditions Groonga supports to narrow down by using syntax like JavaScript, sort by the calculated value. Additionally, Groonga also supports to narrow down & sort search results by using location information (latitude & longitude). Narrow down & Full-text search by using syntax like JavaScript The filter parameter of select command accepts the search condition. There is one difference between filter parameter and query parameter, you need to specify the condition by syntax like JavaScript for filter parameter. Execution example: select --table Site --filter "_id <= 1" --output_columns _id,_key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 1, # "http://example.org/" # ] # ] # ] # ] See the detail of above query. Here is the condition which is specified as filter parameter: _id <= 1 In this case, this query returns the records which meets the condition that the value of _id is equal to or less than 1. Moreover, you can use && for AND search, || for OR search. Execution example: select --table Site --filter "_id >= 4 && _id <= 6" --output_columns _id,_key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 4, # "http://example.net/afr" # ], # [ # 5, # "http://example.org/aba" # ], # [ # 6, # "http://example.com/rab" # ] # ] # ] # ] select --table Site --filter "_id <= 2 || _id >= 7" --output_columns _id,_key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 1, # "http://example.org/" # ], # [ # 2, # "http://example.net/" # ], # [ # 7, # "http://example.net/atv" # ], # [ # 8, # "http://example.org/gat" # ], # [ # 9, # "http://example.com/vdw" # ] # ] # ] # ] If you specify query parameter and filter parameter at the same time, you can get the records which meets both of the condition as a result. Sort by using scorer select command accepts scorer parameter which is used to process each record of full-text search results. This parameter accepts the conditions which is specified by syntax like JavaScript as same as filter parameter. Execution example: select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # 6, # "http://example.com/rab", # 424238335 # ], # [ # 9, # "http://example.com/vdw", # 596516649 # ], # [ # 7, # "http://example.net/atv", # 719885386 # ], # [ # 2, # "http://example.net/", # 846930886 # ], # [ # 8, # "http://example.org/gat", # 1649760492 # ], # [ # 3, # "http://example.com/", # 1681692777 # ], # [ # 4, # "http://example.net/afr", # 1714636915 # ], # [ # 1, # "http://example.org/", # 1804289383 # ], # [ # 5, # "http://example.org/aba", # 1957747793 # ] # ] # ] # ] select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # 4, # "http://example.net/afr", # 783368690 # ], # [ # 2, # "http://example.net/", # 1025202362 # ], # [ # 5, # "http://example.org/aba", # 1102520059 # ], # [ # 1, # "http://example.org/", # 1189641421 # ], # [ # 3, # "http://example.com/", # 1350490027 # ], # [ # 8, # "http://example.org/gat", # 1365180540 # ], # [ # 9, # "http://example.com/vdw", # 1540383426 # ], # [ # 7, # "http://example.net/atv", # 1967513926 # ], # [ # 6, # "http://example.com/rab", # 2044897763 # ] # ] # ] # ] '_score' is one of a pseudo column. The score of full-text search is assigned to it. See /reference/columns/pseudo about '_score' column. In the above query, the condition of scorer parameter is: _score = rand() In this case, the score of full-text search is overwritten by the value of rand() function. The condition of sortby parameter is: _score This means that sorting the search result by ascending order. As a result, the order of search result is randomized. Narrow down & sort by using location information Groonga supports to store location information (Longitude & Latitude) and not only narrow down but also sort by using it. Groonga supports two kind of column types to store location information. One is TokyoGeoPoint, the other is WGS84GeoPoint. TokyoGeoPoint is used for Japan geodetic system. WGS84GeoPoint is used for world geodetic system. Specify longitude and latitude as follows: • "[latitude in milliseconds]x[longitude in milliseconds]"(e.g.: "128452975x503157902") • "[latitude in milliseconds],[longitude in milliseconds]"(e.g.: "128452975,503157902") • "[latitude in degrees]x[longitude in degrees]"(e.g.: "35.6813819x139.7660839") • "[latitude in degrees],[longitude in degrees]"(e.g.: "35.6813819,139.7660839") Let's store two location information about station in Japan by WGS. One is Tokyo station, the other is Shinjyuku station. Both of them are station in Japan. The latitude of Tokyo station is 35 degrees 40 minutes 52.975 seconds, the longitude of Tokyo station is 139 degrees 45 minutes 57.902 seconds. The latitude of Shinjyuku station is 35 degrees 41 minutes 27.316 seconds, the longitude of Shinjyuku station is 139 degrees 42 minutes 0.929 seconds. Thus, location information in milliseconds are "128452975x503157902" and "128487316x502920929" respectively. location information in degrees are "35.6813819x139.7660839" and "35.6909211x139.7002581" respectively. Let's register location information in milliseconds. Execution example: column_create --table Site --name location --type WGS84GeoPoint # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Site [ {"_key":"http://example.org/","location":"128452975x503157902"} {"_key":"http://example.net/","location":"128487316x502920929"}, ] # [[0, 1337566253.89858, 0.000355720520019531], 2] select --table Site --query "_id:1 OR _id:2" --output_columns _key,location # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ] # ], # [ # "http://example.org/", # "128452975x503157902" # ], # [ # "http://example.net/", # "128487316x502920929" # ] # ] # ] # ] Then assign the value of geo distance which is calculated by /reference/functions/geo_distance function to scorer parameter. Let's show geo distance from Akihabara station in Japan. In world geodetic system, the latitude of Akihabara station is 35 degrees 41 minutes 55.259 seconds, the longitude of Akihabara station is 139 degrees 46 minutes 27.188 seconds. Specify "128515259x503187188" for geo_distance function. Execution example: select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://example.org/", # "128452975x503157902", # 2054 # ], # [ # "http://example.net/", # "128487316x502920929", # 6720 # ] # ] # ] # ] As you can see, the geo distance between Tokyo station and Akihabara station is 2054 meters, the geo distance between Akihabara station and Shinjyuku station is 6720 meters. The return value of geo_distance function is also used for sorting by specifying pseudo _score column to sortby parameter. Execution example: select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")' --sortby -_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://example.net/", # "128487316x502920929", # 6720 # ], # [ # "http://example.org/", # "128452975x503157902", # 2054 # ] # ] # ] # ] Groonga also supports to narrow down by "a certain point within specified meters". In such a case, use /reference/functions/geo_in_circle function in filter parameter. For example, search the records which exists within 5000 meters from Akihabara station. Execution example: select --table Site --output_columns _key,location --filter 'geo_in_circle(location, "128515259x503187188", 5000)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ] # ], # [ # "http://example.org/", # "128452975x503157902" # ] # ] # ] # ] There is /reference/functions/geo_in_rectangle function which is used to search a certain point within specified region. Drilldown You learned how to search and sort searched results in the previous sections. Now that you can search as you likes, but how do you summarize the number of records which has specific value in the column? As you know, there is a naive solution to execute query by every the value of column, then you can get the number of records as a result. It is a simple way, but it is not reasonable to many records. If you are familiar with SQL, you will doubt with "Is there a similar SQL functionality to GROUP BY in Groonga?". Of course, Groonga provides such a functionality. It's called as drilldown. drilldown enables you to get the number of records which belongs to specific the value of column at once. To illustrate this feature, imagine the case that classification by domain and grouping by country that domain belongs to. Here is the concrete examples how to use this feature. In this example, we add two columns to Site table. domain column is used for TLD (top level domain). country column is used for country name. The type of these columns are SiteDomain table which uses domain name as a primary key and SiteCountry table which uses country name as a primary key. Execution example: table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Site [ {"_key":"http://example.org/","domain":".org","country":"japan"}, {"_key":"http://example.net/","domain":".net","country":"brazil"}, {"_key":"http://example.com/","domain":".com","country":"japan"}, {"_key":"http://example.net/afr","domain":".net","country":"usa"}, {"_key":"http://example.org/aba","domain":".org","country":"korea"}, {"_key":"http://example.com/rab","domain":".com","country":"china"}, {"_key":"http://example.net/atv","domain":".net","country":"china"}, {"_key":"http://example.org/gat","domain":".org","country":"usa"}, {"_key":"http://example.com/vdw","domain":".com","country":"japan"} ] # [[0, 1337566253.89858, 0.000355720520019531], 9] Here is a example of drilldown with domain column. Three kind of values are used in domain column - ".org", ".net" and ".com". Execution example: select --table Site --limit 0 --drilldown domain # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # ".org", # 3 # ], # [ # ".net", # 3 # ], # [ # ".com", # 3 # ] # ] # ] # ] Here is a summary of above query. Drilldown by domain column ┌─────────┬──────────────────────────┬─────────────────────────────────┐ │Group by │ The number of group │ Group records means │ │ │ records │ following records │ ├─────────┼──────────────────────────┼─────────────────────────────────┤ │.org │ 3 │ │ │ │ │ • http://example.org/ │ │ │ │ │ │ │ │ • http://example.org/aba │ │ │ │ │ │ │ │ • http://example.org/gat │ ├─────────┼──────────────────────────┼─────────────────────────────────┤ │.net │ 3 │ │ │ │ │ • http://example.net/ │ │ │ │ │ │ │ │ • http://example.net/afr │ │ │ │ │ │ │ │ • http://example.net/atv │ └─────────┴──────────────────────────┴─────────────────────────────────┘ │.com │ 3 │ │ │ │ │ • http://example.com/ │ │ │ │ │ │ │ │ • http://example.com/rab │ │ │ │ │ │ │ │ • http://example.com/vdw │ └─────────┴──────────────────────────┴─────────────────────────────────┘ The value of drilldown are returned as the value of _nsubrecs column. In this case, Site table is grouped by ".org", ".net", ".com" domain. _nsubrecs shows that each three domain has three records. If you execute drildown to the column which has table as a type, you can get the value of column which is stored in referenced table. _nsubrecs pseudo column is added to the table which is used for drilldown. this pseudo column stores the number of records which is grouped by. Then, investigate referenced table in detail. As Site table use SiteDomain table as column type of domain, you can use --drilldown_output_columns to know detail of referenced column. Execution example: select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # 1, # ".org", # 3 # ], # [ # 2, # ".net", # 3 # ], # [ # 3, # ".com", # 3 # ] # ] # ] # ] Now, you can see detail of each grouped domain, drilldown by country column which has ".org" as column value. Execution example: select --table Site --limit 0 --filter "domain._id == 1" --drilldown country # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "japan", # 1 # ], # [ # "korea", # 1 # ], # [ # "usa", # 1 # ] # ] # ] # ] Drilldown with multiple column Drilldown feature supports multiple column. Use comma separated multiple column names as drildown parameter. You can get the each result of drilldown at once. Execution example: select --table Site --limit 0 --drilldown domain,country # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # ".org", # 3 # ], # [ # ".net", # 3 # ], # [ # ".com", # 3 # ] # ], # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "japan", # 3 # ], # [ # "brazil", # 1 # ], # [ # "usa", # 2 # ], # [ # "korea", # 1 # ], # [ # "china", # 2 # ] # ] # ] # ] Sorting drildown results Use --drilldown_sortby if you want to sort the result of drilldown. For example, specify _nsubrecs as ascending order. Execution example: select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ] # ], # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "brazil", # 1 # ], # [ # "korea", # 1 # ], # [ # "usa", # 2 # ], # [ # "china", # 2 # ], # [ # "japan", # 3 # ] # ] # ] # ] limits drildown results The number of drilldown results is limited to 10 as a default. Use drilldown_limit and drilldown_offset parameter to customize orilldown results. Execution example: select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 9 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "country", # "SiteCountry" # ], # [ # "domain", # "SiteDomain" # ], # [ # "link", # "Site" # ], # [ # "links", # "Site" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "title", # "ShortText" # ] # ] # ], # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "usa", # 2 # ], # [ # "china", # 2 # ] # ] # ] # ] Note that drilldown to the column which stores string is slower than the columns which stores the other types. If you drilldown to string type of column, create the table that type of primary key is string, then create the column which refers that table. Tag search and reverse resolution of reference relationships As you know, Groonga supports to store array in column which refers other table. In fact, you can do tag search by using array data which refers other table. Tag search is very fast because Groonga use inverted index as data structure. Tag search Let's consider to create a search engine for an web site to share movies. Each movie may be associated with multiple keywords which represents the content of movie. Let's create tables for movie information, then search the movies. First, create the Video table which stores movie information. the Video table has two columns. the title column stores title of the movie. the tags column stores multiple tag information in reference Tag table. Next, create the Tag table which stores tag information. the Tag table has one column. The tag string is stored as primary key, then index_tags stores indexes for tags column of Video table. Execution example: table_create --name Video --flags TABLE_HASH_KEY --key_type UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create --name Tag --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Video --name title --flags COLUMN_SCALAR --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Video --name tags --flags COLUMN_VECTOR --type Tag # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Tag --name index_tags --flags COLUMN_INDEX --type Video --source tags # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Video [ {"_key":1,"title":"Soccer 2010","tags":["Sports","Soccer"]}, {"_key":2,"title":"Zenigata Kinjirou","tags":["Variety","Money"]}, {"_key":3,"title":"groonga Demo","tags":["IT","Server","groonga"]}, {"_key":4,"title":"Moero!! Ultra Baseball","tags":["Sports","Baseball"]}, {"_key":5,"title":"Hex Gone!","tags":["Variety","Quiz"]}, {"_key":6,"title":"Pikonyan 1","tags":["Animation","Pikonyan"]}, {"_key":7,"title":"Draw 8 Month","tags":["Animation","Raccoon"]}, {"_key":8,"title":"K.O.","tags":["Animation","Music"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 8] After creating indexed column, you can do full-text search very fast. The indexed column is also automatically updated when stored data is refreshed. List up the movies that specific keywords are given. Execution example: select --table Video --query tags:@Variety --output_columns _key,title # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "UInt32" # ], # [ # "title", # "ShortText" # ] # ], # [ # 2, # "Zenigata Kinjirou" # ], # [ # 5, # "Hex Gone!" # ] # ] # ] # ] select --table Video --query tags:@Sports --output_columns _key,title # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "UInt32" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "Soccer 2010" # ], # [ # 4, # "Moero!! Ultra Baseball" # ] # ] # ] # ] select --table Video --query tags:@Animation --output_columns _key,title # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "UInt32" # ], # [ # "title", # "ShortText" # ] # ], # [ # 6, # "Pikonyan 1" # ], # [ # 7, # "Draw 8 Month" # ], # [ # 8, # "K.O." # ] # ] # ] # ] You can search by tags such as "Variety", "Sports" and "Animation". Reverse resolution of reference relationships Groonga supports indexes for reverse resolution among tables. Tag search is one of concrete examples. For example, you can search friendships by reverse resolution in social networking site. Following example shows how to create User table which stores user information, username column which stores user name, friends column which stores list of user's friends in array, index_friends column as indexed column. Execution example: table_create --name User --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table User --name username --flags COLUMN_SCALAR --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table User --name friends --flags COLUMN_VECTOR --type User # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table User --name index_friends --flags COLUMN_INDEX --type User --source friends # [[0, 1337566253.89858, 0.000355720520019531], true] load --table User [ {"_key":"ken","username":"健作","friends":["taro","jiro","tomo","moritapo"]} {"_key":"moritapo","username":"森田","friends":["ken","tomo"]} {"_key":"taro","username":"ぐるんが太郎","friends":["jiro","tomo"]} {"_key":"jiro","username":"ぐるんが次郎","friends":["taro","tomo"]} {"_key":"tomo","username":"トモちゃん","friends":["ken","hana"]} {"_key":"hana","username":"花子","friends":["ken","taro","jiro","moritapo","tomo"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 6] Let's show list of users who contains specified user in friend list. Execution example: select --table User --query friends:@tomo --output_columns _key,username # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "username", # "ShortText" # ] # ], # [ # "ken", # "健作" # ], # [ # "taro", # "ぐるんが太郎" # ], # [ # "jiro", # "ぐるんが次郎" # ], # [ # "moritapo", # "森田" # ], # [ # "hana", # "花子" # ] # ] # ] # ] select --table User --query friends:@jiro --output_columns _key,username # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "username", # "ShortText" # ] # ], # [ # "ken", # "健作" # ], # [ # "taro", # "ぐるんが太郎" # ], # [ # "hana", # "花子" # ] # ] # ] # ] Then drilldown the count which shows user is listed as friend. Execution example: select --table User --limit 0 --drilldown friends # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 6 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "friends", # "User" # ], # [ # "index_friends", # "UInt32" # ], # [ # "username", # "ShortText" # ] # ] # ], # [ # [ # 6 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "taro", # 3 # ], # [ # "jiro", # 3 # ], # [ # "tomo", # 5 # ], # [ # "moritapo", # 2 # ], # [ # "ken", # 3 # ], # [ # "hana", # 1 # ] # ] # ] # ] As you can see, it shows the results which follows reverse resolution of reference relationship. Geo location search with index Groonga supports to add indexes to the column which stores geo location information. Groonga is very fast because it use such indexes against the column which contains geo location information to search enormous number of records. Execution example: table_create --name GeoSite --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table GeoSite --name location --type WGS84GeoPoint # [[0, 1337566253.89858, 0.000355720520019531], true] table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table GeoIndex --name index_point --type GeoSite --flags COLUMN_INDEX --source location # [[0, 1337566253.89858, 0.000355720520019531], true] load --table GeoSite [ {"_key":"http://example.org/","location":"128452975x503157902"}, {"_key":"http://example.net/","location":"128487316x502920929"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 5000)' --output_columns _key,location # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ] # ], # [ # "http://example.org/", # "128452975x503157902" # ] # ] # ] # ] These indexes are also used when sorting the records with geo location search. Execution example: select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 50000)' --output_columns _key,location,_score --sortby '-geo_distance(location, "128515259x503187188")' --scorer '_score = geo_distance(location, "128515259x503187188")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "location", # "WGS84GeoPoint" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://example.org/", # "128452975x503157902", # 2054 # ], # [ # "http://example.net/", # "128487316x502920929", # 6720 # ] # ] # ] # ] match_columns parameter Full-text search against multiple columns Groonga supports full-text search against multiple columns. Let's consider blog site. Usually, blog site has a table which contains title column and content column. How do you search the blog entry which contains specified keywords in title or content? In such a case, there are two ways to create indexes. One way is creating column index against each column. The other way is creating one column index against multiple columns. Either way, Groonga supports similar full-text search syntax. Creating column index against each column Here is the example which create column index against each column. First, create Blog1 table, add title column which stores title string, message column which stores content of blog entry. Then create IndexBlog1 table for column indexes, add index_title column for title column, index_message column for message column. Execution example: table_create --name Blog1 --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Blog1 --name title --flags COLUMN_SCALAR --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Blog1 --name message --flags COLUMN_SCALAR --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create --name IndexBlog1 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table IndexBlog1 --name index_title --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source title # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table IndexBlog1 --name index_message --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source message # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Blog1 [ {"_key":"grn1","title":"Groonga test","message":"Groonga message"}, {"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"}, {"_key":"grn3","title":"Groonga message","message":"none"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] match_columns option of select command accepts multiple columns as search target. Specify query string to query option. Then you can do full-text search title and content of blog entries. Let's try to search blog entries. Execution example: select --table Blog1 --match_columns title||message --query groonga # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "message", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "grn1", # "Groonga message", # "Groonga test" # ], # [ # 3, # "grn3", # "none", # "Groonga message" # ], # [ # 2, # "grn2", # "rakutan eggs 4 - 4 Groonga moritars", # "baseball result" # ] # ] # ] # ] select --table Blog1 --match_columns title||message --query message # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "message", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 3, # "grn3", # "none", # "Groonga message" # ], # [ # 1, # "grn1", # "Groonga message", # "Groonga test" # ] # ] # ] # ] select --table Blog1 --match_columns title --query message # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "message", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 3, # "grn3", # "none", # "Groonga message" # ] # ] # ] # ] Creating one column index against multiple columns Groonga also supports one column index against multiple columns. The difference for previous example is only one column index exists. Thus, There is one common column index against title and message column. Even though same column index is used, Groonga supports to search against title column only, message column only and title or message column. Execution example: table_create --name Blog2 --flags TABLE_HASH_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Blog2 --name title --flags COLUMN_SCALAR --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table Blog2 --name message --flags COLUMN_SCALAR --type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create --name IndexBlog2 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table IndexBlog2 --name index_blog --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Blog2 --source title,message # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Blog2 [ {"_key":"grn1","title":"Groonga test","message":"Groonga message"}, {"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"}, {"_key":"grn3","title":"Groonga message","message":"none"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] Let's search same query in previous section. You can get same search results. Execution example: select --table Blog2 --match_columns title||message --query groonga # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "message", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "grn1", # "Groonga message", # "Groonga test" # ], # [ # 2, # "grn2", # "rakutan eggs 4 - 4 Groonga moritars", # "baseball result" # ], # [ # 3, # "grn3", # "none", # "Groonga message" # ] # ] # ] # ] select --table Blog2 --match_columns title||message --query message # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "message", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 1, # "grn1", # "Groonga message", # "Groonga test" # ], # [ # 3, # "grn3", # "none", # "Groonga message" # ] # ] # ] # ] select --table Blog2 --match_columns title --query message # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "message", # "ShortText" # ], # [ # "title", # "ShortText" # ] # ], # [ # 3, # "grn3", # "none", # "Groonga message" # ] # ] # ] # ] NOTE: There may be a question that "which is the better solution for indexing." It depends on the case. • Indexes for each column - The update performance tends to be better than multiple colum index because there is enough buffer for updating. On the other hand, the efficiency of disk usage is not so good. • Indexes for multiple column - It saves disk usage because it shares common buffer. On the other hand, the update performance is not so good. Full text search with specific index name TODO Nested index search among related table by column index If there are relationships among multiple table with column index, you can search multiple table by specifying reference column name. Here is the concrete example. There are tables which store blog articles, comments for articles. The table which stores articles has columns for article and comment. And the comment column refers Comments table. The table which stores comments has columns for comment and column index to article table. if you want to search the articles which contain specified keyword in comment, you need to execute fulltext search for table of comment, then search the records which contains fulltext search results. But, you can search the records by specifying the reference column index at once. Here is the sample schema. Execution example: table_create Comments TABLE_HASH_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comments content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Articles TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Articles content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Articles comment COLUMN_SCALAR Comments # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon articles_content COLUMN_INDEX|WITH_POSITION Articles content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon comments_content COLUMN_INDEX|WITH_POSITION Comments content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comments article COLUMN_INDEX Articles comment # [[0, 1337566253.89858, 0.000355720520019531], true] Here is the sample data. Execution example: load --table Comments [ {"_key": 1, "content": "I'm using Groonga too!"}, {"_key": 2, "content": "I'm using Groonga and Mroonga!"}, {"_key": 3, "content": "I'm using Mroonga too!"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] load --table Articles [ {"content": "Groonga is fast!", "comment": 1}, {"content": "Groonga is useful!"}, {"content": "Mroonga is fast!", "comment": 3} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] You can write the query that search the records which contains specified keyword as a comment, then fetch the articles which refers to it. Query for searching the records described above: select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *" You need to concatenate comment column of Articles table and content column of Comments table with period( . ) as --match_columns arguments. At first, this query execute fulltext search from content of Comments table, then fetch the records of Articles table which refers to already searched records of Comments table. (Because of this, if you comment out the query which creates index column article of Comments table, you can't get intended search results.) Execution example: select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_score", # "Int32" # ], # [ # "comment", # "Comments" # ], # [ # "content", # "Text" # ] # ], # [ # 1, # 1, # 1, # "Groonga is fast!" # ] # ] # ] # ] Now, you can search articles which contains specific keywords as a comment. The feature of nested index search is not limited to the relationship between two table only. Here is the sample schema similar to previous one. The difference is added table which express 'Reply' and relationship is extended to three tables. Execution example: table_create Replies2 TABLE_HASH_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Replies2 content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Comments2 TABLE_HASH_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comments2 content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comments2 comment COLUMN_SCALAR Replies2 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Articles2 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Articles2 content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Articles2 comment COLUMN_SCALAR Comments2 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Lexicon2 TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon2 articles_content COLUMN_INDEX|WITH_POSITION Articles2 content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon2 comments_content COLUMN_INDEX|WITH_POSITION Comments2 content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon2 replies_content COLUMN_INDEX|WITH_POSITION Replies2 content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comments2 article COLUMN_INDEX Articles2 comment # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Replies2 reply_to COLUMN_INDEX Comments2 comment # [[0, 1337566253.89858, 0.000355720520019531], true] Here is the sample data. Execution example: load --table Replies2 [ {"_key": 1, "content": "I'm using Rroonga too!"}, {"_key": 2, "content": "I'm using Groonga and Mroonga and Rroonga!"}, {"_key": 3, "content": "I'm using Nroonga too!"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] load --table Comments2 [ {"_key": 1, "content": "I'm using Groonga too!", "comment": 1}, {"_key": 2, "content": "I'm using Groonga and Mroonga!", "comment": 2}, {"_key": 3, "content": "I'm using Mroonga too!"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] load --table Articles2 [ {"content": "Groonga is fast!", "comment": 1}, {"content": "Groonga is useful!", "comment": 2}, {"content": "Mroonga is fast!", "comment": 3} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] Query for searching the records described above: select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *" select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *" The first query searches mroonga from Comments2 table, the second one searches mroonga from Replies2 and Comments2 table by using reference column index. Execution example: select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_score", # "Int32" # ], # [ # "comment", # "Comments2" # ], # [ # "content", # "Text" # ] # ], # [ # 2, # 1, # 2, # "Groonga is useful!" # ], # [ # 3, # 1, # 3, # "Mroonga is fast!" # ] # ] # ] # ] select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_score", # "Int32" # ], # [ # "comment", # "Comments2" # ], # [ # "content", # "Text" # ] # ], # [ # 2, # 1, # 2, # "Groonga is useful!" # ] # ] # ] # ] As a result, the first query matches two article because of Comments2 table has two records which contains mroonga as keyword. On the other hand, the second one matches one article only because of Replies2 table has only one record which contains mroonga as keyword, and there is one record which contains same keyword and refers to the record in Comments2 table. Indexes with Weight TODO Prefix search with patricia trie Groonga supports to create a table with patricia trie option. By specifying it, You can do prefix search. And more, you can do suffix search against primary key by specifying additional option. Prefix search by primary key table_create command which uses TABLE_PAT_KEY for flags option supports prefix search by primary key. Execution example: table_create --name PatPrefix --flags TABLE_PAT_KEY --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table PatPrefix [ {"_key":"James"} {"_key":"Jason"} {"_key":"Jennifer"}, {"_key":"Jeff"}, {"_key":"John"}, {"_key":"Joseph"}, ] # [[0, 1337566253.89858, 0.000355720520019531], 6] select --table PatPrefix --query _key:^Je # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 3, # "Jennifer" # ], # [ # 4, # "Jeff" # ] # ] # ] # ] Suffix search by primary key table_create command which uses TABLE_PAT_KEY and KEY_WITH_SIS for flags option supports prefix search and suffix search by primary key. If you set KEY_WITH_SIS flag, suffix search records also are added when you add the data. So if you search simply, the automatically added records are hit in addition to the original records. In order to search only the original records, you need a plan. For example, in order to make this distinction between the original records and automatically added records, add the original column indicating that it is the original record, and add original column is true to the search condition. For attention, use --filter option because --query option is not specify Bool type value intuitively. Execution example: table_create --name PatSuffix --flags TABLE_PAT_KEY|KEY_WITH_SIS --key_type ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create --table PatSuffix --name original --type Bool # [[0, 1337566253.89858, 0.000355720520019531], true] load --table PatSuffix [ {"_key":"ひろゆき","original":true}, {"_key":"まろゆき","original":true}, {"_key":"ひろあき","original":true}, {"_key":"ゆきひろ","original":true} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] select --table PatSuffix --query _key:$ゆき # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "original", # "Bool" # ] # ], # [ # 3, # "ゆき", # false # ], # [ # 2, # "ろゆき", # false # ], # [ # 5, # "まろゆき", # true # ], # [ # 1, # "ひろゆき", # true # ] # ] # ] # ] select --table PatSuffix --filter '_key @$ "ゆき" && original == true' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "original", # "Bool" # ] # ], # [ # 5, # "まろゆき", # true # ], # [ # 1, # "ひろゆき", # true # ] # ] # ] # ] Additional information about lexicon for full text search Groonga uses lexicon for full text search as a table. Thus, Groonga can hold multiple information each lexicon. For example, Groonga holds frequency of word, flags for stop word, importance of word and so on. TODO: Write document. Let's create micro-blog Let's create micro-blog with full text search by Groonga. Micro-blog is one of the broadcast medium in the forms of blog. It is mainly used to post small messages like a Twitter. Create a table Let's create table. table_create --name Users --flags TABLE_HASH_KEY --key_type ShortText table_create --name Comments --flags TABLE_HASH_KEY --key_type ShortText table_create --name HashTags --flags TABLE_HASH_KEY --key_type ShortText table_create --name Bigram --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint column_create --table Users --name name --flags COLUMN_SCALAR --type ShortText column_create --table Users --name follower --flags COLUMN_VECTOR --type Users column_create --table Users --name favorites --flags COLUMN_VECTOR --type Comments column_create --table Users --name location --flags COLUMN_SCALAR --type WGS84GeoPoint column_create --table Users --name location_str --flags COLUMN_SCALAR --type ShortText column_create --table Users --name description --flags COLUMN_SCALAR --type ShortText column_create --table Users --name followee --flags COLUMN_INDEX --type Users --source follower column_create --table Comments --name comment --flags COLUMN_SCALAR --type ShortText column_create --table Comments --name last_modified --flags COLUMN_SCALAR --type Time column_create --table Comments --name replied_to --flags COLUMN_SCALAR --type Comments column_create --table Comments --name replied_users --flags COLUMN_VECTOR --type Users column_create --table Comments --name hash_tags --flags COLUMN_VECTOR --type HashTags column_create --table Comments --name location --flags COLUMN_SCALAR --type WGS84GeoPoint column_create --table Comments --name posted_by --flags COLUMN_SCALAR --type Users column_create --table Comments --name favorited_by --flags COLUMN_INDEX --type Users --source favorites column_create --table HashTags --name hash_index --flags COLUMN_INDEX --type Comments --source hash_tags column_create --table Bigram --name users_index --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Users --source name,location_str,description column_create --table Bigram --name comment_index --flags COLUMN_INDEX|WITH_POSITION --type Comments --source comment column_create --table GeoIndex --name users_location --type Users --flags COLUMN_INDEX --source location column_create --table GeoIndex --name comments_location --type Comments --flags COLUMN_INDEX --source location Users table This is the table which stores user information. It stores name of user, profile, list of follower and so on. _key User ID name User name follower List of following users favorites List of favorite comments location Current location of user (geolocation) location_str Current location of user (string) description User profile followee Indexes for follower column in Users table. With this indexes, you can search users who follows the person. Comments table This is the table which stores comments and its metadata. It stores content of comment, posted date, comment which reply to, and so on. _key Comment ID comment Content of comment last_modified Posted date replied_to Comment which you reply to someone replied_users List of users who you reply to hash_tags List of hash tags about comment location Posted place (for geolocation) posted_by Person who write comment favorited_by Indexes for favorites column in Users table. With this indexes, you can search the person who mark comment as favorite one. HashTags table This is the table which stores hash tags for comments. _key Hash tag hash_index Indexes for Comments.hash_tags. With this indexes, you can search list of comments with specified hash tags. Bigram table This is the table which stores indexes for full text search by user information or comments. _key Word users_index Indexes of user information. This column contains indexes of user name (Users.name), current location (Users.location_str), profile (Users.description). comment_index Indexes about content of comments (Comments.comment). GeoIndex table This is the table which stores indexes of location column to search geo location effectively. users_location Indexes of location column for Users table comments_location Indexes of location column for Comments table Loading data Then, load example data. load --table Users [ { "_key": "alice", "name": "Alice", "follower": ["bob"], "favorites": [], "location": "152489000x-255829000", "location_str": "Boston, Massachusetts", "description": "Groonga developer" }, { "_key": "bob", "name": "Bob", "follower": ["alice","charlie"], "favorites": ["alice:1","charlie:1"], "location": "146249000x-266228000", "location_str": "Brooklyn, New York City", "description": "" }, { "_key": "charlie", "name": "Charlie", "follower": ["alice","bob"], "favorites": ["alice:1","bob:1"], "location": "146607190x-267021260", "location_str": "Newark, New Jersey", "description": "Hmm,Hmm" } ] load --table Comments [ { "_key": "alice:1", "comment": "I've created micro-blog!", "last_modified": "2010/03/17 12:05:00", "posted_by": "alice", }, { "_key": "bob:1", "comment": "First post. test,test...", "last_modified": "2010/03/17 12:00:00", "posted_by": "bob", }, { "_key": "alice:2", "comment": "@bob Welcome!!!", "last_modified": "2010/03/17 12:05:00", "replied_to": "bob:1", "replied_users": ["bob"], "posted_by": "alice", }, { "_key": "bob:2", "comment": "@alice Thanks!", "last_modified": "2010/03/17 13:00:00", "replied_to": "alice:2", "replied_users": ["alice"], "posted_by": "bob", }, { "_key": "bob:3", "comment": "I've just used 'Try-Groonga' now! #groonga", "last_modified": "2010/03/17 14:00:00", "hash_tags": ["groonga"], "location": "146566000x-266422000", "posted_by": "bob", }, { "_key": "bob:4", "comment": "I'm come at city of New York for development camp! #groonga #travel", "last_modified": "2010/03/17 14:05:00", "hash_tags": ["groonga", "travel"], "location": "146566000x-266422000", "posted_by": "bob", }, { "_key": "charlie:1", "comment": "@alice @bob I've tried to register!", "last_modified": "2010/03/17 15:00:00", "replied_users": ["alice", "bob"], "location": "146607190x-267021260", "posted_by": "charlie", } { "_key": "charlie:2", "comment": "I'm at the Museum of Modern Art in NY now!", "last_modified": "2010/03/17 15:05:00", "location": "146741340x-266319590", "posted_by": "charlie", } ] follower column and favorites column in Users table and replied_users column in Comments table are vector column, so specify the value as an array. location column in Users table, location column in Comments table use GeoPoint type. This type accepts "[latitude]x[longitude]". last_modified column in Comments table use Time type. There are two way to specify the value. First, specify epoch (seconds since Jan, 1, 1970 AM 00:00:00) directly. In this case, you can specify micro seconds as fractional part. The value is converted from factional part to the time which is micro seconds based one when data is loaded. The second, specify the timestamp as string in following format: "(YEAR)/(MONTH)/(DAY) (HOUR):(MINUTE):(SECOND)". In this way, the string is casted to proper micro seconds when data is loaded. Search Let's search micro-blog. Search users by keyword In this section, we search micro-blog against multiple column by keyword. See match_columns to search multiple column at once. Let's search user from micro-blog's user name, location, description entries. Execution example: select --table Users --match_columns name,location_str,description --query "New York" --output_columns _key,name # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], true] # [[0, 1337566253.89858, 0.000355720520019531], 3] # [[0, 1337566253.89858, 0.000355720520019531], 8] # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "name", # "ShortText" # ] # ], # [ # "bob", # "Bob" # ] # ] # ] # ] By using "New York" as searching keyword for user, "Bob" who lives in "New York" is listed in search result. Search users by geolocation data (GeoPoint) In this section, we search users by column data which use type of GeoPoint. See search about GeoPoint column. Following example searches users who live in within 20km from specified location. Execution example: select --table Users --filter 'geo_in_circle(location,"146710080x-266315480",20000)' --output_columns _key,name # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "name", # "ShortText" # ] # ], # [ # "charlie", # "Charlie" # ], # [ # "bob", # "Bob" # ] # ] # ] # ] It shows that "Bob" and "Charlie" lives in within 20 km from station of "Grand Central Terminal". Search users who follows specific user In this section, we do reverse resolution of reference relationships which is described at index. Following examples shows reverse resolution about follower column of Users table. Execution example: select --table Users --query follower:@bob --output_columns _key,name # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "name", # "ShortText" # ] # ], # [ # "alice", # "Alice" # ], # [ # "charlie", # "Charlie" # ] # ] # ] # ] It shows that "Alice" and "Charlie" follows "Bob". Search comments by using the value of GeoPoint type In this section, we search comments which are written within specific location. Then, we also use drill down which is described at drilldown. Following example shows how to drill down against search results. As a result, we get the value of count which is grouped by user, and hash tags respectively. Execution example: select --table Comments --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "posted_by.name", # "ShortText" # ], # [ # "comment", # "ShortText" # ] # ], # [ # "Charlie", # "I'm at the Museum of Modern Art in NY now!" # ], # [ # "Bob", # "I've just used 'Try-Groonga' now! #groonga" # ], # [ # "Bob", # "I'm come at city of New York for development camp! #groonga #travel" # ], # [ # "Charlie", # "@alice @bob I've tried to register!" # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "groonga", # 2 # ], # [ # "travel", # 1 # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "charlie", # 2 # ], # [ # "bob", # 2 # ] # ] # ] # ] Above query searches comments which are posted within 20 km from Central Park in city of New York. As specified range is 20 km, all comments with location are collected. You know that search results contain 2 #groonga hash tags and one #travel hash tag, and bob and charlie posted 2 comments. Search comments by keyword In this section, we search comments which contains specific keyword. And more, Let's calculate the value of _score which is described at search. Execution example: select --table Comments --query comment:@Now --output_columns comment,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "comment", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "I've just used 'Try-Groonga' now! #groonga", # 1 # ], # [ # "I'm at the Museum of Modern Art in NY now!", # 1 # ] # ] # ] # ] By using 'Now' as a keyword, above query returns 2 comments. It also contains count of 'Now' as the value of _score. Search comments by keyword and geolocation In this section, we search comments by specific keyword and geolocation. By using --query and --filter option, following query returns records which are matched to both conditions. Execution example: select --table Comments --query comment:@New --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "posted_by.name", # "ShortText" # ], # [ # "comment", # "ShortText" # ] # ], # [ # "Bob", # "I'm come at city of New York for development camp! #groonga #travel" # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "groonga", # 1 # ], # [ # "travel", # 1 # ] # ], # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "bob", # 1 # ] # ] # ] # ] It returns 1 comment which meets both condition. It also returns result of drilldown. There is 1 comment which is commented by Bob. Search comments by hash tags In this section, we search comments which contains specific hash tags. Let's use reverse resolution of reference relationships. Execution example: select --table Comments --query hash_tags:@groonga --output_columns posted_by.name,comment --drilldown posted_by # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "posted_by.name", # "ShortText" # ], # [ # "comment", # "ShortText" # ] # ], # [ # "Bob", # "I've just used 'Try-Groonga' now! #groonga" # ], # [ # "Bob", # "I'm come at city of New York for development camp! #groonga #travel" # ] # ], # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "bob", # 2 # ] # ] # ] # ] Above query returns 2 comments which contains #groonga hash tag. It also returns result of drilldown grouped by person who posted it. It shows that there are 2 comments. Bob commented it. Search comments by user id In this section, we search comments which are posted by specific user. Execution example: select --table Comments --query posted_by:bob --output_columns comment --drilldown hash_tags # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "comment", # "ShortText" # ] # ], # [ # "First post. test,test..." # ], # [ # "@alice Thanks!" # ], # [ # "I've just used 'Try-Groonga' now! #groonga" # ], # [ # "I'm come at city of New York for development camp! #groonga #travel" # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "groonga", # 2 # ], # [ # "travel", # 1 # ] # ] # ] # ] Above query returns 4 comments which are posted by Bob. It also returns result of drilldown by hash tags. There are 2 comments which contains #groonga, and 1 comment which contains #travel as hash tag. Search user's favorite comments In this section, we search user's favorite comments. Execution example: select --table Users --query _key:bob --output_columns favorites.posted_by,favorites.comment # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "favorites.posted_by", # "Users" # ], # [ # "favorites.comment", # "ShortText" # ] # ], # [ # [ # "alice", # "charlie" # ], # [ # "I've created micro-blog!", # "@alice @bob I've tried to register!" # ] # ] # ] # ] # ] Above query returns Bob's favorite comments. Search comments by posted time In this section, we search comments by posted time. See type of Time in data. Let's search comments that posted time are older than specified time. Execution example: select Comments --filter 'last_modified<=1268802000' --output_columns posted_by.name,comment,last_modified --drilldown hash_tags,posted_by # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "posted_by.name", # "ShortText" # ], # [ # "comment", # "ShortText" # ], # [ # "last_modified", # "Time" # ] # ], # [ # "Alice", # "I've created micro-blog!", # 1268795100.0 # ], # [ # "Bob", # "First post. test,test...", # 1268794800.0 # ], # [ # "Alice", # "@bob Welcome!!!", # 1268795100.0 # ], # [ # "Bob", # "@alice Thanks!", # 1268798400.0 # ], # [ # "Bob", # "I've just used 'Try-Groonga' now! #groonga", # 1268802000.0 # ] # ], # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "groonga", # 1 # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "alice", # 2 # ], # [ # "bob", # 3 # ] # ] # ] # ] Above query returns 5 comments which are older than 2010/03/17 14:00:00. It also returns result of drilldown by posted person. There are 2 comments by Alice, 3 comments by Bob. Query expansion Groonga accepts query_expander parameter for /reference/commands/select command. It enables you to extend your query string. For example, if user searches "theatre" instead of "theater", query expansion enables to return search results of "theatre OR theater". This kind of way reduces search leakages. This is what really user wants. Preparation To use query expansion, you need to create table which stores documents, synonym table which stores query string and replacement string. In synonym table, primary key represents original string, the column of ShortText represents modified string. Let's create document table and synonym table. Execution example: table_create Doc TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Doc body COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Term TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Term Doc_body COLUMN_INDEX|WITH_POSITION Doc body # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Synonym TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Synonym body COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Doc [ {"_key": "001", "body": "Play all night in this theater."}, {"_key": "002", "body": "theatre is British spelling."}, ] # [[0, 1337566253.89858, 0.000355720520019531], 2] load --table Synonym [ {"_key": "theater", "body": "(theater OR theatre)"}, {"_key": "theatre", "body": "(theater OR theatre)"}, ] # [[0, 1337566253.89858, 0.000355720520019531], 2] In this case, it doesn't occur search leakage because it creates synonym table which accepts "theatre" and "theater" as query string. Search Then, let's use prepared synonym table. First, use select command without query_expander parameter. Execution example: select Doc --match_columns body --query "theater" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "body", # "ShortText" # ] # ], # [ # 1, # "001", # "Play all night in this theater." # ] # ] # ] # ] select Doc --match_columns body --query "theatre" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "body", # "ShortText" # ] # ], # [ # 2, # "002", # "theatre is British spelling." # ] # ] # ] # ] Above query returns the record which completely equal to query string. Then, use query_expander parameter against body column of Synonym table. Execution example: select Doc --match_columns body --query "theater" --query_expander Synonym.body # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "body", # "ShortText" # ] # ], # [ # 1, # "001", # "Play all night in this theater." # ], # [ # 2, # "002", # "theatre is British spelling." # ] # ] # ] # ] select Doc --match_columns body --query "theatre" --query_expander Synonym.body # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "body", # "ShortText" # ] # ], # [ # 1, # "001", # "Play all night in this theater." # ], # [ # 2, # "002", # "theatre is British spelling." # ] # ] # ] # ] In which cases, query string is replaced to "(theater OR theatre)", thus synonym is considered for full text search.
SERVER
Server packages The package groonga is the minimum set of fulltext search engine. If you want to use groonga for server use, you can install additional preconfigured packages. There are two packages for server use. • groonga-httpd (nginx and HTTP protocol based server package) • groonga-server-gqtp (/spec/gqtp protocol based server package) There is the reason why groonga supports not only GQTP but also two HTTP server packages. /spec/gqtp - Groonga Query Transfer Protocol is desined to reduce overheads and improve performance. But, GQTP is less support of client library than HTTP protocol does. As HTTP is matured protocol, you can take advantage of existing tool and there are many client library (See related projects for details). If you use groonga-httpd package, you can also take benefits of nginx functionality. We recommend to use groonga-httpd at first, because it provides fullfilling server functionality. If you have performance issues which is derived from protocol overheads, consider to use groonga-server-gqtp. NOTE: In the previous versions, there is a groonga-server-http package (simple HTTP protocol based server package). It is now marked as obsolete, please use groonga-httpd packages instead. groonga-server-http package became a transitional package for groonga-httpd. groonga-httpd groonga-httpd is a nginx and HTTP protocol based server package. Preconfigured setting: ┌───────────────────┬───────────────────────────────────────┐ │Item │ Default value │ ├───────────────────┼───────────────────────────────────────┤ │Port number │ 10041 │ ├───────────────────┼───────────────────────────────────────┤ │Access log path │ /var/log/groonga/httpd/acccess.log │ ├───────────────────┼───────────────────────────────────────┤ │Error log path │ /var/log/groonga/http-query.log │ ├───────────────────┼───────────────────────────────────────┤ │Database │ /var/lib/groonga/db/* │ ├───────────────────┼───────────────────────────────────────┤ │Configuration file │ /etc/groonga/httpd/groonga-httpd.conf │ └───────────────────┴───────────────────────────────────────┘ Start HTTP server Starting groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-httpd start Starting groonga HTTP server(Fedora): % sudo systemctl start groonga-httpd Stop HTTP server Stopping groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-httpd stop Starting groonga HTTP server(Fedora): % sudo systemctl stop groonga-httpd Restart HTTP server Restarting groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-httpd restart Restarting groonga HTTP server(Fedora): % sudo systemctl restart groonga-httpd groonga-server-gqtp groonga-server-gqtp is a /spec/gqtp protocol based server package. ┌────────────┬───────────────────────────────────┐ │Item │ Default value │ ├────────────┼───────────────────────────────────┤ │Port number │ 10043 │ ├────────────┼───────────────────────────────────┤ │process-log │ /var/log/groonga/groonga-gqtp.log │ ├────────────┼───────────────────────────────────┤ │query-log │ /var/log/groonga/gqtp-query.log │ ├────────────┼───────────────────────────────────┤ │Database │ /var/lib/groonga/db/* │ └────────────┴───────────────────────────────────┘ Configuration file for server setting (Debian/Ubuntu): /etc/default/groonga/groonga-server-gqtp Configuration file for server setting (CentOS): /etc/sysconfig/groonga-server-gqtp Start GQTP server Starting groonga GQTP server(Debian/Ubuntu/CentOS): % sudo service groonga-server-gqtp start Starting groonga GQTP server(Fedora): % sudo systemctl start groonga-server-gqtp Stop GQTP server Stopping groonga GQTP server(Debian/Ubuntu/CentOS): % sudo service groonga-server-http stop Stopping groonga GQTP server(Fedora): % sudo systemctl stop groonga-server-gqtp Restart GQTP server Restarting groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-server-gqtp restart Restarting groonga HTTP server(Fedora): % sudo systemctl restart groonga-server-gqtp groonga-server-http groonga-server-http is a simple HTTP protocol based server package. NOTE: groonga-server-http package is the transitional package since Groonga 4.0.8. Please use groonga-httpd instead. Preconfigured setting: ┌────────────┬───────────────────────────────────┐ │Item │ Default value │ ├────────────┼───────────────────────────────────┤ │Port number │ 10041 │ ├────────────┼───────────────────────────────────┤ │process-log │ /var/log/groonga/groonga-http.log │ ├────────────┼───────────────────────────────────┤ │query-log │ /var/log/groonga/http-query.log │ ├────────────┼───────────────────────────────────┤ │Database │ /var/lib/groonga/db/* │ └────────────┴───────────────────────────────────┘ Configuration file for server setting (Debian/Ubuntu): /etc/default/groonga/groonga-server-http Configuration file for server setting (CentOS): /etc/sysconfig/groonga-server-http Start HTTP server Starting groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-server-http start Starting groonga HTTP server(Fedora): % sudo systemctl start groonga-server-http Stop HTTP server Stopping groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-server-http stop Stopping groonga HTTP server(Fedora): % sudo systemctl stop groonga-server-http Restart HTTP server Restarting groonga HTTP server(Debian/Ubuntu/CentOS): % sudo service groonga-server-http restart Restarting groonga HTTP server(Fedora): % sudo systemctl restart groonga-server-http HTTP Groonga provides two HTTP server implementations. • http/groonga • http/groonga-httpd http/groonga is a simple implemntation. It is fast but doesn't have many HTTP features. It is convenient to try Groonga because it requires just a few command line options to run. http/groonga-httpd is a nginx based implementation. It is also fast and has many HTTP features. Comparison There are many differences between groonga and groonga-httpd. Here is a comparison table. ┌─────────────────────────┬────────────────────────┬──────────────────────┐ │ │ groonga │ groonga-httpd │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Performance │ o │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Using multi CPU cores │ o (by multi threading) │ o (by multi process) │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Configuration file │ optional │ required │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Custom prefix path │ x │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Custom command version │ o │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Multi databases │ x │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Authentication │ x │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Gzip compression │ x │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │POST │ o │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │HTTPS │ x │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Access log │ x │ o │ ├─────────────────────────┼────────────────────────┼──────────────────────┤ │Upgrading without │ x │ o │ │downtime │ │ │ └─────────────────────────┴────────────────────────┴──────────────────────┘ Performance Both groonga and groonga-httpd are very fast. They can work with the same throughput. Using multi CPU cores Groonga scales on multi CPU cores. groonga scales by multi threading. groonga-httpd scales by multi processes. groonga uses the same number of threads as CPU cores by default. If you have 8 CPU cores, 8 threads are used by default. groonga-httpd uses 1 process by default. You need to set worker_processes directive to use CPU cores. If you have 8 CPU cores, specify worker_processes 8 in configuration file like the following: worker_processes 8; http { # ... } Configuration file groonga can work without configuration file. All configuration items such as port number and the max number of threads can be specified by command line. Configuration file is also used to specify configuration items. It's very easy to run groonga HTTP server because groonga requires just a few options to run. Here is the most simple command line to start HTTP server by groonga: % groonga --protocol http -d /PATH/TO/DATABASE groonga-httpd requires configuration file to run. Here is the most simple configuration file to start HTTP server by groonga-httpd: events { } http { server { listen 10041; location /d/ { groonga on; groonga_database /PATH/TO/DATABASE; } } } Custom prefix path groonga accepts a path that starts with /d/ as command URL such as http://localhost:10041/d/status. You cannot change the prefix path /d/. groonga-httpd can custom prefix path. For example, you can use http://localhost:10041/api/status as command URL. Here is a sample configuration to use /api/ as prefix path: events { } http { server { listen 10041; location /api/ { # <- change this groonga on; groonga_database /PATH/TO/DATABASE; } } } Custom command version Groonga has /reference/command/command_version mechanism. It is for upgrading groonga commands with backward compatibility. groonga can change the default command veresion by --default-command-version option. Here is a sample command line to use command version 2 as the default command version: % groonga --protocol http --default-command-version 2 -d /PATH/TO/DATABASE groonga-httpd cannot custom the default command version yet. But it will be supported soon. If it is supported, you can provides different command version groonga commands in the same groonga-httpd process. Here is a sample configuration to provide command version 1 commands under /api/1/ and command version 2 comamnds under /api/2/: events { } http { server { listen 10041; groonga_database /PATH/TO/DATABASE; location /api/1/ { groonga on; groogna_default_command_version 1; } location /api/2/ { groonga on; groogna_default_command_version 2; } } } Multi databases groonga can use only one database in a process. groonga-httpd can use one or more databases in a process. Here is a sample configuration to provide /tmp/db1 database under /db1/ path and /tmp/db2 database under /db2/ path: events { } http { server { listen 10041; location /db1/ { groonga on; groonga_database /tmp/db1; } location /db2/ { groonga on; groonga_database /tmp/db2; } } } Authentication HTTP supports authentications such as basic authentication and digest authentication. It can be used for restricting use of danger command such as /reference/commands/shutdown. groonga doesn't support any authentications. To restrict use of danger command, other tools such as iptables and reverse proxy are needed. groonga-httpd supports basic authentication. Here is a sample configuration to restrict use of /reference/commands/shutdown command: events { } http { server { listen 10041; groonga_database /PATH/TO/DATABASE; location /d/shutdown { groonga on; auth_basic "manager is required!"; auth_basic_user_file "/etc/managers.htpasswd"; } location /d/ { groonga on; } } } Gzip compression HTTP supports response compression by gzip with Content-Encoding: gzip response header. It can reduce network flow. It is useful for large search response. groonga doesn't support compression. To support compression, reverse proxy is needed. groonga-httpd supports gzip compression. Here is a sample configuration to compress response by gzip: events { } http { server { listen 10041; groonga_database /PATH/TO/DATABASE; location /d/ { groonga on; gzip on; gzip_types *; } } } Note that gzip_types * is specified. It's one of the important configuration. gzip_types specifies gzip target data formats by MIME types. groonga-httpd returns one of JSON, XML or MessagePack format data. But those formats aren't included in the default value of gzip_types. The default value of gzip_types is text/html. To compress response data from groonga-httpd by gzip, you need to specify gzip_types * or gzip_types application/json text/xml application/x-msgpack explicitly. gzip_types * is recommended. There are two reasons for it. The first, groonga may support more formats in the future. The second, all requests for the location are processed by groonga. You don't need to consider about other modules. POST You can load your data by POST JSON data. You need follow the following rules to use loading by POST. • Content-Type header value must be application/json. • JSON data is sent as body. • Table name is specified by query parameter such as table=NAME. Here is an example curl command line that loads two users alice and bob to Users table: % curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users" HTTPS TODO Access log TODO Upgrading without downtime TODO groonga TODO groonga-httpd TODO GQTP Summary GQTP is the acronym standing for "Groonga Query Transfer Protocol". GQTP is a protocol designed for Groonga. It's a stateful protocol. You can send multiple commands in one session. GQTP will be faster rather than /server/http when you send many light commands like /reference/commands/status. GQTP will be almost same performance as HTTP when you send heavy commands like /reference/commands/select. We recommend that you use HTTP for many cases. Because there are many HTTP client libraries. If you want to use GQTP, you can use the following libraries: • Ruby: groonga-client • Python: poyonga • Go: goroo • PHP: proonga • C/C++: Groonga (Groonga can be also used as library) It's not a library but you can use /reference/executables/groonga as a GQTP client. How to run /reference/executables/groonga is a GQTP server implementation. You can run a Groonga server by the following command line: groonga --protocol gqtp -s [options] DB_PATH You can run a Groonga server as a daemon by the following command line: groonga --protocol gqtp -d [options] DB_PATH See /reference/executables/groonga for available options. Memcached binary protocol Groonga supports the memcached binary protocol. The following form shows how to run Groonga as a memcached binary protocol server daemon. Form: groonga [-p PORT_NUMBER] -d --protocol memcached DB_PATH The --protocol option and its argument specify the protocol of the server. "memcached" specifies to use the memcached binary protocol. You don't need to create a table. When Groonga receives a request, it creates a table automatically. The table name will be Memcache .
CLIENT
Groonga supports the original protocol (/spec/gqtp), the memcached binary protocol and HTTP. As HTTP and memcached binary protocol is matured protocol, you can use existing client libraries. There are some client libraries which provides convenient API to connect to Groonga server in some program languages. See Client libraries for details.
REFERENCE MANUAL
Executables This section describes executable files provided by groonga package. grndb Summary NOTE: This executable command is an experimental feature. New in version 4.0.9. grndb manages a Groonga database. Here are features: • Checks whether database is broken or not. • Recovers broken database automatically if the database is recoverable. Syntax grndb requires command and database path: grndb COMMAND [OPTIONS] DATABASE_PATH Here are available commands: • check - Checks whether database is broken or not. • recover - Recovers database. Usage Here is an example to check the database at /var/lib/groonga/db/db: % grndb check /var/lib/groonga/db/db Here is an example to recover the database at /var/lib/groonga/db/db: % grndb recover /var/lib/groonga/db/db Commands This section describes available commands. check It checks an existing Groonga database. If the database is broken, grndb reports reasons and exits with non-0 exit status. NOTE: You must not use this command for opened database. If the database is opened, this command may report wrong result. check has some options. --target New in version 5.1.2. It specifies a check target object. If your database is large and you know an unreliable object, this option will help you. check need more time for large database. You can reduce check time by --target option to reduce check target. The check target is checked recursive. Because related objects of unreliable object will be unreliable. If the check target is a table, all columns of the table are also checked recursive. If the check target is a table and its key type is another table, the another table is also checked recursive. If the check target is a column and its value type is a table, the table is also checked recursive. If the check target is an index column, the table specified as value type and all sources are also checked recursive. Here is an example that checks only Entries table and its columns: % grndb check --target Entries /var/lib/groonga/db/db Here is an example that checks only Entries.name column: % grndb check --target Entries.name /var/lib/groonga/db/db recover It recovers an existing broken Groonga database. If the database is not broken, grndb does nothing and exits with 0 exit status. If the database is broken and one or more index columns are only broken, grndb recovers these index columns and exists with 0 exit status. It may take a long time for large indexed data. If the database is broken and tables or data columns are broken, grndb reports broken reasons and exits with non-0 exit status. You can know whether the database is recoverable or not by check command. NOTE: You must not use this command for opened database. If the database is opened, this command may break the database. grnslap 名前 grnslap - groongaプロセスの通信層のパフォーマンスをチェックするツール 書式 grnslap [options] [dest] 説明 grnslapは、groongaプロセスに対してリクエストを多重に行い、パフォーマンスをチェックするためのツールです。 Groonga独自プロトコルであるGQTPと、httpの両プロトコルでリクエストを行うことができます。また、リクエストの多重度を指定することができます。 クエリの内容を標準入力から与えることができます。実稼動環境でのクエリパタンに近いクエリを標準入力に与えることによって、実稼動環境に近い状態での検証を行うことができます。 現在は、make installしてもインストールは行われない。 オプション -P リクエストのプロトコルを指定します。 http httpでリクエストします。対象のhttpのパス群(GETパラメータを含む)をLF区切り形式で標準入力に与えると、それらのパスに順次アクセスします。 gqtp gqtpでリクエストします。gqtpのリクエストをLF区切り形式で標準入力に与えると、それらのリクエストを順次行います。 -m リクエストの多重度を指定します。初期値は10です。 引数 dest 接続先のホスト名とポート番号をを指定します(デフォルト値は'localhost:10041')。ポート番号を指定しない場合には、10041が指定されたものとします。 サンプル http://localhost:10041/d/status に、多重度100でリクエストを行う。 > yes /d/status | head -n 100 | grnslap -P http -m 100 localhost:10041 2009-11-12 19:34:09.998696|begin: max_concurrency=100 max_tp=10000 2009-11-12 19:34:10.011208|end : n=100 min=46 max=382 avg=0 qps=7992.966190 etime=0.012511 groonga executable file Summary groonga executable file provides the following features: • Fulltext search server • Fulltext search shell • Client for Groonga fulltext search server Groonga can be used as a library. If you want to use Groonga as a library, you need to write a program in C, C++ and so on. Library use is useful for embedding fulltext search feature to your application, but it's not easy to use. You can use groonga executable file to get fulltext search feature. If you want to try Groonga, fulltext search shell usage is useful. You don't need any server and client. You just need one terminal. You can try Groonga like the following: % groonga -n db > status [[0,1429687763.70845,0.000115633010864258],{"alloc_count":195,...}] > quit % If you want to create an application that has fulltext search feature, fulltext search server usage is useful. You can use Groonga as a server like RDBMS (Relational DataBase Management System). Client-server model is a popular architecture. Normally, client for Groonga fulltext server usage isn't used. Syntax groonga executable file has the following four modes: • Standalone mode • Server mode • Daemon mode • Client mode There are common options in these modes. These common options is described later section. Standalone mode In standalone mode, groonga executable file runs one or more Groonga /reference/command against a local Groonga database. Here is the syntax to run shell that executes Groonga command against temporary database: groonga [options] Here is the syntax to create a new database and run shell that executes Groonga command against the new database: groonga [options] -n DB_PATH Here is the syntax to run shell that executes Groonga command against existing database: groonga [options] DB_PATH Here is the syntax to run Groonga command against existing database and exit: groonga [options] DB_PATH COMMAND [command arguments] Server mode In server mode, groonga executable file runs as a server. The server accepts connections from other processes at local machine or remote machine and executes received Groonga /reference/command against a local Groonga database. You can choose one protocol from /server/http and /server/gqtp. Normally, HTTP is suitable but GQTP is the default protocol. This section describes only about HTTP protocol usage. In server mode, groonga executable file runs in the foreground. If you want to run Groonga server in the background, see Daemon mode. Here is the syntax to run Groonga server with temporary database: groonga [options] --protocol http -s Here is the syntax to create a new database and run Groonga server with the new database: groonga [options] --protocol http -s -n DB_PATH Here is the syntax to run Groonga server with existing database: groonga [options] --protocol http -s DB_PATH Daemon mode In daemon mode, groonga executable file runs as a daemon. Daemon is similar to server but it runs in the background. See Server mode about server. Here is the syntax to run Groonga daemon with temporary database: groonga [options] --protocol http -d Here is the syntax to create a new database and run Groonga daemon with the new database: groonga [options] --protocol http -d -n DB_PATH Here is the syntax to run Groonga daemon with existing database: groonga [options] --protocol http -d DB_PATH --pid-path option will be useful for daemon mode. Client mode In client mode, groonga executable file runs as a client for GQTP protocol Groonga server. Its usage is similar to Standalone mode. You can run shell and execute one command. You need to specify server address instead of local database. Note that you can use groonga executable file as a client for HTTP protocol Groonga server. Here is the syntax to run shell that executes Groonga command against Groonga server that is running at 192.168.0.1:10043: groonga [options] -c --host 192.168.0.1 --port 10043 Here is the syntax to run Groonga command against Groonga server that is running at 192.168.0.1:10043 and exit: groonga [options] -c --host 192.168.0.1 --port 10043 COMMAND [command arguments] Options -n Creates new database. -c Executes groonga command in client mode. -s Executes groonga command in server mode. Use "Ctrl+C" to stop the groonga process. -d Executes groonga command in daemon mode. In contrast to server mode, groonga command forks in daemon mode. For example, to stop local daemon process, use "curl http://127.0.0.1:10041/d/shutdown". -e, --encoding <encoding> Specifies encoding which is used for Groonga database. This option is effective when you create new Groonga database. This parameter specifies one of the following values: none, euc, utf8, sjis, latin or koi8r. -l, --log-level <log level> Specifies log level. A integer value between 0 and 8. The meaning of value is: ┌──────────┬─────────────┐ │log level │ description │ ├──────────┼─────────────┤ │0 │ Nothing │ ├──────────┼─────────────┤ │1 │ Emergency │ ├──────────┼─────────────┤ │2 │ Alert │ ├──────────┼─────────────┤ │3 │ Critical │ ├──────────┼─────────────┤ │4 │ Error │ ├──────────┼─────────────┤ │5 │ Warning │ ├──────────┼─────────────┤ │6 │ Notice │ ├──────────┼─────────────┤ │7 │ Info │ ├──────────┼─────────────┤ │8 │ Debug │ └──────────┴─────────────┘ -a, --address <ip/hostname> Deprecated since version 1.2.2: Use --bind-address instead. --bind-address <ip/hostname> New in version 1.2.2. サーバモードかデーモンモードで実行するとき、listenするアドレスを指定します。(デフォルトは hostname の返すホスト名) -p, --port <port number> クライアント、サーバ、またはデーモンモードで使用するTCPポート番号。 (クライアントモードのデフォルトは10043番、サーバ、またはデーモンモードのデフォルトは、HTTPの場合、10041番、GQTPの場合、10043番) -i, --server-id <ip/hostname> サーバモードかデーモンモードで実行するとき、サーバのIDとなるアドレスを指定します。(デフォルトは`hostname`の返すホスト名) -h, --help ヘルプメッセージを出力します。 --document-root <path> httpサーバとしてgroongaを使用する場合に静的ページを格納するディレクトリを指定します。 デフォルトでは、データベースを管理するための汎用的なページに対応するファイルが/usr/share/groonga/admin_html以下にインストールされます。このディレクトリをdocument-rootオプションの値に指定して起動した場合、ウェブブラウザでhttp://hostname:port/index.htmlにアクセスすると、ウェブベースのデータベース管理ツールを使用できます。 --protocol <protocol> http,gqtpのいずれかを指定します。(デフォルトはgqtp) --log-path <path> ログを出力するファイルのパスを指定します。(デフォルトは/var/log/groonga/groonga.logです) --log-rotate-threshold-size <threshold> New in version 5.0.3. Specifies threshold for log rotation. Log file is rotated when log file size is larger than or equals to the threshold (default: 0; disabled). --query-log-path <path> クエリーログを出力するファイルのパスを指定します。(デフォルトでは出力されません) --query-log-rotate-threshold-size <threshold> New in version 5.0.3. Specifies threshold for query log rotation. Query log file is rotated when query log file size is larger than or equals to the threshold (default: 0; disabled). -t, --max-threads <max threasd> 最大で利用するスレッド数を指定します。(デフォルトはマシンのCPUコア数と同じ数です) --pid-path <path> PIDを保存するパスを指定します。(デフォルトでは保存しません) --config-path <path> 設定ファイルのパスを指定します。設定ファイルは以下のようなフォーマットになります。: # '#'以降はコメント。 ; ';'以降もコメント。 # 'キー = 値'でオプションを指定。 pid-path = /var/run/groonga.pid # '='の前後の空白はは無視される。↓は↑と同じ意味。 pid-path=/var/run/groonga.pid # 'キー'は'--XXX'スタイルのオプション名と同じものが使える。 # 例えば、'--pid-path'に対応するキーは'pid-path'。 # ただし、キーが'config-path'のオプションは無視される。 --cache-limit <limit> キャッシュ数の最大値を指定します。(デフォルトは100です) --default-match-escalation-threshold <threshold> 検索の挙動をエスカレーションする閾値を指定します。(デフォルトは0です) Command line parameters dest 使用するデータベースのパス名を指定します。 クライアントモードの場合は接続先のホスト名とポート番号を指定します(デフォルト値は'localhost:10043')。ポート番号を指定しない場合には、10043が指定されたものとします。 command [args] スタンドアロンおよびクライアントモードの場合は、実行するコマンドとその引数をコマンドライン引数に指定できます。コマンドライン引数にcommandを与えなかった場合は、標準入力から一行ずつEOFに達するまでコマンド文字列を読み取り、順次実行します。 Command groongaコマンドを通してデータベースを操作する命令をコマンドと呼びます。コマンドは主にC言語で記述され、groongaプロセスにロードすることによって使用できるようになります。 それぞれのコマンドは一意な名前と、0個以上の引数を持ちます。 引数は以下の2種類の方法のいずれかで指定することができます。: 形式1: コマンド名 値1 値2,.. 形式2: コマンド名 --引数名1 値1 --引数名2 値2,.. 形式1でコマンドを実行する場合は、定義された順番で値を指定しなければならず、途中の引数の値を省略することはできません。形式2でコマンドを実行する場合は、「--引数名」のように引数の名前を明示しなければならない代わりに、任意の順番で引数を指定することが可能で、途中の引数の指定を省略することもできます。 標準入力からコマンド文字列を与える場合は、コマンド名と引数名と値は、空白( )で区切ります。空白や、記号「"'()」のうちいずれかを含む値を指定したい場合は、シングルクォート(')かダブルクォート(")で値を囲みます。値として指定する文字列の中では、改行文字は'n'に置き換えて指定します。また、引用符に使用した文字を値の中で指定する場合には、その文字の前にバックスラッシュ('') を指定します。バックスラッシュ文字自身を値として指定する場合には、その前にバックスラッシュを指定します。 You can write command list with continuous line which is represented by '\' character.: table_create --name Terms \ --flags TABLE_PAT_KEY \ --key_type ShortText \ --default_tokenizer TokenBigram Builtin command 以下のコマンドは組み込みコマンドとして予め定義されています。 status groongaプロセスの状態を表示します。 table_list DBに定義されているテーブルのリストを表示します。 column_list テーブルに定義されているカラムのリストを表示します。 table_create DBにテーブルを追加します。 column_create テーブルにカラムを追加します。 table_remove DBに定義されているテーブルを削除します。 column_remove テーブルに定義されているカラムを削除します。 load テーブルにレコードを挿入します。 select テーブルに含まれるレコードを検索して表示します。 define_selector 検索条件をカスタマイズした新たな検索コマンドを定義します。 quit データベースとのセッションを終了します。 shutdown サーバ(デーモン)プロセスを停止します。 log_level ログ出力レベルを設定します。 log_put ログ出力を行います。 clearlock ロックを解除します。 Usage 新しいデータベースを作成します。: % groonga -n /tmp/hoge.db quit % 作成済みのデータベースにテーブルを定義します。: % groonga /tmp/hoge.db table_create Table 0 ShortText [[0]] % サーバを起動します。: % groonga -d /tmp/hoge.db % httpサーバとして起動します。: % groonga -d -p 80 --protocol http --document-root /usr/share/groonga/admin_html /tmp/hoge.db % サーバに接続し、テーブル一覧を表示します。: % groonga -c localhost table_list [[0],[["id","name","path","flags","domain"],[256,"Table","/tmp/hoge.db.0000100",49152,14]]] % groonga-benchmark 名前 groonga-benchmark - groongaテストプログラム 書式 groonga-benchmark [options...] [script] [db] 説明 groonga-benchmarkは、groonga汎用ベンチマークツールです。 groongaを単独のプロセスとして利用する場合はもちろん、サーバプログラムとして利用する場合の動作確認や実行速度測定が可能です。 groonga-benchmark用のデータファイルは自分で作成することも既存のものを利用することもできます。既存のデータファイルは、ftp.groonga.orgから必要に応じダウンロードします。そのため、groonga及びgroonga-benchmarkが動作し、インターネットに接続できる環境であればgroongaコマンドの知識がなくてもgroongaの動作を確認できます。 現在は、Linux 及びWindows上で動作します。make installしてもインストールは行われません。 オプション -i, --host <ip/hostname> 接続するgroongaサーバを、ipアドレスまたはホスト名で指定します。指定先にgroongaサーバが立ち上がっていない場合、接続不能となることに注意してください。このオプションを指定しない場合、groonga-benchmarkは自動的にlocalhostのgroongaサーバを起動して接続します。 -p, --port <port number> 自動的に起動するgroongaサーバ、または明示的に指定した接続先のgroonga サーバが利用するポート番号を指定します。接続先のgroongaサーバが利用しているポートと、このオプションで指定したポート番号が異なる場合、接続不能となることに注意してください。 --dir ftp.groonga.org に用意されているスクリプトファイルを表示します。 --ftp ftp.groonga.orgとFTP通信を行い、scriptファイルの同期やログファイルの送信を行います。 --log-output-dir デフォルトでは、groonga-benchmark終了後のログファイルの出力先ははカレントディレクトリです。このオプションを利用すると、任意のディレクトリに出力先を変更することができます。 --groonga <groonga_path> groongaコマンドのパスを指定します。デフォルトでは、PATHの中からgroongaコマンドを探します。 --protocol <gqtp|http> groongaコマンドが使うプロトコルとして gqtp または http を指定します。 引数 script groonga-benchmarkの動作方法(以下、groonga-benchmark命令と呼びます)を記述したテキストファイルです。拡張子は.scrです。 db groonga-benchmarkが利用するgroonga データベースです。指定されたデータベースが存在しない場合、groonga-benchmarkが新規に作成します。またgroonga サーバを自動的に起動する場合もこの引数で指定したデータベースが利用されます。接続するgroonga サーバを明示的に指定した場合に利用するデータベースは、接続先サーバが使用中のデータベースになることに注意してください。 使い方 まず、シェル上(Windowsならコマンドプロンプト上)で: groonga-benchmark test.scr 任意のDB名 とタイプしてください。もしgroonga-benchmarkが正常に動作すれば、: test-ユーザ名-数字.log というファイルが作成されるはずです。作成されない場合、このドキュメントの「トラブルシューティング」の章を参照してください。 スクリプトファイル スクリプトファイルは、groonga-benchmark命令を記述したテキストファイルです。 ";"セミコロンを利用して、一行に複数のgroonga-benchmark命令を記述することができます。一行に複数のgroonga-benchmark命令がある場合、各命令は並列に実行されます。 "#"で始まる行はコメントとして扱われます。 groonga-benchmark命令 現在サポートされているgroonga-benchmark命令は以下の11種類です。 do_local コマンドファイル [スレッド数] [繰り返し数] コマンドファイルをgroonga-benchmark単体で実行します。スレッド数が指定されている場合、複数のスレッドで同じコマンドファイルを同時に実行します。繰り返し数が指定されてい場合、コマンドファイルの内容を繰り返し実行します。スレッド数、繰り返し数とも省略時は1です。1スレッドで複数回動作させたい場合は、do_local コマンドファイル 1 [繰り返し数]と明示的に指定してください。 do_gqpt コマンドファイル [スレッド数] [繰り返し数] コマンドファイルをgroongaサーバでGQTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。 do_http コマンドファイル [スレッド数] [繰り返し数] コマンドファイルをgroongaサーバでHTTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。 rep_local コマンドファイル [スレッド数] [繰り返し数] コマンドファイルをgroonga-benchmark単体で実行し、より詳細な報告を行います。 rep_gqpt コマンドファイル [スレッド数] [繰り返し数] コマンドファイルをgroongaサーバでGQTP経由で実行し、より詳細な報告を行います。 スレッド数や繰り返し数の意味はdo_localと 同じです。 rep_http コマンドファイル [スレッド数] [繰り返し数] コマンドファイルをgroongaサーバでHTTP経由で実行し、より詳細な報告を行います。 スレッド数や繰り返し数の意味はdo_localと 同じです。 out_local コマンドファイル 入力ファイル名 コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果をすべて”出力ファイル"に書きだします。この結果は、test_local, test_gqtp命令で利用します。なおこの命令の「出力ファイル」とは、groonga-benchmark実行時に自動的に作成されるログとは別のものです。groonga-benchmarkではコメントが利用できる以外、: groonga < コマンドファイル > 出力ファイル とした場合と同じです。 out_gqtp コマンドファイル 出力ファイル名 コマンドファイルをgroongaサーバでGQTP経由で実行します。その他はout_local命令と同等です。 out_http コマンドファイル 出力ファイル名 コマンドファイルをgroongaサーバでHTTP経由で実行します。その他はout_local命令と同等です。 test_local コマンドファイル 入力ファイル名 コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果を入力ファイルと比較します。処理時間など本質的要素以外に差分があった場合、差分を、入力ファイル.diffというファイルに書きだします。 コマンドファイル コマンドファイルは、groonga組み込みコマンドを1行に1つずつ記述したテキストファイルです。拡張子に制限はありません。groonga組み込みコマンドに関しては /reference/command を参照してください。 サンプル スクリプトファイルのサンプルです。: # sample script rep_local test.ddl do_local test.load; do_gqtp test.select 10 10; do_local test.status 10 上記の意味は以下のとおりです。 1行目 コメント行。 2行目 test.ddl というコマンドファイルをgroonga単体で実行し、詳細に報告する。 3行目 test.load というコマンドファイルをgroonga単体で実行する。(最後の";"セミコロンは複数のgroonga-benchmark命令を記述する場合に必要ですが、この例のように1つのgroonga-benchmark命令を実行する場合に付与しても問題ありません。) 4行目 test.select というコマンドファイルをgroongaサーバで10個のスレッドで同時に実行する。各スレッドはtest.selectの中身を10回繰り返す。また同時に、groonga単体でtest.statusというコマンドファイルを10個のスレッドで実行する。 特殊命令 スクリプトファイルのコメント行には特殊コマンドを埋め込むことが可能です。現在サポートされている特殊命令は以下の二つです。 #SET_HOST <ip/hostname> -i, --hostオプションと同等の機能です。コマンドラインオプションに指定したIPアドレス/ホスト名と、SET_HOSTで指定したIPアドレス/ホスト名が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_HOSTが優先されます。SET_HOSTを利用した場合、サーバが自動的には起動されないのもコマンドラインオプションで指定した場合と同様です。 #SET_PORT <port number> -p, --port オプションと同等の機能です。コマンドラインオプションに指定したポート番号とSET_PORTで指定したポート番号が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_PORTが優先されます。 特殊命令はスクリプトファイルの任意の場所に書き込むことができます。同一ファイル内に複数回特殊命令を記述した場合、「最後の」特殊命令が有効となります。 例えば、 $ ./groonga-benchmark --port 20010 test.scr testdb とコマンド上でポートを指定した場合でも、もしtest.scrの中身が #SET_PORT 10900 rep_local test.ddl do_local test.load; rep_gqtp test.select 10 10; rep_local test.status 10 #SET_PORT 10400 であれば、自動的に起動されるgroongaサーバはポート番号10400を利用します。 groonga-benchmark実行結果 groonga-benchmarkが正常に終了すると、(拡張子を除いた)スクリプト名-ユーザ名-実行開始時刻.logという形式のログファイルがカレントディレクトリに作られます。ログファイルは自動的にftp.groonga.org に送信されます。ログファイルは以下のようなjson形式のテキストです。 [{"script": "test.scr", "user": "homepage", "date": "2010-04-14 22:47:04", "CPU": Intel(R) Pentium(R) 4 CPU 2.80GHz", "BIT": 32, "CORE": 1, "RAM": "975MBytes", "HDD": "257662232KBytes", "OS": "Linux 2.4.20-24.7-i686", "HOST": "localhost", "PORT": "10041", "VERSION": "0.1.8-100-ga54c5f8" }, {"jobs": "rep_local test.ddl", "detail": [ [0, "table_create res_table --key_type ShortText", 1490, 3086, [0,1271252824.25846,0.00144 7]], [0, "column_create res_table res_column --type Text", 3137, 5956, [0,1271252824.2601,0.002 741]], [0, "column_create res_table user_column --type Text", 6020, 8935, [0,1271252824.26298,0.0 02841]], [0, "column_create res_table mail_column --type Text", 8990, 11925, [0,1271252824.26595,0. 002861]], [0, "column_create res_table time_column --type Time", 12008, 13192, [0,1271252824.26897,0 .001147]], [0, "status", 13214, 13277, [0,1271252824.27018,3.0e-05]], [0, "table_create thread_table --key_type ShortText", 13289, 14541, [0,1271252824.27025,0. 001213]], [0, "column_create thread_table thread_title_column --type ShortText", 14570, 17380, [0,12 71252824.27153,0.002741]], [0, "status", 17435, 17480, [0,1271252824.2744,2.7e-05]], [0, "table_create lexicon_table --flags 129 --key_type ShortText --default_tokenizer Token Bigram", 17491, 18970, [0,1271252824.27446,0.001431]], [0, "column_create lexicon_table inv_res_column 514 res_table res_column ", 18998, 33248, [0,1271252824.27596,0.01418]], [0, "column_create lexicon_table inv_thread_column 514 thread_table thread_title_column ", 33285, 48472, [0,1271252824.29025,0.015119]], [0, "status", 48509, 48554, [0,1271252824.30547,2.7e-05]]], "summary" :[{"job": "rep_local test.ddl", "latency": 48607, "self": 47719, "qps": 272.4281 73, "min": 45, "max": 15187, "queries": 13}]}, {"jobs": "do_local test.load; ", "summary" :[{"job": "do_local test.load", "latency": 68693, "self": 19801, "qps": 1010.049 997, "min": 202, "max": 5453, "queries": 20}]}, {"jobs": "do_gqtp test.select 10 10; do_local test.status 10", "summary" :[{"job": " do_local test.status 10", "latency": 805990, "self": 737014, "qps": 54.273053, "min": 24, "max": 218, "queries": 40},{"job": "do_gqtp test.select 10 10", "lat ency": 831495, "self": 762519, "qps": 1967.164097, "min": 73, "max": 135631, "queries": 15 00}]}, {"total": 915408, "qps": 1718.359464, "queries": 1573}] 制限事項 • スクリプトファイルの一行には複数のgroonga-benchmark命令を記述できますが、すべてのスレッド数の合計は最大64までに制限されます。 • コマンドファイル中のgroongaコマンドの長さは最長5000000byteです。 トラブルシューティング もし、groonga-benchmarkが正常に動作しない場合、まず以下を確認してください。 • インターネットに接続しているか? --ftp オプションを指定すると、groonga-benchmarkは動作のたびにftp.groonga.orgと通信します。ftp.groonga.orgと通信可能でない場合、groonga-benchmarkは正常に動作しません。 • groonga サーバが動作していないか? groonga-benchmarkは、-i, --host オプションで明示的にサーバを指定しないかぎり、自動的にlocalhostのgroongaサーバを立ち上げます。すでにgroongaサーバが動作している場合、groonga-benchmarkは正常に動作しない可能性があります。 • 指定したDBが適切か? groonga-benchmarkは、引数で指定したDBの中身はチェックしません。もし指定されたDBが存在しなければ自動的にDBを作成しますが、もしファイルとして存在する場合は中身に関わらず動作を続けてしまい、結果が異常になる可能性があります。 以上の原因でなければ、問題はgroonga-benchmarkかgroongaにあります。ご報告をお願いします。 groonga-httpd Summary groonga-httpd is a program to communicate with a Groonga server using the HTTP protocol. It functions as same as groonga-server-http. Although groonga-server-http has limited support for HTTP with a minimal built-in HTTP server, groonga-httpd has full support for HTTP with an embedded nginx. All standards-compliance and features provided by nginx is also available in groonga-httpd. groonga-httpd has an Web-based administration tool implemented with HTML and JavaScript. You can access to it from http://hostname:port/. Synopsis groonga-httpd [nginx options] Usage Set up First, you'll need to edit the groonga-httpd configuration file to specify a database. Edit /etc/groonga/httpd/groonga-httpd.conf to enable the groonga_database directive like this: # Match this to the file owner of groonga database files if groonga-httpd is # run as root. #user groonga; ... http { ... # Don't change the location; currently only /d/ is supported. location /d/ { groonga on; # <= This means to turn on groonga-httpd. # Specify an actual database and enable this. groonga_database /var/lib/groonga/db/db; } ... } Then, run groonga-httpd. Note that the control immediately returns back to the console because groonga-httpd runs as a daemon process by default.: % groonga-httpd Request queries To check, request a simple query (/reference/commands/status). Execution example: % curl http://localhost:10041/d/status [ [ 0, 1337566253.89858, 0.000355720520019531 ], { "uptime": 0, "max_command_version": 2, "n_queries": 0, "cache_hit_rate": 0.0, "version": "4.0.1", "alloc_count": 161, "command_version": 1, "starttime": 1395806036, "default_command_version": 1 } ] Loading data by POST You can load data by POST JSON data. Here is an example curl command line that loads two users alice and bob to Users table: % curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users" If you loads users from JSON file, prepare JSON file like this: [ {"_key": "alice"}, {"_key": "bob"} ] Then specify JSON file in curl command line: % curl -X POST 'http://localhost:10041/d/load?table=Users' -H 'Content-Type: application/json' -d @users.json Browse the administration tool Also, you can browse Web-based administration tool at http://localhost:10041/. Shut down Finally, to terminate the running groonga-httpd daemon, run this: % groonga-httpd -s stop Configuration directives This section describes only important directives. They are groonga-httpd specific directives and performance related directives. The following directives can be used in the groonga-httpd configuration file. By default, it's located at /etc/groonga/httpd/groonga-httpd.conf. Groonga-httpd specific directives The following directives aren't provided by nginx. They are provided by groonga-httpd to configure groonga-httpd specific configurations. groonga Synopsis: groonga on | off; Default groonga off; Context location Specifies whether Groonga is enabled in the location block. The default is off. You need to specify on to enable groonga. Examples: location /d/ { groonga on; # Enables groonga under /d/... path } location /d/ { groonga off; # Disables groonga under /d/... path } groonga_database Synopsis: groonga_database /path/to/groonga/database; Default groonga_database /usr/local/var/lib/groonga/db/db; Context http, server, location Specifies the path to a Groonga database. This is the required directive. groonga_database_auto_create Synopsis: groonga_database_auto_create on | off; Default groonga_database_auto_create on; Context http, server, location Specifies whether Groonga database is created automatically or not. If the value is on and the Groonga database specified by groonga_database doesn't exist, the Groonga database is created automatically. If the Groonga database exists, groonga-httpd does nothing. If parent directory doesn't exist, parent directory is also created recursively. The default value is on. Normally, the value doesn't need to be changed. groonga_base_path Synopsis: groonga_base_path /d/; Default The same value as location name. Context location Specifies the base path in URI. Groonga uses /d/command?parameter1=value1&... path to run command. The form of path in used in groonga-httpd but groonga-httpd also supports /other-prefix/command?parameter1=value1&... form. To support the form, groonga-httpd removes the base path from the head of request URI and prepend /d/ to the processed request URI. By the path conversion, users can use custom path prefix and Groonga can always uses /d/command?parameter1=value1&... form. Nomally, this directive isn't needed. It is needed for per command configuration. Here is an example configuration to add authorization to /reference/commands/shutdown command: groonga_database /var/lib/groonga/db/db; location /d/shutdown { groonga on; # groonga_base_path is needed. # Because /d/shutdown is handled as the base path. # Without this configuration, /d/shutdown/shutdown path is required # to run shutdown command. groonga_base_path /d/; auth_basic "manager is required!"; auth_basic_user_file "/etc/managers.htpasswd"; } location /d/ { groonga on; # groonga_base_path doesn't needed. # Because location name is the base path. } groonga_log_path Synopsis: groonga_log_path path | off; Default /var/log/groonga/httpd/groonga.log Context http, server, location Specifies Groonga log path in the http, server or location block. The default is /var/log/groonga/httpd/groonga.log. You can disable logging to specify off. Examples: location /d/ { groonga on; # You can disable log for groonga. groonga_log_path off; } groonga_log_level Synopsis: groonga_log_level none | emergency | alert | ciritical | error | warning | notice | info | debug | dump; Default notice Context http, server, location Specifies Groonga log level in the http, server or location block. The default is notice. You can disable logging by specifying none as log level. Examples: location /d/ { groonga on; # You can customize log level for groonga. groonga_log_level notice; } groonga_query_log_path Synopsis: groonga_query_log_path path | off; Default /var/log/groonga/httpd/groonga-query.log Context http, server, location Specifies Groonga's query log path in the http, server or location block. The default is /var/log/groonga/httpd/groonga-query.log. You can disable logging to specify off. Examples: location /d/ { groonga on; # You can disable query log for groonga. groonga_query_log_path off; } Query log is useful for the following cases: • Detecting slow query. • Debugging. You can analyze your query log by groonga-query-log package. The package provides useful tools. For example, there is a tool that analyzing your query log. It can detect slow queries from your query log. There is a tool that replaying same queries in your query log. It can test the new Groonga before updating production environment. Performance related directives The following directives are related to the performance of groonga-httpd. worker_processes For optimum performance, set this to be equal to the number of CPUs or cores. In many cases, Groonga queries may be CPU-intensive work, so to fully utilize multi-CPU/core systems, it's essential to set this accordingly. This isn't a groonga-httpd specific directive, but an nginx's one. For details, see http://wiki.nginx.org/CoreModule#worker_processes. By default, this is set to 1. It is nginx's default. groonga_cache_limit This directive is introduced to customize cache limit for each worker process. Synopsis: groonga_cache_limit CACHE_LIMIT; Default 100 Context http, server, location Specifies Groonga's limit of query cache in the http, server or location block. The default value is 100. You can disable query cache to specify 0 to groonga_cache_limit explicitly. Examples: location /d/ { groonga on; # You can customize query cache limit for groonga. groonga_cache_limit 100; } proxy_cache In short, you can use nginx's reverse proxy and cache mechanism instead of Groonga's built-in query cache feature. Query cache Groonga has query cache feature for /reference/commands/select command. The feature improves performance in many cases. Query cache feature works well on groonga-httpd except you use /reference/commands/cache_limit command on 2 or more workers. Normally, /reference/commands/cache_limit command isn't used. So there is no problem on many cases. Here is a description about a problem of using /reference/commands/cache_limit command on 2 or more workers. Groonga's query cache is available in the same process. It means that workers can't share the cache. If you don't change cache size, it isn't a big problem. If you want to change cache size by /reference/commands/cache_limit command, there is a problem. There is no portable ways to change cache size for all workers. For example, there are 3 workers: +-- worker 1 client -- groonga-httpd (master) --+-- worker 2 +-- worker 3 The client requests /reference/commands/cache_limit command and the worker 1 receives it: +-> worker 1 (changed!) client -> groonga-httpd (master) --+-- worker 2 +-- worker 3 The client requests /reference/commands/cache_limit command again and the worker 1 receives it again: +-> worker 1 (changed again!!!) client -> groonga-httpd (master) --+-- worker 2 +-- worker 3 In this case, the worker 2 and the worker 3 aren't received any requests. So they don't change cache size. You can't choose a worker. So you can't change cache sizes of all workers by /reference/commands/cache_limit command. Reverse proxy and cache You can use nginx's reverse proxy and cache feature for query cache: +-- worker 1 client -- groonga-httpd (master) -- reverse proxy + cache --+-- worker 2 +-- worker 3 You can use the same cache configuration for all workers but you can't change cache configuration dynamically by HTTP. Here is a sample configuration: ... http { proxy_cache_path /var/cache/groonga-httpd levels=1:2 keys_zone=groonga:10m; proxy_cache_valid 10m; ... # Reverse proxy and cache server { listen 10041; ... # Only select command location /d/select { # Pass through groonga with cache proxy_cache groonga; proxy_pass http://localhost:20041; } location / { # Pass through groonga proxy_pass http://localhost:20041; } } # groonga server { location 20041; location /d/ { groonga on; groonga_database /var/lib/groonga/db/db; } } ... } See the following nginx documentations for parameter details: • http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path • http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid • http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache • http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass Note that you need to remove cache files created by nginx by hand after you load new data to Groonga. For the above sample configuration, run the following command to remove cache files: % groonga DB_PATH < load.grn % rm -rf /var/cache/groonga-httpd/* If you use Groonga's query cache feature, you don't need to expire cache by hand. It is done automatically. Available nginx modules All standard HTTP modules are available. HttpRewriteModule is disabled when you don't have PCRE (Perl Compatible Regular Expressions). For the list of standard HTTP modules, see http://wiki.nginx.org/Modules. Groonga HTTP server Name Groonga HTTP server Synopsis groonga -d --protocol http DB_PATH Summary You can communicate by HTTP if you specify http to --protocol option. And output a file that is put under the path, and correspond to specified URI to HTTP request if you specify static page path by --document-root. Groonga has an Web-based administration tool implemented with HTML and JavaScript. If you don't specify --document-root, regarded as administration tool installed path is specified, so you can use administration tool to access http://HOSTNAME:PORT/ in Web browser. Command You can use the same commands of Groonga that starts of the other mode to Groonga server that starts to specify http. A command takes the arguments. An argument has a name. And there are special arguments output_type and command_version. In standalone mode or client mode, a command is specified by the following format. Format 1: COMMAND_NAME VALUE1 VALUE2,.. Format 2: COMMAND_NAME --PARAMETER_NAME1 VALUE1 --PARAMETER_NAME2 VALUE2,.. Format 1 and Format 2 are possible to mix. Output type is specified by output_type in the formats. In HTTP server mode, the following formats to specify command: Format: /d/COMMAND_NAME.OUTPUT_TYPE?ARGUMENT_NAME1=VALUE1&ARGUMENT_NAME2=VALUE2&... But, they need URL encode for command names, arguments names and values. You can use GET method only. You can specify JSON, TSV and XML to output type. command_version is specified for command specification compatibility. See /reference/command/command_version for details. Return value The execution result is output that follows output type specification by the command. groonga-suggest-create-dataset NAME groonga-suggest-create-dataset - Defines schema for a suggestion dataset SYNOPSTIS groonga-suggest-create-dataset [options] DATABASE DATASET DESCTIPION groonga-suggest-create-dataset creates a dataset for /reference/suggest. A database has many datasets. This command just defines schema for a suggestion dataset. This command generates some tables and columns for /reference/suggest. Here is the list of such tables. If you specify 'query' as dataset name, following '_DATASET' suffix are replaced. Thus, 'item_query', 'pair_query', 'sequence_query', 'event_query' tables are generated. • event_type • bigram • kana • item_DATASET • pair_DATASET • sequence_DATASET • event_DATASET • configuration OPTIONS None. EXIT STATUS TODO FILES TODO EXAMPLE TODO SEE ALSO /reference/suggest groonga-suggest-httpd groonga-suggest-learner groonga-suggest-httpd Summary groonga-suggest-httpd is a program to provide interface which accepts HTTP request and returns suggestion dataset, then saves logs for learning. groonga-suggest-httpd behaves similar in point of view of suggestion functionality, but the name of parameter is different. Synopsis groonga-suggest-httpd [options] database_path Usage Set up First you need to set up database for suggestion. Execution example: % groonga-suggest-create-dataset /tmp/groonga-databases/groonga-suggest-httpd query Launch groonga-suggest-httpd Execute groonga-suggest-httpd command: Execution example: % groonga-suggest-httpd /tmp/groonga-databases/groonga-suggest-httpd After executing above command, groonga-suggest-httpd accepts HTTP request on 8080 port. If you just want to save requests into log file, use -l option. Here is the example to save log files under logs directory with log prefix for each file.: % groonga-suggest-httpd -l logs/log /tmp/groonga-databases/groonga-suggest-httpd Under logs directory, log files such as logYYYYmmddHHMMSS-00 are created. Request to groonga-suggest-httpd Here is the sample requests to learn groonga for query dataset: % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=92619&t=complete&q=g' % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=93850&t=complete&q=gr' % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94293&t=complete&q=gro' % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94734&t=complete&q=groo' % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95147&t=complete&q=grooon' % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95553&t=complete&q=groonga' % curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95959&t=submit&q=groonga Options -p, --port Specify http server port number. The default value is 8080. -t, --n-threads Specify number of threads. The default value is 8. This option accepts 128 as the max value, but use the number of CPU cores for performance. -s, --send-endpoint Specify endpoint for sender. -r, --receive-endpoint Specify endpoint for receiver. -l, --log-base-path Specify path prefix of log. --n-lines-per-log-file Specify the number of lines in a log file. The default value is 1,000,000. -d, --daemon Specify this option to daemonize. --disable-max-fd-check Specify this option to disable checking max fd on start. Command line parameters There is one required parameter - database_path. database_path Specifies the path to a Groonga database. This database must be created by groonga-suggest-create-dataset command because it executes required initialization for suggestion. GET parameters groonga-suggest-httpd accepts following GET parameters. There are required parameters which depends on type of query. Required parameters ┌────┬──────────────────────────┬──────┐ │Key │ Description │ Note │ ├────┼──────────────────────────┼──────┤ │q │ UTF-8 encoded string │ │ │ │ which user fills in form │ │ ├────┼──────────────────────────┼──────┤ │t │ The type of query. The │ │ │ │ value of type must be │ │ │ │ complete, correct, │ │ │ │ suggest or submit. It │ │ │ │ also accepts multiple │ │ │ │ type of query which is │ │ │ │ concatinated by |. Note │ │ │ │ that submit is invalid │ │ │ │ value when you specify │ │ │ │ multiple type of query. │ │ └────┴──────────────────────────┴──────┘ Required parameters for learning ┌────┬──────────────────────────┬──────────────────────────┐ │Key │ Description │ Note │ ├────┼──────────────────────────┼──────────────────────────┤ │s │ Elapsed time from 0:00 │ Note that you need │ │ │ January 1, 1970 │ specify the value of s │ │ │ │ in milliseconds │ └────┴──────────────────────────┴──────────────────────────┘ │i │ Unique ID to distinct │ Use session ID or IP │ │ │ user │ address for example │ ├────┼──────────────────────────┼──────────────────────────┤ │l │ Specify the name of │ Note that dataset name │ │ │ dataset for learning. It │ must be matched to │ │ │ also accepts multiple │ following regular │ │ │ dataset name which is │ expression [A-Za-z │ │ │ concatinated by | │ ][A-Za-z0-9 ]{0,15} │ └────┴──────────────────────────┴──────────────────────────┘ Required parameters for suggestion ┌────┬──────────────────────────┬──────────────────────────┐ │Key │ Description │ Note │ ├────┼──────────────────────────┼──────────────────────────┤ │n │ Specify the name of │ This dataset name is │ │ │ dataset for suggestion │ used to calculate │ │ │ │ suggestion results │ └────┴──────────────────────────┴──────────────────────────┘ Optional parameter ┌─────────┬──────────────────────────┬──────────────────────────┐ │Key │ Description │ Note │ ├─────────┼──────────────────────────┼──────────────────────────┤ │callback │ Specify the name of │ The name of function │ │ │ function if you prefer │ must be matched to │ │ │ JSONP as response format │ reqular expression │ │ │ │ [A-Za-z ][A-Za-z0-9 │ │ │ │ ]{0,15} │ └─────────┴──────────────────────────┴──────────────────────────┘ Return value groonga-suggest-httpd command returns following response in JSON or JSONP format. In JSON format: {TYPE: [[CANDIDATE_1, SCORE_1], [CANDIDATE_2, SCORE_2], ... [CANDIDATE_N, SCORE_N]]} In JSONP format: FUNCTION({TYPE: [[CANDIDATE_1, SCORE_1], [CANDIDATE_2, SCORE_2], ... [CANDIDATE_N, SCORE_N]]}) TYPE One of complete, correct and suggest. CANDIDATE_N The string of candidate (UTF-8). SCORE_N The score. groonga-suggest-learner Summary groonga-suggest-learner is a program to learn suggest result from data which derived from groonga-suggest-httpd. Usually, it is used with groonga-suggest-httpd, but It is allowed to launch standalone. In such a case, groonga-suggest-learner loads data from log directory. Synopsis groonga-suggest-learner [options] database_path Usage groonga-suggest-leaner supports the two way of learning data. One is learning data from groonga-suggest-httpd, the other is learning data from already existing log files. Learning data from groonga-suggest-httpd Execute groonga-suggest-learner.: groonga-suggest-learner testdb/db Learning data from log files Execute groonga-suggest-learner with -l option. Here is the sample to load log data under logs directory: groonga-suggest-learner -l logs testdb/db Options -r <endpoint>, --receive-endpoint <endpoint> Uses <endpoint> as the receiver endpoint. -s <endpoint>, --send-endpoint <endpoint> Uses <endpoint> as the sender endpoint. -d, --daemon Runs as a daemon. -l <directory>, --log-base-path <directory> Reads logs from <directory>. --log-path <path> Outputs log to <path>. --log-level <level> Uses <level> for log level. <level> must be between 1 and 9. Larger level outputs more logs. Parameters There is one required parameter - database_path. database_path Specifies the path to a groonga database. Related tables Here is the list of table which learned data is stored. If you specify query as dataset name, following _DATASET suffix are replaced. Thus, event_query table is used. • event_DATASET Output Groonga supports the following output format types: • JSON • XML • TSV (Tab Separated Values) • MessagePack JSON is the default output format. Usage Groonga has the following query interfaces: • command line • HTTP They provides different ways to change output format type. Command line You can use command line query interface by groonga DB_PATH or groonga -c. Those groonga commands shows > prompt. In this query interface, you can specify output format type by output_type option. If you don't specify output_type option, you will get a result in JSON format: > status [[0,1327721628.10738,0.000131845474243164],{"alloc_count":142,"starttime":1327721626,"uptime":2,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}] You can specify json as output_type value to get a result in JSON format explicitly: > status --output_type json [[0,1327721639.08321,7.93933868408203e-05],{"alloc_count":144,"starttime":1327721626,"uptime":13,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}] You need to specify xml as output_type value to get a result in XML format: > status --output_type xml <?xml version="1.0" encoding="utf-8"?> <RESULT CODE="0" UP="1327721649.61095" ELAPSED="0.000126361846923828"> <RESULT> <TEXT>alloc_count</TEXT> <INT>146</INT> <TEXT>starttime</TEXT> <INT>1327721626</INT> <TEXT>uptime</TEXT> <INT>23</INT> <TEXT>version</TEXT> <TEXT>1.2.9-92-gb87d9f8</TEXT> <TEXT>n_queries</TEXT> <INT>0</INT> <TEXT>cache_hit_rate</TEXT> <FLOAT>0.0</FLOAT> <TEXT>command_version</TEXT> <INT>1</INT> <TEXT>default_command_version</TEXT> <INT>1</INT> <TEXT>max_command_version</TEXT> <INT>2</INT></RESULT> </RESULT> You need to specify tsv as output_type value to get a result in TSV format: > status --output_type tsv 0 1327721664.82675 0.000113964080810547 "alloc_count" 146 "starttime" 1327721626 "uptime" 38 "version" "1.2.9-92-gb87d9f8" "n_queries" 0 "cache_hit_rate" 0.0 "command_version" 1 "default_command_version" 1 "max_command_version" 2 END You need to specify msgpack as output_type value to get a result in MessagePack format: > status --output_type msgpack (... omitted because MessagePack is binary data format. ...) HTTP You can use HTTP query interface by groonga --protocol http -s DB_PATH. Groonga HTTP server starts on port 10041 by default. In this query interface, you can specify output format type by extension. If you don't specify extension, you will get a result in JSON format: % curl http://localhost:10041/d/status [[0,1327809294.54311,0.00082087516784668],{"alloc_count":155,"starttime":1327809282,"uptime":12,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}] You can specify json as extension to get a result in JSON format explicitly: % curl http://localhost:10041/d/status.json [[0,1327809319.01929,9.5367431640625e-05],{"alloc_count":157,"starttime":1327809282,"uptime":37,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}] You need to specify xml as extension to get a result in XML format: % curl http://localhost:10041/d/status.xml <?xml version="1.0" encoding="utf-8"?> <RESULT CODE="0" UP="1327809339.5782" ELAPSED="9.56058502197266e-05"> <RESULT> <TEXT>alloc_count</TEXT> <INT>159</INT> <TEXT>starttime</TEXT> <INT>1327809282</INT> <TEXT>uptime</TEXT> <INT>57</INT> <TEXT>version</TEXT> <TEXT>1.2.9-92-gb87d9f8</TEXT> <TEXT>n_queries</TEXT> <INT>0</INT> <TEXT>cache_hit_rate</TEXT> <FLOAT>0.0</FLOAT> <TEXT>command_version</TEXT> <INT>1</INT> <TEXT>default_command_version</TEXT> <INT>1</INT> <TEXT>max_command_version</TEXT> <INT>2</INT></RESULT> </RESULT> You need to specify tsv as extension to get a result in TSV format: % curl http://localhost:10041/d/status.tsv 0 1327809366.84187 8.44001770019531e-05 "alloc_count" 159 "starttime" 1327809282 "uptime" 84 "version" "1.2.9-92-gb87d9f8" "n_queries" 0 "cache_hit_rate" 0.0 "command_version" 1 "default_command_version" 1 "max_command_version" 2 END You need to specify msgpack as extension to get a result in MessagePack format: % curl http://localhost:10041/d/status.msgpack (... omitted because MessagePack is binary data format. ...) Command Command is the most important processing unit in query API. You request a processing to groonga by a command. This section describes about command and built-in commands. Command version 概要 Groonga1.1からコマンドバージョンという概念が導入されます。コマンドバージョンは、selectやloadなどのGroongaのコマンドの仕様の互換性を表します。Groongaパッケージのバージョンが新しくなったとしても、同一のコマンドバージョンが使用可能であるなら、すべてのコマンドについて互換性が保証されます。コマンドバージョンが異なれば、同じ名前のコマンドであっても、動作に互換性がない可能性があります。 あるバージョンのGroongaは、二つのコマンドバージョンを同時にサポートするようになります。 使用するコマンドバージョンは、groongaを起動する際のコマンドラインオプションないしコンフィグファイルにdefault-commnad-versionパラメータを与えることによって指定できます。また、個々のコマンドを実行する際に、command_versionパラメータを与えることによっても指定することができます。 コマンドバージョンは1からはじまり、更新されるたびに1ずつ大きくなります。現状のGroongaのコマンドの仕様はcommand-version 1という扱いになります。次回提供するGroongaは、command-version 1とcommand-version 2の二つをサポートすることになります。 バージョンの位置づけ あるバージョンのGroongaにおいてサポートされるコマンドバージョンは、develop, stable,deprecatedのいずれかの位置づけとなります。 develop まだ開発中であり、仕様が変更される可能性があります。 stable 使用可能であり仕様も安定しています。その時点で使用することが推奨されます。 deprecated 使用可能であり仕様も安定していますが、廃止予定であり使用が推奨されません。 あるバージョンのGroongaがサポートする二つのコマンドバージョンのうち、いずれか一つが必ずstableの位置づけとなります。残りの一つは、developないしdeprecatedとなります。 たとえば下記のようにGroongaのサポートするコマンドバージョンは推移します。: groonga1.1: command-version1=stable command-version2=develop groonga1.2: command-version1=deprecated command-version2=stable groonga1.3: command-version2=stable command-version3=develop groonga1.4: command-version2=deprecated command-version3=stable groonga1.5: command-version3=stable command-version4=develop あるコマンドバージョンははじめにdevelop扱いとしてリリースされ、やがてstableに移行します。 その後二世代経過するとそのコマンドバージョンはdeprecated扱いとなります。さらに次のコマンドバージョンがリリースされると、deprecatedだったコマンドバージョンはサポート対象外となります。 default-commnad-versionパラメータやcommand_versionパラメータを指定せずにgroongaコマンドを実行した際には、その時点でstableであるコマンドバージョンが指定されたものとみなします。 groongaプロセス起動時に、default-command-versionパラメータにstable扱いでないコマンドバージョンを指定した場合には、警告メッセージがログファイルに出力されます。また、サポート範囲外のコマンドバージョンを指定した場合にはエラーとなり、プロセスは速やかに停止します。 コマンドバージョンの指定方法 コマンドバージョンの指定方法はgroonga実行モジュールの引数として指定する方法と各コマンドの引数として指定する方法があります。 default-command-versionパラメータ groonga実行モジュールの引数としてdefault-command-versionパラメータを指定できます。 (configファイルの中に指定することも可能です) 実行例: groonga --default-command-version 1 そのプロセスで実行するすべてのコマンドについて、デフォルトのコマンドバージョンとして指定されたバージョンを使用します。指定されたコマンドバージョンがstableであった場合にはなんのメッセージも表示されずそのまま起動します。指定されたコマンドバージョンがdevelopあるいはdeprecatedであった場合には、groonga.logファイルに警告メッセージを出力します。指定されたコマンドバージョンがサポート対象外であった場合には標準エラー出力にエラーメッセージを出力し、プロセスは速やかに終了します。 command_versionパラメータ select,loadなどのすべてのgroongaコマンドにcommand_versionが指定できます。 実行例: select --command_version 1 --table tablename 指定されたコマンドバージョンでコマンドを実行します。指定されたコマンドバージョンがサポート対象外であった場合にはエラーが返されます。command-versionが指定されなかった場合は、当該プロセス起動時にdefault-command-versionに指定した値が指定されたものとみなします。 Output format Summary Commands output their result as JSON, MessagePack, XML or TSV format. JSON and MessagePack output have the same structure. XML and TSV are their original structure. JSON or MessagePack is recommend format. XML is useful for visual result check. TSV is just for special use. Normally you doesn't need to use TSV. JSON and MessagePack This secsion describes the structure of command result on JSON and MessagePack format. JSON is used to show structure because MessagePack is binary format. Binary format isn't proper for documenataion. JSON and MessagePack uses the following structure: [HEADER, BODY] For example: [ [ 0, 1337566253.89858, 0.000355720520019531 ], [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "_key", "ShortText" ], [ "content", "Text" ], [ "n_likes", "UInt32" ] ], [ 2, "Groonga", "I started to use groonga. It's very fast!", 10 ] ] ] ] In the example, the following part is HEADER: [ 0, 1337566253.89858, 0.000355720520019531 ] The following part is BODY: [ [ [ 1 ], [ [ "_id", "UInt32" ], [ "_key", "ShortText" ], [ "content", "Text" ], [ "n_likes", "UInt32" ] ], [ 2, "Groonga", "I started to use groonga. It's very fast!", 10 ] ] ] HEADER HEADER is an array. The content of HEADER has some patterns. Success case HEADER has three elements on success: [0, UNIX_TIME_WHEN_COMMAND_IS_STARTED, ELAPSED_TIME] The first element is always 0. UNIX_TIME_WHEN_COMMAND_IS_STARTED is the number of seconds since 1970-01-01 00:00:00 UTC when the command is started processing. ELAPSED_TIME is the elapsed time for processing the command in seconds. Both UNIX_TIME_WHEN_COMMAND_IS_STARTED and ELAPSED_TIME are float value. The precision of them are nanosecond. Error case HEADER has four or five elements on error: [ RETURN_CODE, UNIX_TIME_WHEN_COMMAND_IS_STARTED, ELAPSED_TIME, ERROR_MESSAGE, ERROR_LOCATION ] ERROR_LOCATION may not be included in HEADER but other four elements are always included. RETURN_CODE is non 0 value. See return_code about available return codes. UNIX_TIME_WHEN_COMMAND_IS_STARTED and ELAPSED_TIME are the same as success case. ERROR_MESSAGE is an error message in string. ERROR_LOCATION is optional. If error location is collected, ERROR_LOCATION is included. ERROR_LOCATION is an array. ERROR_LOCATION has one ore two elements: [ LOCATION_IN_GROONGA, LOCATION_IN_INPUT ] LOCATION_IN_GROONGA is the source location that error is occurred in groonga. It is useful for groonga developers but not useful for users. LOCATION_IN_GROONGA is an array. LOCATION_IN_GROONGA has three elements: [ FUNCTION_NAME, SOURCE_FILE_NAME, LINE_NUMBER ] FUNCTION_NAME is the name of function that error is occurred. SOURCE_FILE_NAME is the name of groonga's source file that error is occurred. LINE_NUMBER is the line number of SOURCE_FILE_NAME that error is occurred. LOCATION_IN_INPUT is optional. LOCATION_IN_INPUT is included when the location that error is occurred in input file is collected. Input file can be specified by --file command line option for groonga command. LOCATION_IN_GROONGA is an array. LOCATION_IN_GROONGA has three elements: [ INPUT_FILE_NAME, LINE_NUMBER, LINE_CONTENT ] INPUT_FILE_NAME is the input file name that error is occurred. LINE_NUMBER is the line number of INPUT_FILE_NAME that error is occurred. LINE_CONTENT is the content at LINE_NUMBER in INPUT_FILE_NAME. BODY BODY content depends on the executed command. It may be omitted. BODY may be an error message on error case. XML TODO TSV TODO See also • return_code describes about return code. Pretty print Summary New in version 5.1.0. Groonga supports pretty print when you choose JSON for output_format. Usage Just specify yes to output_pretty parameter: > status --output_pretty yes [ [ 0, 1448344438.43783, 5.29289245605469e-05 ], { "alloc_count": 233, "starttime": 1448344437, "start_time": 1448344437, "uptime": 1, "version": "5.0.9-135-g0763d91", "n_queries": 0, "cache_hit_rate": 0.0, "command_version": 1, "default_command_version": 1, "max_command_version": 2 } ] Here is a result without output_pretty parameter: > status [[0,1448344438.43783,5.29289245605469e-05],{"alloc_count":233,"starttime":1448344437,...}] Request ID Summary New in version 4.0.9. You can assign ID to each request. The ID can be used by canceling the request. See also /reference/commands/request_cancel for details about canceling a request. Request ID should be managed by user. If you assign the same ID for some running requests, you can't cancel the request. The simplest ID sequence is incremented numbers such as 1, 2 , .... A request ID is a string. The maximum request ID size is 4096 byte. How to assign ID to request All commands accept request_id parameter. You can assign ID to request by adding request_id parameter. Here is an example to assign id-1 ID to a request: select Users --request_id id-1 See also • /reference/commands/request_cancel Return code Summary Return code is used to show whether a processing is succeeded or not. If the processing is not succeeded, return code shows error type. Return code is used in C API and query API. You can check return code via grn_ctx_t::rc in C API. You can check return code by looking the header element in query API. See output_format about the header element in query API. List Here is a list of return codes. GRN_SUCCESS (= 0) means that the processing is succeeded. Return codes that have negative value show error type. GRN_END_OF_DATA is a special return code. It is used only C API. It is not showen in query API. • 0: GRN_SUCCESS • 1: GRN_END_OF_DATA • -1: GRN_UNKNOWN_ERROR • -2: GRN_OPERATION_NOT_PERMITTED • -3: GRN_NO_SUCH_FILE_OR_DIRECTORY • -4: GRN_NO_SUCH_PROCESS • -5: GRN_INTERRUPTED_FUNCTION_CALL • -6: GRN_INPUT_OUTPUT_ERROR • -7: GRN_NO_SUCH_DEVICE_OR_ADDRESS • -8: GRN_ARG_LIST_TOO_LONG • -9: GRN_EXEC_FORMAT_ERROR • -10: GRN_BAD_FILE_DESCRIPTOR • -11: GRN_NO_CHILD_PROCESSES • -12: GRN_RESOURCE_TEMPORARILY_UNAVAILABLE • -13: GRN_NOT_ENOUGH_SPACE • -14: GRN_PERMISSION_DENIED • -15: GRN_BAD_ADDRESS • -16: GRN_RESOURCE_BUSY • -17: GRN_FILE_EXISTS • -18: GRN_IMPROPER_LINK • -19: GRN_NO_SUCH_DEVICE • -20: GRN_NOT_A_DIRECTORY • -21: GRN_IS_A_DIRECTORY • -22: GRN_INVALID_ARGUMENT • -23: GRN_TOO_MANY_OPEN_FILES_IN_SYSTEM • -24: GRN_TOO_MANY_OPEN_FILES • -25: GRN_INAPPROPRIATE_I_O_CONTROL_OPERATION • -26: GRN_FILE_TOO_LARGE • -27: GRN_NO_SPACE_LEFT_ON_DEVICE • -28: GRN_INVALID_SEEK • -29: GRN_READ_ONLY_FILE_SYSTEM • -30: GRN_TOO_MANY_LINKS • -31: GRN_BROKEN_PIPE • -32: GRN_DOMAIN_ERROR • -33: GRN_RESULT_TOO_LARGE • -34: GRN_RESOURCE_DEADLOCK_AVOIDED • -35: GRN_NO_MEMORY_AVAILABLE • -36: GRN_FILENAME_TOO_LONG • -37: GRN_NO_LOCKS_AVAILABLE • -38: GRN_FUNCTION_NOT_IMPLEMENTED • -39: GRN_DIRECTORY_NOT_EMPTY • -40: GRN_ILLEGAL_BYTE_SEQUENCE • -41: GRN_SOCKET_NOT_INITIALIZED • -42: GRN_OPERATION_WOULD_BLOCK • -43: GRN_ADDRESS_IS_NOT_AVAILABLE • -44: GRN_NETWORK_IS_DOWN • -45: GRN_NO_BUFFER • -46: GRN_SOCKET_IS_ALREADY_CONNECTED • -47: GRN_SOCKET_IS_NOT_CONNECTED • -48: GRN_SOCKET_IS_ALREADY_SHUTDOWNED • -49: GRN_OPERATION_TIMEOUT • -50: GRN_CONNECTION_REFUSED • -51: GRN_RANGE_ERROR • -52: GRN_TOKENIZER_ERROR • -53: GRN_FILE_CORRUPT • -54: GRN_INVALID_FORMAT • -55: GRN_OBJECT_CORRUPT • -56: GRN_TOO_MANY_SYMBOLIC_LINKS • -57: GRN_NOT_SOCKET • -58: GRN_OPERATION_NOT_SUPPORTED • -59: GRN_ADDRESS_IS_IN_USE • -60: GRN_ZLIB_ERROR • -61: GRN_LZO_ERROR • -62: GRN_STACK_OVER_FLOW • -63: GRN_SYNTAX_ERROR • -64: GRN_RETRY_MAX • -65: GRN_INCOMPATIBLE_FILE_FORMAT • -66: GRN_UPDATE_NOT_ALLOWED • -67: GRN_TOO_SMALL_OFFSET • -68: GRN_TOO_LARGE_OFFSET • -69: GRN_TOO_SMALL_LIMIT • -70: GRN_CAS_ERROR • -71: GRN_UNSUPPORTED_COMMAND_VERSION See also • output_format shows where return code is appeared in query API response. • /spec/gqtp: GQTP protocol also uses return code as status but it uses 2byte unsigned integer. So return codes that have negative value are statuses that have positive value in GQTP protocol. You can convert status value in GQTP protocol to return code by handling it as 2byte signed integer. cache_limit Summary cache_limit gets or sets the max number of query cache entries. Query cache is used only by select command. If the max number of query cache entries is 100, the recent 100 select commands are only cached. The cache expire algorithm is LRU (least recently used). Syntax This command takes only one optional parameter: cache_limit [max=null] Usage You can get the current max number of cache entries by executing cache_limit without parameter. Execution example: cache_limit # [[0, 1337566253.89858, 0.000355720520019531], 100] You can set the max number of cache entries by executing cache_limit with max parameter. Here is an example that sets 10 as the max number of cache entries. Execution example: cache_limit 10 # [[0, 1337566253.89858, 0.000355720520019531], 100] cache_limit # [[0, 1337566253.89858, 0.000355720520019531], 10] If max parameter is used, the return value is the max number of cache entries before max parameter is set. Parameters This section describes all parameters. max Specifies the max number of query cache entries as a number. If max parameter isn't specified, the current max number of query cache entries isn't changed. cache_limit just returns the current max number of query cache entries. Return value cache_limit returns the current max number of query cache entries: [HEADER, N_ENTRIES] HEADER See /reference/command/output_format about HEADER. N_ENTRIES N_ENTRIES is the current max number of query cache entries. It is a number. See also • select check Summary check - オブジェクトの状態表示 Groonga組込コマンドの一つであるcheckについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 checkコマンドは、groongaプロセス内の指定したオブジェクトの状態を表示します。主にデータベースが壊れた場合など異常時の問題解決のために使用することを想定しています。デバッグ用のため、返値のフォーマットが安定しているということは保証されません。(フォーマットが変更される可能性が高い) Syntax check obj Usage テーブルTermsのインデックスカラムnameの状態を表示します。: check Terms.name [{"flags":"00008202", "max sid":1, "number of garbage segments":0, "number of array segments":1, "max id of array segment":1, "number of buffer segments":110, "max id of buffer segment":111, "max id of physical segment in use":111, "number of unmanaged segments":4294967185, "total chunk size":7470239, "max id of chunk segments in use":127, "number of garbage chunk":[0,0,0,0,0,0,0,0,2,2,0,0,0,0,0]}, {"buffer id":0, "chunk size":94392, "buffer term":["596","59777","6",...], "buffer free":152944, "size in buffer":7361, "nterms":237, "nterms with chunk":216, "buffer id":1, "chunk size":71236, "buffer term":[["に述",18149,18149,2,25,6,6], ["に追",4505,4505,76,485,136,174], ["に退",26568,26568,2,9,2,2], ...], "buffer free":120000, "size in buffer":11155, "nterms":121, "nterms with chunk":116}, {"buffer id":1, ...}, ...] Parameters obj 状態を表示するオブジェクトの名前を指定します。 Return value チェックするオブジェクトにより返される値が変わります。 インデックスカラムの場合: 下記のような配列が出力されます。 [インデックスの状態, バッファの状態1, バッファの状態2, ...] インデックスの状態 には下記の項目がハッシュ形式で出力されます。 flags 指定されているフラグ値です。16進数で表現されています。 max sid セグメントのうち最も大きなIDです。 number of garbage segments ゴミセグメントの数です。 number of array segments 配列セグメントの数です。 max id of array segment 配列セグメントのうち最も大きなIDです。 number of buffer segments バッファセグメントの数です。 max id of buffer segment バッファセグメントのうち最も大きなIDです。 max id of physical segment in use 使用中の論理セグメントのうち最も大きなIDです。 number of unmanaged segments 管理されていないセグメントの数です。 total chunk size チャンクサイズの合計です。 max id of chunk segments in use 使用中のチャンクセグメントのうち最も大きなIDです。 number of garbage chunk 各チャンク毎のゴミの数です。 バッファの状態 には下記の項目がハッシュ形式で出力されます。 buffer id バッファIDです。 chunk size チャンクのサイズです。 buffer term バッファ内にある語の一覧です。各語の状態は以下のような配列となっています。 [語, バッファに登録されている語のID, 用語集に登録されている語のID, バッファ内でのサイズ, チャンク内でのサイズ] buffer free バッファの空き容量です。 size in buffer バッファの使用量です。 nterms バッファ内にある語の数です。 nterms with chunk バッファ内にある語のうち、チャンクを使っている語の数です。 clearlock Summary Deprecated since version 4.0.9: Use lock_clear instead. clearlock - オブジェクトにセットされたロックを解除する Groonga組込コマンドの一つであるclearlockについて説明します。組込コマンドは、groonga実行ファイルの引数、標準>入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 clearlockは、対象となるオブジェクト(データベース,テーブル,インデックス等)を指定し、オブジェクトにかけられた>ロックを再帰的に解除します。 Syntax clearlock objname Usage 開いているデータベースのロックをすべて解除する: clearlock [true] テーブル名 Entry のカラム body のロックを解除する: clearlock Entry.body [true] Parameters objname 対象となるオブジェクト名を指定します。空の場合、開いているdbオブジェクトが対象となります。 Return value [成功かどうかのフラグ] 成功かどうかのフラグ エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。 See also load column_copy Summary New in version 5.0.7. column_copy copies all column values to other column. You can implement the following features with this command: • Changing column configuration • Changing table configuration You can change column configuration by the following steps: 1. Create a new column with new configuration 2. Copy all values from the current column to the new column 3. Remove the current column 4. Rename the new column to the current column You can change table configuration by the following steps: 1. Create a new table with new configuration 2. Create all same columns to the new table 3. Copy all column values from the current table to the new table 4. Remove the current table 5. Rename the new table to the current table Concrete examples are showed later. You can't copy column values from a TABLE_NO_KEY table to another table. And you can't copy column values to a TABLE_NO_KEY table from another table. Because Groonga can't map records without record key. You can copy column values from a TABLE_NO_KEY table to the same TABLE_NO_KEY table. You can copy column values from a TABLE_HASH_KEY / TABLE_PAT_KEY / TABLE_DAT_KEY table to the same or another TABLE_HASH_KEY / TABLE_PAT_KEY / TABLE_DAT_KEY table. Syntax This command takes four parameters. All parameters are required: column_copy from_table from_name to_table to_name Usage Here are use cases of this command: • Changing column configuration • Changing table configuration How to change column configuration You can change column value type. For example, you can change UInt32 column value to ShortText column value. You can change column type. For example, you can change COLUMN_SCALAR column to COLUMN_VECTOR column. You can move a column to other table. For example, you can move high_score column to Users table from Players table. Here are basic steps to change column configuration: 1. Create a new column with new configuration 2. Copy all values from the current column to the new column 3. Remove the current column 4. Rename the new column to the current column Here is an example to change column value type to Int32 from ShortText. Here are schema and data: Execution example: table_create Logs TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs serial COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Logs [ {"_key": "log1", "serial": 1} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] The following commands change Logs.serial column value type to ShortText from Int32: Execution example: column_create Logs new_serial COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_copy Logs serial Logs new_serial # [[0, 1337566253.89858, 0.000355720520019531], true] column_remove Logs serial # [[0, 1337566253.89858, 0.000355720520019531], true] column_rename Logs new_serial serial # [[0, 1337566253.89858, 0.000355720520019531], true] select Logs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "serial", # "ShortText" # ] # ], # [ # 1, # "log1", # "1" # ] # ] # ] # ] You can find Logs.serial stores ShortText value from the response of select. Here is an example to change column type to COLUMN_VECTOR from COLUMN_SCALAR. Here are schema and data: Execution example: table_create Entries TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries tag COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"_key": "entry1", "tag": "Groonga"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] The following commands change Entries.tag column to COLUMN_VECTOR from COLUMN_SCALAR: Execution example: column_create Entries new_tag COLUMN_VECTOR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_copy Entries tag Entries new_tag # [[0, 1337566253.89858, 0.000355720520019531], true] column_remove Entries tag # [[0, 1337566253.89858, 0.000355720520019531], true] column_rename Entries new_tag tag # [[0, 1337566253.89858, 0.000355720520019531], true] select Entries # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "entry1", # [ # "Groonga" # ] # ] # ] # ] # ] You can find Entries.tag stores COLUMN_VECTOR value from the response of select. Here is an example to move high_score column to Users table from Players table. Here are schema and data: Execution example: table_create Players TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Players high_score COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Players [ {"_key": "player1", "high_score": 100} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] The following commands move high_score column to Users table from Players table: Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users high_score COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_copy Players high_score Users high_score # [[0, 1337566253.89858, 0.000355720520019531], true] column_remove Players high_score # [[0, 1337566253.89858, 0.000355720520019531], true] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "high_score", # "Int32" # ] # ], # [ # 1, # "player1", # 100 # ] # ] # ] # ] You can find Users.high_score is moved from Players.high_score from the response of select. How to change table configuration You can change table key type. For example, you can change key type to ShortText from Int32. You can change table type. For example, you can change TABLE_HASH_KEY table to TABLE_PAT_KEY table. You can also change other options such as default tokenizer and normalizer. For example, you can change default tokenizer to TokenBigramSplitSymbolAlphaDigit from TokenBigrm. NOTE: You can't change TABLE_NO_KEY table. Because TABLE_NO_KEY doesn't have record key. Groonga can't identify copy destination record without record key. Here are basic steps to change table configuration: 1. Create a new table with new configuration 2. Create all same columns to the new table 3. Copy all column values from the current table to the new table 4. Remove the current table 5. Rename the new table to the current table Here is an example to change table key type to ShortText from Int32. Here are schema and data: Execution example: table_create IDs TABLE_HASH_KEY Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create IDs label COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create IDs used COLUMN_SCALAR Bool # [[0, 1337566253.89858, 0.000355720520019531], true] load --table IDs [ {"_key": 100, "label": "ID 100", used: true} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] The following commands change IDs table key type to ShortText from Int32: Execution example: table_create NewIDs TABLE_HASH_KEY Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create NewIDs label COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create NewIDs used COLUMN_SCALAR Bool # [[0, 1337566253.89858, 0.000355720520019531], true] column_copy IDs label NewIDs label # [[0, 1337566253.89858, 0.000355720520019531], true] column_copy IDs used NewIDs used # [[0, 1337566253.89858, 0.000355720520019531], true] table_remove IDs # [[0, 1337566253.89858, 0.000355720520019531], true] table_rename NewIDs IDs # [[0, 1337566253.89858, 0.000355720520019531], true] select IDs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "Int32" # ], # [ # "label", # "ShortText" # ], # [ # "used", # "Bool" # ] # ], # [ # 1, # 100, # "ID 100", # true # ] # ] # ] # ] You can find IDs stores ShortText key from the response of select. Here is an example to change table type to TABLE_PAT_KEY from TABLE_HASH_KEY. Here are schema and data: Execution example: table_create Names TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Names used COLUMN_SCALAR Bool # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"_key": "alice", "used": false} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] The following commands change Names table to TABLE_PAT_KEY from TABLE_HASH_KEY: Execution example: table_create NewNames TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create NewNames used COLUMN_SCALAR Bool # [[0, 1337566253.89858, 0.000355720520019531], true] column_copy Names used NewNames used # [[0, 1337566253.89858, 0.000355720520019531], true] table_remove Names # [[0, 1337566253.89858, 0.000355720520019531], true] table_rename NewNames Names # [[0, 1337566253.89858, 0.000355720520019531], true] select Names --filter '_key @^ "ali"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 0 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "used", # "Bool" # ] # ] # ] # ] # ] You can find Names is a TABLE_PAT_KEY because select can use script-syntax-prefix-search-operator. You can't use script-syntax-prefix-search-operator with TABLE_HASH_KEY. Parameters This section describes parameters. Required parameters All parameters are required. from_table Specifies the table name of source column. You can specify any table including TABLE_NO_KEY table. If you specify TABLE_NO_KEY table, to_table must be the same table. Here is an example to use from_table. Here are schema and data: Execution example: table_create FromTable TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create FromTable from_column COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create FromTable to_column COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table FromTable [ {"_key": "key1", "from_column": "value1"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] select FromTable --output_columns _key,from_column,to_column # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "from_column", # "ShortText" # ], # [ # "to_column", # "ShortText" # ] # ], # [ # "key1", # "value1", # "" # ] # ] # ] # ] You can copy all values to to_column from from_column: Execution example: column_copy FromTable from_column FromTable to_column # [[0, 1337566253.89858, 0.000355720520019531], true] select FromTable --output_columns _key,from_column,to_column # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "from_column", # "ShortText" # ], # [ # "to_column", # "ShortText" # ] # ], # [ # "key1", # "value1", # "value1" # ] # ] # ] # ] from_name Specifies the column name to be copied values. See from_table for example. to_table Specifies the table name of destination column. You can specify the same table name as from_table when you want to copy column values in the same table. You can't specify TABLE_NO_KEY table to to_table because Groonga can't identify destination records without record key. There is one exception. If you specify the same name as from_table to to_table, you can use TABLE_NO_KEY table as to_table. Because Groonga can identify destination records when source table and destination table is the same table. Here is an example to use to_table. Here are schema and data: Execution example: table_create Table TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Table column COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create ToTable TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create ToTable to_column COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Table [ {"_key": "key1", "column": "value1"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] You can copy all values to ToTable.to_column from Table.column: Execution example: column_copy Table column ToTable to_column # [[0, 1337566253.89858, 0.000355720520019531], true] select ToTable --output_columns _key,to_column # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "to_column", # "ShortText" # ] # ], # [ # "key1", # "value1" # ] # ] # ] # ] to_name Specifies the destination column name. See to_table for example. Optional parameters There is no optional parameter. Return value The command returns true as body on success such as: [HEADER, true] If the command fails, error details are in HEADER. See /reference/command/output_format for HEADER. column_create Summary column_create - カラムの追加 Groonga組込コマンドの一つであるcolumn_createについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 column_createは、使用しているデータベースのテーブルに対してカラムを追加します。 Syntax column_create table name flags type [source] Usage テーブルEntryに、ShortText型の値を格納するカラム、bodyを作成します。: column_create Entry body --type ShortText [true] テーブルTermに、Entryテーブルのbodyカラムの値を対象とする完全転置インデックス型カラム、entry_bodyを作成します。: column_create Term entry_body COLUMN_INDEX|WITH_POSITION Entry body [true] Parameters table カラムを追加するテーブルの名前を指定します。 name 作成するカラムの名前を指定します。カラム名は、テーブルの中で一意でなければなりません。 ピリオド('.'), コロン(':')を含む名前のカラムは作成できません。また、アンダースコア('_')で始まる名前は予約済みであり、使用できません。 flags カラムの属性を表す以下の数値か、パイプ('|')で組み合わせたシンボル名を指定します。 0, COLUMN_SCALAR 単一の値が格納できるカラムを作成します。 1, COLUMN_VECTOR 複数の値の配列を格納できるカラムを作成します。 2, COLUMN_INDEX インデックス型のカラムを作成します。 There are two flags to compress the value of column, but you can't specify these flags for now because there are memory leaks issue GitHub#6 when refers the value of column. This issue occurs both of them (zlib and lzo). 16, COMPRESS_ZLIB Compress the value of column by using zlib. This flag is enabled when you build Groonga with --with-zlib. 32, COMPRESS_LZO Compress the value of column by using lzo. This flag is enabled when you build Groonga with --with-lzo. インデックス型のカラムについては、flagsの値に以下の値を加えることによって、追加の属 性を指定することができます。 128, WITH_SECTION 段落情報を格納するインデックスを作成します。 256, WITH_WEIGHT ウェイト情報を格納するインデックスを作成します。 512, WITH_POSITION 位置情報を格納するインデックス(完全転置インデックス)を作成します。 type 値の型を指定します。Groongaの組込型か、同一データベースに定義済みのユーザ定義型、定義済みのテーブルを指定することができます。 source インデックス型のカラムを作成した場合は、インデックス対象となるカラムをsource引数に指定します。 Return value [HEADER, SUCCEEDED] HEADER See /reference/command/output_format about HEADER. SUCCEEDED If command is succeeded, it returns true on success, false otherwise. column_list Summary column_list command lists columns in a table. Syntax This command takes only one required parameter: column_list table Usage Here is a simple example of column_list command. Execution example: table_create Users TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users age COLUMN_SCALAR UInt8 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users tags COLUMN_VECTOR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_list Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # "id", # "UInt32" # ], # [ # "name", # "ShortText" # ], # [ # "path", # "ShortText" # ], # [ # "type", # "ShortText" # ], # [ # "flags", # "ShortText" # ], # [ # "domain", # "ShortText" # ], # [ # "range", # "ShortText" # ], # [ # "source", # "ShortText" # ] # ], # [ # 256, # "_key", # "", # "", # "COLUMN_SCALAR", # "Users", # "ShortText", # [] # ], # [ # 257, # "age", # "/tmp/groonga-databases/commands_column_list.0000101", # "fix", # "COLUMN_SCALAR|PERSISTENT", # "Users", # "UInt8", # [] # ], # [ # 258, # "tags", # "/tmp/groonga-databases/commands_column_list.0000102", # "var", # "COLUMN_VECTOR|PERSISTENT", # "Users", # "ShortText", # [] # ] # ] # ] Parameters This section describes parameters of column_list. Required parameters All parameters are required. table Specifies the name of table to be listed columns. Return value column_list returns the list of column information in the table: [ HEADER, [ COLUMN_LIST_HEADER, COLUMN_INFORMATION1, COLUMN_INFORMATION2, ... ] ] HEADER See /reference/command/output_format about HEADER. COLUMN_LIST_HEADER COLUMN_LIST_HEADER describes about content of each COLUMN_INFORMATION. COLUMN_LIST_HEADER is the following format: [ ["id", "UInt32"], ["name", "ShortText"], ["path", "ShortText"], ["type", "ShortText"], ["flags", "ShortText"], ["domain", "ShortText"], ["range", "ShortText"], ["source", "ShortText"] ] It means the following: • The first content in COLUMN_INFORMATION is id value and the value type is UInt32. • The second content in COLUMN_INFORMATION is name value and the value type is ShortText. • The third content .... See the following COLUMN_INFORMATION description for details. This field provides meta-data of column information. So this field will be useful for programs rather than humans. COLUMN_INFORMATION Each COLUMN_INFORMATION is the following format: [ ID, NAME, PATH, TYPE, FLAGS, DOMAIN, RANGE, SOURCES ] ID The column ID in the Groonga database. Normally, you don't care about it. NAME The column name. PATH The path for storing column data. TYPE The type of the column. It is one of the followings: ┌──────┬──────────────────────────────────┐ │Value │ Description │ ├──────┼──────────────────────────────────┤ │fix │ The column is a fixed size │ │ │ column. Scalar column that its │ │ │ type is fixed size type is fixed │ │ │ size column. │ ├──────┼──────────────────────────────────┤ │var │ The column is a variable size │ │ │ column. Vector column or scalar │ │ │ column that its type is variable │ │ │ size type are variable size │ │ │ column. │ ├──────┼──────────────────────────────────┤ │index │ The column is an index column. │ └──────┴──────────────────────────────────┘ FLAGS The flags of the column. Each flag is separated by | like COLUMN_VECTOR|WITH_WEIGHT. FLAGS must include one of COLUMN_SCALAR, COLUMN_VECTOR or COLUMN_INDEX. Other flags are optional. Here is the available flags: ┌──────────────┬──────────────────────────────────┐ │Flag │ Description │ ├──────────────┼──────────────────────────────────┤ │COLUMN_SCALAR │ The column is a scalar column. │ ├──────────────┼──────────────────────────────────┤ │COLUMN_VECTOR │ The column is a vector column. │ ├──────────────┼──────────────────────────────────┤ │COLUMN_INDEX │ The column is an index column. │ ├──────────────┼──────────────────────────────────┤ │WITH_WEIGHT │ The column can have weight. │ │ │ COLUMN_VECTOR and COLUMN_INDEX │ │ │ may have it. COLUMN_SCALAR │ │ │ doesn't have it. │ ├──────────────┼──────────────────────────────────┤ │WITH_SECTION │ The column can have section │ │ │ information. COLUMN_INDEX may │ │ │ have it. COLUMN_SCALAR and │ │ │ COLUMN_VECTOR don't have it. │ │ │ │ │ │ Multiple column index has it. │ ├──────────────┼──────────────────────────────────┤ │WITH_POSITION │ The column can have position │ │ │ information. COLUMN_INDEX may │ │ │ have it. COLUMN_SCALAR and │ │ │ COLUMN_VECTOR don't have it. │ │ │ │ │ │ Full text search index must has │ │ │ it. │ └──────────────┴──────────────────────────────────┘ │PERSISTENT │ The column is a persistent │ │ │ column. It means that the column │ │ │ isn't a │ │ │ /reference/columns/pseudo. │ └──────────────┴──────────────────────────────────┘ DOMAIN The name of table that has the column. RANGE The value type name of the column. It is a type name or a table name. SOURCES An array of the source column names of the index. If the index column is multiple column index, the array has two or more source column names. It is always an empty array for COLUMN_SCALAR and COLUMN_VECTOR. See also • /reference/commands/column_create • /reference/column column_remove Summary column_remove - テーブルに定義されているカラムの削除 Groonga組込コマンドの一つであるcolumn_removeについて説明します。組込コマンドは、groonga実行ファイルの引数、>標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 column_removeはテーブルに定義されているカラムを削除します。 また、付随するインデックスも削除されます。[1] Syntax column_remove table name Usage column_remove Entry body [true] 脚注 [1] マルチセクションインデックスの一部である場合も、インデックスが削除されます。 Parameters table 削除対象のカラムが定義されているテーブルの名前を指定します。 name 削除対象のカラム名を指定します。 Return value [成功かどうかのフラグ] 成功かどうかのフラグ エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。 column_rename Summary column_rename command renames a column. It is a light operation. It just changes a relationship between name and the column object. It doesn't copy column values. It is a dangerous operation. You must stop all operations including read operations while you run column_rename. If the following case is occurred, Groonga process may be crashed: • Starts an operation (like select) that accesses the column to be renamed by the current column name. The current column name is called as the old column name in the below because the column name is renamed. • Runs column_rename. The select is still running. • The select accesses the column to be renamed by the old column name. But the select can't find the column by the old name because the column has been renamed to the new column name. It may crash the Groonga process. Syntax This command takes three parameters. All parameters are required: column_rename table name new_name Usage Here is a simple example of column_rename command. Execution example: table_create Users TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users score COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "Alice", "score": 2}, {"_key": "Bob", "score": 0}, {"_key": "Carlos", "score": -1} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] column_rename Users score point # [[0, 1337566253.89858, 0.000355720520019531], true] column_list Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # "id", # "UInt32" # ], # [ # "name", # "ShortText" # ], # [ # "path", # "ShortText" # ], # [ # "type", # "ShortText" # ], # [ # "flags", # "ShortText" # ], # [ # "domain", # "ShortText" # ], # [ # "range", # "ShortText" # ], # [ # "source", # "ShortText" # ] # ], # [ # 256, # "_key", # "", # "", # "COLUMN_SCALAR", # "Users", # "ShortText", # [] # ], # [ # 257, # "point", # "/tmp/groonga-databases/commands_column_rename.0000101", # "fix", # "COLUMN_SCALAR|PERSISTENT", # "Users", # "Int32", # [] # ] # ] # ] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "point", # "Int32" # ] # ], # [ # 1, # "Alice", # 2 # ], # [ # 2, # "Bob", # 0 # ], # [ # 3, # "Carlos", # -1 # ] # ] # ] # ] Parameters This section describes parameters of column_rename. Required parameters All parameters are required. table Specifies the name of table that has the column to be renamed. name Specifies the column name to be renamed. new_name Specifies the new column name. Return value [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT It is true on success, false otherwise. config_delete Summary New in version 5.1.2. config_delete command deletes the specified configuration item. Syntax This command takes only one required parameter: config_delete key Usage Here is an example to delete alias.column configuration item: Execution example: config_set alias.column Aliases.real_name # [[0, 1337566253.89858, 0.000355720520019531], true] config_get alias.column # [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"] config_delete alias.column # [[0, 1337566253.89858, 0.000355720520019531], true] config_get alias.column # [[0, 1337566253.89858, 0.000355720520019531], ""] Here is an example to delete nonexistent configuration item: Execution example: config_delete nonexistent # [ # [ # -22, # 1337566253.89858, # 0.000355720520019531, # "[config][delete] failed to delete", # [ # [ # "grn_config_delete", # "config.c", # 166 # ] # ] # ], # false # ] config_delete returns an error when you try to delete nonexistent configuration item. Parameters This section describes all parameters. Required parameters There is one required parameter. key Specifies the key of target configuration item. The max key size is 4KiB. You can't use an empty string as key. Optional parameters There is no optional parameter. Return value config_delete command returns whether deleting a configuration item is succeeded or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. See also • /reference/configuration • config_get • config_set config_get Summary New in version 5.1.2. config_get command returns the value of the specified configuration item. Syntax This command takes only one required parameter: config_get key Usage Here is an example to set a value to alias.column configuration item and get the value: Execution example: config_set alias.column Aliases.real_name # [[0, 1337566253.89858, 0.000355720520019531], true] config_get alias.column # [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"] Here is an example to get nonexistent configuration item value: Execution example: config_get nonexistent # [[0, 1337566253.89858, 0.000355720520019531], ""] config_get returns an empty string for nonexistent configuration item key. Parameters This section describes all parameters. Required parameters There is one required parameter. key Specifies the key of target configuration item. The max key size is 4KiB. You can't use an empty string as key. Optional parameters There is no optional parameter. Return value config_get command returns the value of the specified configuration item: [HEADER, VALUE] HEADER See /reference/command/output_format about HEADER. VALUE VALUE is the value of the configuration item specified by key. It's a string. See also • /reference/configuration • config_set • config_delete config_set Summary New in version 5.1.2. config_set command sets a value to the specified configuration item. Syntax This command takes two required parameters: config_set key value Usage Here is an example to set a value to alias.column configuration item and confirm the set value: Execution example: config_set alias.column Aliases.real_name # [[0, 1337566253.89858, 0.000355720520019531], true] config_get alias.column # [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"] Parameters This section describes all parameters. Required parameters There are required parameters. key Specifies the key of target configuration item. The max key size is 4KiB. You can't use an empty string as key. value Specifies the value of the target configuration item specified by key. The max value size is 4091B (= 4KiB - 5B). Optional parameters There is no optional parameter. Return value config_set command returns whether setting a configuration item value is succeeded or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. See also • /reference/configuration • config_get • config_delete database_unmap Summary New in version 5.0.7. database_unmap unmaps already mapped tables and columns in the database. "Map" means that loading from disk to memory. "Unmap" means that releasing mapped memory. NOTE: Normally, you don't need to use database_unmap because OS manages memory cleverly. If remained system memory is reduced, OS moves memory used by Groonga to disk until Groonga needs the memory. OS moves unused memory preferentially. CAUTION: You can use this command only when thread_limit returns 1. It means that this command doesn't work with multithreading. Syntax This command takes no parameters: database_unmap Usage You can unmap database after you change the max number of threads to 1: Execution example: thread_limit --max 1 # [[0, 1337566253.89858, 0.000355720520019531], 2] database_unmap # [[0, 1337566253.89858, 0.000355720520019531], true] If the max number of threads is larger than 1, database_unmap fails: Execution example: thread_limit --max 2 # [[0, 1337566253.89858, 0.000355720520019531], 1] database_unmap # [ # [ # -2, # 1337566253.89858, # 0.000355720520019531, # "[database_unmap] the max number of threads must be 1: <2>", # [ # [ # "proc_database_unmap", # "proc.c", # 6931 # ] # ] # ], # false # ] Parameters This section describes all parameters. Required parameters There is no required parameter. Optional parameters There is no optional parameter. Return value The command returns true as body on success such as: [HEADER, true] If the command fails, error details are in HEADER. See /reference/command/output_format for HEADER. define_selector Summary define_selector - 検索コマンドを定義 Groonga組込コマンドの一つであるdefine_selectorについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 define_selectorは、検索条件をカスタマイズした新たな検索コマンドを定義します。 Syntax define_selector name table [match_columns [query [filter [scorer [sortby [output_columns [offset [limit [drilldown [drilldown_sortby [drilldown_output_columns [drilldown_offset [drilldown_limit]]]]]]]]]]]]] Usage テーブルEntryの全レコード・全カラムの値を出力するselectorコマンドを定義します。: define_selector entry_selector Entry [true] Parameters name 定義するselectorコマンドの名前を指定します。 table 検索対象のテーブルを指定します。 match_columns 追加するselectorコマンドのmatch_columns引数のデフォルト値を指定します。 query 追加するselectorコマンドのquery引数のデフォルト値を指定します。 filter 追加するselectorコマンドのfilter引数のデフォルト値を指定します。 scorer 追加するselectorコマンドのscorer引数のデフォルト値を指定します。 sortby 追加するselectorコマンドのsortby引数のデフォルト値を指定します。 output_columns 追加するselectorコマンドのoutput_columns引数のデフォルト値を指定します。 offset 追加するselectorコマンドのoffset引数のデフォルト値を指定します。 limit 追加するselectorコマンドのlimit引数のデフォルト値を指定します。 drilldown 追加するselectorコマンドのdrilldown引数のデフォルト値を指定します。 drilldown_sortby 追加するselectorコマンドのdrilldown_sortby引数のデフォルト値を指定します。 drilldown_output_columns 追加するselectorコマンドのdrilldown_output_columns引数のデフォルト値を指定します。 drilldown_offset 追加するselectorコマンドのdrilldown_offset引数のデフォルト値を指定します。 drilldown_limit 追加するselectorコマンドのdrilldown_limit引数のデフォルト値を指定します。 Return value [成功かどうかのフラグ] 成功かどうかのフラグ エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。 See also /reference/grn_expr defrag Summary defrag command resolves fragmentation of specified objects. Groonga組込コマンドの一つであるdefragについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力 、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 defragは、対象となるオブジェクト(データベースか可変長サイズカラム)を指定し、オブジェクトのフラグメンテーショ ンを解消します。 Syntax defrag objname threshold Usage 開いているデータベースのフラグメンテーションを解消する: defrag [300] テーブル名 Entry のカラム body のフラグメンテーションを解消する: defrag Entry.body [30] Parameters objname 対象となるオブジェクト名を指定します。空の場合、開いているdbオブジェクトが対象となります。 Return value [フラグメンテーション解消を実行したセグメントの数] フラグメンテーション解消を実行したセグメントの数 フラグメンテーション解消を実行したセグメントの数を返す。 delete Summary delete command deletes specified record of table. Cascade delete There is a case that multiple table is associated. For example, the key of one table are referenced by other table's records. In such a case, if you delete the key of one table, other table's records are also removed. Note that the type of other table's column is COLUMN_VECTOR, only the value of referencing key is removed from the vector value. Syntax delete table [key [id [filter]]] Usage Here are a schema definition and sample data to show usage. Delete the record from Entry table which has "2" as the key. Execution example: delete Entry 2 # [[0, 1337566253.89858, 0.000355720520019531], true] select Entry # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "UInt32" # ], # [ # "status", # "ShortText" # ] # ], # [ # 1, # 1, # "OK" # ] # ] # ] # ] Here is the example about cascaded delete. The country column of Users table associates with Country table. "Cascaded delete" removes the records which matches specified key and refers that key. Execution example: table_create Country TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Users TABLE_HASH_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users name COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users country COLUMN_SCALAR Country # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": 1, "name": "John", country: "United States"} {"_key": 2, "name": "Mike", country: "United States"} {"_key": 3, "name": "Takashi", country: "Japan"} {"_key": 4, "name": "Hanako", country: "Japan"} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] load --table Country [ {"_key": "United States"} {"_key": "Japan"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] delete Country "United States" # [[0, 1337566253.89858, 0.000355720520019531], true] select Country # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 2, # "Japan" # ] # ] # ] # ] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "UInt32" # ], # [ # "country", # "Country" # ], # [ # "name", # "ShortText" # ] # ], # [ # 1, # 1, # "", # "John" # ], # [ # 2, # 2, # "", # "Mike" # ], # [ # 3, # 3, # "Japan", # "Takashi" # ], # [ # 4, # 4, # "Japan", # "Hanako" # ] # ] # ] # ] Parameters table Specifies the name of table to delete the records. key Specifies the key of record to delete. If you use the table with TABLE_NO_KEY, the key is just ignored. (Use id parameter in such a case) id Specifies the id of record to delete. If you specify id parameter, you must not specify key parameter. filter Specifies the expression of grn_expr to identify the record. If you specify filter parameter, you must not specify key and id parameter. Return value [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. See also load dump Summary dump - データベースのスキーマとデータを出力する Groonga組込コマンドの一つであるdumpについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、 またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 dumpはデータベースのスキーマとデータを後から読み込めるフォーマットで出力します。dumpの結果は大きくなるため、 主にコマンドラインから使うことを想定しています。データベースのバックアップが主な利用方法です。 dumpが出力するフォーマットは直接Groongaが解釈できるフォーマットです。そのため、以下のようにしてデータベース>をコピーすることができます。: % groonga original/db dump > dump.grn % mkdir backup % groonga -n backup/db < dump.grn Syntax dump [tables] [dump_plugins] [dump_schema] [dump_records] [dump_indexes] Usage Here is the sample schema and data to check dump behaviour: plugin_register token_filters/stop_word table_create Bookmarks TABLE_HASH_KEY ShortText column_create Bookmarks title COLUMN_SCALAR ShortText table_create Lexicon TABLE_PAT_KEY ShortText table_create Sites TABLE_NO_KEY column_create Sites url COLUMN_SCALAR ShortText column_create Lexicon bookmark_title COLUMN_INDEX Bookmarks title load --table Bookmarks [ {"_key":"Groonga", "title":"Introduction to Groonga"}, {"_key":"Mroonga", "title":"Introduction to Mroonga"} ] load --table Sites [ {"_key": 1, "url":"http://groonga.org"}, {"_key": 2, "url":"http://mroonga.org"} ] Dump all data in database: > dump plugin_register token_filters/stop_word table_create Sites TABLE_NO_KEY column_create Sites url COLUMN_SCALAR ShortText table_create Bookmarks TABLE_HASH_KEY ShortText column_create Bookmarks title COLUMN_SCALAR ShortText table_create Lexicon TABLE_PAT_KEY ShortText load --table Sites [ ["_id","url"], [1,"http://groonga.org"], [2,"http://mroonga.org"] ] load --table Bookmarks [ ["_key","title"], ["Groonga","Introduction to Groonga"], ["Mroonga","Introduction to Mroonga"] ] create Lexicon bookmark_title COLUMN_INDEX Bookmarks title Dump schema and specific table data: > dump Bookmarks plugin_register token_filters/stop_word table_create Sites TABLE_NO_KEY column_create Sites url COLUMN_SCALAR ShortText table_create Bookmarks TABLE_HASH_KEY ShortText column_create Bookmarks title COLUMN_SCALAR ShortText table_create Lexicon TABLE_PAT_KEY ShortText load --table Bookmarks [ ["_key","title"], ["Groonga","Introduction to Groonga"], ["Mroonga","Introduction to Mroonga"] ] column_create Lexicon bookmark_title COLUMN_INDEX Bookmarks title Dump plugin only: > dump --dump_schema no --dump_records no --dump_indexes no plugin_register token_filters/stop_word Dump records only: > dump --dump_schema no --dump_plugins no --dump_indexes no load --table Sites [ ["_id","url"], [1,"http://groonga.org"], [2,"http://mroonga.org"] ] load --table Bookmarks [ ["_key","title"], ["Groonga","Introduction to Groonga"], ["Mroonga","Introduction to Mroonga"] ] Dump schema only: > dump --dump_records no --dump_plugins no --dump_indexes no table_create Sites TABLE_NO_KEY column_create Sites url COLUMN_SCALAR ShortText table_create Bookmarks TABLE_HASH_KEY ShortText column_create Bookmarks title COLUMN_SCALAR ShortText table_create Lexicon TABLE_PAT_KEY ShortText Parameters There are optional parameters. Optional parameters tables 出力対象のテーブルを「,」(カンマ)区切りで指定します。存在しないテーブルを指定した場合は無視されます。 dump_plugins New in version 5.0.3. You can customize the output whether it contains registered plugins or not. To exclude registered plugins from the output, specify no. The default value is yes. dump_schema New in version 5.0.3. You can customize the output whether it contains database schema or not. To exclude database schema from the output, specify no. The default value is yes. dump_records New in version 5.0.3. You can customize the output whether it contains records or not. To exclude records from the output, specify no. The default value is yes. dump_indexes New in version 5.0.3. You can customize the output whether it contains indexes or not. To exclude indexes from the output, specify no. The default value is yes. Return value データベースのスキーマとデータをGroongaの組み込みコマンド呼び出し形式で出力します。output_type指定は無視されます。 io_flush Summary NOTE: This command is an experimental feature. New in version 5.0.5. io_flush flushes all changes in memory to disk explicitly. Normally, you don't need to use io_flush explicitly. Because flushing is done automatically by OS. And flushing by OS is effective. You need to use io_flush explicitly when your system may often crash unexpectedly or you may not shutdown your Groonga process in a normal way. (For example, using shutdown is a normal shutdown process.) It's better that you use io_flush after you change your Groonga database for the case. Here are commands that change your Groonga database: • load • delete • truncate • table_create • table_remove • table_rename • column_create • column_remove • column_rename • plugin_register • plugin_unregister If you're using select-scorer parameter in select to change existing column values, select is added to the above list. Note that io_flush may be a heavy process. If there are many changes in memory, flushing them to disk is a heavy process. Syntax This command takes two parameters. All parameters are optional: io_flush [target_name=null] [recursive=yes] Usage You can flush all changes in memory to disk with no arguments: Execution example: io_flush # [[0, 1337566253.89858, 0.000355720520019531], true] If you know what is changed, you can narrow flush targets. Here is a correspondence table between command and flush targets. ┌─────────────────────────┬──────────────────────────┬────────────────────────────────────────────────────────────────────────────┐ │Command │ Flush targets │ io_flush arguments │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │load and delete │ Target table and its │ Table and its columns: │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │truncate │ Target table and its │ Table and its columns: │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │table_create │ Target table and │ Table: │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │table_remove and │ Database. │ Database: │ │table_rename │ │ │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │column_create │ Target column and │ Table: │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │column_remove and │ Database. │ Database: │ │column_rename │ │ │ ├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤ │plugin_register and │ Database. │ Database: │ │plugin_unregister │ │ │ └─────────────────────────┴──────────────────────────┴────────────────────────────────────────────────────────────────────────────┘ Parameters This section describes all parameters. Required parameters There is no required parameter. Optional parameters There are optional parameters. target_name Specifies a flush target object name. Target object is one of database, table or column. If you omit this parameter, database is flush target object: Execution example: io_flush # [[0, 1337566253.89858, 0.000355720520019531], true] If you specify table name, the table is flush target object: Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] io_flush --target_name Users # [[0, 1337566253.89858, 0.000355720520019531], true] If you specify column name, the column is flush target object: Execution example: column_create Users age COLUMN_SCALAR UInt8 # [[0, 1337566253.89858, 0.000355720520019531], true] io_flush --target_name Users.age # [[0, 1337566253.89858, 0.000355720520019531], true] recursive Specifies whether child objects of the flush target object are also flush target objects. Child objects of database is all tables and all columns. Child objects of table is all its columns. Child objects of column is nothing. recursive value must be yes or no. yes means that all of the specified flush target object and child objects are flush target objects. no means that only the specified flush target object is flush target object. The following io_flush flushes all changes in database, all tables and all columns: Execution example: io_flush --recursive yes # [[0, 1337566253.89858, 0.000355720520019531], true] The following io_flush flushes all changes only in database: Execution example: io_flush --recursive no # [[0, 1337566253.89858, 0.000355720520019531], true] If you specify other value (not yes neither no) or omit recursive parameter, yes is used. yes is used in the following case because invalid recursive argument is specified: Execution example: io_flush --recursive invalid # [[0, 1337566253.89858, 0.000355720520019531], true] yes is used in the following case because recursive parameter isn't specified: Execution example: io_flush # [[0, 1337566253.89858, 0.000355720520019531], true] Return value The command returns true as body on success such as: [HEADER, true] If the command fails, error details are in HEADER. See /reference/command/output_format for HEADER. load Summary load loads data as records in the current database and updates values of each columns. Syntax load values table [columns [ifexists [input_type]]] Parameters This section describes all parameters. values Specifies values loaded to records. Values should satisfy input_type format. If you specify "json" as input_type, you can choose a format from below: Format 1: [[COLUMN_NAME1, COLUMN_NAME2,..], [VALUE1, VALUE2,..], [VALUE1, VALUE2,..],..] Format 2: [{COLUMN_NAME1: VALUE1, COLUMN_NAME2: VALUE2}, {COLUMN_NAME1: VALUE1, COLUMN_NAME2: VALUE2},..] [COLUMN_NAME1, COLUMN_NAME2,..] format in Format 1 is effective only when columns parameter isn't specified. When a target table contains primary key, you must specify _key column (pseudo column associated primary key) as the one of COLUMN_NAME. If values isn't specified any values, they are read from the standard input until all opened parenthes match their closed ones. You don't have to enclose them with single-quotes or double-quotes, but if you specified values with values parameter, you should do. In following values, you also don't have to enclose any spaces (' ') with single-quotes or double-quotes. table Specifies a table name you want to add records. columns Specifies column names in added records with comma separations. ifexists Specifies executed grn_expr string when the same primary key as added records already exists in your table. If ifexists specifies grn_expr string (default: true) and its value is true, values in other (all columns excluding _key column) columns is updated. input_type Specifies an input format for values. It supports JSON only. Usage Here is an example to add records to "Entry" table. load --table Entry --input_type json --values [{\"_key\":\"Groonga\",\"body\":\"It's very fast!!\"}] [1] This example shows how to add values from standard input. load --table Entry --input_type json [ {"_key": "Groonga", "body": "It's very fast!!"} ] [1] Return value JSON format load returns the number of added records such as [NUMBER] See also /reference/grn_expr lock_acquire Summary New in version 5.1.2. lock_acquire command acquires the lock of the target object. The target object is one of database, table and column. NOTE: This is a dangerous command. You must release locks by lock_release that you acquire when these locks are no longer needed. If you forget to release these locks, your database may be broken. Syntax This command takes only one optional parameter: lock_clear [target_name=null] If target_name parameters is omitted, database is used for the target object. Usage Here is an example to acquire the lock of the database: Execution example: lock_acquire # [[0, 1337566253.89858, 0.000355720520019531], true] If the database is locked, you can't create a new table and column. Release the lock of the database to show another examples. Execution example: lock_release # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to acquire the lock of Entries table: Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] lock_acquire Entries # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to acquire the lock of Sites.title column: Execution example: table_create Sites TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Sites title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] lock_acquire Sites.title # [[0, 1337566253.89858, 0.000355720520019531], true] Parameters This section describes all parameters. target_name Specifies the name of table or column. If you don't specify it, database is used for the target object. The default is none. It means that the target object is database. Return value lock_acquire command returns whether lock is acquired or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. See also • lock_release • lock_clear lock_clear Summary New in version 4.0.9. lock_clear command clear the lock of the target object recursively. The target object is one of database, table and column. NOTE: This is a dangerous command. You must not use this command while other process or thread is doing a write operation to the target object. If you do it, your database may be broken and/or your process may be crashed. Syntax This command takes only one optional parameter: lock_clear [target_name=null] If target_name parameters is omitted, database is used for the target object. It means that all locks in the database are cleared. Usage Here is an example to clear all locks in the database: Execution example: lock_clear # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to clear locks of Entries table and Entries table columns: Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries body COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] lock_clear Entries # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to clear the lock of Sites.title column: Execution example: table_create Sites TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Sites title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] lock_clear Sites.title # [[0, 1337566253.89858, 0.000355720520019531], true] Parameters This section describes all parameters. target_name Specifies the name of table or column. If you don't specify it, database is used for the target object. The default is none. It means that the target object is database. Return value lock_clear command returns whether lock is cleared successfully or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. lock_release Summary New in version 5.1.2. lock_release command releases the lock of the target object. The target object is one of database, table and column. NOTE: This is a dangerous command. You must only release locks that you acquire by lock_acquire. If you release locks without lock_acquire, your database may be broken. Syntax This command takes only one optional parameter: lock_clear [target_name=null] If target_name parameters is omitted, database is used for the target object. Usage Here is an example to release the lock of the database: Execution example: lock_acquire # [[0, 1337566253.89858, 0.000355720520019531], true] lock_release # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to release the lock of Entries table: Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] lock_acquire Entries # [[0, 1337566253.89858, 0.000355720520019531], true] lock_release Entries # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to release the lock of Sites.title column: Execution example: table_create Sites TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Sites title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] lock_acquire Sites.title # [[0, 1337566253.89858, 0.000355720520019531], true] lock_release Sites.title # [[0, 1337566253.89858, 0.000355720520019531], true] Parameters This section describes all parameters. target_name Specifies the name of table or column. If you don't specify it, database is used for the target object. The default is none. It means that the target object is database. Return value lock_release command returns whether lock is released successfully or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. See also • lock_acquire • lock_clear log_level Summary log_level - ログ出力レベルの設定 Groonga組込コマンドの一つであるlog_levelについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 log_levelは、ログ出力レベルを設定します。 Syntax log_level level Usage log_level warning [true] Parameters level 設定するログ出力レベルの値を以下のいずれかで指定します。 EMERG ALERT CRIT error warning notice info debug Return value [成功かどうかのフラグ] 成功かどうかのフラグ エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。 See also log_put log_reopen log_put Summary log_put - ログ出力 groonga組込コマンドの一つであるlog_putについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 log_putは、ログにmessageを出力します。 Syntax log_put level message Usage log_put ERROR ****MESSAGE**** [true] Parameters level 設定するログ出力レベルの値を以下のいずれかで指定します。 EMERG ALERT CRIT error warning notice info debug message 出力する文字列を指定します。 Return value [成功かどうかのフラグ] 成功かどうかのフラグ エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。 See also log_level log_reopen log_reopen Summary log_reopen - ログファイルの再読み込み Groonga組込コマンドの一つであるlog_reopenについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 log_reopenは、ログファイルを再読み込みします。 現在、デフォルトのログ関数を用いている場合のみに対応しています。 Syntax log_reopen Usage log_reopen [true] log_reopenを用いたログのローテーション 1. ログファイルをmvなどで移動する。 ログはmvで移動された先のファイルに書き込まれる。 2. log_reopenコマンドを実行する。 3. 既存のログファイル名と同じファイル名で、新たなログファイルが作成される。 今後のログは新たなログファイルに書き込まれる。 Parameters ありません。 Return value [成功かどうかのフラグ] 成功かどうかのフラグ エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。 See also log_level log_put logical_count Summary New in version 5.0.0. logical_count is a command to count matched records even though actual records are stored into parted tables. It is useful for users because there is less need to care about maximum records of table /limitations. Note that this feature is not matured yet, so there are some limitations. • Create parted tables which contains "_YYYYMMDD" postfix. It is hardcoded, so you must create tables by each day. • Load proper data into parted tables on your own. Syntax This command takes many parameters. The required parameters are logical_table and shard_key: logical_count logical_table shard_key [min] [min_border] [max] [max_border] [filter] Usage Register sharding plugin to use logical_count command in advance. Note that logical_count is implemented as an experimental plugin, and the specification may be changed in the future. Here is the simple example which shows how to use this feature. Let's consider to count specified logs which are stored into multiple tables. Here is the schema and data. Execution example: table_create Logs_20150203 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150203 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150203 message COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Logs_20150204 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150204 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150204 message COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Logs_20150205 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150205 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150205 message COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] Execution example: load --table Logs_20150203 [ {"timestamp": "2015-02-03 23:59:58", "message": "Start"}, {"timestamp": "2015-02-03 23:59:58", "message": "Shutdown"}, {"timestamp": "2015-02-03 23:59:59", "message": "Start"}, {"timestamp": "2015-02-03 23:59:59", "message": "Shutdown"} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] load --table Logs_20150204 [ {"timestamp": "2015-02-04 00:00:00", "message": "Start"}, {"timestamp": "2015-02-04 00:00:00", "message": "Shutdown"}, {"timestamp": "2015-02-04 00:00:01", "message": "Start"}, {"timestamp": "2015-02-04 00:00:01", "message": "Shutdown"}, {"timestamp": "2015-02-04 23:59:59", "message": "Start"}, {"timestamp": "2015-02-04 23:59:59", "message": "Shutdown"} ] # [[0, 1337566253.89858, 0.000355720520019531], 6] load --table Logs_20150205 [ {"timestamp": "2015-02-05 00:00:00", "message": "Start"}, {"timestamp": "2015-02-05 00:00:00", "message": "Shutdown"}, {"timestamp": "2015-02-05 00:00:01", "message": "Start"}, {"timestamp": "2015-02-05 00:00:01", "message": "Shutdown"} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] There are three tables which are mapped each day from 2015 Feb 03 to 2015 Feb 05. • Logs_20150203 • Logs_20150204 • Logs_20150205 Then, it loads data into each table which correspond to. Let's count logs which contains "Shutdown" in message column and the value of timestamp is "2015-02-04 00:00:00" or later. Here is the query to achieve above purpose. Execution example: logical_count Logs timestamp --filter 'message == "Shutdown"' --min "2015-02-04 00:00:00" --min_border "include" # [[0, 1337566253.89858, 0.000355720520019531], 5] There is a well known limitation about the number of records. By sharding feature, you can overcome such limitations because such a limitation is applied per table. NOTE: There is no convenient query such as PARTITIONING BY in SQL. Thus, you must create table by table_create for each tables which contains "_YYYYMMDD" postfix in table name. Parameters This section describes parameters of logical_count. Required parameters There are required parameters, logical_table and shard_key. logical_table Specifies logical table name. It means table name without "_YYYYMMDD" postfix. If you use actual table such as "Logs_20150203", "Logs_20150203" and so on, logical table name is "Logs". shard_key Specifies column name which is treated as shared key in each parted table. Optional parameters There are optional parameters. min Specifies the min value of shard_key min_border Specifies whether the min value of borderline must be include or not. Specify include or exclude as the value of this parameter. max Specifies the max value of shard_key. max_border Specifies whether the max value of borderline must be include or not. Specify include or exclude as the value of this parameter. filter Return value TODO [HEADER, LOGICAL_COUNT] logical_parameters Summary New in version 5.0.6. logical_parameters is a command for test. Normally, you don't need to use this command. logical_parameters provides the following two features: • It returns the current parameters for logical_* commands. • It sets new parameters for logical_* commands. Here is a list of parameters: • range_index NOTE: The parameters are independent in each thread. (To be exact, each grn_ctx.) If you want to control the parameters perfectly, you should reduce the max number of threads to 1 by /reference/commands/thread_limit while you're using the parameters. Syntax This command takes only one optional parameter: logical_parameters [range_index=null] Usage You need to register sharding plugin to use this command: Execution example: plugin_register sharding # [[0, 1337566253.89858, 0.000355720520019531], true] You can get the all current parameter values by calling without parameters: Execution example: logical_parameters # [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}] You can set new values by calling with parameters: Execution example: logical_parameters --range_index never # [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}] logical_parameters returns the parameter values before new values are set when you set new values. Parameters This section describes parameters. Required parameters There is no required parameter. Optional parameters There is one optional parameter. range_index Specifies how to use range index in logical_range_filter by keyword. Here are available keywords: • auto (default) • always • never If auto is specified, range index is used only when it'll be efficient. This is the default value. Execution example: logical_parameters --range_index auto # [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "never"}] If always is specified, range index is always used. It'll be useful for testing a case that range index is used. Execution example: logical_parameters --range_index always # [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}] If never is specified, range index is never used. It'll be useful for testing a case that range index isn't used. Execution example: logical_parameters --range_index never # [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "always"}] Return value The command returns the current parameters for logical_* command: [ HEADER, {"range_index": HOW_TO_USE_RANGE_INDEX} ] HOW_TO_USE_RANGE_INDEX value is one of the followings: • "auto" • "always" • "never" See /reference/command/output_format for HEADER. logical_range_filter Summary New in version 5.0.0. TODO: Write summary Syntax This command takes many parameters. The required parameters are logical_table and shard_key: logical_range_filter logical_table shard_key [min=null] [min_border=null] [max=null] [max_border=null] [order=ascending] [filter=null] [offset=0] [limit=10] [output_columns=_key,*] [use_range_index=null] There are some parameters that can be only used as named parameters. You can't use these parameters as ordered parameters. You must specify parameter name. Here are parameters that can be only used as named parameters: • cache=no Usage Register sharding plugin to use logical_range_filter command in advance. TODO: Add examples Parameters This section describes parameters of logical_range_filter. Required parameters There are required parameters, logical_table and shard_key. logical_table Specifies logical table name. It means table name without "_YYYYMMDD" postfix. If you use actual table such as "Logs_20150203", "Logs_20150203" and so on, logical table name is "Logs". TODO: Add examples shard_key Specifies column name which is treated as shared key in each parted table. TODO: Add examples Optional parameters There are optional parameters. min Specifies the min value of shard_key TODO: Add examples min_border Specifies whether the min value of borderline must be include or not. Specify include or exclude as the value of this parameter. TODO: Add examples max Specifies the max value of shard_key. TODO: Add examples max_border Specifies whether the max value of borderline must be include or not. Specify include or exclude as the value of this parameter. TODO: Add examples order TODO filter TODO offset TODO limit TODO output_columns TODO use_range_index Specifies whether range_index is used or not. Note that it's a parameter for test. It should not be used for production. TODO: Add examples Cache related parameter cache Specifies whether caching the result of this query or not. If the result of this query is cached, the next same query returns response quickly by using the cache. It doesn't control whether existing cached result is used or not. Here are available values: ┌──────┬──────────────────────────────────┐ │Value │ Description │ ├──────┼──────────────────────────────────┤ │no │ Don't cache the output of this │ │ │ query. │ ├──────┼──────────────────────────────────┤ │yes │ Cache the output of this query. │ │ │ It's the default value. │ └──────┴──────────────────────────────────┘ TODO: Add examples The default value is yes. Return value TODO [HEADER, LOGICAL_FILTERED] logical_select Summary New in version 5.0.5. logical_select is a sharding version of select. logical_select searches records from multiple tables and outputs them. You need to plugin_register sharding plugin because logical_select is included in sharding plugin. Syntax This command takes many parameters. The required parameters are logical_table and shard_key. Other parameters are optional: logical_select logical_table shard_key [min=null] [min_border="include"] [max=null] [max_border="include"] [filter=null] [sortby=null] [output_columns="_id, _key, *"] [offset=0] [limit=10] [drilldown=null] [drilldown_sortby=null] [drilldown_output_columns="_key, _nsubrecs"] [drilldown_offset=0] [drilldown_limit=10] [drilldown_calc_types=NONE] [drilldown_calc_target=null] logical_select has the following named parameters for advanced drilldown: • drilldown[${LABEL}].keys=null • drilldown[${LABEL}].sortby=null • drilldown[${LABEL}].output_columns="_key, _nsubrecs" • drilldown[${LABEL}].offset=0 • drilldown[${LABEL}].limit=10 • drilldown[${LABEL}].calc_types=NONE • drilldown[${LABEL}].calc_target=null You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1 is a valid ${LABEL}. Parameters that have the same ${LABEL} are grouped. For example, the following parameters specify one drilldown: • --drilldown[label].keys column • --drilldown[label].sortby -_nsubrecs The following parameters specify two drilldowns: • --drilldown[label1].keys column1 • --drilldown[label1].sortby -_nsubrecs • --drilldown[label2].keys column2 • --drilldown[label2].sortby _key Differences from select Most of logical_select features can be used like corresponding select features. For example, parameter name is same, output format is same and so on. But there are some differences from select: • logical_table and shard_key parameters are required instead of table parameter. • sortby isn't supported when multiple shards are used. (Only one shard is used, they are supported.) • _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards. It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple shards. • match_columns and query aren't supported yet. • cache isn't supported yet. • match_escalation_threshold isn't supported yet. • query_flags isn't supported yet. • query_expander isn't supported yet. • adjuster isn't supported yet. Usage Let's learn about logical_select usage with examples. This section shows many popular usages. You need to register sharding plugin because logical_select is included in sharding plugin. Execution example: plugin_register sharding # [[0, 1337566253.89858, 0.000355720520019531], true] Here are a schema definition and sample data to show usage. Execution example: table_create Entries_20150708 TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150708 created_at COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150708 content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150708 n_likes COLUMN_SCALAR UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150708 tag COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Entries_20150709 TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150709 created_at COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150709 content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150709 n_likes COLUMN_SCALAR UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries_20150709 tag COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_key_index_20150708 \ COLUMN_INDEX|WITH_POSITION Entries_20150708 _key # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_content_index_20150708 \ COLUMN_INDEX|WITH_POSITION Entries_20150708 content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_key_index_20150709 \ COLUMN_INDEX|WITH_POSITION Entries_20150709 _key # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_content_index_20150709 \ COLUMN_INDEX|WITH_POSITION Entries_20150709 content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries_20150708 [ {"_key": "The first post!", "created_at": "2015/07/08 00:00:00", "content": "Welcome! This is my first post!", "n_likes": 5, "tag": "Hello"}, {"_key": "Groonga", "created_at": "2015/07/08 01:00:00", "content": "I started to use Groonga. It's very fast!", "n_likes": 10, "tag": "Groonga"}, {"_key": "Mroonga", "created_at": "2015/07/08 02:00:00", "content": "I also started to use Mroonga. It's also very fast! Really fast!", "n_likes": 15, "tag": "Groonga"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] load --table Entries_20150709 [ {"_key": "Good-bye Senna", "created_at": "2015/07/09 00:00:00", "content": "I migrated all Senna system!", "n_likes": 3, "tag": "Senna"}, {"_key": "Good-bye Tritonn", "created_at": "2015/07/09 01:00:00", "content": "I also migrated all Tritonn system!", "n_likes": 3, "tag": "Senna"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] There are two tables, Entries_20150708 and Entries_20150709, for blog entries. NOTE: You need to use ${LOGICAL_TABLE_NAME}_${YYYYMMDD} naming rule for table names. In this example, LOGICAL_TABLE_NAME is Entries and YYYYMMDD is 20150708 or 20150709. An entry has title, created time, content, the number of likes for the entry and tag. Title is key of Entries_YYYYMMDD. Created time is value of Entries_YYYYMMDD.created_at column. Content is value of Entries_YYYYMMDD.content column. The number of likes is value of Entries_YYYYMMDD.n_likes column. Tag is value of Entries_YYYYMMDD.tag column. Entries_YYYYMMDD._key column and Entries_YYYYMMDD.content column are indexed using TokenBigram tokenizer. So both Entries_YYYYMMDD._key and Entries_YYYYMMDD.content are fulltext search ready. OK. The schema and data for examples are ready. Simple usage TODO Parameters This section describes parameters of logical_select. Required parameters There are required parameters, logical_table and shard_key. logical_table Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use actual table such as Entries_20150708, Entries_20150709 and so on, logical table name is Entries. You can show 10 records by specifying logical_table and shard_key parameters. They are required parameters. Execution example: logical_select --logical_table Entries --shard_key created_at # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 1436284800.0, # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1436288400.0, # 15, # "Groonga" # ], # [ # 1, # "Good-bye Senna", # "I migrated all Senna system!", # 1436367600.0, # 3, # "Senna" # ], # [ # 2, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1436371200.0, # 3, # "Senna" # ] # ] # ] # ] If nonexistent table is specified, an error is returned. Execution example: logical_select --logical_table Nonexistent --shard_key created_at # [ # [ # -22, # 1337566253.89858, # 0.000355720520019531, # "[logical_select] no shard exists: logical_table: <Nonexistent>: shard_key: <created_at>", # [ # [ # "Groonga::Context.set_groonga_error", # "lib/mrb/scripts/context.rb", # 27 # ] # ] # ] # ] shard_key Specifies column name which is treated as shared key. Shard key is a column that stores data that is used for distributing records to suitable shards. Shard key must be Time type for now. See logical_table how to specify shard_key. Optional parameters There are optional parameters. min Specifies the minimum value of shard_key column. If shard doesn't have any matched records, the shard isn't searched. For example, min is "2015/07/09 00:00:00", Entry_20150708 isn't searched. Because Entry_20150708 has only records for "2015/07/08". The following example only uses Entry_20150709 table. Entry_20150708 isn't used. Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --min "2015/07/09 00:00:00" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "Good-bye Senna", # "I migrated all Senna system!", # 1436367600.0, # 3, # "Senna" # ], # [ # 2, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1436371200.0, # 3, # "Senna" # ] # ] # ] # ] min_border Specifies whether the minimum value is included or not. Here is available values. ┌────────┬──────────────────────────────────┐ │Value │ Description │ ├────────┼──────────────────────────────────┤ │include │ Includes min value. This is the │ │ │ default. │ ├────────┼──────────────────────────────────┤ │exclude │ Doesn't include min value. │ └────────┴──────────────────────────────────┘ Here is an example for exclude. The result doesn't include the "Good-bye Senna" record because its created_at value is "2015/07/09 00:00:00". Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --min "2015/07/09 00:00:00" \ --min_border "exclude" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1436371200.0, # 3, # "Senna" # ] # ] # ] # ] max Specifies the maximum value of shard_key column. If shard doesn't have any matched records, the shard isn't searched. For example, max is "2015/07/08 23:59:59", Entry_20150709 isn't searched. Because Entry_20150709 has only records for ""2015/07/09". The following example only uses Entry_20150708 table. Entry_20150709 isn't used. Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --max "2015/07/08 23:59:59" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 1436284800.0, # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1436288400.0, # 15, # "Groonga" # ] # ] # ] # ] max_border Specifies whether the maximum value is included or not. Here is available values. ┌────────┬──────────────────────────────────┐ │Value │ Description │ ├────────┼──────────────────────────────────┤ │include │ Includes max value. This is the │ │ │ default. │ ├────────┼──────────────────────────────────┤ │exclude │ Doesn't include max value. │ └────────┴──────────────────────────────────┘ Here is an example for exclude. The result doesn't include the "Good-bye Senna" record because its created_at value is "2015/07/09 00:00:00". Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --max "2015/07/09 00:00:00" \ --max_border "exclude" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 1436284800.0, # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1436288400.0, # 15, # "Groonga" # ] # ] # ] # ] Search related parameters logical_select provides select compatible search related parameters. match_columns and query aren't supported yet. filter is only supported for now. match_columns Not implemented yet. query Not implemented yet. filter Corresponds to select-filter in select. See select-filter for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --filter "n_likes <= 5" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ], # [ # 1, # "Good-bye Senna", # "I migrated all Senna system!", # 1436367600.0, # 3, # "Senna" # ], # [ # 2, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1436371200.0, # 3, # "Senna" # ] # ] # ] # ] Advanced search parameters logical_select doesn't implement advanced search parameters yet. match_escalation_threshold Not implemented yet. query_flags Not implemented yet. query_expander Not implemented yet. Output related parameters output_columns Corresponds to select-output-columns in select. See select-output-columns for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --output_columns '_key, *' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ], # [ # "Groonga", # "I started to use Groonga. It's very fast!", # 1436284800.0, # 10, # "Groonga" # ], # [ # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1436288400.0, # 15, # "Groonga" # ], # [ # "Good-bye Senna", # "I migrated all Senna system!", # 1436367600.0, # 3, # "Senna" # ], # [ # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1436371200.0, # 3, # "Senna" # ] # ] # ] # ] sortby Corresponds to select-sortby in select. See select-sortby for details. sortby has a limitation. It works only when the number of search target shards is one. If the number of search target shards is larger than one, sortby doesn't work. Here is an example that uses only one shard: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --min "2015/07/08 00:00:00" \ --min_border "include" \ --max "2015/07/09 00:00:00" \ --max_border "exclude" \ --sortby _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 1436284800.0, # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1436288400.0, # 15, # "Groonga" # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ] # ] # ] # ] offset Corresponds to select-offset in select. See select-offset for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --offset 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1436288400.0, # 15, # "Groonga" # ], # [ # 1, # "Good-bye Senna", # "I migrated all Senna system!", # 1436367600.0, # 3, # "Senna" # ], # [ # 2, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1436371200.0, # 3, # "Senna" # ] # ] # ] # ] limit Corresponds to select-limit in select. See select-limit for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "created_at", # "Time" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 1436281200.0, # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 1436284800.0, # 10, # "Groonga" # ] # ] # ] # ] scorer Not implemented yet. Drilldown related parameters All drilldown related parameters in select are supported. See select-drilldown-related-parameters for details. drilldown Corresponds to select-drilldown in select. See select-drilldown for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --output_columns _key,tag \ --drilldown tag # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tag", # "ShortText" # ] # ], # [ # "The first post!", # "Hello" # ], # [ # "Groonga", # "Groonga" # ], # [ # "Mroonga", # "Groonga" # ], # [ # "Good-bye Senna", # "Senna" # ], # [ # "Good-bye Tritonn", # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ] # ] # ] # ] drilldown_sortby Corresponds to select-drilldown-sortby in select. See select-drilldown-sortby for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown tag \ --drilldown_sortby -_nsubrecs,_key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ], # [ # "Hello", # 1 # ] # ] # ] # ] drilldown_output_columns Corresponds to select-drilldown-output-columns in select. See select-drilldown-output-columns for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown tag \ --drilldown_output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "Hello" # ], # [ # "Groonga" # ], # [ # "Senna" # ] # ] # ] # ] drilldown_offset Corresponds to select-drilldown-offset in select. See select-drilldown-offset for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown tag \ --drilldown_offset 1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ] # ] # ] # ] drilldown_limit Corresponds to select-drilldown-limit in select. See select-drilldown-limit for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown tag \ --drilldown_limit 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ] # ] # ] # ] drilldown_calc_types Corresponds to select-drilldown-calc-types in select. See select-drilldown-calc-types for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit -1 \ --output_columns tag,n_likes \ --drilldown tag \ --drilldown_calc_types MAX,MIN,SUM,AVG \ --drilldown_calc_target n_likes \ --drilldown_output_columns _key,_nsubrecs,_max,_min,_sum,_avg # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "tag", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # "Hello", # 5 # ], # [ # "Groonga", # 10 # ], # [ # "Groonga", # 15 # ], # [ # "Senna", # 3 # ], # [ # "Senna", # 3 # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ], # [ # "_max", # "Int64" # ], # [ # "_min", # "Int64" # ], # [ # "_sum", # "Int64" # ], # [ # "_avg", # "Float" # ] # ], # [ # "Hello", # 1, # 5, # 5, # 5, # 5.0 # ], # [ # "Groonga", # 2, # 15, # 10, # 25, # 12.5 # ], # [ # "Senna", # 2, # 3, # 3, # 6, # 3.0 # ] # ] # ] # ] drilldown_calc_target Corresponds to select-drilldown-calc-target in select. See select-drilldown-calc-target for details. See also drilldown_calc_types for an example. Advanced drilldown related parameters All advanced drilldown related parameters in select are supported. See select-advanced-drilldown-related-parameters for details. There are some limitations: • _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards. It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple shards. drilldown[${LABEL}].keys Corresponds to select-drilldown-label-keys in select. See select-drilldown-label-keys for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown[tag.n_likes].keys tag,n_likes \ --drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # { # "tag.n_likes": [ # [ # 4 # ], # [ # [ # "tag", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 5, # 1 # ], # [ # "Groonga", # 10, # 1 # ], # [ # "Groonga", # 15, # 1 # ], # [ # "Senna", # 3, # 2 # ] # ] # } # ] # ] drilldown[${LABEL}].output_columns Corresponds to select-drilldown-label-output-columns in select. See select-drilldown-label-output-columns for details. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown[tag].keys tag \ --drilldown[tag].output_columns _key,_nsubrecs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # { # "tag": [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ] # ] # } # ] # ] drilldown[${LABEL}].sortby Corresponds to drilldown_sortby in not labeled drilldown. drilldown[${LABEL}].sortby has a limitation. _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards. It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple shards. Here is an example that uses _value.${KEY_NAME} with only one shard: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --min "2015/07/08 00:00:00" \ --min_border "include" \ --max "2015/07/09 00:00:00" \ --max_border "exclude" \ --limit 0 \ --output_columns _id \ --drilldown[tag.n_likes].keys tag,n_likes \ --drilldown[tag.n_likes].output_columns _nsubrecs,_value.n_likes,_value.tag \ --drilldown[tag.n_likes].sortby -_nsubrecs,_value.n_likes,_value.tag # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # { # "tag.n_likes": [ # [ # 3 # ], # [ # [ # "_nsubrecs", # "Int32" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # 5, # "Hello" # ], # [ # 1, # 10, # "Groonga" # ], # [ # 1, # 15, # "Groonga" # ] # ] # } # ] # ] drilldown[${LABEL}].offset Corresponds to drilldown_offset in not labeled drilldown. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown[tag.n_likes].keys tag \ --drilldown[tag.n_likes].offset 1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # { # "tag.n_likes": [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ] # ] # } # ] # ] drilldown[${LABEL}].limit Corresponds to drilldown_limit in not labeled drilldown. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown[tag.n_likes].keys tag \ --drilldown[tag.n_likes].limit 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # { # "tag.n_likes": [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ] # ] # } # ] # ] drilldown[${LABEL}].calc_types Corresponds to drilldown_calc_types in not labeled drilldown. Here is an example: Execution example: logical_select \ --logical_table Entries \ --shard_key created_at \ --limit 0 \ --output_columns _id \ --drilldown[tag].keys tag \ --drilldown[tag].calc_types MAX,MIN,SUM,AVG \ --drilldown[tag].calc_target n_likes \ --drilldown[tag].output_columns _key,_nsubrecs,_max,_min,_sum,_avg # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ] # ] # ], # { # "tag": [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ], # [ # "_max", # "Int64" # ], # [ # "_min", # "Int64" # ], # [ # "_sum", # "Int64" # ], # [ # "_avg", # "Float" # ] # ], # [ # "Hello", # 1, # 5, # 5, # 5, # 5.0 # ], # [ # "Groonga", # 2, # 15, # 10, # 25, # 12.5 # ], # [ # "Senna", # 2, # 3, # 3, # 6, # 3.0 # ] # ] # } # ] # ] drilldown[${LABEL}].calc_target Corresponds to drilldown_calc_target in not labeled drilldown. See also drilldown[${LABEL}].calc_types for an example. Return value The return value format of logical_select is compatible with select. See select-return-value for details. logical_shard_list Summary New in version 5.0.7. logical_shard_list returns all existing shard names against the specified logical table name. Syntax This command takes only one required parameter: logical_shard_list logical_table Usage You need to register sharding plugin to use this command: Execution example: plugin_register sharding # [[0, 1337566253.89858, 0.000355720520019531], true] Here are sample shards: Execution example: table_create Logs_20150801 TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150801 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Logs_20150802 TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150802 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Logs_20150930 TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20150930 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] You can get the all shard names in ascending order by specifying Logs as the logical table name: Execution example: logical_shard_list --logical_table Logs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "name": "Logs_20150801" # }, # { # "name": "Logs_20150802" # }, # { # "name": "Logs_20150930" # } # ] # ] Parameters This section describes parameters. Required parameters There is one required parameter. logical_table Specifies the logical table name. logical_shard_list returns a list of shard name of the logical table: Execution example: logical_shard_list --logical_table Logs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "name": "Logs_20150801" # }, # { # "name": "Logs_20150802" # }, # { # "name": "Logs_20150930" # } # ] # ] The list is sorted by shard name in ascending order. Optional parameters There is no optional parameter. Return value The command returns a list of shard names in ascending order: [ HEADER, [ {"name": "SHARD_NAME_1"}, {"name": "SHARD_NAME_2"}, ... {"name": "SHARD_NAME_N"} ] ] See /reference/command/output_format for HEADER. See also • /reference/sharding logical_table_remove Summary New in version 5.0.5. logical_table_remove removes tables and their columns for the specified logical table. If there are one or more indexes against key of the tables and their columns, they are also removed. If you specify the part of a shard, table of the shard isn't removed. logical_table_remove just deletes records in the table. For example, there are the following records in a table: • Record1: 2016-03-18 00:30:00 • Record2: 2016-03-18 01:00:00 • Record3: 2016-03-18 02:00:00 logical_table_remove deletes "Record1" and "Record2" when you specify range as between 2016-03-18 00:00:00 and 2016-03-18 01:30:00. logical_table_remove doesn't delete "Record3". logical_table_remove doesn't remove the table. New in version 6.0.1: You can also remove tables and columns that reference the target table and tables related with the target shard by using dependent parameter. Syntax This command takes many parameters. The required parameters are logical_table and shard_key: logical_table_remove logical_table shard_key [min=null] [min_border="include"] [max=null] [max_border="include"] [dependent=no] Usage You specify logical table name and shard key what you want to remove. This section describes about the followings: • Basic usage • Removes parts of a logical table • Unremovable cases • Removes with related tables • Decreases used resources Basic usage Register sharding plugin to use this command in advance. Execution example: register sharding # [[0, 1337566253.89858, 0.000355720520019531], true] You can remove all tables for the logical table by specifying only logical_table and shard_key. Here are commands to create 2 shards: Execution example: table_create Logs_20160318 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160318 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Logs_20160319 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160319 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] You can confirm existing shards by logical_shard_list: Execution example: logical_shard_list --logical_table Logs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "name": "Logs_20160318" # }, # { # "name": "Logs_20160319" # } # ] # ] You can remove all shards: Execution example: logical_table_remove \ --logical_table Logs \ --shard_key timestamp # [[0, 1337566253.89858, 0.000355720520019531], true] There are no shards after you remove all shards: Execution example: logical_shard_list --logical_table Logs # [[0, 1337566253.89858, 0.000355720520019531], []] Removes parts of a logical table You can specify range of shards by the following parameters: • min • min_border • max • max_border See the following documents of logical_select for each parameter: • logical-select-min • logical-select-min-border • logical-select-max • logical-select-max-border If the specified range doesn't cover all records in a shard, table for the shard isn't removed. Target records in the table are only deleted. If the specified range covers all records in a shard, table for the shard is removed. Here is a logical table to show the behavior. The logical table has two shards: Execution example: table_create Logs_20160318 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160318 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Logs_20160318 [ {"timestamp": "2016-03-18 00:30:00"}, {"timestamp": "2016-03-18 01:00:00"}, {"timestamp": "2016-03-18 02:00:00"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] table_create Logs_20160319 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160319 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Logs_20160319 [ {"timestamp": "2016-03-19 00:30:00"}, {"timestamp": "2016-03-19 01:00:00"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] There are the following records in Logs_20160318 table: • Record1: "2016-03-18 00:30:00" • Record2: "2016-03-18 01:00:00" • Record3: "2016-03-18 02:00:00" There are the following records in Logs_20160319 table: • Record1: "2016-03-19 00:30:00" • Record2: "2016-03-19 01:00:00" The following range doesn't cover "Record1" in Logs_20160318 table but covers all records in Logs_20160319 table: ┌───────────┬───────────────────────┐ │Parameter │ Value │ ├───────────┼───────────────────────┤ │min │ "2016-03-18 01:00:00" │ ├───────────┼───────────────────────┤ │min_border │ "include" │ ├───────────┼───────────────────────┤ │max │ "2016-03-19 01:30:00" │ ├───────────┼───────────────────────┤ │max_border │ "include" │ └───────────┴───────────────────────┘ logical_table_remove with the range deletes "Record2" and "Record3" in Logs_20160318 table but doesn't remove Logs_20160318 table. Because there is "Record1" in Logs_20160318 table. logical_table_remove with the range removes Logs_20160319 table because the range covers all records in Logs_20160319 table. Here is an example to use logical_table_remove with the range: Execution example: logical_table_remove \ --logical_table Logs \ --shard_key timestamp \ --min "2016-03-18 01:00:00" \ --min_border "include" \ --max "2016-03-19 01:30:00" \ --max_border "include" # [[0, 1337566253.89858, 0.000355720520019531], true] dump shows that there is "Record1" in Logs_20160318 table: Execution example: dump # plugin_register sharding # # table_create Logs_20160318 TABLE_NO_KEY # column_create Logs_20160318 timestamp COLUMN_SCALAR Time # # load --table Logs_20160318 # [ # ["_id","timestamp"], # [1,1458228600.0] # ] Unremovable cases There are some unremovable cases. See table-remove-unremovable-cases for details. Because logical_table_remove uses the same checks. Removes with related tables New in version 6.0.1. If you understand what you'll do, you can also remove tables and columns that depend on the target shard with one logical_table_remove command by using --dependent yes parameter. Here are conditions for dependent. If table or column satisfies one of the conditions, the table or column depends on the target shard: • Tables and columns that reference the target shard • Tables for the shard (= The table has the same _YYYYMMDD postfix as the target shard and is referenced from the target shard) If there are one or more tables and columns that reference the target shard, logical_table_remove is failed. It's for avoiding dangling references. Bookmarks.log_20160320 column in the following is the column that references the target shard: Execution example: table_create Logs_20160320 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160320 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Bookmarks TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Bookmarks log_20160320 COLUMN_SCALAR Logs_20160320 # [[0, 1337566253.89858, 0.000355720520019531], true] You can't remove Logs_20160320 by logical_table_remove by default: Execution example: logical_table_remove \ --logical_table Logs \ --shard_key timestamp # [ # [ # -2, # 1337566253.89858, # 0.000355720520019531, # "operation not permitted: <[table][remove] a column that references the table exists: <Bookmarks.log_20160320> -> <Logs_20160320", # [ # [ # "Groonga::Sharding::LogicalTableRemoveCommand.remove_table", # "/home/kou/work/c/groonga.clean/plugins/sharding/logical_table_remove.rb", # 80 # ] # ] # ] # ] You can remove Logs_20160320 by logical_table_remove with --dependent yes parameter. Bookmarks.log_20160320 is also removed: Execution example: logical_table_remove \ --logical_table Logs \ --shard_key timestamp \ --dependent yes # [[0, 1337566253.89858, 0.000355720520019531], true] object_exist shows that Logs_20160320 table and Bookmarks.log_20160320 column are removed: Execution example: object_exist Logs_20160320 # [[0, 1337566253.89858, 0.000355720520019531], false] object_exist Bookmarks.log_20160320 # [[0, 1337566253.89858, 0.000355720520019531], false] If there is one or more tables for the target shard, logical_table_remove with --dependent yes also removes them. Tables that have the same _YYYYMMDD postfix as the target shard are treated as tables for the target shard. Here are two tables that have _20160320 postfix. NotRelated_20160320 table isn't used by Logs_20160320 table. Users_20160320 table is used by Logs_20160320 table. Servers table exists and used by Logs_20160320 table: Execution example: table_create NotRelated_20160320 TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Users_20160320 TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Servers TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Logs_20160320 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160320 timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160320 user COLUMN_SCALAR Users_20160320 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs_20160320 server COLUMN_SCALAR Servers # [[0, 1337566253.89858, 0.000355720520019531], true] logical_table_remove with --dependent yes parameter removes only Logs_20160320 table and Users_20160320 table. Because Users_20160320 table has _20160320 postfix and used by Logs_20160320. NotRelated_20160320 table and Servers table aren't removed. Because NotRelated_20160320 table has _20160320 postfix but isn't used by Logs_20160320. Servers table is used by Logs_20160320 but doesn't have _20160320 postfix: Execution example: logical_table_remove \ --logical_table Logs \ --shard_key timestamp \ --dependent yes # [[0, 1337566253.89858, 0.000355720520019531], true] You can confirm that Logs_20160320 table and Users_20160320 table are removed but NotRelated_20160320 table and Servers table aren't removed: Execution example: object_exist Logs_20160320 # [[0, 1337566253.89858, 0.000355720520019531], false] object_exist Users_20160320 # [[0, 1337566253.89858, 0.000355720520019531], false] object_exist NotRelated_20160320 # [[0, 1337566253.89858, 0.000355720520019531], true] object_exist Servers # [[0, 1337566253.89858, 0.000355720520019531], true] Decreases used resources You can decrease resources for this command. See table-remove-decreases-used-resources for details. Because logical_table_remove uses the same logic as table_remove. Parameters This section describes parameters of logical_table_remove. Required parameters There are required parameters. logical_table Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use actual table such as Logs_20150203, Logs_20150203 and so on, logical table name is Logs. See also logical-select-logical-table. shard_key Specifies column name which is treated as shared key. See also logical-select-shard-key. Optional parameters There are optional parameters. min Specifies the minimum value of shard_key column. See also logical-select-min. min_border Specifies whether the minimum value is included or not. include and exclude are available. The default is include. See also logical-select-min-border. max Specifies the maximum value of shard_key column. See also logical-select-max. max_border Specifies whether the maximum value is included or not. include and exclude are available. The default is include. See also logical-select-max-border. dependent New in version 6.0.1. Specifies whether tables and columns that depend on the target shard are also removed or not. Here are conditions for dependent. If table or column satisfies one of the conditions, the table or column depends on the target shard: • Tables and columns that reference the target shard • Tables for the shard (= The table has the same _YYYYMMDD postfix as the target shard and is referenced from the target shard) If this value is yes, tables and columns that depend on the target shard are also removed. Otherwise, they aren't removed. If there are one or more tables that reference the target shard, an error is returned. If there are tables for the shared, they are not touched. You should use this parameter carefully. This is a danger parameter. See Removes with related tables how to use this parameter. Return value The command returns true as body on success such as: [HEADER, true] If the command fails, error details are in HEADER. See /reference/command/output_format for HEADER. normalize NOTE: This command is an experimental feature. This command may be changed in the future. Summary normalize command normalizes text by the specified normalizer. There is no need to create table to use normalize command. It is useful for you to check the results of normalizer. Syntax This command takes three parameters. normalizer and string are required. Others are optional: normalize normalizer string [flags=NONE] Usage Here is a simple example of normalize command. Execution example: normalize NormalizerAuto "aBcDe 123" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "normalized": "abcde 123", # "types": [], # "checks": [] # } # ] Parameters This section describes parameters of normalizer. Required parameters There are required parameters, normalizer and string. normalizer Specifies the normalizer name. normalize command uses the normalizer that is named normalizer. See /reference/normalizers about built-in normalizers. Here is an example to use built-in NormalizerAuto normalizer. TODO If you want to use other normalizers, you need to register additional normalizer plugin by register command. For example, you can use MySQL compatible normalizer by registering groonga-normalizer-mysql. string Specifies any string which you want to normalize. If you want to include spaces in string, you need to quote string by single quotation (') or double quotation ("). Here is an example to use spaces in string. TODO Optional parameters There are optional parameters. flags Specifies a normalization customize options. You can specify multiple options separated by "|". For example, REMOVE_BLANK|WITH_TYPES. Here are available flags. ┌───────────────────────────┬───────────────┐ │Flag │ Description │ ├───────────────────────────┼───────────────┤ │NONE │ Just ignored. │ ├───────────────────────────┼───────────────┤ │REMOVE_BLANK │ TODO │ ├───────────────────────────┼───────────────┤ │WITH_TYPES │ TODO │ ├───────────────────────────┼───────────────┤ │WITH_CHECKS │ TODO │ ├───────────────────────────┼───────────────┤ │REMOVE_TOKENIZED_DELIMITER │ TODO │ └───────────────────────────┴───────────────┘ Here is an example that uses REMOVE_BLANK. TODO Here is an example that uses WITH_TYPES. TODO Here is an example that uses REMOVE_TOKENIZED_DELIMITER. TODO Return value [HEADER, normalized_text] HEADER See /reference/command/output_format about HEADER. normalized_text normalized_text is an object that has the following attributes. ┌───────────┬──────────────────────────────────┐ │Name │ Description │ ├───────────┼──────────────────────────────────┤ │normalized │ The normalized text. │ ├───────────┼──────────────────────────────────┤ │types │ An array of types of the │ │ │ normalized text. The N-th types │ │ │ shows the type of the N-th │ │ │ character in normalized. │ └───────────┴──────────────────────────────────┘ See also • /reference/normalizers normalizer_list Summary normalizer_list command lists normalizers in a database. Syntax This command takes no parameters: normalizer_list Usage Here is a simple example. Execution example: normalizer_list # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "name": "NormalizerAuto" # }, # { # "name": "NormalizerNFKC51" # } # ] # ] It returns normalizers in a database. Return value normalizer_list command returns normalizers. Each normalizers has an attribute that contains the name. The attribute will be increased in the feature: [HEADER, normalizers] HEADER See /reference/command/output_format about HEADER. normalizers normalizers is an array of normalizer. Normalizer is an object that has the following attributes. ┌─────┬──────────────────┐ │Name │ Description │ ├─────┼──────────────────┤ │name │ Normalizer name. │ └─────┴──────────────────┘ See also • /reference/normalizers • /reference/commands/normalize object_exist Summary New in version 5.0.6. object_exist returns whether object with the specified name exists or not in database. It's a light operation. It just checks existence of the name in the database. It doesn't load the specified object from disk. object_exist doesn't check object type. The existing object may be table, column, function and so on. Syntax This command takes only one required parameter: object_exist name Usage You can check whether the name is already used in database: Execution example: object_exist Users # [[0, 1337566253.89858, 0.000355720520019531], false] table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] object_exist Users # [[0, 1337566253.89858, 0.000355720520019531], true] The object_exist Users returns false before you create Users table. The object_exist Users returns true after you create Users table. Parameters This section describes all parameters. Required parameters There is only one required parameters. name Specifies the object name to be checked. If you want to check existence of a column, use TABLE_NAME.COLUMN_NAME format like the following: Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] object_exist Logs.timestamp # [[0, 1337566253.89858, 0.000355720520019531], true] Logs is table name and timestamp is column name in Logs.timestamp. Optional parameters There is no optional parameter. Return value The command returns true as body if object with the specified name exists in database such as: [HEADER, true] The command returns false otherwise such as: [HEADER, false] See /reference/command/output_format for HEADER. object_inspect Summary New in version 6.0.0. object_inspect inspects an object. You can confirm details of an object. For example: • If the object is a table, you can confirm the number of records in the table. • If the object is a column, you can confirm the type of value of the column. Syntax This command takes only one optional parameter: object_inspect [name=null] Usage You can inspect an object in the database specified by name: Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "Alice"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] object_inspect Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "name": "Users", # "n_records": 1, # "value": { # "type": null # }, # "key": { # "total_size": 5, # "max_total_size": 4294967295, # "type": { # "size": 4096, # "type": { # "id": 32, # "name": "type" # }, # "id": 14, # "name": "ShortText" # } # }, # "type": { # "id": 48, # "name": "table:hash_key" # }, # "id": 256 # } # ] The object_inspect Users returns the following information: • The name of the table: "name": Users • The total used key size: "key": {"total_size": 5} ("Alice" is 5 byte data) • The maximum total key size: "key": {"max_total_size": 4294967295} • and so on. You can inspect the database by not specifying name: Execution example: object_inspect # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "name_table": { # "name": "", # "n_records": 256, # "value": null, # "key": { # "type": null # }, # "type": { # "id": 50, # "name": "table:dat_key" # }, # "id": 0 # }, # "type": { # "id": 55, # "name": "db" # } # } # ] The object_inspect returns the following information: • The table type for object name management: "key": {"type": {"name": "table:dat_key"}} • and so on. Parameters This section describes all parameters. Required parameters There is no required parameter. Optional parameters There is only one optional parameter. name Specifies the object name to be inspected. If name isn't specified, the database is inspected. Return value The command returns an object (nested key and value pairs) that includes details of the object (such as table) as body: [HEADER, object] See /reference/command/output_format for HEADER. The format of the details is depends on object type. For example, table has key information but function doesn't have key information. Database Database inspection returns the following information: { "type": { "id": DATABASE_TYPE_ID, "name": DATABASE_TYPE_NAME }, "name_table": DATABASE_NAME_TABLE } DATABASE_TYPE_ID DATABASE_TYPE_ID is always 55. DATABASE_TYPE_NAME DATABASE_TYPE_NAME is always "db". DATABASE_NAME_TABLE DATABASE_NAME_TABLE is a table for managing object names in the database. The table is table-pat-key or table-dat-key. Normally, it's table-dat-key. See Table for format details. Table Table inspection returns the following information: { "name": TABLE_NAME, "type": { "id": TABLE_TYPE_ID, "name": TABLE_TYPE_NAME }, "key": { "type": TABLE_KEY_TYPE, "total_size": TABLE_KEY_TOTAL_SIZE "max_total_size": TABLE_KEY_MAX_TOTAL_SIZE }, "value": { "type": TABLE_VALUE_TYPE, }, "n_records": TABLE_N_RECORDS } There are some exceptions: • table-no-key doesn't return key information because it doesn't have key. • table-dat-key doesn't return value information because it doesn't have value. TABLE_NAME The name of the inspected table. TABLE_TYPE_ID The type ID of the inspected table. Here is a list of type IDs: ┌───────────────┬────┐ │Table type │ ID │ ├───────────────┼────┤ │table-hash-key │ 48 │ ├───────────────┼────┤ │table-pat-key │ 49 │ ├───────────────┼────┤ │table-dat-key │ 50 │ ├───────────────┼────┤ │table-no-key │ 51 │ └───────────────┴────┘ TABLE_TYPE_NAME The type name of the inspected table. Here is a list of type names: ┌───────────────┬──────────────────┐ │Table type │ Name │ ├───────────────┼──────────────────┤ │table-hash-key │ "table:hash_key" │ ├───────────────┼──────────────────┤ │table-pat-key │ "table:pat_key" │ ├───────────────┼──────────────────┤ │table-dat-key │ "table:dat_key" │ ├───────────────┼──────────────────┤ │table-no-key │ "table:no_key" │ └───────────────┴──────────────────┘ TABLE_KEY_TYPE The type of key of the inspected table. See Type for format details. TABLE_KEY_TOTAL_SIZE The total key size of the inspected table in bytes. TABLE_KEY_MAX_TOTAL_SIZE The maximum total key size of the inspected table in bytes. TABLE_VALUE_TYPE The type of value of the inspected table. See Type for format details. TABLE_N_RECORDS The number of records of the inspected table. It's a 64bit unsigned integer value. Type Type inspection returns the following information: { "id": TYPE_ID, "name": TYPE_NAME, "type": { "id": TYPE_ID_OF_TYPE, "name": TYPE_NAME_OF_TYPE }, "size": TYPE_SIZE } TYPE_ID The ID of the inspected type. Here is an ID list of builtin types: ┌─────────────────────────────┬────┐ │Type │ ID │ ├─────────────────────────────┼────┤ │builtin-type-bool │ 3 │ ├─────────────────────────────┼────┤ │builtin-type-int8 │ 4 │ ├─────────────────────────────┼────┤ │builtin-type-uint8 │ 5 │ ├─────────────────────────────┼────┤ │builtin-type-int16 │ 6 │ ├─────────────────────────────┼────┤ │builtin-type-uint16 │ 7 │ ├─────────────────────────────┼────┤ │builtin-type-int32 │ 8 │ ├─────────────────────────────┼────┤ │builtin-type-uint32 │ 9 │ ├─────────────────────────────┼────┤ │builtin-type-int64 │ 10 │ ├─────────────────────────────┼────┤ │builtin-type-uint64 │ 11 │ ├─────────────────────────────┼────┤ │builtin-type-float │ 12 │ └─────────────────────────────┴────┘ │builtin-type-time │ 13 │ ├─────────────────────────────┼────┤ │builtin-type-short-text │ 14 │ ├─────────────────────────────┼────┤ │builtin-type-text │ 15 │ ├─────────────────────────────┼────┤ │builtin-type-long-text │ 16 │ ├─────────────────────────────┼────┤ │builtin-type-tokyo-geo-point │ 17 │ ├─────────────────────────────┼────┤ │builtin-type-wgs84-geo-point │ 18 │ └─────────────────────────────┴────┘ TYPE_NAME The name of the inspected type. Here is a name list of builtin types: • builtin-type-bool • builtin-type-int8 • builtin-type-uint8 • builtin-type-int16 • builtin-type-uint16 • builtin-type-int32 • builtin-type-uint32 • builtin-type-int64 • builtin-type-uint64 • builtin-type-float • builtin-type-time • builtin-type-short-text • builtin-type-text • builtin-type-long-text • builtin-type-tokyo-geo-point • builtin-type-wgs84-geo-point TYPE_ID_OF_TYPE TYPE_ID_OF_TYPE is always 32. TYPE_NAME_OF_TYPE TYPE_NAME_OF_TYPE is always type. TYPE_SIZE TYPE_SIZE is the size of the inspected type in bytes. If the inspected type is variable size type, the size means the maximum size. object_remove Summary New in version 6.0.0. object_remove removes an object. You can remove any object including table, column, command and so on. Normally, you should use specific remove command such as table_remove and column_remove. object_remove is danger because you can remove any object. You should use object_remove carefully. object_remove has "force mode". You can remove a broken object by "force mode". "Force mode" is useful to resolve problems reported by /reference/executables/grndb. Syntax This command takes two parameters: object_remove name [force=no] Usage You can remove an object in the database specified by name: Execution example: object_remove Users # [ # [ # -22, # 1337566253.89858, # 0.000355720520019531, # "[object][remove] target object doesn't exist: <Users>", # [ # [ # "command_object_remove", # "proc_object.c", # 121 # ] # ] # ], # false # ] table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] object_remove Users # [[0, 1337566253.89858, 0.000355720520019531], true] The object_remove Users returns false before you create Users table. The object_remove Users returns true after you create Users table. You can't remove a broken object by default: Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] thread_limit 1 # [[0, 1337566253.89858, 0.000355720520019531], 1] database_unmap # [[0, 1337566253.89858, 0.000355720520019531], true] echo "BROKEN" > ${DB_PATH}.0000100 object_remove Users # [ # [ # -22, # 1337566253.89858, # 0.000355720520019531, # "[object][remove] failed to open the target object: <Users>", # [ # [ # "command_object_remove", # "proc_object.c", # 116 # ] # ] # ], # false # ] object_exist Users # [[0, 1337566253.89858, 0.000355720520019531], true] You can remove a broken object by --force yes: Execution example: object_remove Users --force yes # [ # [ # -65, # 1337566253.89858, # 0.000355720520019531, # "[io][open] file size is too small: <7>(required: >= 64): </tmp/groonga-databases/commands_object_remove.0000100>", # [ # [ # "grn_io_open", # "io.c", # 565 # ] # ] # ], # false # ] object_exist Users # [[0, 1337566253.89858, 0.000355720520019531], false] --force yes means you enable "force mode". You can remove a broken object in "force mode". Parameters This section describes all parameters. Required parameters There is only one required parameter. name Specifies the object name to be removed. If you want to remove a column, use TABLE_NAME.COLUMN_NAME format like the following: Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] object_remove Logs.timestamp # [[0, 1337566253.89858, 0.000355720520019531], true] Logs is table name and timestamp is column name in Logs.timestamp. Optional parameters There is one optional parameter. force Specifies whether removing the object in "force mode". You can't remove a broken object by default. But you can remove a broken object in "force mode". force value must be yes or no. yes means that "force mode" is enabled. no means that "force mode" is disabled. The default value is no. It means that "force mode" is disabled by default. Return value The command returns true as body when the command removed the specified object without any error. For example: [HEADER, true] The command returns false as body when the command gets any errors. For example: [HEADER, false] See /reference/command/output_format for HEADER. Note that false doesn't mean that "the command can't remove the object". If you enable "force mode", the command removes the object even if the object is broken. In the case, the object is removed and false is returned as body. plugin_register New in version 5.0.1. Summary plugin_register command registers a plugin. You need to register a plugin before you use a plugin. You need just one plugin_register command for a plugin in the same database because registered plugin information is written into the database. When you restart your groonga process, groonga process loads all registered plugins without plugin_register command. You can unregister a registered plugin by plugin_unregister. Syntax This command takes only one required parameter: plugin_register name Usage Here is a sample that registers QueryExpanderTSV query expander that is included in ${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so. Execution example: plugin_register query_expanders/tsv # [[0, 1337566253.89858, 0.000355720520019531], true] You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed automatically. You can specify absolute path such as plugin_register /usr/lib/groonga/plugins/query_expanders/tsv.so. Return value plugin_register returns true as body on success such as: [HEADER, true] If plugin_register fails, error details are in HEADER. See /reference/command/output_format for HEADER. See also • plugin_unregister plugin_unregister NOTE: This command is an experimental feature. New in version 5.0.1. Summary plugin_unregister command unregisters a plugin. Syntax This command takes only one required parameter: plugin_unregister name Usage Here is a sample that unregisters QueryExpanderTSV query expander that is included in ${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so. Execution example: plugin_unregister query_expanders/tsv # [[0, 1337566253.89858, 0.000355720520019531], true] You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed automatically. You can specify absolute path such as plugin_unregister /usr/lib/groonga/plugins/query_expanders/tsv.so. Return value plugin_unregister returns true as body on success such as: [HEADER, true] If plugin_unregister fails, error details are in HEADER. See /reference/command/output_format for HEADER. See also • plugin_register quit Summary quit - セッション終了 Groonga組込コマンドの一つであるquitについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 quitは、groongaプロセスとのセッションを終了します。クライアントプロセスならばgroongaプロセスとの接続を切ります。 Syntax quit Usage quit Parameters ありません。 Return value ありません。 range_filter Summary TODO: write me Syntax Usage Return value See also • /reference/commands/select register Deprecated since version 5.0.1: Use plugin_register instead. Summary register command registers a plugin. You need to register a plugin before you use a plugin. You need just one register command for a plugin in the same database because registered plugin information is written into the database. When you restart your groonga process, groonga process loads all registered plugins without register command. NOTE: Registered plugins can be removed since Groonga 5.0.1. Use plugin_unregister in such a case. Syntax This command takes only one required parameter: register path Usage Here is a sample that registers QueryExpanderTSV query expander that is included in ${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so. Execution example: register query_expanders/tsv # [[0, 1337566253.89858, 0.000355720520019531], true] You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed automatically. You can specify absolute path such as register /usr/lib/groonga/plugins/query_expanders/tsv.so. Return value register returns true as body on success such as: [HEADER, true] If register fails, error details are in HEADER. See /reference/command/output_format for HEADER. See also • plugin_register • plugin_unregister reindex Summary New in version 5.1.0. reindex command recreates one or more index columns. If you specify a database as target object, all index columns are recreated. If you specify a table as target object, all index columns in the table are recreated. If you specify a data column as target object, all index columns for the data column are recreated. If you specify an index column as target object, the index column is recreated. This command is useful when your index column is broken. The target object is one of database, table and column. NOTE: You can't use target index columns while reindex command is running. If you use the same database from multiple processes, all processes except running reindex should reopen the database. You can use database_unmap for reopening database. Syntax This command takes only one optional parameter: reindex [target_name=null] If target_name parameters is omitted, database is used for the target object. It means that all index columns in the database are recreated. Usage Here is an example to recreate all index columns in the database: Execution example: reindex # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to recreate all index columns (Lexicon.entry_key and Lexicon.entry_body) in Lexicon table: Execution example: table_create Entry TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entry body COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Lexicon TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon entry_key COLUMN_INDEX|WITH_POSITION \ Entry _key # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon entry_body COLUMN_INDEX|WITH_POSITION \ Entry body # [[0, 1337566253.89858, 0.000355720520019531], true] reindex Lexicon # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to recreate all index columns (BigramLexicon.site_title and RegexpLexicon.site_title) of Site.title data column: Execution example: table_create Site TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Site title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create BigramLexicon TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create BigramLexicon site_title COLUMN_INDEX|WITH_POSITION \ Site title # [[0, 1337566253.89858, 0.000355720520019531], true] table_create RegexpLexicon TABLE_PAT_KEY ShortText \ --default_tokenizer TokenRegexp \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create RegexpLexicon site_title COLUMN_INDEX|WITH_POSITION \ Site title # [[0, 1337566253.89858, 0.000355720520019531], true] reindex Site.title # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example to recreate an index column (Timestamp.index): Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs timestamp COLUMN_SCALAR Time # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Timestamp TABLE_PAT_KEY Time # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Timestamp logs_timestamp COLUMN_INDEX Logs timestamp # [[0, 1337566253.89858, 0.000355720520019531], true] reindex Timestamp.logs_timestamp # [[0, 1337566253.89858, 0.000355720520019531], true] Parameters This section describes all parameters. target_name Specifies the name of table or column. If you don't specify it, database is used for the target object. The default is none. It means that the target object is database. Return value reindex command returns whether recreation is succeeded or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. request_cancel Summary NOTE: This command is an experimental feature. New in version 4.0.9. request_cancel command cancels a running request. There are some limitations: • Request ID must be managed by user. (You need to assign unique key for each request.) • Cancel request may be ignored. (You can send request_cancel command multiple times for the same request ID.) • Only multithreading type Groonga server is supported. (You can use with /reference/executables/groonga based server but can't use with /reference/executables/groonga-httpd.) See /reference/command/request_id about request ID. If request is canceled, the canceled request has -5 (GRN_INTERRUPTED_FUNCTION_CALL) as /reference/command/return_code. Syntax This command takes only one required parameter: request_cancel id Usage Here is an example of request_cancel command: $ curl 'http://localhost:10041/d/select?table=LargeTable&filter=true&request_id=unique-id-1' & # The above "select" takes a long time... # Point: "request_id=unique-id-1" $ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1' [[...], {"id": "unique-id-1", "canceled": true}] # Point: "id=unique-id-1" Assume that the first select command takes a long time. unique-id-1 request ID is assigned to the select command by request_id=unique-id-1 parameter. The second request_cancel command passes id=unique-id-1 parameter. unique-id-1 is the same request ID passed in select command. The select command may not be canceled immediately. And the cancel request may be ignored. You can send cancel request for the same request ID multiple times. If the target request is canceled or finished, "canceled" value is changed to false from true in return value: $ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1' [[...], {"id": "unique-id-1", "canceled": true}] # "select" is still running... ("canceled" is "true") $ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1' [[...], {"id": "unique-id-1", "canceled": true}] # "select" is still running... ("canceled" is "true") $ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1' [[...], {"id": "unique-id-1", "canceled": false}] # "select" is canceled or finished. ("canceled" is "false") If the select command is canceled, response of the select command has -5 (GRN_INTERRUPTED_FUNCTION_CALL) as /reference/command/return_code: $ curl 'http://localhost:10041/d/select?table=LargeTable&filter=true&request_id=unique-id-1' & [[-5, ...], ...] Parameters This section describes parameters of request_cancel. Required parameters There is required parameter, id. id Specifies the ID for the target request. Return value request_cancel command returns the result of the cancel request: [ HEADER, { "id": ID, "canceled": CANCEL_REQUEST_IS_ACCEPTED_OR_NOT } ] HEADER See /reference/command/output_format about HEADER. ID The ID of the target request. CANCEL_REQUEST_IS_ACCEPTED_OR_NOT If the cancel request is accepted, this is true, otherwise this is false. Note that "cancel request is accepted" doesn't means that "the target request is canceled". It just means "cancel request is notified to the target request but the cancel request may be ignored by the target request". If request assigned with the request ID doesn't exist, this is false. See also • /reference/command/request_id ruby_eval Summary ruby_eval command evaluates Ruby script and returns the result. Syntax This command takes only one required parameter: ruby_eval script Usage You can execute any scripts which mruby supports by calling ruby_eval. Here is an example that just calculate 1 + 2 as Ruby script. Execution example: register ruby/eval # [[0, 1337566253.89858, 0.000355720520019531], true] ruby_eval "1 + 2" # [[0, 1337566253.89858, 0.000355720520019531], {"value": 3}] Register ruby/eval plugin to use ruby_eval command in advance. Note that ruby_eval is implemented as an experimental plugin, and the specification may be changed in the future. Parameters This section describes all parameters. script Specifies the Ruby script which you want to evaluate. Return value ruby_eval returns the evaluated result with metadata such as exception information (Including metadata isn't implemented yet): [HEADER, {"value": EVALUATED_VALUE}] HEADER See /reference/command/output_format about HEADER. EVALUATED_VALUE EVALUATED_VALUE is the evaludated value of ruby_script. ruby_eval supports only a number for evaluated value for now. Supported types will be increased in the future. See also ruby_load Summary ruby_load command loads specified Ruby script. Syntax This command takes only one required parameter: ruby_load path Usage You can load any script file which mruby supports by calling ruby_load. Here is an example that just load expression.rb as Ruby script. Execution example: register ruby/load # [[0, 1337566253.89858, 0.000355720520019531], true] ruby_load "expression.rb" # [[0, 1337566253.89858, 0.000355720520019531], {"value": null}] Register ruby/load plugin to use ruby_load command in advance. Note that ruby_load is implemented as an experimental plugin, and the specification may be changed in the future. Parameters This section describes all parameters. path Specifies the Ruby script path which you want to load. Return value ruby_load returns the loaded result with metadata such as exception information (Including metadata isn't implemented yet): [HEADER, {"value": LOADED_VALUE}] HEADER See /reference/command/output_format about HEADER. LOADED_VALUE LOADED_VALUE is the loaded value of ruby script. ruby_load just return null as LOADED_VALUE for now, it will be supported in the future. See also /reference/commands/ruby_eval schema Summary New in version 5.0.9. schema command returns schema in the database. This command is useful when you want to inspect the database. For example, visualizing the database, creating GUI for the database and so on. Syntax This command takes no parameters: schema Usage Here is an example schema to show example output: Execution example: table_create Memos TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms memos_content_index \ COLUMN_INDEX|WITH_POSITION \ Memos content # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an output of schema command against this example schema: Execution example: schema # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "tables": { # "Terms": { # "normalizer": { # "name": "NormalizerAuto" # }, # "name": "Terms", # "tokenizer": { # "name": "TokenBigram" # }, # "command": { # "command_line": "table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto", # "name": "table_create", # "arguments": { # "key_type": "ShortText", # "default_tokenizer": "TokenBigram", # "normalizer": "NormalizerAuto", # "flags": "TABLE_PAT_KEY", # "name": "Terms" # } # }, # "indexes": [], # "key_type": { # "type": "type", # "name": "ShortText" # }, # "value_type": null, # "token_filters": [], # "type": "patricia trie", # "columns": { # "memos_content_index": { # "name": "memos_content_index", # "weight": false, # "section": false, # "compress": null, # "command": { # "command_line": "column_create --table Terms --name memos_content_index --flags COLUMN_INDEX|WITH_POSITION --type Memos --sources content", # "name": "column_create", # "arguments": { # "table": "Terms", # "flags": "COLUMN_INDEX|WITH_POSITION", # "name": "memos_content_index", # "sources": "content", # "type": "Memos" # } # }, # "indexes": [], # "sources": [ # { # "table": "Memos", # "name": "content", # "full_name": "Memos.content" # } # ], # "value_type": { # "type": "reference", # "name": "Memos" # }, # "full_name": "Terms.memos_content_index", # "position": true, # "table": "Terms", # "type": "index" # } # } # }, # "Memos": { # "normalizer": null, # "name": "Memos", # "tokenizer": null, # "command": { # "command_line": "table_create --name Memos --flags TABLE_HASH_KEY --key_type ShortText", # "name": "table_create", # "arguments": { # "key_type": "ShortText", # "flags": "TABLE_HASH_KEY", # "name": "Memos" # } # }, # "indexes": [], # "key_type": { # "type": "type", # "name": "ShortText" # }, # "value_type": null, # "token_filters": [], # "type": "hash table", # "columns": { # "content": { # "name": "content", # "weight": false, # "section": false, # "compress": null, # "command": { # "command_line": "column_create --table Memos --name content --flags COLUMN_SCALAR --type Text", # "name": "column_create", # "arguments": { # "table": "Memos", # "flags": "COLUMN_SCALAR", # "name": "content", # "type": "Text" # } # }, # "indexes": [ # { # "table": "Terms", # "section": 0, # "name": "memos_content_index", # "full_name": "Terms.memos_content_index" # } # ], # "sources": [], # "value_type": { # "type": "type", # "name": "Text" # }, # "full_name": "Memos.content", # "position": false, # "table": "Memos", # "type": "scalar" # } # } # } # }, # "normalizers": { # "NormalizerNFKC51": { # "name": "NormalizerNFKC51" # }, # "NormalizerAuto": { # "name": "NormalizerAuto" # } # }, # "token_filters": {}, # "tokenizers": { # "TokenBigramSplitSymbolAlphaDigit": { # "name": "TokenBigramSplitSymbolAlphaDigit" # }, # "TokenRegexp": { # "name": "TokenRegexp" # }, # "TokenBigramIgnoreBlankSplitSymbolAlphaDigit": { # "name": "TokenBigramIgnoreBlankSplitSymbolAlphaDigit" # }, # "TokenBigram": { # "name": "TokenBigram" # }, # "TokenDelimit": { # "name": "TokenDelimit" # }, # "TokenUnigram": { # "name": "TokenUnigram" # }, # "TokenBigramSplitSymbol": { # "name": "TokenBigramSplitSymbol" # }, # "TokenDelimitNull": { # "name": "TokenDelimitNull" # }, # "TokenBigramIgnoreBlankSplitSymbolAlpha": { # "name": "TokenBigramIgnoreBlankSplitSymbolAlpha" # }, # "TokenBigramSplitSymbolAlpha": { # "name": "TokenBigramSplitSymbolAlpha" # }, # "TokenTrigram": { # "name": "TokenTrigram" # }, # "TokenMecab": { # "name": "TokenMecab" # }, # "TokenBigramIgnoreBlankSplitSymbol": { # "name": "TokenBigramIgnoreBlankSplitSymbol" # }, # "TokenBigramIgnoreBlank": { # "name": "TokenBigramIgnoreBlank" # } # }, # "plugins": {}, # "types": { # "UInt64": { # "can_be_key_type": true, # "name": "UInt64", # "can_be_value_type": true, # "size": 8 # }, # "Int32": { # "can_be_key_type": true, # "name": "Int32", # "can_be_value_type": true, # "size": 4 # }, # "Int16": { # "can_be_key_type": true, # "name": "Int16", # "can_be_value_type": true, # "size": 2 # }, # "LongText": { # "can_be_key_type": false, # "name": "LongText", # "can_be_value_type": false, # "size": 2147483648 # }, # "TokyoGeoPoint": { # "can_be_key_type": true, # "name": "TokyoGeoPoint", # "can_be_value_type": true, # "size": 8 # }, # "Text": { # "can_be_key_type": false, # "name": "Text", # "can_be_value_type": false, # "size": 65536 # }, # "ShortText": { # "can_be_key_type": true, # "name": "ShortText", # "can_be_value_type": false, # "size": 4096 # }, # "Float": { # "can_be_key_type": true, # "name": "Float", # "can_be_value_type": true, # "size": 8 # }, # "UInt8": { # "can_be_key_type": true, # "name": "UInt8", # "can_be_value_type": true, # "size": 1 # }, # "UInt32": { # "can_be_key_type": true, # "name": "UInt32", # "can_be_value_type": true, # "size": 4 # }, # "Object": { # "can_be_key_type": true, # "name": "Object", # "can_be_value_type": true, # "size": 8 # }, # "UInt16": { # "can_be_key_type": true, # "name": "UInt16", # "can_be_value_type": true, # "size": 2 # }, # "Int64": { # "can_be_key_type": true, # "name": "Int64", # "can_be_value_type": true, # "size": 8 # }, # "Time": { # "can_be_key_type": true, # "name": "Time", # "can_be_value_type": true, # "size": 8 # }, # "Bool": { # "can_be_key_type": true, # "name": "Bool", # "can_be_value_type": true, # "size": 1 # }, # "WGS84GeoPoint": { # "can_be_key_type": true, # "name": "WGS84GeoPoint", # "can_be_value_type": true, # "size": 8 # }, # "Int8": { # "can_be_key_type": true, # "name": "Int8", # "can_be_value_type": true, # "size": 1 # } # } # } # ] Parameters This section describes all parameters. Required parameters There is no required parameter. Optional parameters There is no optional parameter. Return value schema command returns schema in the database: [HEADER, SCHEMA] HEADER See /reference/command/output_format about HEADER. SCHEMA SCHEMA is an object that consists of the following information: { "plugins": PLUGINS, "types": TYPES, "tokenizers": TOKENIZERS, "normalizers": NORMALIZERS, "token_filters": TOKEN_FITLERS, "tables": TABLES } PLUGINS PLUGINS is an object. Its key is plugin name and its value is plugin detail: { "PLUGIN_NAME_1": PLUGIN_1, "PLUGIN_NAME_2": PLUGIN_2, ... "PLUGIN_NAME_n": PLUGIN_n } PLUGIN PLUGIN is an object that describes plugin detail: { "name": PLUGIN_NAME } Here are properties of PLUGIN: ┌─────┬──────────────────────────────────┐ │Name │ Description │ ├─────┼──────────────────────────────────┤ │name │ The plugin name. It's used in │ │ │ plugin_register. │ └─────┴──────────────────────────────────┘ TYPES TYPES is an object. Its key is type name and its value is type detail: { "TYPE_NAME_1": TYPE_1, "TYPE_NAME_2": TYPE_2, ... "TYPE_NAME_n": TYPE_n } TYPE TYPE is an object that describes type detail: { "name": TYPE_NAME, "size": SIZE_OF_ONE_VALUE_IN_BYTE, "can_be_key_type": BOOLEAN, "can_be_value_type": BOOLEAN } Here are properties of TYPE: ┌──────────────────┬──────────────────────────────────┐ │Name │ Description │ ├──────────────────┼──────────────────────────────────┤ │name │ The type name. │ ├──────────────────┼──────────────────────────────────┤ │size │ The number of bytes of one │ │ │ value. │ ├──────────────────┼──────────────────────────────────┤ │can_be_key_type │ true when the type can be used │ │ │ for table key, false otherwise. │ ├──────────────────┼──────────────────────────────────┤ │can_be_value_type │ true when the type can be used │ │ │ for table value, false │ │ │ otherwise. │ └──────────────────┴──────────────────────────────────┘ TOKENIZERS TOKENIZERS is an object. Its key is tokenizer name and its value is tokenizer detail: { "TOKENIZER_NAME_1": TOKENIZER_1, "TOKENIZER_NAME_2": TOKENIZER_2, ... "TOKENIZER_NAME_n": TOKENIZER_n } TOKENIZER TOKENIZER is an object that describes tokenizer detail: { "name": TOKENIZER_NAME } Here are properties of TOKENIZER: ┌─────┬──────────────────────────────────┐ │Name │ Description │ ├─────┼──────────────────────────────────┤ │name │ The tokenizer name. It's used │ │ │ for │ │ │ table-create-default-tokenizer. │ └─────┴──────────────────────────────────┘ NORMALIZERS NORMALIZERS is an object. Its key is normalizer name and its value is normalizer detail: { "NORMALIZER_NAME_1": NORMALIZER_1, "NORMALIZER_NAME_2": NORMALIZER_2, ... "NORMALIZER_NAME_n": NORMALIZER_n } NORMALIZER NORMALIZER is an object that describes normalizer detail: { "name": NORMALIZER_NAME } Here are properties of NORMALIZER: ┌─────┬──────────────────────────────────┐ │Name │ Description │ └─────┴──────────────────────────────────┘ │name │ The normalizer name. It's used │ │ │ for table-create-normalizer. │ └─────┴──────────────────────────────────┘ TOKEN_FILTERS TOKEN_FILTERS is an object. Its key is token filter name and its value is token filter detail: { "TOKEN_FILTER_NAME_1": TOKEN_FILTER_1, "TOKEN_FILTER_NAME_2": TOKEN_FILTER_2, ... "TOKEN_FILTER_NAME_n": TOKEN_FILTER_n } TOKEN_FILTER TOKEN_FILTER is an object that describes token filter detail: { "name": TOKEN_FILTER_NAME } Here are properties of TOKEN_FILTER: ┌─────┬──────────────────────────────────┐ │Name │ Description │ ├─────┼──────────────────────────────────┤ │name │ The token filter name. It's used │ │ │ for table-create-token-filters. │ └─────┴──────────────────────────────────┘ TABLES TABLES is an object. Its key is table name and its value is table detail: { "TABLE_NAME_1": TABLE_1, "TABLE_NAME_2": TABLE_2, ... "TABLE_NAME_n": TABLE_n } TABLE TABLE is an object that describes table detail: { "name": TABLE_NAME "type": TYPE, "key_type": KEY_TYPE, "value_type": VALUE_TYPE, "tokenizer": TOKENIZER, "normalizer": NORMALIZER, "token_filters": [ TOKEN_FILTER_1, TOKEN_FILTER_2, ..., TOKEN_FILTER_n, ], "indexes": [ INDEX_1, INDEX_2, ..., INDEX_n ], "command": COMMAND, "columns": { "COLUMN_NAME_1": COLUMN_1, "COLUMN_NAME_2": COLUMN_2, ..., "COLUMN_NAME_3": COLUMN_3, } } Here are properties of TABLE: ┌──────────────┬──────────────────────────────────┐ │Name │ Description │ ├──────────────┼──────────────────────────────────┤ │name │ The table name. │ ├──────────────┼──────────────────────────────────┤ │type │ The table type. │ │ │ │ │ │ This is one of the followings: │ │ │ │ │ │ • array: table-no-key │ │ │ │ │ │ • hash: table-hash-key │ │ │ │ │ │ • patricia trie: │ │ │ table-pat-key │ │ │ │ │ │ • double array trie: │ │ │ table-dat-key │ └──────────────┴──────────────────────────────────┘ │key_type │ The type of the table's key. │ │ │ │ │ │ If the table type is array, this │ │ │ is null. │ │ │ │ │ │ If the table type isn't array, │ │ │ this is an object that has the │ │ │ following properties: │ │ │ │ │ │ • name: The type name. │ │ │ │ │ │ • type: reference if │ │ │ the type is an │ │ │ table, type │ │ │ otherwise. │ ├──────────────┼──────────────────────────────────┤ │value_type │ The type of the table's value. │ │ │ │ │ │ If the table doesn't use value, │ │ │ this is null. │ │ │ │ │ │ If the table uses value, this is │ │ │ an object that has the following │ │ │ properties: │ │ │ │ │ │ • name: The type name. │ │ │ │ │ │ • type: reference if │ │ │ the type is an │ │ │ table, type │ │ │ otherwise. │ ├──────────────┼──────────────────────────────────┤ │tokenizer │ The tokenizer of the table. It's │ │ │ specified by │ │ │ table-create-default-tokenizer. │ │ │ │ │ │ If the table doesn't use │ │ │ tokenizer, this is null. │ │ │ │ │ │ If the table uses tokenizer, │ │ │ this is an object that has the │ │ │ following properties: │ │ │ │ │ │ • name: The tokenizer │ │ │ name. │ ├──────────────┼──────────────────────────────────┤ │normalizer │ The normalizer of the table. │ │ │ It's specified by │ │ │ table-create-normalizer. │ │ │ │ │ │ If the table doesn't use │ │ │ normalizer, this is null. │ │ │ │ │ │ If the table uses normalizer, │ │ │ this is an object that has the │ │ │ following properties: │ │ │ │ │ │ • name: The normalizer │ │ │ name. │ ├──────────────┼──────────────────────────────────┤ │token_filters │ The token filters of the table. │ │ │ It's specified by │ │ │ table-create-token-filters. │ │ │ │ │ │ This is an array of an object. │ │ │ The object has the following │ │ │ properties: │ │ │ │ │ │ • name: The token │ │ │ filter name. │ ├──────────────┼──────────────────────────────────┤ │indexes │ The indexes of the table's key. │ │ │ │ │ │ This is an array of INDEX. │ ├──────────────┼──────────────────────────────────┤ │command │ The Groonga command information │ │ │ to create the table. │ │ │ │ │ │ This is COMMAND. │ ├──────────────┼──────────────────────────────────┤ │columns │ The columns of the table. │ │ │ │ │ │ This is an object that its key │ │ │ is a column name and its value │ │ │ is COLUMN. │ └──────────────┴──────────────────────────────────┘ INDEX INDEX is an object that describes index detail: { "full_name": INDEX_COLUMN_NAME_WITH_TABLE_NAME, "table": TABLE_NAME, "name": INDEX_COLUMN_NAME, "section": SECTION } Here are properties of INDEX: ┌──────────┬──────────────────────────────────┐ │Name │ Description │ ├──────────┼──────────────────────────────────┤ │full_name │ The index column name with table │ │ │ name. │ │ │ │ │ │ For example, Terms.index. │ ├──────────┼──────────────────────────────────┤ │table │ The table name of the index │ │ │ column. │ │ │ │ │ │ For example, Terms. │ ├──────────┼──────────────────────────────────┤ │name │ The index column name. │ │ │ │ │ │ For example, index. │ ├──────────┼──────────────────────────────────┤ │section │ The section number in the index │ │ │ column for the table's key. │ │ │ │ │ │ If the index column isn't │ │ │ multiple column index, this is │ │ │ 0. │ └──────────┴──────────────────────────────────┘ COMMAND COMMAND is an object that describes how to create the table or column: { "name": COMMAND_NAME, "arguments": { "KEY_1": "VALUE_1", "KEY_2": "VALUE_2", ..., "KEY_n": "VALUE_n" }, "command_line": COMMAND_LINE } Here are properties of COMMAND: ┌─────────────┬──────────────────────────────────┐ │Name │ Description │ ├─────────────┼──────────────────────────────────┤ │name │ The Groonga command name to │ │ │ create the table or column. │ ├─────────────┼──────────────────────────────────┤ │arguments │ The arguments of the Groonga │ │ │ command to create the table or │ │ │ column. │ │ │ │ │ │ This is an object that its key │ │ │ is argument name and its value │ │ │ is argument value. │ ├─────────────┼──────────────────────────────────┤ │command_line │ The Groonga command line to │ │ │ create the table or column. │ │ │ │ │ │ This is a string that can be │ │ │ evaluated by Groonga. │ └─────────────┴──────────────────────────────────┘ COLUMN COLUMN is an object that describes column detail: { "name": COLUMN_NAME, "table": TABLE_NAME, "full_name": COLUMN_NAME_WITH_TABLE, "type": TYPE, "value_type": VALUE_TYPE, "compress": COMPRESS, "section": SECTION, "weight": WEIGHT, "compress": COMPRESS, "section": BOOLEAN, "weight": BOOLEAN, "position": BOOLEAN, "sources": [ SOURCE_1, SOURCE_2, ..., SOURCE_n ], "indexes": [ INDEX_1, INDEX_2, ..., INDEX_n ], "command": COMMAND } Here are properties of COLUMN: ┌───────────┬───────────────────────────────────────┐ │Name │ Description │ └───────────┴───────────────────────────────────────┘ │name │ The column name. │ │ │ │ │ │ For example, age. │ ├───────────┼───────────────────────────────────────┤ │table │ The table name of the column. │ │ │ │ │ │ For example, Users. │ ├───────────┼───────────────────────────────────────┤ │full_name │ The column name with table name. │ │ │ │ │ │ For example, Users.age. │ ├───────────┼───────────────────────────────────────┤ │type │ The column type. │ │ │ │ │ │ This is one of the followings: │ │ │ │ │ │ • scalar: │ │ │ /reference/columns/scalar │ │ │ │ │ │ • vector: │ │ │ /reference/columns/vector │ │ │ │ │ │ • index: │ │ │ /reference/columns/index │ ├───────────┼───────────────────────────────────────┤ │value_type │ The type of the column's value. │ │ │ │ │ │ This is an object that has the │ │ │ following properties: │ │ │ │ │ │ • name: The type name. │ │ │ │ │ │ • type: reference if the │ │ │ type is an table, type │ │ │ otherwise. │ ├───────────┼───────────────────────────────────────┤ │compress │ The compression method of the column. │ │ │ │ │ │ If the column doesn't use any │ │ │ compression methods, this is null. │ │ │ │ │ │ If the column uses a compression │ │ │ method, this is one of the │ │ │ followings: │ │ │ │ │ │ • zlib: The column uses │ │ │ zlib to compress column │ │ │ value. │ │ │ │ │ │ • lz4: The column uses LZ4 │ │ │ to compress column value. │ ├───────────┼───────────────────────────────────────┤ │section │ Whether the column can store section │ │ │ information or not. │ │ │ │ │ │ true if the column is created with │ │ │ WITH_SECTION flag, false otherwise. │ │ │ │ │ │ Normally, if the column isn't an │ │ │ index column, this is false. │ ├───────────┼───────────────────────────────────────┤ │weight │ Whether the column can store weight │ │ │ information or not. │ │ │ │ │ │ true if the column is created with │ │ │ WITH_WEIGHT flag, false otherwise. │ ├───────────┼───────────────────────────────────────┤ │position │ Whether the column can store position │ │ │ information or not. │ │ │ │ │ │ true if the column is created with │ │ │ WITH_POSITION flag, false otherwise. │ │ │ │ │ │ Normally, if the column isn't an │ │ │ index column, this is false. │ ├───────────┼───────────────────────────────────────┤ │sources │ The source columns of the index │ │ │ column. │ │ │ │ │ │ This is an array of SOURCE. │ │ │ │ │ │ Normally, if the column isn't an │ │ │ index column, this is an empty array. │ ├───────────┼───────────────────────────────────────┤ │indexes │ The indexes of the column. │ │ │ │ │ │ This is an array of INDEX. │ ├───────────┼───────────────────────────────────────┤ │command │ The Groonga command information to │ │ │ create the column. │ │ │ │ │ │ This is COMMAND. │ └───────────┴───────────────────────────────────────┘ SOURCE SOURCE is an object that describes source detail: { "name": COLUMN_NAME, "table": TABLE_NAME, "full_name": COLUMN_NAME_WITH_TABLE_NAME } Here are properties of SOURCE: ┌──────────┬──────────────────────────────────┐ │Name │ Description │ ├──────────┼──────────────────────────────────┤ │name │ The source column name. │ │ │ │ │ │ For example, content. │ │ │ │ │ │ This may be a _key pseudo │ │ │ column. │ ├──────────┼──────────────────────────────────┤ │table │ The table name of the source │ │ │ column. │ │ │ │ │ │ For example, Memos. │ ├──────────┼──────────────────────────────────┤ │full_name │ The source column name with │ │ │ table name. │ │ │ │ │ │ For example, Memos.content. │ └──────────┴──────────────────────────────────┘ See also • table_create • column_create select Summary select searches records that are matched to specified conditions from a table and then outputs them. select is the most important command in groonga. You need to understand select to use the full power of Groonga. Syntax This command takes many parameters. The required parameter is only table. Other parameters are optional: select table [match_columns=null] [query=null] [filter=null] [scorer=null] [sortby=null] [output_columns="_id, _key, *"] [offset=0] [limit=10] [drilldown=null] [drilldown_sortby=null] [drilldown_output_columns="_key, _nsubrecs"] [drilldown_offset=0] [drilldown_limit=10] [cache=yes] [match_escalation_threshold=0] [query_expansion=null] [query_flags=ALLOW_PRAGMA|ALLOW_COLUMN|ALLOW_UPDATE|ALLOW_LEADING_NOT|NONE] [query_expander=null] [adjuster=null] [drilldown_calc_types=NONE] [drilldown_calc_target=null] select has the following named parameters for advanced drilldown: • drilldown[${LABEL}].keys=null • drilldown[${LABEL}].sortby=null • drilldown[${LABEL}].output_columns="_key, _nsubrecs" • drilldown[${LABEL}].offset=0 • drilldown[${LABEL}].limit=10 • drilldown[${LABEL}].calc_types=NONE • drilldown[${LABEL}].calc_target=null You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1 is a valid ${LABEL}. Parameters that have the same ${LABEL} are grouped. For example, the following parameters specify one drilldown: • --drilldown[label].keys column • --drilldown[label].sortby -_nsubrecs The following parameters specify two drilldowns: • --drilldown[label1].keys column1 • --drilldown[label1].sortby -_nsubrecs • --drilldown[label2].keys column2 • --drilldown[label2].sortby _key Usage Let's learn about select usage with examples. This section shows many popular usages. Here are a schema definition and sample data to show usage. Execution example: table_create Entries TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries n_likes COLUMN_SCALAR UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries tag COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"_key": "The first post!", "content": "Welcome! This is my first post!", "n_likes": 5, "tag": "Hello"}, {"_key": "Groonga", "content": "I started to use Groonga. It's very fast!", "n_likes": 10, "tag": "Groonga"}, {"_key": "Mroonga", "content": "I also started to use Mroonga. It's also very fast! Really fast!", "n_likes": 15, "tag": "Groonga"}, {"_key": "Good-bye Senna", "content": "I migrated all Senna system!", "n_likes": 3, "tag": "Senna"}, {"_key": "Good-bye Tritonn", "content": "I also migrated all Tritonn system!", "n_likes": 3, "tag": "Senna"} ] # [[0, 1337566253.89858, 0.000355720520019531], 5] There is a table, Entries, for blog entries. An entry has title, content, the number of likes for the entry and tag. Title is key of Entries. Content is value of Entries.content column. The number of likes is value of Entries.n_likes column. Tag is value of Entries.tag column. Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So both Entries._key and Entries.content are fulltext search ready. OK. The schema and data for examples are ready. Simple usage Here is the most simple usage with the above schema and data. It outputs all records in Entries table. Execution example: select Entries # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] Why does the command output all records? There are two reasons. The first reason is that the command doesn't specify any search conditions. No search condition means all records are matched. The second reason is that the number of all records is 5. select command outputs 10 records at a maximum by default. There are only 5 records. It is less than 10. So the command outputs all records. Search conditions Search conditions are specified by query or filter. You can also specify both query and filter. It means that selected records must be matched against both query and filter. Search condition: query query is designed for search box in Web page. Imagine a search box in google.com. You specify search conditions for query as space separated keywords. For example, search engine means a matched record should contain two words, search and engine. Normally, query parameter is used for specifying fulltext search conditions. It can be used for non fulltext search conditions but filter is used for the propose. query parameter is used with match_columns parameter when query parameter is used for specifying fulltext search conditions. match_columns specifies which columnes and indexes are matched against query. Here is a simple query usage example. Execution example: select Entries --match_columns content --query fast # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] The select command searches records that contain a word fast in content column value from Entries table. query has query syntax but its deatils aren't described here. See /reference/grn_expr/query_syntax for datails. Search condition: filter filter is designed for complex search conditions. You specify search conditions for filter as ECMAScript like syntax. Here is a simple filter usage example. Execution example: select Entries --filter 'content @ "fast" && _key == "Groonga"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ] # ] # ] # ] The select command searches records that contain a word fast in content column value and has Groonga as _key from Entries table. There are three operators in the command, @, && and ==. @ is fulltext search operator. && and == are the same as ECMAScript. && is logical AND operator and == is equality operator. filter has more operators and syntax like grouping by (...) its details aren't described here. See /reference/grn_expr/script_syntax for datails. Paging You can specify range of outputted records by offset and limit. Here is an example to output only the 2nd record. Execution example: select Entries --offset 1 --limit 1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ] # ] # ] # ] offset is zero-based. --offset 1 means output range is started from the 2nd record. limit specifies the max number of output records. --limit 1 means the number of output records is 1 at a maximium. If no records are matched, select command outputs no records. The total number of records You can use --limit 0 to retrieve the total number of recrods without any contents of records. Execution example: select Entries --limit 0 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ] # ] # ] --limit 0 is also useful for retrieving only the number of matched records. Drilldown You can get additional grouped results against the search result in one select. You need to use two or more SELECT``s in SQL but ``select in Groonga can do it in one select. This feature is called as drilldown in Groonga. It's also called as faceted search in other search engine. For example, think about the following situation. You search entries that has fast word: Execution example: select Entries --filter 'content @ "fast"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] You want to use tag for additional search condition like --filter 'content @ "fast" && tag == "???". But you don't know suitable tag until you see the result of content @ "fast". If you know the number of matched records of each available tag, you can choose suitable tag. You can use drilldown for the case: Execution example: select Entries --filter 'content @ "fast"' --drilldown tag # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ], # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ] # ] # ] # ] --drilldown tag returns a list of pair of available tag and the number of matched records. You can avoid "no hit search" case by choosing a tag from the list. You can also avoid "too many search results" case by choosing a tag that the number of matched records is few from the list. You can create the following UI with the drilldown results: • Links to narrow search results. (Users don't need to input a search query by their keyboard. They just click a link.) Most EC sites use the UI. See side menu at Amazon. Groonga supports not only counting grouped records but also finding the maximum and/or minimum value from grouped records, summing values in grouped records and so on. See Drilldown related parameters for details. Parameters This section describes all parameters. Parameters are categorized. Required parameters There is a required parameter, table. table Specifies a table to be searched. table must be specified. If nonexistent table is specified, an error is returned. Execution example: select Nonexistent # [ # [ # -22, # 1337566253.89858, # 0.000355720520019531, # "invalid table name: <Nonexistent>", # [ # [ # "grn_select", # "proc.c", # 1217 # ] # ] # ] # ] Search related parameters There are search related parameters. Typically, match_columns and query parameters are used for implementing a search box. filter parameters is used for implementing complex search feature. If both query and filter are specified, selected records must be matched against both query and filter. If both query and filter aren't specified, all records are selected. match_columns Specifies the default target column for fulltext search by query parameter value. A target column for fulltext search can be specified in query parameter. The difference between match_columns and query is whether weight and score function are supported or not. match_columns supports them but query doesn't. Weight is relative importance of target column. A higher weight target column gets more hit score rather than a lower weight target column when a record is matched by fulltext search. The default weight is 1. Here is a simple match_columns usage example. Execution example: select Entries --match_columns content --query fast --output_columns '_key, _score' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga", # 1 # ], # [ # "Mroonga", # 2 # ] # ] # ] # ] --match_columns content means the default target column for fulltext search is content column and its weight is 1. --output_columns '_key, _score' means that the select command outputs _key value and _score value for matched records. Pay attention to _score value. _score value is the number of matched counts against query parameter value. In the example, query parameter value is fast. The fact that _score value is 1 means that fast appers in content column only once. The fact that _score value is 2 means that fast appears in content column twice. To specify weight, column * weight syntax is used. Here is a weight usage example. Execution example: select Entries --match_columns 'content * 2' --query fast --output_columns '_key, _score' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Mroonga", # 4 # ] # ] # ] # ] --match_columns 'content * 2' means the default target column for fulltext search is content column and its weight is 2. Pay attention to _score value. _score value is doubled because weight is 2. You can specify one or more columns as the default target columns for fulltext search. If one or more columns are specified, fulltext search is done for all columns and scores are accumulated. If one of the columns is matched against query parameter value, the record is treated as matched. To specify one or more columns, column1 * weight1 || column2 * weight2 || ... syntax is used. * weight can be omitted. If it is omitted, 1 is used for weight. Here is a one or more columns usage example. Execution example: select Entries --match_columns '_key * 10 || content' --query groonga --output_columns '_key, _score' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga", # 11 # ] # ] # ] # ] --match_columns '_key * 10 || content' means the default target columns for fulltext search are _key and content columns and _key column's weight is 10 and content column's weight is 1. This weight allocation means _key column value is more important rather than content column value. In this example, title of blog entry is more important rather thatn content of blog entry. You can also specify score function. See /reference/scorer for details. Note that score function isn't related to scorer parameter. query Specifies the query text. Normally, it is used for fulltext search with match_columns parameter. query parameter is designed for a fulltext search form in a Web page. A query text should be formatted in /reference/grn_expr/query_syntax. The syntax is similar to common search form like Google's search form. For example, word1 word2 means that groonga searches records that contain both word1 and word2. word1 OR word2 means that groogna searches records that contain either word1 or word2. Here is a simple logical and search example. Execution example: select Entries --match_columns content --query "fast groonga" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ] # ] # ] # ] The select command searches records that contain two words fast and groonga in content column value from Entries table. Here is a simple logical or search example. Execution example: select Entries --match_columns content --query "groonga OR mroonga" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] The select command searches records that contain one of two words groonga or mroonga in content column value from Entries table. See /reference/grn_expr/query_syntax for other syntax. It can be used for not only fulltext search but also other conditions. For example, column:value means the value of column column is equal to value. column:<value means the value of column column is less than value. Here is a simple equality operator search example. Execution example: select Entries --query _key:Groonga # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ] # ] # ] # ] The select command searches records that _key column value is Groonga from Entries table. Here is a simple less than operator search example. Execution example: select Entries --query n_likes:<11 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The select command searches records that n_likes column value is less than 11 from Entries table. See /reference/grn_expr/query_syntax for other operations. filter Specifies the filter text. Normally, it is used for complex search conditions. filter can be used with query parameter. If both filter and query are specified, there are conbined with logical and. It means that matched records should be matched against both filter and query. filter parameter is designed for complex conditions. A filter text should be formatted in /reference/grn_expr/script_syntax. The syntax is similar to ECMAScript. For example, column == "value" means that the value of column column is equal to "value". column < value means that the value of column column is less than value. Here is a simple equality operator search example. Execution example: select Entries --filter '_key == "Groonga"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ] # ] # ] # ] The select command searches records that _key column value is Groonga from Entries table. Here is a simple less than operator search example. Execution example: select Entries --filter 'n_likes < 11' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The select command searches records that n_likes column value is less than 11 from Entries table. See /reference/grn_expr/script_syntax for other operators. Advanced search parameters match_escalation_threshold Specifies threshold to determine whether search storategy escalation is used or not. The threshold is compared against the number of matched records. If the number of matched records is equal to or less than the threshold, the search storategy escalation is used. See /spec/search about the search storategy escalation. The default threshold is 0. It means that search storategy escalation is used only when no records are matched. The default threshold can be customized by one of the followings. • --with-match-escalation-threshold option of configure • --match-escalation-threshold option of groogna command • match-escalation-threshold configuration item in configuration file Here is a simple match_escalation_threshold usage example. The first select doesn't have match_escalation_threshold parameter. The second select has match_escalation_threshold parameter. Execution example: select Entries --match_columns content --query groo # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ] # ] # ] # ] select Entries --match_columns content --query groo --match_escalation_threshold -1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 0 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ] # ] # ] The first select command searches records that contain a word groo in content column value from Entries table. But no records are matched because the TokenBigram tokenizer tokenizes groonga to groonga not gr|ro|oo|on|ng|ga. (The TokenBigramSplitSymbolAlpha tokenizer tokenizes groonga to gr|ro|oo|on|ng|ga. See /reference/tokenizers for details.) It means that groonga is indexed but groo isn't indexed. So no records are matched against groo by exact match. In the case, the search storategy escalation is used because the number of matched records (0) is equal to match_escalation_threshold (0). One record is matched against groo by unsplit search. The second select command also searches records that contain a word groo in content column value from Entries table. And it also doesn't found matched records. In this case, the search storategy escalation is not used because the number of matched records (0) is larger than match_escalation_threshold (-1). So no more searches aren't executed. And no records are matched. query_expansion Deprecated. Use query_expander instead. query_flags It customs query parameter syntax. You cannot update column value by query parameter by default. But if you specify ALLOW_COLUMN|ALLOW_UPDATE as query_flags, you can update column value by query. Here are available values: • ALLOW_PRAGMA • ALLOW_COLUMN • ALLOW_UPDATE • ALLOW_LEADING_NOT • NONE ALLOW_PRAGMA enables pragma at the head of query. This is not implemented yet. ALLOW_COLUMN enables search againt columns that are not included in match_columns. To specify column, there are COLUMN:... syntaxes. ALLOW_UPDATE enables column update by query with COLUMN:=NEW_VALUE syntax. ALLOW_COLUMN is also required to update column because the column update syntax specifies column. ALLOW_LEADING_NOT enables leading NOT condition with -WORD syntax. The query searches records that doesn't match WORD. Leading NOT condition query is heavy query in many cases because it matches many records. So this flag is disabled by default. Be careful about it when you use the flag. NONE is just ignores. You can use NONE for specifying no flags. They can be combined by separated | such as ALLOW_COLUMN|ALLOW_UPDATE. The default value is ALLOW_PRAGMA|ALLOW_COLUMN. Here is a usage example of ALLOW_COLUMN. Execution example: select Entries --query content:@mroonga --query_flags ALLOW_COLUMN # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] The select command searches records that contain mroonga in content column value from Entries table. Here is a usage example of ALLOW_UPDATE. Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users age COLUMN_SCALAR UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "alice", "age": 18}, {"_key": "bob", "age": 20} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "age", # "UInt32" # ] # ], # [ # 1, # "alice", # 19 # ], # [ # 2, # "bob", # 19 # ] # ] # ] # ] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "age", # "UInt32" # ] # ], # [ # 1, # "alice", # 19 # ], # [ # 2, # "bob", # 19 # ] # ] # ] # ] The first select command sets age column value of all records to 19. The second select command outputs updated age column values. Here is a usage example of ALLOW_LEADING_NOT. Execution example: select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The select command searches records that don't contain mroonga in content column value from Entries table. Here is a usage example of NONE. Execution example: select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] The select command searches records that contain one of two words mroonga or _key:Groonga in content from Entries table. Note that _key:Groonga doesn't mean that the value of _key column is equal to Groonga. Because ALLOW_COLUMN flag is not specified. See also /reference/grn_expr/query_syntax. query_expander It's for query expansion. Query expansion substitutes specific words to another words in query. Nomally, it's used for synonym search. It specifies a column that is used to substitute query parameter value. The format of this parameter value is "${TABLE}.${COLUMN}". For example, "Terms.synonym" specifies synonym column in Terms table. Table for query expansion is called "substitution table". Substitution table's key must be ShortText. So array table (TABLE_NO_KEY) can't be used for query expansion. Because array table doesn't have key. Column for query expansion is called "substitution column". Substitution column's value type must be ShortText. Column type must be vector (COLUMN_VECTOR). Query expansion substitutes key of substitution table in query with values in substitution column. If a word in query is a key of substitution table, the word is substituted with substitution column value that is associated with the key. Substition isn't performed recursively. It means that substitution target words in substituted query aren't substituted. Here is a sample substitution table to show a simple query_expander usage example. Execution example: table_create Thesaurus TABLE_PAT_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Thesaurus synonym COLUMN_VECTOR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Thesaurus [ {"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]}, {"_key": "groonga", "synonym": ["groonga", "senna"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] Thesaurus substitution table has two synonyms, "mroonga" and "groonga". If an user searches with "mroonga", Groonga searches with "((mroonga) OR (tritonn) OR (groonga mysql))". If an user searches with "groonga", Groonga searches with "((groonga) OR (senna))". Normally, it's good idea that substitution table uses a normalizer. For example, if normalizer is used, substitute target word is matched in case insensitive manner. See /reference/normalizers for available normalizers. Note that those synonym values include the key value such as "mroonga" and "groonga". It's recommended that you include the key value. If you don't include key value, substituted value doesn't include the original substitute target value. Normally, including the original value is better search result. If you have a word that you don't want to be searched, you should not include the original word. For example, you can implement "stop words" by an empty vector value. Here is a simple query_expander usage example. Execution example: select Entries --match_columns content --query "mroonga" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The first select command doesn't use query expansion. So a record that has "tritonn" isn't found. The second select command uses query expansion. So a record that has "tritonn" is found. The third select command doesn't use query expansion but it is same as the second select command. The third one uses expanded query. Each substitute value can contain any /reference/grn_expr/query_syntax syntax such as (...) and OR. You can use complex substitution by using those syntax. Here is a complex substitution usage example that uses query syntax. Execution example: load --table Thesaurus [ {"_key": "popular", "synonym": ["popular", "n_likes:>=10"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] The load command registers a new synonym "popular". It is substituted with ((popular) OR (n_likes:>=10)). The substituted query means that "popular" is containing the word "popular" or 10 or more liked entries. The select command outputs records that n_likes column value is equal to or more than 10 from Entries table. Output related parameters output_columns Specifies output columns separated by ,. Here is a simple output_columns usage example. Execution example: select Entries --output_columns '_id, _key' --limit 1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 1, # "The first post!" # ] # ] # ] # ] The select command just outputs _id and _key column values. * is a special value. It means that all columns that are not /reference/columns/pseudo. Here is a * usage example. Execution example: select Entries --output_columns '_key, *' --limit 1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ] # ] # ] # ] The select command outputs _key pseudo column, content column and n_likes column values but doesn't output _id pseudo column value. The default value is _id, _key, *. It means that all column values except _score are outputted. sortby Specifies sort keys separated by ,. Each sort key is column name. Here is a simple sortby usage example. Execution example: select Entries --sortby 'n_likes, _id' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ] # ] # ] # ] The select command sorts by n_likes column value in ascending order. For records that has the same n_likes are sorted by _id in ascending order. "Good-bye Senna" and "Good-bye Tritonn" are the case. If you want to sort in descending order, add - before column name. Here is a descending order sortby usage example. Execution example: select Entries --sortby '-n_likes, _id' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The select command sorts by n_likes column value in descending order. But ascending order is used for sorting by _id. You can use _score pseudo column in sortby if you use query or filter parameter. Execution example: select Entries --match_columns content --query fast --sortby -_score --output_columns '_key, _score' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Mroonga", # 2 # ], # [ # "Groonga", # 1 # ] # ] # ] # ] The select command sorts matched records by hit score in descending order and outputs record key and hit score. If you use _score without query nor filter parameters, it's just ignored but get a warning in log file. offset Specifies offset to determine output records range. Offset is zero-based. --offset 1 means output range is started from the 2nd record. Execution example: select Entries --sortby _id --offset 3 --output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "Good-bye Senna" # ], # [ # "Good-bye Tritonn" # ] # ] # ] # ] The select command outputs from the 4th record. You can specify negative value. It means that the number of matched records + offset. If you have 3 matched records and specify --offset -2, you get records from the 2nd (3 + -2 = 1. 1 means 2nd. Remember that offset is zero-based.) record to the 3rd record. Execution example: select Entries --sortby _id --offset -2 --output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "Good-bye Senna" # ], # [ # "Good-bye Tritonn" # ] # ] # ] # ] The select command outputs from the 4th record because the total number of records is 5. The default value is 0. limit Specifies the max number of output records. If the number of matched records is less than limit, all records are outputted. Here is a simple limit usage example. Execution example: select Entries --sortby _id --offset 2 --limit 3 --output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "Mroonga" # ], # [ # "Good-bye Senna" # ], # [ # "Good-bye Tritonn" # ] # ] # ] # ] The select command outputs the 3rd, the 4th and the 5th records. You can specify negative value. It means that the number of matched records + limit + 1. For example, --limit -1 outputs all records. It's very useful value to show all records. Here is a simple negative limit value usage example. Execution example: select Entries --limit -1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The select command outputs all records. The default value is 10. scorer TODO: write in English and add example. 検索条件にマッチする全てのレコードに対して適用するgrn_exprをscript形式で指定します。 scorerは、検索処理が完了し、ソート処理が実行される前に呼び出されます。従って、各レコードのスコアを操作する式を指定しておけば、検索結果のソート順序をカスタマイズできるようになります。 Drilldown related parameters This section describes basic drilldown related parameters. Advanced drilldown related parameters are described in another section. drilldown Specifies keys for grouping separated by ,. Matched records by specified search conditions are grouped by each key. If you specify no search condition, all records are grouped by each key. Here is a simple drilldown example: Execution example: select Entries \ --output_columns _key,tag \ --drilldown tag # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tag", # "ShortText" # ] # ], # [ # "The first post!", # "Hello" # ], # [ # "Groonga", # "Groonga" # ], # [ # "Mroonga", # "Groonga" # ], # [ # "Good-bye Senna", # "Senna" # ], # [ # "Good-bye Tritonn", # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ] # ] # ] # ] The select command outputs the following information: • There is one record that has "Hello" tag. • There is two records that has "Groonga" tag. • There is two records that has "Senna" tag. Here is a drilldown with search condition example: Execution example: select Entries \ --output_columns _key,tag \ --filter 'n_likes >= 5' \ --drilldown tag # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tag", # "ShortText" # ] # ], # [ # "The first post!", # "Hello" # ], # [ # "Groonga", # "Groonga" # ], # [ # "Mroonga", # "Groonga" # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ] # ] # ] # ] The select command outputs the following information: • In records that have 5 or larger as n_likes value: • There is one record that has "Hello" tag. • There is two records that has "Groonga" tag. Here is a drilldown with multiple group keys example: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown tag,n_likes # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ] # ], # [ # [ # 4 # ], # [ # [ # "_key", # "UInt32" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # 5, # 1 # ], # [ # 10, # 1 # ], # [ # 15, # 1 # ], # [ # 3, # 2 # ] # ] # ] # ] The select command outputs the following information: • About tag: • There is one record that has "Hello" tag. • There is two records that has "Groonga" tag. • There is two records that has "Senna" tag. • About n_likes: • There is one record that has "Hello" tag. • There is two records that has "Groonga" tag. • There is two records that has "Senna" tag. drilldown_sortby Specifies sort keys for drilldown outputs separated by ,. Each sort key is column name. You can refer the number of grouped records by _nsubrecs /reference/columns/pseudo. Here is a simple drilldown_sortby example: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown 'tag, n_likes' \ --drilldown_sortby '-_nsubrecs, _key' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ], # [ # "Hello", # 1 # ] # ], # [ # [ # 4 # ], # [ # [ # "_key", # "UInt32" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # 3, # 2 # ], # [ # 5, # 1 # ], # [ # 10, # 1 # ], # [ # 15, # 1 # ] # ] # ] # ] Drilldown result is sorted by the number of grouped records (= _nsubrecs ) in descending order. If there are grouped results that the number of records in the group are the same, these grouped results are sorted by grouped key (= _key ) in ascending order. The sort keys are used in all group keys specified in drilldown: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown 'tag, n_likes' \ --drilldown_sortby '-_nsubrecs, _key' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Senna", # 2 # ], # [ # "Hello", # 1 # ] # ], # [ # [ # 4 # ], # [ # [ # "_key", # "UInt32" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # 3, # 2 # ], # [ # 5, # 1 # ], # [ # 10, # 1 # ], # [ # 15, # 1 # ] # ] # ] # ] The same sort keys are used in tag drilldown and n_likes drilldown. If you want to use different sort keys for each drilldown, use Advanced drilldown related parameters. drilldown_output_columns Specifies output columns for drilldown separated by ,. Here is a drilldown_output_columns example: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "Hello" # ], # [ # "Groonga" # ], # [ # "Senna" # ] # ] # ] # ] The select command just outputs grouped key. If grouped key is a referenced type column (= column that its type is a table), you can access column of the table referenced by the referenced type column. Here are a schema definition and sample data to show drilldown against referenced type column: Execution example: table_create Tags TABLE_HASH_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Tags label COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Tags priority COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Items TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Items tag COLUMN_SCALAR Tags # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Tags [ {"_key": "groonga", label: "Groonga", priority: 10}, {"_key": "mroonga", label: "Mroonga", priority: 5} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] load --table Items [ {"_key": "A", "tag": "groonga"}, {"_key": "B", "tag": "groonga"}, {"_key": "C", "tag": "mroonga"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] Tags table is a referenced table. Items.tag is a referenced type column. You can refer Tags.label by label in drilldown_output_columns: Execution example: select Items \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_output_columns '_key, label' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "tag", # "Tags" # ] # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "label", # "ShortText" # ] # ], # [ # "groonga", # "Groonga" # ], # [ # "mroonga", # "Mroonga" # ] # ] # ] # ] You can use * to refer all columns in referenced table (= Tags): Execution example: select Items \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_output_columns '_key, *' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "tag", # "Tags" # ] # ] # ], # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "label", # "ShortText" # ], # [ # "priority", # "Int32" # ] # ], # [ # "groonga", # "Groonga", # 10 # ], # [ # "mroonga", # "Mroonga", # 5 # ] # ] # ] # ] * is expanded to label, priority. The default value of drilldown_output_columns is _key, _nsubrecs. It means that grouped key and the number of records in the group are output. You can use more /reference/columns/pseudo in drilldown_output_columns such as _max, _min, _sum and _avg when you use drilldown_calc_types. See drilldown_calc_types document for details. drilldown_offset Specifies offset to determine range of drilldown output records. Offset is zero-based. --drilldown_offset 1 means output range is started from the 2nd record. Here is a drilldown_offset example: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_sortby _key \ --drilldown_offset 1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Senna", # 2 # ] # ] # ] # ] The select command outputs from the 2nd record. You can specify negative value. It means that the number of grouped results + offset. If you have 3 grouped results and specify --drilldown_offset -2, you get grouped results from the 2st (3 + -2 = 1. 1 means 2nd. Remember that offset is zero-based.) grouped result to the 3rd grouped result. Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_sortby _key \ --drilldown_offset -2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Senna", # 2 # ] # ] # ] # ] The select command outputs from the 2nd grouped result because the total number of grouped results is 3. The default value of drilldown_offset is 0. drilldown_limit Specifies the max number of groups in a drilldown. If the number of groups is less than drilldown_limit, all groups are outputted. Here is a drilldown_limit example: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_sortby _key \ --drilldown_offset 1 \ --drilldown_limit 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 1 # ], # [ # "Senna", # 2 # ] # ] # ] # ] The select command outputs the 2rd and the 3rd groups. You can specify negative value. It means that the number of groups + drilldown_limit + 1. For example, --drilldown_limit -1 outputs all groups. It's very useful value to show all groups. Here is a negative drilldown_limit value example. Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown tag \ --drilldown_sortby _key \ --drilldown_limit -1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Groonga", # 2 # ], # [ # "Hello", # 1 # ], # [ # "Senna", # 2 # ] # ] # ] # ] The select command outputs all groups. The default value of drilldown_limit is 10. drilldown_calc_types Specifies how to calculate (aggregate) values in grouped records by a drilldown. You can specify multiple calculation types separated by ",". For example, MAX,MIN. Calculation target values are read from a column of grouped records. The column is specified by drilldown_calc_target. You can read calculated value by /reference/columns/pseudo such as _max and _min in drilldown_output_columns. You can use the following calculation types: ┌──────────┬───────────────────────────┬───────────────────────┬─────────────────────┐ │Type name │ /reference/columns/pseudo │ Need │ Description │ │ │ name │ drilldown_calc_target │ │ ├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤ │NONE │ Nothing. │ Not needs. │ Just ignored. │ ├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤ │COUNT │ _nsubrecs │ Not needs. │ Counting grouped │ │ │ │ │ records. It's │ │ │ │ │ always enabled. So │ │ │ │ │ you don't need to │ │ │ │ │ specify it. │ ├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤ │MAX │ _max │ Needs. │ Finding the maximum │ │ │ │ │ integer value from │ │ │ │ │ integer values in │ │ │ │ │ grouped records. │ ├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤ │MIN │ _min │ Needs. │ Finding the minimum │ │ │ │ │ integer value from │ │ │ │ │ integer values in │ │ │ │ │ grouped records. │ ├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤ │SUM │ _sum │ Needs. │ Summing integer │ │ │ │ │ values in grouped │ │ │ │ │ records. │ ├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤ │AVG │ _avg │ Needs. │ Averaging │ │ │ │ │ integer/float │ │ │ │ │ values in grouped │ │ │ │ │ records. │ └──────────┴───────────────────────────┴───────────────────────┴─────────────────────┘ Here is a MAX example: Execution example: select Entries \ --limit -1 \ --output_column _id,n_likes \ --drilldown tag \ --drilldown_calc_types MAX \ --drilldown_calc_target n_likes \ --drilldown_output_columns _key,_max # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_max", # "Int64" # ] # ], # [ # "Hello", # 5 # ], # [ # "Groonga", # 15 # ], # [ # "Senna", # 3 # ] # ] # ] # ] The select command groups all records by tag column value, finding the maximum n_likes column value for each group and outputs pairs of grouped key and the maximum n_likes column value for the group. It uses _max /reference/columns/pseudo to read the maximum n_likes column value. Here is a MIN example: Execution example: select Entries \ --limit -1 \ --output_column _id,n_likes \ --drilldown tag \ --drilldown_calc_types MIN \ --drilldown_calc_target n_likes \ --drilldown_output_columns _key,_min # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_min", # "Int64" # ] # ], # [ # "Hello", # 5 # ], # [ # "Groonga", # 10 # ], # [ # "Senna", # 3 # ] # ] # ] # ] The select command groups all records by tag column value, finding the minimum n_likes column value for each group and outputs pairs of grouped key and the minimum n_likes column value for the group. It uses _min /reference/columns/pseudo to read the minimum n_likes column value. Here is a SUM example: Execution example: select Entries \ --limit -1 \ --output_column _id,n_likes \ --drilldown tag \ --drilldown_calc_types SUM \ --drilldown_calc_target n_likes \ --drilldown_output_columns _key,_sum # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_sum", # "Int64" # ] # ], # [ # "Hello", # 5 # ], # [ # "Groonga", # 25 # ], # [ # "Senna", # 6 # ] # ] # ] # ] The select command groups all records by tag column value, sums all n_likes column values for each group and outputs pairs of grouped key and the summed n_likes column values for the group. It uses _sum /reference/columns/pseudo to read the summed n_likes column values. Here is a AVG example: Execution example: select Entries \ --limit -1 \ --output_column _id,n_likes \ --drilldown tag \ --drilldown_calc_types AVG \ --drilldown_calc_target n_likes \ --drilldown_output_columns _key,_avg # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_avg", # "Float" # ] # ], # [ # "Hello", # 5.0 # ], # [ # "Groonga", # 12.5 # ], # [ # "Senna", # 3.0 # ] # ] # ] # ] The select command groups all records by tag column value, averages all n_likes column values for each group and outputs pairs of grouped key and the averaged n_likes column values for the group. It uses _avg /reference/columns/pseudo to read the averaged n_likes column values. Here is an example that uses all calculation types: Execution example: select Entries \ --limit -1 \ --output_column _id,n_likes \ --drilldown tag \ --drilldown_calc_types MAX,MIN,SUM,AVG \ --drilldown_calc_target n_likes \ --drilldown_output_columns _key,_nsubrecs,_max,_min,_sum,_avg # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ], # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_nsubrecs", # "Int32" # ], # [ # "_max", # "Int64" # ], # [ # "_min", # "Int64" # ], # [ # "_sum", # "Int64" # ], # [ # "_avg", # "Float" # ] # ], # [ # "Hello", # 1, # 5, # 5, # 5, # 5.0 # ], # [ # "Groonga", # 2, # 15, # 10, # 25, # 12.5 # ], # [ # "Senna", # 2, # 3, # 3, # 6, # 3.0 # ] # ] # ] # ] The select command specifies multiple calculation types separated by "," like MAX,MIN,SUM,AVG. You can use _nsubrecs /reference/columns/pseudo in drilldown_output_columns without specifying COUNT in drilldown_calc_types. Because COUNT is always enabled. The default value of drilldown_calc_types is NONE. It means that only COUNT is enabled. Because NONE is just ignored and COUNT is always enabled. drilldown_calc_target Specifies the target column for drilldown_calc_types. If you specify a calculation type that needs a target column such as MAX in drilldown_calc_types but you omit drilldown_calc_target, the calculation result is always 0. You can specify only one column name like --drilldown_calc_target n_likes. You can't specify multiple column name like --drilldown_calc_target _key,n_likes. You can use referenced value from the target record by combining "." like --drilldown_calc_target reference_column.nested_reference_column.value. See drilldown_calc_types to know how to use drilldown_calc_target. The default value of drilldown_calc_target is null. It means that no calculation target column is specified. Advanced drilldown related parameters You can get multiple drilldown results by specifying multiple group keys by drilldown. But you need to use the same configuration for all drilldowns. For example, drilldown_output_columns is used by all drilldowns. You can use a configuration for each drilldown by the following parameters: • drilldown[${LABEL}].keys • drilldown[${LABEL}].sortby • drilldown[${LABEL}].output_columns • drilldown[${LABEL}].offset • drilldown[${LABEL}].limit • drilldown[${LABEL}].calc_types • drilldown[${LABEL}].calc_target ${LABEL} is a variable. You can use the following characters for ${LABEL}: • Alphabets • Digits • . • _ NOTE: You can use more characters but it's better that you use only these characters. Parameters that has the same ${LABEL} value are grouped. Grouped parameters are used for one drilldown. For example, there are 2 groups for the following parameters: • --drilldown[label1].keys _key • --drilldown[label1].output_columns _nsubrecs • --drilldown[label2].keys tag • --drilldown[label2].output_columns _key,_nsubrecs drilldown[label1].keys and drilldown[label1].output_columns are grouped. drilldown[label2].keys and drilldown[label2].output_columns are also grouped. In label1 group, _key is used for group key and _nsubrecs is used for output columns. In label2 group, tag is used for group key and _key,_nsubrecs is used for output columns. See document for corresponding drilldown_XXX parameter to know how to use it for the following parameters: • drilldown[${LABEL}].sortby: drilldown_sortby • drilldown[${LABEL}].offset: drilldown_offset • drilldown[${LABEL}].limit: drilldown_limit • drilldown[${LABEL}].calc_types: drilldown_calc_types • drilldown[${LABEL}].calc_target: drilldown_calc_target The following parameters are needed more description: • drilldown[${LABEL}].keys • drilldown[${LABEL}].output_columns Output format is different a bit. It's also needed more description. drilldown[${LABEL}].keys drilldown can specify multiple keys for multiple drilldowns. But it can't specify multiple keys for one drilldown. drilldown[${LABEL}].keys can't specify multiple keys for multiple drilldowns. But it can specify multiple keys for one drilldown. You can specify multiple keys separated by ",". Here is an example to group by multiple keys, tag and n_likes column values: Execution example: select Entries \ --limit -1 \ --output_column tag,n_likes \ --drilldown[tag.n_likes].keys tag,n_likes \ --drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ], # { # "tag.n_likes": [ # [ # 4 # ], # [ # [ # "tag", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_nsubrecs", # "Int32" # ] # ], # [ # "Hello", # 5, # 1 # ], # [ # "Groonga", # 10, # 1 # ], # [ # "Groonga", # 15, # 1 # ], # [ # "Senna", # 3, # 2 # ] # ] # } # ] # ] tag.n_likes is used as the label for the drilldown parameters group. You can refer grouped keys by _value.${KEY_NAME} syntax in drilldown[${LABEL}].output_columns. ${KEY_NAME} is a column name to be used by group key. tag and n_likes are ${KEY_NAME} in this case. Note that you can't use _value.${KEY_NAME} syntax when you just specify one key as drilldown[${LABEL}].keys like --drilldown[tag].keys tag. You should use _key for the case. It's the same rule in drilldown_output_columns. drilldown[${LABEL}].output_columns It's almost same as drilldown_output_columns. The difference between drilldown_output_columns and drilldown[${LABEL}].output_columns is how to refer group keys. drilldown_output_columns uses _key /reference/columns/pseudo to refer group key. drilldown[${LABEL}].output_columns also uses _key /reference/columns/pseudo to refer group key when you specify only one group key by drilldown[${LABEL}].keys. Here is an example to refer single group key by _key /reference/columns/pseudo: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown[tag.n_likes].keys tag,n_likes \ --drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # { # "tag.n_likes": [ # [ # 4 # ], # [ # [ # "tag", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # "Hello", # 5 # ], # [ # "Groonga", # 10 # ], # [ # "Groonga", # 15 # ], # [ # "Senna", # 3 # ] # ] # } # ] # ] But you can't refer each group key by _key /reference/columns/pseudo in drilldown[${LABEL}].output_columns. You need to use _value.${KEY_NAME} syntax. ${KEY_NAME} is a column name that is used for group key in drilldown[${LABEL}].keys. Here is an example to refer each group key in multiple group keys by _value.${KEY_NAME} syntax: Execution example: select Entries \ --limit 0 \ --output_column _id \ --drilldown[tag.n_likes].keys tag,n_likes \ --drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ] # ], # { # "tag.n_likes": [ # [ # 4 # ], # [ # [ # "tag", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # "Hello", # 5 # ], # [ # "Groonga", # 10 # ], # [ # "Groonga", # 15 # ], # [ # "Senna", # 3 # ] # ] # } # ] # ] TIP: Why _value.${KEY_NAME} syntax? It's implementation specific information. _key is a vector value. The vector value is consists of all group keys. You can see byte sequence of the vector value by referring _key in drilldown[${LABEL}].output_columns. There is one grouped record in _value to refer each grouped values when you specify multiple group keys to drilldown[${LABEL}].keys. So you can refer each group key by _value.${KEY_NAME} syntax. On the other hand, there is no grouped record in _value when you specify only one group key to drilldown[${LABEL}].keys. So you can't refer group key by _value.${KEY_NAME} syntax. Output format for drilldown[${LABEL}] style There is a difference in output format between drilldown and drilldown[${LABEL}].keys. drilldown uses array to output multiple drilldown results. drilldown[${LABEL}].keys uses pairs of label and drilldown result. drilldown uses the following output format: [ HEADER, [ SEARCH_RESULT, DRILLDOWN_RESULT1, DRILLDOWN_RESULT2, ... ] ] drilldown[${LABEL}].keys uses the following output format: [ HEADER, [ SEARCH_RESULT, { "LABEL1": DRILLDOWN_RESULT1, "LABEL2": DRILLDOWN_RESULT2, ... } ] ] Cache related parameter cache Specifies whether caching the result of this query or not. If the result of this query is cached, the next same query returns response quickly by using the cache. It doesn't control whether existing cached result is used or not. Here are available values: ┌──────┬──────────────────────────────────┐ │Value │ Description │ ├──────┼──────────────────────────────────┤ │no │ Don't cache the output of this │ │ │ query. │ ├──────┼──────────────────────────────────┤ │yes │ Cache the output of this query. │ │ │ It's the default value. │ └──────┴──────────────────────────────────┘ Here is an example to disable caching the result of this query: Execution example: select Entries --cache no # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ], # [ # "tag", # "ShortText" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5, # "Hello" # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10, # "Groonga" # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15, # "Groonga" # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3, # "Senna" # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3, # "Senna" # ] # ] # ] # ] The default value is yes. Score related parameters There is a score related parameter, adjuster. adjuster Specifies one or more score adjust expressions. You need to use adjuster with query or filter. adjuster doesn't work with not searched request. You can increase score of specific records by adjuster. You can use adjuster to set high score for important records. For example, you can use adjuster to increase score of records that have groonga tag. Here is the syntax: --adjuster "SCORE_ADJUST_EXPRESSION1 + SCORE_ADJUST_EXPRESSION2 + ..." Here is the SCORE_ADJUST_EXPRESSION syntax: COLUMN @ "KEYWORD" * FACTOR Note the following: • COLUMN must be indexed. • "KEYWORD" must be a string. • FACTOR must be a positive integer. Here is a sample adjuster usage example that uses just one SCORE_ADJUST_EXPRESSION: Execution example: select Entries \ --filter true \ --adjuster 'content @ "groonga" * 5' \ --output_columns _key,content,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "The first post!", # "Welcome! This is my first post!", # 1 # ], # [ # "Groonga", # "I started to use Groonga. It's very fast!", # 6 # ], # [ # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1 # ], # [ # "Good-bye Senna", # "I migrated all Senna system!", # 1 # ], # [ # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1 # ] # ] # ] # ] The select command matches all records. Then it applies adjuster. The adjuster increases score of records that have "groonga" in Entries.content column by 5. There is only one record that has "groonga" in Entries.content column. So the record that its key is "Groonga" has score 6 (= 1 + 5). You can omit FACTOR. If you omit FACTOR, it is treated as 1. Here is a sample adjuster usage example that omits FACTOR: Execution example: select Entries \ --filter true \ --adjuster 'content @ "groonga"' \ --output_columns _key,content,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "The first post!", # "Welcome! This is my first post!", # 1 # ], # [ # "Groonga", # "I started to use Groonga. It's very fast!", # 2 # ], # [ # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 1 # ], # [ # "Good-bye Senna", # "I migrated all Senna system!", # 1 # ], # [ # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1 # ] # ] # ] # ] The adjuster in the select command doesn't have FACTOR. So the factor is treated as 1. There is only one record that has "groonga" in Entries.content column. So the record that its key is "Groonga" has score 2 (= 1 + 1). Here is a sample adjuster usage example that uses multiple SCORE_ADJUST_EXPRESSION: Execution example: select Entries \ --filter true \ --adjuster 'content @ "groonga" * 5 + content @ "started" * 3' \ --output_columns _key,content,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "The first post!", # "Welcome! This is my first post!", # 1 # ], # [ # "Groonga", # "I started to use Groonga. It's very fast!", # 9 # ], # [ # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 4 # ], # [ # "Good-bye Senna", # "I migrated all Senna system!", # 1 # ], # [ # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 1 # ] # ] # ] # ] The adjuster in the select command has two SCORE_ADJUST_EXPRESSION s. The final increased score is sum of scores of these SCORE_ADJUST_EXPRESSION s. All SCORE_ADJUST_EXPRESSION s in the select command are applied to a record that its key is "Groonga". So the final increased score of the record is sum of scores of all SCORE_ADJUST_EXPRESSION s. The first SCORE_ADJUST_EXPRESSION is content @ "groonga" * 5. It increases score by 5. The second SCORE_ADJUST_EXPRESSION is content @ "started" * 3. It increases score by 3. The final increased score is 9 (= 1 + 5 + 3). A SCORE_ADJUST_EXPRESSION has a factor for "KEYWORD". This means that increased scores of all records that has "KEYWORD" are the same value. You can change increase score for each record that has the same "KEYWORD". It is useful to tune search score. See weight-vector-column for details. Return value select returns response with the following format: [ HEADER, [ SEARCH_RESULT, DRILLDOWN_RESULT_1, DRILLDOWN_RESULT_2, ..., DRILLDOWN_RESULT_N ] ] If select fails, error details are in HEADER. See /reference/command/output_format for HEADER. There are zero or more DRILLDOWN_RESULT. If no drilldown and drilldown[${LABEL}].keys are specified, they are omitted like the following: [ HEADER, [ SEARCH_RESULT ] ] If drilldown has two or more keys like --drilldown "_key, column1, column2", multiple DRILLDOWN_RESULT exist: [ HEADER, [ SEARCH_RESULT, DRILLDOWN_RESULT_FOR_KEY, DRILLDOWN_RESULT_FOR_COLUMN1, DRILLDOWN_RESULT_FOR_COLUMN2 ] ] If drilldown[${LABEL}].keys is used, only one DRILLDOWN_RESULT exist: [ HEADER, [ SEARCH_RESULT, DRILLDOWN_RESULT_FOR_LABELED_DRILLDOWN ] ] DRILLDOWN_RESULT format is different between drilldown and drilldown[${LABEL}].keys. It's described later. SEARCH_RESULT is the following format: [ [N_HITS], COLUMNS, RECORDS ] See Simple usage for concrete example of the format. N_HITS is the number of matched records before limit is applied. COLUMNS describes about output columns specified by output_columns. It uses the following format: [ [COLUMN_NAME_1, COLUMN_TYPE_1], [COLUMN_NAME_2, COLUMN_TYPE_2], ..., [COLUMN_NAME_N, COLUMN_TYPE_N] ] COLUMNS includes one or more output column information. Each output column information includes the followings: • Column name as string • Column type as string or null Column name is extracted from value specified as output_columns. Column type is Groonga's type name or null. It doesn't describe whether the column value is vector or scalar. You need to determine it by whether real column value is array or not. See /reference/types for type details. null is used when column value type isn't determined. For example, function call in output_columns such as --output_columns "snippet_html(content)" uses null. Here is an example of COLUMNS: [ ["_id", "UInt32"], ["_key", "ShortText"], ["n_likes", "UInt32"], ] RECORDS includes column values for each matched record. Included records are selected by offset and limit. It uses the following format: [ [ RECORD_1_COLUMN_1, RECORD_1_COLUMN_2, ..., RECORD_1_COLUMN_N ], [ RECORD_2_COLUMN_1, RECORD_2_COLUMN_2, ..., RECORD_2_COLUMN_N ], ... [ RECORD_N_COLUMN_1, RECORD_N_COLUMN_2, ..., RECORD_N_COLUMN_N ] ] Here is an example RECORDS: [ [ 1, "The first post!", 5 ], [ 2, "Groonga", 10 ], [ 3, "Mroonga", 15 ] ] DRILLDOWN_RESULT format is different between drilldown and drilldown[${LABEL}].keys. drilldown uses the same format as SEARCH_RESULT: [ [N_HITS], COLUMNS, RECORDS ] And drilldown generates one or more DRILLDOWN_RESULT when drilldown has one ore more keys. drilldown[${LABEL}].keys uses the following format. Multiple drilldown[${LABEL}].keys are mapped to one object (key-value pairs): { "LABEL_1": [ [N_HITS], COLUMNS, RECORDS ], "LABEL_2": [ [N_HITS], COLUMNS, RECORDS ], ..., "LABEL_N": [ [N_HITS], COLUMNS, RECORDS ] } Each drilldown[${LABEL}].keys corresponds to the following: "LABEL": [ [N_HITS], COLUMNS, RECORDS ] The following value part is the same format as SEARCH_RESULT: [ [N_HITS], COLUMNS, RECORDS ] See also Output format for drilldown[${LABEL}] style for drilldown[${LABEL}] style drilldown output format. See also • /reference/grn_expr/query_syntax • /reference/grn_expr/script_syntax shutdown Summary shutdown stops the Groonga server process. shutdown uses graceful shutdown by default. If there are some running commands, the Groonga server process stops after these running commands are finished. New command requests aren't processed after shutdown command is executed. New in version 6.0.1: shutdown uses immediate shutdown by specifying immediate to mode parameter. The Groonga server process stops immediately even when there are some running commands. NOTE: You need to set /reference/command/request_id to all requests to use immediate shutdown. Syntax This command takes only one optional parameter: shutdown [mode=graceful] Usage shutdown use graceful shutdown by default: Execution example: shutdown # [[0, 1337566253.89858, 0.000355720520019531], true] You can specify graceful to mode parameter explicitly: Execution example: shutdown --mode graceful # [[0, 1337566253.89858, 0.000355720520019531], true] You can choose immediate shutdown by specifying immediate to mode parameter: Execution example: shutdown --mode immediate # [[0, 1337566253.89858, 0.000355720520019531], true] Immediate shutdown is useful when you don't have time for graceful shutdown. For example, Windows kills service that takes long time to stop on Windows shutdown. Parameters This section describes parameters of this command. Required parameters There is no required parameter. Optional parameters There are optional parameters. mode Specifies shutdown mode. Here are available shutdown modes: ┌──────────┬──────────────────────────────────┐ │Value │ Description │ ├──────────┼──────────────────────────────────┤ │graceful │ Stops after running commands are │ │ │ finished. │ │ │ │ │ │ This is the default. │ ├──────────┼──────────────────────────────────┤ │immediate │ New in version 6.0.1: Stops │ │ │ immediately even if there are │ │ │ some running commands. │ └──────────┴──────────────────────────────────┘ Return value shutdown returns true as body when shutdown is accepted: [HEADER, true] If shutdown doesn't accept shutdown, error details are in HEADER. See /reference/command/output_format for HEADER. status Summary status returns the current status of the context that processes the request. Context is an unit that processes requests. Normally, context is created for each thread. Syntax This command takes no parameters: status Usage Here is a simple example: Execution example: status # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "uptime": 0, # "max_command_version": 2, # "start_time": 1441980651, # "cache_hit_rate": 0.0, # "version": "5.0.7-126-gb6fd7f7", # "alloc_count": 206, # "command_version": 1, # "starttime": 1441980651, # "default_command_version": 1, # "n_queries": 0 # } # ] It returns the current status of the context that processes the request. See Return value for details. Parameters This section describes all parameters. Required parameters There is no required parameter. Optional parameters There is no optional parameter. Return value The command returns the current status as an object: [ HEADER, { "alloc_count": ALLOC_COUNT, "cache_hit_rate": CACHE_HIT_RATE, "command_version": COMMAND_VERSION, "default_command_version": DEFAULT_COMMAND_VERSION, "max_command_version": MAX_COMMAND_VERSION, "n_queries": N_QUERIES, "start_time": START_TIME, "starttime": STARTTIME, "uptime": UPTIME, "version": VERSION } ] See /reference/command/output_format for HEADER. Here are descriptions about values. See Usage for real values: ┌────────────────────────┬────────────────────────────────────┬────────────┐ │Key │ Description │ Example │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │alloc_count │ The number of allocated │ 1400 │ │ │ memory blocks that │ │ │ │ aren't freed. If this │ │ │ │ value is continuously │ │ │ │ increased, there may be │ │ │ │ a memory leak. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │cache_hit_rate │ Percentage of cache used │ 29.4 │ │ │ responses in the Groonga │ │ │ │ process. If there are 10 │ │ │ │ requests and 7 responses │ │ │ │ are created from cache, │ │ │ │ cache_hit_rate is 70.0. │ │ │ │ The percentage is │ │ │ │ computed from only │ │ │ │ requests that use │ │ │ │ commands that support │ │ │ │ cache. │ │ │ │ │ │ │ │ Here are commands that │ │ │ │ support cache: │ │ │ │ │ │ │ │ • select │ │ │ │ │ │ │ │ • logical_select │ │ │ │ │ │ │ │ • logical_range_filter │ │ │ │ │ │ │ │ • logical_count │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │command_version │ The │ 1 │ │ │ /reference/command/command_version │ │ │ │ that is used by the context. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │default_command_version │ The default │ 1 │ │ │ /reference/command/command_version │ │ │ │ of the Groonga process. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │max_command_version │ The max │ 2 │ │ │ /reference/command/command_version │ │ │ │ of the Groonga process. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │n_queries │ The number of requests processed │ 29 │ │ │ by the Groonga process. It counts │ │ │ │ only requests that use commands │ │ │ │ that support cache. │ │ │ │ │ │ │ │ Here are commands that support │ │ │ │ cache: │ │ │ │ │ │ │ │ • select │ │ │ │ │ │ │ │ • logical_select │ │ │ │ │ │ │ │ • logical_range_filter │ │ │ │ │ │ │ │ • logical_count │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │start_time │ New in version 5.0.8. │ 1441761403 │ │ │ │ │ │ │ │ │ │ │ The time that the Groonga process │ │ │ │ started in UNIX time. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │starttime │ Deprecated since version 5.0.8: │ 1441761403 │ │ │ Use start_time instead. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │uptime │ The elapsed time since the Groonga │ 216639 │ │ │ process started in second. │ │ │ │ │ │ │ │ For example, 216639 means that 2.5 │ │ │ │ (= 216639 / 60 / 60 / 24 = 2.507) │ │ │ │ days. │ │ ├────────────────────────┼────────────────────────────────────┼────────────┤ │version │ The version of the Groonga │ 5.0.7 │ │ │ process. │ │ └────────────────────────┴────────────────────────────────────┴────────────┘ suggest NOTE: The suggest feature specification isn't stable. The specification may be changed. Summary suggest - returns completion, correction and/or suggestion for a query. The suggest command returns completion, correction and/or suggestion for a specified query. See /reference/suggest/introduction about completion, correction and suggestion. Syntax suggest types table column query [sortby [output_columns [offset [limit [frequency_threshold [conditional_probability_threshold [prefix_search]]]]]]] Usage Here are learned data for completion. Execution example: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "1", "time": 1312950803.86057, "item": "e"}, {"sequence": "1", "time": 1312950803.96857, "item": "en"}, {"sequence": "1", "time": 1312950804.26057, "item": "eng"}, {"sequence": "1", "time": 1312950804.56057, "item": "engi"}, {"sequence": "1", "time": 1312950804.76057, "item": "engin"}, {"sequence": "1", "time": 1312950805.86057, "item": "engine", "type": "submit"} ] # [[0, 1337566253.89858, 0.000355720520019531], 6] Here are learned data for correction. Execution example: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "2", "time": 1312950803.86057, "item": "s"}, {"sequence": "2", "time": 1312950803.96857, "item": "sa"}, {"sequence": "2", "time": 1312950804.26057, "item": "sae"}, {"sequence": "2", "time": 1312950804.56057, "item": "saer"}, {"sequence": "2", "time": 1312950804.76057, "item": "saerc"}, {"sequence": "2", "time": 1312950805.76057, "item": "saerch", "type": "submit"}, {"sequence": "2", "time": 1312950809.76057, "item": "serch"}, {"sequence": "2", "time": 1312950810.86057, "item": "search", "type": "submit"} ] # [[0, 1337566253.89858, 0.000355720520019531], 8] Here are learned data for suggestion. Execution example: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "3", "time": 1312950803.86057, "item": "search engine", "type": "submit"}, {"sequence": "3", "time": 1312950808.86057, "item": "web search realtime", "type": "submit"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] Here is a completion example. Execution example: suggest --table item_query --column kana --types complete --frequency_threshold 1 --query en # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "complete": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "engine", # 1 # ] # ] # } # ] Here is a correction example. Execution example: suggest --table item_query --column kana --types correct --frequency_threshold 1 --query saerch # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "correct": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search", # 1 # ] # ] # } # ] Here is a suggestion example. Execution example: suggest --table item_query --column kana --types suggest --frequency_threshold 1 --query search # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "suggest": [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search engine", # 1 # ], # [ # "web search realtime", # 1 # ] # ] # } # ] Here is a mixed example. Execution example: suggest --table item_query --column kana --types complete|correct|suggest --frequency_threshold 1 --query search # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "suggest": [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search engine", # 1 # ], # [ # "web search realtime", # 1 # ] # ], # "complete": [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search", # 2 # ], # [ # "search engine", # 2 # ] # ], # "correct": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search", # 2 # ] # ] # } # ] Parameters types Specifies what types are returned by the suggest command. Here are available types: complete The suggest command does completion. correct The suggest command does correction. suggest The suggest command does suggestion. You can specify one or more types separated by |. Here are examples: It returns correction: correct It returns correction and suggestion: correct|suggest It returns complete, correction and suggestion: complete|correct|suggest table Specifies table name that has item_${DATA_SET_NAME} format. For example, item_query is a table name if you created dataset by the following command: groonga-suggest-create-dataset /tmp/db-path query column Specifies a column name that has furigana in Katakana in table table. query Specifies query for completion, correction and/or suggestion. sortby Specifies sort key. Default: -_score output_columns Specifies output columns. Default: _key,_score offset Specifies returned records offset. Default: 0 limit Specifies number of returned records. Default: 10 frequency_threshold Specifies threshold for item frequency. Returned records must have _score that is greater than or equal to frequency_threshold. Default: 100 conditional_probability_threshold Specifies threshold for conditional probability. Conditional probability is used for learned data. It is probability of query submission when query is occurred. Returned records must have conditional probability that is greater than or equal to conditional_probability_threshold. Default: 0.2 prefix_search Specifies whether optional prefix search is used or not in completion. Here are available values: yes Prefix search is always used. no Prefix search is never used. auto Prefix search is used only when other search can't find any records. Default: auto similar_search Specifies whether optional similar search is used or not in correction. Here are available values: yes Similar search is always used. no Similar search is never used. auto Similar search is used only when other search can't find any records. Default: auto Return value Here is a returned JSON format: {"type1": [["candidate1", score of candidate1], ["candidate2", score of candidate2], ...], "type2": [["candidate1", score of candidate1], ["candidate2", score of candidate2], ...], ...} type A type specified by types. candidate A candidate for completion, correction or suggestion. score of candidate A score of corresponding candidate. It means that higher score candidate is more likely candidate for completion, correction or suggestion. Returned candidates are sorted by score of candidate descending by default. See also • /reference/suggest • /reference/executables/groonga-suggest-create-dataset table_create Summary table_create creates a new table in the current database. You need to create one or more tables to store and search data. Syntax This command takes many parameters. The required parameter is only name and otehrs are optional: table_create name [flags=TABLE_HASH_KEY] [key_type=null] [value_type=null] [default_tokenizer=null] [normalizer=null] [token_filters=null] Usage table_create command creates a new persistent table. See /reference/tables for table details. Create data store table You can use all table types for data store table. See /reference/tables for all table types. Table type is specified as TABLE_${TYPE} to flags parameter. Here is an example to create TABLE_NO_KEY table: Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] The table_create command creates a table that is named Logs and is TABLE_NO_KEY type. If your records aren't searched by key, TABLE_NO_KEY type table is suitable. Because TABLE_NO_KEY doesn't support key but it is fast and small table. Storing logs into Groonga database is the case. If your records are searched by key or referenced by one or more columns, TABLE_NO_KEY type isn't suitable. Lexicon for fulltext search is the case. Create large data store table If you want to store many large keys, your table may not be able to store them. If total key data is larger than 4GiB, you can't store all key data into your table by default. You can expand the maximum total key size to 1TiB from 4GiB by KEY_LARGE flag. KEY_LARGE flag can be used with only TABLE_HASH_KEY. You can't use KEY_LARGE flag with TABLE_NO_KEY, TABLE_PAT_KEY nor TABLE_DAT_KEY. Here is an example to create a table that can store many large keys: Execution example: table_create Paths TABLE_HASH_KEY|KEY_LARGE ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] The table_create command creates a table that is named Paths and is TABLE_HASH_KEY type. The Paths table can store many large keys. Create lexicon table You can use all table types except TABLE_NO_KEY for lexicon table. Lexicon table needs key support but TABLE_NO_KEY doesn't support key. Here is an example to create TABLE_PAT_KEY table: Execution example: table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] The table_create command creates the following table: • The table is named Lexicon. • The table is TABLE_PAT_KEY type table. • The table's key is ShortText type. • The table uses TokenBigram tokenizer to extract tokens from a normalized text. • The table uses NormalizerAuto normalizer to normalize a text. TABLE_PAT_KEY is suitable table type for lexicon table. Lexicon table is used for fulltext search. In fulltext search, predictive search may be used for fuzzy search. Predictive search is supported by TABLE_PAT_KEY and TABLE_DAT_KEY. Lexicon table has many keys because a fulltext target text has many tokens. Table that has many keys should consider table size because large table requires large memory. Requiring large memory causes disk I/O. It blocks fast search. So table size is important for a table that has many keys. TABLE_PAT_KEY is less table size than TABLE_DAT_KEY. Because of the above reasons, TABLE_PAT_KEY is suitable table type for lexicon table. Create tag index table You can use all table types except TABLE_NO_KEY for tag index table. Tag index table needs key support but TABLE_NO_KEY doesn't support key. Here is an example to create TABLE_HASH_KEY table: Execution example: table_create Tags TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] The table_create command creates a table that is named Tags, is TABLE_HASH_KEY type and has ShortText type key. TABLE_HASH_KEY or TABLE_DAT_KEY are suitable table types for tag index table. If you need only exact match tag search feature, TABLE_HASH_KEY is suitable. It is the common case. If you also need predictive tag search feature (for example, searching "groonga" by "gr" keyword.), TABLE_DAT_KEY is suitable. TABLE_DAT_KEY is large table size but it is not important because the number of tags will not be large. Create range index table You can use TABLE_PAT_KEY and TABLE_DAT_KEY table types for range index table. Range index table needs range search support but TABLE_NO_KEY and TABLE_HASH_KEY don't support it. Here is an example to create TABLE_DAT_KEY table: Execution example: table_create Ages TABLE_DAT_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] The table_create command creates a table that is named Ages, is TABLE_DAT_KEY type and has UInt32 type key. TABLE_PAT_KEY and TABLE_DAT_KEY are suitable table types for range index table. If you don't have many indexed items, TABLE_DAT_KEY is suitable. Index for age is the case in the above example. Index for age will have only 0-100 items because human doesn't live so long. If you have many indexed items, TABLE_PAT_KEY is suitable. Because TABLE_PAT_KEY is smaller than TABLE_DAT_KEY. Parameters This section describes all parameters. name Specifies a table name to be created. name must be specified. Here are available characters: • 0 .. 9 (digit) • a .. z (alphabet, lower case) • A .. Z (alphabet, upper case) • # (hash) • @ (at mark) • - (hyphen) • _ (underscore) (NOTE: Underscore can't be used as the first character.) You need to create a name with one or more the above characters. Note that you cannot use _ as the first character such as _name. flags Specifies a table type and table customize options. Here are available flags: ┌───────────────┬──────────────────────────────────┐ │Flag │ Description │ ├───────────────┼──────────────────────────────────┤ │TABLE_NO_KEY │ Array table. See also │ │ │ table-no-key. │ ├───────────────┼──────────────────────────────────┤ │TABLE_HASH_KEY │ Hash table. See also │ │ │ table-hash-key. │ ├───────────────┼──────────────────────────────────┤ │TABLE_PAT_KEY │ Patricia trie. See also │ │ │ table-pat-key. │ ├───────────────┼──────────────────────────────────┤ │TABLE_DAT_KEY │ Double array trie. See also │ │ │ table-dat-key. │ ├───────────────┼──────────────────────────────────┤ │KEY_WITH_SIS │ Enable Semi Infinite String. │ │ │ Require TABLE_PAT_KEY. │ ├───────────────┼──────────────────────────────────┤ │KEY_LARGE │ Expand the maximum total key │ │ │ size to 1TiB from 4GiB. Require │ │ │ TABLE_HASH_KEY. │ └───────────────┴──────────────────────────────────┘ NOTE: Since Groonga 2.1.0 KEY_NORMALIZE flag is deprecated. Use normalizer option with NormalizerAuto instead. You must specify one of TABLE_${TYPE} flags. You cannot specify two or more TABLE_${TYPE} flags. For example, TABLE_NO_KEY|TABLE_HASH_KEY is invalid. You can combine flags with | (vertical bar) such as TABLE_PAT_KEY|KEY_WITH_SIS. See /reference/tables for difference between table types. The default flags are TABLE_HASH_KEY. key_type Specifies key type. If you specify TABLE_HASH_KEY, TABLE_PAT_KEY or TABLE_DAT_KEY as flags parameter, you need to specify key_type option. See /reference/types for all types. The default value is none. value_type Specifies value type. You can use value when you specify TABLE_NO_KEY, TABLE_HASH_KEY or TABLE_PAT_KEY as flags parameter. Value type must be a fixed size type. For example, UInt32 can be used but ShortText cannot be used. Use columns instead of value. The default value is none. default_tokenizer Specifies the default tokenizer that is used on searching and data loading. You must specify default_tokenizer for a table that is used for lexicon of fulltext search index. See /reference/tokenizers for available tokenizers. You must choose a tokenizer from the list for fulltext search. You don't need to specify default_tokenizer in the following cases: • You don't use the table as a lexicon. • You use the table as a lexicon but you don't need fulltext search. For example: • Index target data isn't text data such as Int32 and Time. • You just need exact match search, prefix search and so on. You can't use default_tokenizer with TABLE_NO_KEY flag because a table that uses TABLE_NO_KEY flag can't be used as lexicon. You must specify TABLE_HASH_KEY, TABLE_PAT_KEY, TABLE_DAT_KEY to flags when you want to use the table as a lexicon. The default value is none. normalizer Specifies a normalizer that is used to normalize key. You cannot use normalizer with TABLE_NO_KEY because TABLE_NO_KEY doesn't support key. See /reference/normalizers for all normalizsers. The default value is none. token_filters Specifies token filters that is used to some processes tokenized token. You cannot use token_filters with TABLE_NO_KEY because TABLE_NO_KEY doesn't support key. See /reference/token_filters for all token filters. The default value is none. Return value table_create returns true as body on success such as: [HEADER, true] If table_create fails, error details are in HEADER. See /reference/command/output_format for HEADER. See also • /reference/tables • /reference/commands/column_create • /reference/tokenizers • /reference/normalizers • /reference/command/output_format table_list Summary table_list - DBに定義されているテーブルをリスト表示 Groonga組込コマンドの一つであるtable_listについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。 table_listは、DBに定義されているテーブルのリストを表示します。 Syntax table_list Usage Execution example: table_list # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # "id", # "UInt32" # ], # [ # "name", # "ShortText" # ], # [ # "path", # "ShortText" # ], # [ # "flags", # "ShortText" # ], # [ # "domain", # "ShortText" # ], # [ # "range", # "ShortText" # ], # [ # "default_tokenizer", # "ShortText" # ], # [ # "normalizer", # "ShortText" # ] # ], # [ # 259, # "Ages", # "/tmp/groonga-databases/commands_table_create.0000103", # "TABLE_DAT_KEY|PERSISTENT", # "UInt32", # null, # null, # null # ], # [ # 257, # "Lexicon", # "/tmp/groonga-databases/commands_table_create.0000101", # "TABLE_PAT_KEY|PERSISTENT", # "ShortText", # null, # "TokenBigram", # "NormalizerAuto" # ], # [ # 256, # "Logs", # "/tmp/groonga-databases/commands_table_create.0000100", # "TABLE_NO_KEY|PERSISTENT", # null, # null, # null, # null # ], # [ # 258, # "Tags", # "/tmp/groonga-databases/commands_table_create.0000102", # "TABLE_HASH_KEY|PERSISTENT", # "ShortText", # null, # null, # null # ] # ] # ] Parameters ありません。 Return value テーブル名一覧が以下の形式で返却されます。: [[[テーブル情報名1,テーブル情報型1],...], テーブル情報1,...] テーブル情報名n テーブル情報n には複数の情報が含まれますが、そこに入る情報がどんな内容かを示す名前を出力します。 情報名は以下の通りです。 id テーブルオブジェクトに割り当てられたID name テーブル名 path テーブルのレコードを格納するファイル名 flags テーブルのflags属性 domain 主キー値の属する型 range valueが属する型 テーブル情報型n テーブル情報の型を出力します。 テーブル情報n テーブル情報名n で示された情報の配列を出力します。 情報の順序は テーブル情報名n の順序と同じです。 table_remove Summary table_remove removes a table and its columns. If there are one or more indexes against key of the table and its columns, they are also removed. New in version 6.0.1: You can also remove tables and columns that reference the target table by using dependent parameter. Syntax This command takes two parameters: table_remove name [dependent=no] Usage You just specify table name that you want to remove. table_remove removes the table and its columns. If the table and its columns are indexed, all index columns for the table and its columns are also removed. This section describes about the followings: • Basic usage • Unremovable cases • Removes a table with tables and columns that reference the target table • Decreases used resources Basic usage Let's think about the following case: • There is one table Entries. • Entries table has some columns. • Entries table's key is indexed. • A column of Entries is indexed. Here are commands that create Entries table: Execution example: table_create Entries TABLE_HASH_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] Here are commands that create an index for Entries table's key: Execution example: table_create EntryKeys TABLE_HASH_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create EntryKeys key_index COLUMN_INDEX Entries _key # [[0, 1337566253.89858, 0.000355720520019531], true] Here are commands that create an index for Entries table's column: Execution example: table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms content_index COLUMN_INDEX Entries content # [[0, 1337566253.89858, 0.000355720520019531], true] Let's confirm the current schema before running table_remove: Execution example: dump # table_create Entries TABLE_HASH_KEY UInt32 # column_create Entries content COLUMN_SCALAR Text # column_create Entries title COLUMN_SCALAR ShortText # # table_create EntryKeys TABLE_HASH_KEY UInt32 # # table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # # column_create EntryKeys key_index COLUMN_INDEX Entries _key # column_create Terms content_index COLUMN_INDEX Entries content If you remove Entries table, the following tables and columns are removed: • Entries • Entries.title • Entries.context • EntryKeys.key_index • Terms.content_index The following tables (lexicons) aren't removed: • EntryKeys • Terms Let's run table_remove: Execution example: table_remove Entries # [[0, 1337566253.89858, 0.000355720520019531], true] Here is schema after table_remove. Only EntryKeys and Terms exist: Execution example: dump # table_create EntryKeys TABLE_HASH_KEY UInt32 # # table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto Unremovable cases There are some unremovable cases: • One or more tables use the table as key type. • One or more columns use the table as value type. Both cases blocks dangling references. If the table is referenced as type and the table is removed, tables and columns that refer the table are broken. If the target table satisfies one of them, table_remove is failed. The target table and its columns aren't removed. Here is an example for the table is used as key type case. The following commands create a table to be removed and a table that uses the table to be removed as key type: Execution example: table_create ReferencedByTable TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create ReferenceTable TABLE_HASH_KEY ReferencedByTable # [[0, 1337566253.89858, 0.000355720520019531], true] table_remove against ReferencedByTable is failed: Execution example: table_remove ReferencedByTable # [ # [ # -2, # 1337566253.89858, # 0.000355720520019531, # "[table][remove] a table that references the table exists: <ReferenceTable._key> -> <ReferencedByTable>", # [ # [ # "is_removable_table", # "db.c", # 8831 # ] # ] # ], # false # ] You need to remove ReferenceTable before you remove ReferencedByTable: Execution example: table_remove ReferenceTable # [[0, 1337566253.89858, 0.000355720520019531], true] table_remove ReferencedByTable # [[0, 1337566253.89858, 0.000355720520019531], true] Here is an example for the table is used as value type case. The following commands create a table to be removed and a column that uses the table to be removed as value type: Execution example: table_create ReferencedByColumn TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Table TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Table reference_column COLUMN_SCALAR ReferencedByColumn # [[0, 1337566253.89858, 0.000355720520019531], true] table_remove against ReferencedByColumn is failed: Execution example: table_remove ReferencedByColumn # [ # [ # -2, # 1337566253.89858, # 0.000355720520019531, # "[table][remove] a column that references the table exists: <Table.reference_column> -> <ReferencedByColumn>", # [ # [ # "is_removable_table", # "db.c", # 8851 # ] # ] # ], # false # ] You need to remove Table.reference_column before you remove ReferencedByColumn: Execution example: column_remove Table reference_column # [[0, 1337566253.89858, 0.000355720520019531], true] table_remove ReferencedByColumn # [[0, 1337566253.89858, 0.000355720520019531], true] Removes a table with tables and columns that reference the target table New in version 6.0.1. If you understand what you'll do, you can also remove tables and columns that reference the target table with one table_remove command by using --dependent yes parameter. ReferencedTable in the following schema is referenced from a table and a column: Execution example: table_create ReferencedTable TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Table1 TABLE_HASH_KEY ReferencedTable # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Table2 TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Table2 reference_column COLUMN_SCALAR ReferencedTable # [[0, 1337566253.89858, 0.000355720520019531], true] You can't remove ReferencedTable by default: Execution example: table_remove ReferencedTable # [ # [ # -2, # 1337566253.89858, # 0.000355720520019531, # "[table][remove] a table that references the table exists: <Table1._key> -> <ReferencedTable>", # [ # [ # "is_removable_table", # "db.c", # 8831 # ] # ] # ], # false # ] You can remove ReferencedTable, Table1 and Table2.reference_column by using --dependent yes parameter. Table1 and Table2.reference_column reference ReferencedTable: Execution example: table_remove ReferencedTable --dependent yes # [[0, 1337566253.89858, 0.000355720520019531], true] Decreases used resources table_remove opens all tables and columns in database to check Unremovable cases. If you have many tables and columns, table_remove may use many resources. There is a workaround to avoid the case. table_remove closes temporary opened tables and columns for checking when the max number of threads is 1. You can confirm and change the current max number of threads by thread_limit. The feature is used in the following case: Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] thread_limit 2 # [[0, 1337566253.89858, 0.000355720520019531], 1] table_remove Entries # [[0, 1337566253.89858, 0.000355720520019531], true] The feature isn't used in the following case: Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] thread_limit 2 # [[0, 1337566253.89858, 0.000355720520019531], 1] table_remove Entries # [[0, 1337566253.89858, 0.000355720520019531], true] Parameters This section describes all parameters. Required parameters There is only one required parameter. name Specifies the table name to be removed. See Usage how to use this parameter. Optional parameters There is only one optional parameter. dependent New in version 6.0.1. Specifies whether tables and columns that reference the target table are also removed or not. If this value is yes, tables and columns that reference the target table are also removed. Otherwise, they aren't removed and an error is returned. In other words, if there are any tables and columns that reference the target table, the target table isn't removed by default. You should use this parameter carefully. This is a danger parameter. See Removes a table with tables and columns that reference the target table how to use this parameter. Return value The command returns true as body on success such as: [HEADER, true] If the command fails, error details are in HEADER. See /reference/command/output_format for HEADER. table_rename Summary table_rename command renames a table. It is a light operation. It just changes a relationship between name and the table object. It doesn't copy table and its column values. It is a dangerous operation. You must stop all operations including read operations while you run table_rename. If the following case is occurred, Groonga process may be crashed: • Starts an operation (like select) that accesses the table to be renamed by the current table name. The current table name is called as the old table name in the below because the table name is renamed. • Runs table_rename. The select is still running. • The select accesses the table to be renamed by the old table name. But the select can't find the table by the old name because the table has been renamed to the new table name. It may crash the Groonga process. Syntax This command takes two parameters. All parameters are required: table_rename name new_name Usage Here is a simple example of table_rename command. Execution example: table_create Users TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users score COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "Alice", "score": 2}, {"_key": "Bob", "score": 0}, {"_key": "Carlos", "score": -1} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] table_rename Users Players # [[0, 1337566253.89858, 0.000355720520019531], true] table_list # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # "id", # "UInt32" # ], # [ # "name", # "ShortText" # ], # [ # "path", # "ShortText" # ], # [ # "flags", # "ShortText" # ], # [ # "domain", # "ShortText" # ], # [ # "range", # "ShortText" # ], # [ # "default_tokenizer", # "ShortText" # ], # [ # "normalizer", # "ShortText" # ] # ], # [ # 256, # "Players", # "/tmp/groonga-databases/commands_table_rename.0000100", # "TABLE_PAT_KEY|PERSISTENT", # "ShortText", # null, # null, # null # ] # ] # ] select Players # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "score", # "Int32" # ] # ], # [ # 1, # "Alice", # 2 # ], # [ # 2, # "Bob", # 0 # ], # [ # 3, # "Carlos", # -1 # ] # ] # ] # ] Parameters This section describes parameters of table_rename. Required parameters All parameters are required. name Specifies the table name to be renamed. new_name Specifies the new table name. Return value The command returns true as body on success such as: [HEADER, true] If the command fails, error details are in HEADER. See /reference/command/output_format for HEADER. table_tokenize Summary table_tokenize command tokenizes text by the specified table's tokenizer. Syntax This command takes many parameters. table and string are required parameters. Others are optional: table_tokenize table string [flags=NONE] [mode=GET] Usage Here is a simple example. Execution example: register token_filters/stop_word # [[0,0.0,0.0],true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStopWord # [[0,0.0,0.0],true] column_create Terms is_stop_word COLUMN_SCALAR Bool # [[0,0.0,0.0],true] load --table Terms [ {"_key": "and", "is_stop_word": true} ] # [[0,0.0,0.0],1] table_tokenize Terms "Hello and Good-bye" --mode GET # [ # [ # 0, # 0.0, # 0.0 # ], # [ # { # "value": "hello", # "position": 0 # }, # { # "value": "good", # "position": 2 # }, # { # "value": "-", # "position": 3 # }, # { # "value": "bye", # "position": 4 # } # ] # ] Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is removed with TokenFilterStopWord token filter. Parameters This section describes all parameters. Parameters are categorized. Required parameters There are required parameters, table and string. table Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer, the token filters that is set the lexicon table. string Specifies any string which you want to tokenize. See tokenize-string option in /reference/commands/tokenize about details. Optional parameters There are optional parameters. flags Specifies a tokenization customize options. You can specify multiple options separated by "|". The default value is NONE. See tokenize-flags option in /reference/commands/tokenize about details. mode Specifies a tokenize mode. The default value is GET. See tokenize-mode option in /reference/commands/tokenize about details. Return value table_tokenize command returns tokenized tokens. See tokenize-return-value option in /reference/commands/tokenize about details. See also • /reference/tokenizers • /reference/commands/tokenize thread_limit Summary New in version 5.0.7. thread_limit has the following two features: • It returns the max number of threads. • It sets the max number of threads. /reference/executables/groonga is the only Groonga server that supports full thread_limit features. /reference/executables/groonga-httpd supports only one feature that returns the max number of threads. The max number of threads of /reference/executables/groonga-httpd always returns 1 because /reference/executables/groonga-httpd uses single thread model. If you're using Groonga as a library, thread_limit doesn't work without you set custom functions by grn_thread_set_get_limit_func() and grn_thread_set_set_limit_func(). If you set a function by grn_thread_set_get_limit_func(), the feature that returns the max number of threads works. If you set a function by grn_thread_set_set_limit_func(), the feature that sets the max number of threads works. Syntax This command takes only one optional parameter: thread_limit [max=null] Usage You can get the max number of threads by calling without any parameters: Execution example: thread_limit # [[0, 1337566253.89858, 0.000355720520019531], 2] If it returns 0, your Groonga server doesn't support the feature. You can set the max number of threads by calling max parameter: Execution example: thread_limit --max 4 # [[0, 1337566253.89858, 0.000355720520019531], 2] It returns the previous max number of threads when you pass max parameter. Parameters This section describes all parameters. Required parameters There is no required parameter. Optional parameters There is one optional parameter. max Specifies the new max number of threads. You must specify positive integer: Execution example: thread_limit --max 3 # [[0, 1337566253.89858, 0.000355720520019531], 4] If you specify max parameter, thread_limit returns the max number of threads before max is applied. Return value The command returns the max number of threads as body: [HEADER, N_MAX_THREADS] If max is specified, N_MAX_THREADS is the max number of threads before max is applied. See /reference/command/output_format for HEADER. tokenize Summary tokenize command tokenizes text by the specified tokenizer. It is useful to debug tokenization. Syntax This command takes many parameters. tokenizer and string are required parameters. Others are optional: tokenize tokenizer string [normalizer=null] [flags=NONE] [mode=ADD] [token_filters=NONE] Usage Here is a simple example. Execution example: tokenize TokenBigram "Fulltext Search" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "Fu" # }, # { # "position": 1, # "force_prefix": false, # "value": "ul" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lt" # }, # { # "position": 4, # "force_prefix": false, # "value": "te" # }, # { # "position": 5, # "force_prefix": false, # "value": "ex" # }, # { # "position": 6, # "force_prefix": false, # "value": "xt" # }, # { # "position": 7, # "force_prefix": false, # "value": "t " # }, # { # "position": 8, # "force_prefix": false, # "value": " S" # }, # { # "position": 9, # "force_prefix": false, # "value": "Se" # }, # { # "position": 10, # "force_prefix": false, # "value": "ea" # }, # { # "position": 11, # "force_prefix": false, # "value": "ar" # }, # { # "position": 12, # "force_prefix": false, # "value": "rc" # }, # { # "position": 13, # "force_prefix": false, # "value": "ch" # }, # { # "position": 14, # "force_prefix": false, # "value": "h" # } # ] # ] It has only required parameters. tokenizer is TokenBigram and string is "Fulltext Search". It returns tokens that is generated by tokenizing "Fulltext Search" with TokenBigram tokenizer. It doesn't normalize "Fulltext Search". Parameters This section describes all parameters. Parameters are categorized. Required parameters There are required parameters, tokenizer and string. tokenizer Specifies the tokenizer name. tokenize command uses the tokenizer that is named tokenizer. See /reference/tokenizers about built-in tokenizers. Here is an example to use built-in TokenTrigram tokenizer. Execution example: tokenize TokenTrigram "Fulltext Search" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "Ful" # }, # { # "position": 1, # "force_prefix": false, # "value": "ull" # }, # { # "position": 2, # "force_prefix": false, # "value": "llt" # }, # { # "position": 3, # "force_prefix": false, # "value": "lte" # }, # { # "position": 4, # "force_prefix": false, # "value": "tex" # }, # { # "position": 5, # "force_prefix": false, # "value": "ext" # }, # { # "position": 6, # "force_prefix": false, # "value": "xt " # }, # { # "position": 7, # "force_prefix": false, # "value": "t S" # }, # { # "position": 8, # "force_prefix": false, # "value": " Se" # }, # { # "position": 9, # "force_prefix": false, # "value": "Sea" # }, # { # "position": 10, # "force_prefix": false, # "value": "ear" # }, # { # "position": 11, # "force_prefix": false, # "value": "arc" # }, # { # "position": 12, # "force_prefix": false, # "value": "rch" # }, # { # "position": 13, # "force_prefix": false, # "value": "ch" # }, # { # "position": 14, # "force_prefix": false, # "value": "h" # } # ] # ] If you want to use other tokenizers, you need to register additional tokenizer plugin by register command. For example, you can use KyTea based tokenizer by registering tokenizers/kytea. string Specifies any string which you want to tokenize. If you want to include spaces in string, you need to quote string by single quotation (') or double quotation ("). Here is an example to use spaces in string. Execution example: tokenize TokenBigram "Groonga is a fast fulltext earch engine!" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "Gr" # }, # { # "position": 1, # "force_prefix": false, # "value": "ro" # }, # { # "position": 2, # "force_prefix": false, # "value": "oo" # }, # { # "position": 3, # "force_prefix": false, # "value": "on" # }, # { # "position": 4, # "force_prefix": false, # "value": "ng" # }, # { # "position": 5, # "force_prefix": false, # "value": "ga" # }, # { # "position": 6, # "force_prefix": false, # "value": "a " # }, # { # "position": 7, # "force_prefix": false, # "value": " i" # }, # { # "position": 8, # "force_prefix": false, # "value": "is" # }, # { # "position": 9, # "force_prefix": false, # "value": "s " # }, # { # "position": 10, # "force_prefix": false, # "value": " a" # }, # { # "position": 11, # "force_prefix": false, # "value": "a " # }, # { # "position": 12, # "force_prefix": false, # "value": " f" # }, # { # "position": 13, # "force_prefix": false, # "value": "fa" # }, # { # "position": 14, # "force_prefix": false, # "value": "as" # }, # { # "position": 15, # "force_prefix": false, # "value": "st" # }, # { # "position": 16, # "force_prefix": false, # "value": "t " # }, # { # "position": 17, # "force_prefix": false, # "value": " f" # }, # { # "position": 18, # "force_prefix": false, # "value": "fu" # }, # { # "position": 19, # "force_prefix": false, # "value": "ul" # }, # { # "position": 20, # "force_prefix": false, # "value": "ll" # }, # { # "position": 21, # "force_prefix": false, # "value": "lt" # }, # { # "position": 22, # "force_prefix": false, # "value": "te" # }, # { # "position": 23, # "force_prefix": false, # "value": "ex" # }, # { # "position": 24, # "force_prefix": false, # "value": "xt" # }, # { # "position": 25, # "force_prefix": false, # "value": "t " # }, # { # "position": 26, # "force_prefix": false, # "value": " e" # }, # { # "position": 27, # "force_prefix": false, # "value": "ea" # }, # { # "position": 28, # "force_prefix": false, # "value": "ar" # }, # { # "position": 29, # "force_prefix": false, # "value": "rc" # }, # { # "position": 30, # "force_prefix": false, # "value": "ch" # }, # { # "position": 31, # "force_prefix": false, # "value": "h " # }, # { # "position": 32, # "force_prefix": false, # "value": " e" # }, # { # "position": 33, # "force_prefix": false, # "value": "en" # }, # { # "position": 34, # "force_prefix": false, # "value": "ng" # }, # { # "position": 35, # "force_prefix": false, # "value": "gi" # }, # { # "position": 36, # "force_prefix": false, # "value": "in" # }, # { # "position": 37, # "force_prefix": false, # "value": "ne" # }, # { # "position": 38, # "force_prefix": false, # "value": "e!" # }, # { # "position": 39, # "force_prefix": false, # "value": "!" # } # ] # ] Optional parameters There are optional parameters. normalizer Specifies the normalizer name. tokenize command uses the normalizer that is named normalizer. Normalizer is important for N-gram family tokenizers such as TokenBigram. Normalizer detects character type for each character while normalizing. N-gram family tokenizers use character types while tokenizing. Here is an example that doesn't use normalizer. Execution example: tokenize TokenBigram "Fulltext Search" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "Fu" # }, # { # "position": 1, # "force_prefix": false, # "value": "ul" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lt" # }, # { # "position": 4, # "force_prefix": false, # "value": "te" # }, # { # "position": 5, # "force_prefix": false, # "value": "ex" # }, # { # "position": 6, # "force_prefix": false, # "value": "xt" # }, # { # "position": 7, # "force_prefix": false, # "value": "t " # }, # { # "position": 8, # "force_prefix": false, # "value": " S" # }, # { # "position": 9, # "force_prefix": false, # "value": "Se" # }, # { # "position": 10, # "force_prefix": false, # "value": "ea" # }, # { # "position": 11, # "force_prefix": false, # "value": "ar" # }, # { # "position": 12, # "force_prefix": false, # "value": "rc" # }, # { # "position": 13, # "force_prefix": false, # "value": "ch" # }, # { # "position": 14, # "force_prefix": false, # "value": "h" # } # ] # ] All alphabets are tokenized by two characters. For example, Fu is a token. Here is an example that uses normalizer. Execution example: tokenize TokenBigram "Fulltext Search" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "fulltext" # }, # { # "position": 1, # "force_prefix": false, # "value": "search" # } # ] # ] Continuous alphabets are tokenized as one token. For example, fulltext is a token. If you want to tokenize by two characters with noramlizer, use TokenBigramSplitSymbolAlpha. Execution example: tokenize TokenBigramSplitSymbolAlpha "Fulltext Search" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "fu" # }, # { # "position": 1, # "force_prefix": false, # "value": "ul" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lt" # }, # { # "position": 4, # "force_prefix": false, # "value": "te" # }, # { # "position": 5, # "force_prefix": false, # "value": "ex" # }, # { # "position": 6, # "force_prefix": false, # "value": "xt" # }, # { # "position": 7, # "force_prefix": false, # "value": "t" # }, # { # "position": 8, # "force_prefix": false, # "value": "se" # }, # { # "position": 9, # "force_prefix": false, # "value": "ea" # }, # { # "position": 10, # "force_prefix": false, # "value": "ar" # }, # { # "position": 11, # "force_prefix": false, # "value": "rc" # }, # { # "position": 12, # "force_prefix": false, # "value": "ch" # }, # { # "position": 13, # "force_prefix": false, # "value": "h" # } # ] # ] All alphabets are tokenized by two characters. And they are normalized to lower case characters. For example, fu is a token. flags Specifies a tokenization customize options. You can specify multiple options separated by "|". For example, NONE|ENABLE_TOKENIZED_DELIMITER. Here are available flags. ┌───────────────────────────┬──────────────────────────────────┐ │Flag │ Description │ ├───────────────────────────┼──────────────────────────────────┤ │NONE │ Just ignored. │ ├───────────────────────────┼──────────────────────────────────┤ │ENABLE_TOKENIZED_DELIMITER │ Enables tokenized delimiter. See │ │ │ /reference/tokenizers about │ │ │ tokenized delimiter details. │ └───────────────────────────┴──────────────────────────────────┘ Here is an example that uses ENABLE_TOKENIZED_DELIMITER. Execution example: tokenize TokenDelimit "Fulltext Seacrch" NormalizerAuto ENABLE_TOKENIZED_DELIMITER # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "full" # }, # { # "position": 1, # "force_prefix": false, # "value": "text sea" # }, # { # "position": 2, # "force_prefix": false, # "value": "crch" # } # ] # ] TokenDelimit tokenizer is one of tokenized delimiter supported tokenizer. ENABLE_TOKENIZED_DELIMITER enables tokenized delimiter. Tokenized delimiter is special character that indicates token border. It is U+FFFE. The character is not assigned any character. It means that the character is not appeared in normal string. So the character is good character for this puropose. If ENABLE_TOKENIZED_DELIMITER is enabled, the target string is treated as already tokenized string. Tokenizer just tokenizes by tokenized delimiter. mode Specifies a tokenize mode. If the mode is specified ADD, the text is tokenized by the rule that adding a document. If the mode is specified GET, the text is tokenized by the rule that searching a document. If the mode is omitted, the text is tokenized by the ADD mode. The default mode is ADD. Here is an example to the ADD mode. Execution example: tokenize TokenBigram "Fulltext Search" --mode ADD # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "Fu" # }, # { # "position": 1, # "force_prefix": false, # "value": "ul" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lt" # }, # { # "position": 4, # "force_prefix": false, # "value": "te" # }, # { # "position": 5, # "force_prefix": false, # "value": "ex" # }, # { # "position": 6, # "force_prefix": false, # "value": "xt" # }, # { # "position": 7, # "force_prefix": false, # "value": "t " # }, # { # "position": 8, # "force_prefix": false, # "value": " S" # }, # { # "position": 9, # "force_prefix": false, # "value": "Se" # }, # { # "position": 10, # "force_prefix": false, # "value": "ea" # }, # { # "position": 11, # "force_prefix": false, # "value": "ar" # }, # { # "position": 12, # "force_prefix": false, # "value": "rc" # }, # { # "position": 13, # "force_prefix": false, # "value": "ch" # }, # { # "position": 14, # "force_prefix": false, # "value": "h" # } # ] # ] The last alphabet is tokenized by one character. Here is an example to the GET mode. Execution example: tokenize TokenBigram "Fulltext Search" --mode GET # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "Fu" # }, # { # "position": 1, # "force_prefix": false, # "value": "ul" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lt" # }, # { # "position": 4, # "force_prefix": false, # "value": "te" # }, # { # "position": 5, # "force_prefix": false, # "value": "ex" # }, # { # "position": 6, # "force_prefix": false, # "value": "xt" # }, # { # "position": 7, # "force_prefix": false, # "value": "t " # }, # { # "position": 8, # "force_prefix": false, # "value": " S" # }, # { # "position": 9, # "force_prefix": false, # "value": "Se" # }, # { # "position": 10, # "force_prefix": false, # "value": "ea" # }, # { # "position": 11, # "force_prefix": false, # "value": "ar" # }, # { # "position": 12, # "force_prefix": false, # "value": "rc" # }, # { # "position": 13, # "force_prefix": false, # "value": "ch" # } # ] # ] The last alphabet is tokenized by two characters. token_filters Specifies the token filter names. tokenize command uses the tokenizer that is named token_filters. See /reference/token_filters about token filters. Return value tokenize command returns tokenized tokens. Each token has some attributes except token itself. The attributes will be increased in the feature: [HEADER, tokens] HEADER See /reference/command/output_format about HEADER. tokens tokens is an array of token. Token is an object that has the following attributes. ┌─────────┬─────────────────┐ │Name │ Description │ ├─────────┼─────────────────┤ │value │ Token itself. │ ├─────────┼─────────────────┤ │position │ The N-th token. │ └─────────┴─────────────────┘ See also • /reference/tokenizers tokenizer_list Summary tokenizer_list command lists tokenizers in a database. Syntax This command takes no parameters: tokenizer_list Usage Here is a simple example. Execution example: tokenizer_list # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "name": "TokenMecab" # }, # { # "name": "TokenDelimit" # }, # { # "name": "TokenUnigram" # }, # { # "name": "TokenBigram" # }, # { # "name": "TokenTrigram" # }, # { # "name": "TokenBigramSplitSymbol" # }, # { # "name": "TokenBigramSplitSymbolAlpha" # }, # { # "name": "TokenBigramSplitSymbolAlphaDigit" # }, # { # "name": "TokenBigramIgnoreBlank" # }, # { # "name": "TokenBigramIgnoreBlankSplitSymbol" # }, # { # "name": "TokenBigramIgnoreBlankSplitSymbolAlpha" # }, # { # "name": "TokenBigramIgnoreBlankSplitSymbolAlphaDigit" # }, # { # "name": "TokenDelimitNull" # }, # { # "name": "TokenRegexp" # } # ] # ] It returns tokenizers in a database. Return value tokenizer_list command returns tokenizers. Each tokenizers has an attribute that contains the name. The attribute will be increased in the feature: [HEADER, tokenizers] HEADER See /reference/command/output_format about HEADER. tokenizers tokenizers is an array of tokenizer. Tokenizer is an object that has the following attributes. ┌─────┬─────────────────┐ │Name │ Description │ ├─────┼─────────────────┤ │name │ Tokenizer name. │ └─────┴─────────────────┘ See also • /reference/tokenizers • /reference/commands/tokenize truncate Summary truncate command deletes all records from specified table or all values from specified column. Syntax This command takes only one required parameter: truncate target_name New in version 4.0.9: target_name parameter can be used since 4.0.9. You need to use table parameter for 4.0.8 or earlier. For backward compatibility, truncate command accepts table parameter. But it should not be used for newly written code. Usage Here is a simple example of truncate command against a table. Execution example: table_create Users TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users score COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "Alice", "score": 2}, {"_key": "Bob", "score": 0}, {"_key": "Carlos", "score": -1} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "score", # "Int32" # ] # ], # [ # 1, # "Alice", # 2 # ], # [ # 2, # "Bob", # 0 # ], # [ # 3, # "Carlos", # -1 # ] # ] # ] # ] truncate Users # [[0, 1337566253.89858, 0.000355720520019531], true] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 0 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "score", # "Int32" # ] # ] # ] # ] # ] Here is a simple example of truncate command against a column. Execution example: table_create Users TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users score COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "Alice", "score": 2}, {"_key": "Bob", "score": 0}, {"_key": "Carlos", "score": -1} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "score", # "Int32" # ] # ], # [ # 1, # "Alice", # 2 # ], # [ # 2, # "Bob", # 0 # ], # [ # 3, # "Carlos", # -1 # ] # ] # ] # ] truncate Users.score # [[0, 1337566253.89858, 0.000355720520019531], true] select Users # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "score", # "Int32" # ] # ], # [ # 1, # "Alice", # 0 # ], # [ # 2, # "Bob", # 0 # ], # [ # 3, # "Carlos", # 0 # ] # ] # ] # ] Parameters This section describes parameters of truncate. Required parameters There is required parameter, target_name. target_name Specifies the name of table or column. Return value truncate command returns whether truncation is succeeded or not: [HEADER, SUCCEEDED_OR_NOT] HEADER See /reference/command/output_format about HEADER. SUCCEEDED_OR_NOT If command succeeded, it returns true, otherwise it returns false on error. Data types Name Groonga data types Description Groonga identifies data types to store. A primary key of table and column value belong to some kind of data types in Groonga database. And normally, column values become in common with all records in one table. A primary key type and column type can be specified Groonga defined types, user defined types or user defined table. If you specify other table to primary key type, this table becomes subset of the table of primary key type. If you specify other table to column type, this column becomes reference key of the table of column type. Builtin types The following types are defined as builtin types. Bool Boolean type. The possible values are true and false. (default: false) To store a value by /reference/commands/load command, becomes false if you specify false, 0 or empty string, becomes true if you specify others. Int8 Signed 8bit integer. It's -128 or more and 127 or less. (default: 0) UInt8 Unsigned 8bit integer. Is't 0 or more and 255 or less. (default: 0) Int16 Signed 16bit integer. It's -32,768 or more and 32,767 or less. (default: 0) UInt16 Unsigned 16bit integer. It's 0 or more and 65,535 or less. (default: 0) Int32 Signed 32bit integer. It's -2,147,483,648 or more and 2,147,483,647 or less. (default: 0) UInt32 Unsigned 32bit integer. It's 0 or more and 4,294,967,295 or less. (default: 0) Int64 Signed 64bit integer. It's -9,223,372,036,854,775,808 or more and 9,223,372,036,854,775,807 or less. (default: 0) UInt64 Unsigned 64bit integer. It's 0 or more and 18,446,744,073,709,551,615 or less. (default: 0) Float Double-precision floating-point number of IEEE 754 as a real number. (default: 0.0) See IEEE floating point - Wikipedia, the free encyclopedia or IEEE 754: Standard for Binary Floating-Point for details of IEEE 754 format. Time Date and Time, the number of seconds that have elapsed since 1970-01-01 00:00:00 by 64 bit signed integer. (default: 0) To store a value by /reference/commands/load command, specifies the number of elapsed seconds since 1970-01-01 00:00:00. To specify the detailed date and time than seconds, use the decimal. ShortText String of 4,095 or less bytes. (default: "") Text String of 65,535 or less bytes. (default: "") LongText String of 2,147,483,647 or less bytes. (default: "") TokyoGeoPoint 旧日本測地系による経緯度であり、経度と緯度をミリ秒単位で表現した整数の組により表現します。(デフォルト値: 0x0) 度分秒形式でx度y分z秒となる経度・緯度は、(((x * 60) + y) * 60 + z) * 1000という計算式でミリ秒単位へと変換されます。 /reference/commands/load コマンドで値を格納するときは、"ミリ秒単位の経度xミリ秒単位の緯度" もしくは "経度の小数表記x緯度の小数表記" という文字列表現を使って指定します。経度と緯度の区切りとしては、'x' のほかに ',' を使うことができます。 測地系の詳細については、 測地系 - Wikipedia を参照してください。 WGS84GeoPoint 世界測地系(World Geodetic System, WGS 84)による経緯度であり、経度と緯度をミリ秒単位で表現した整数の組により表現します。(デフォルト値: 0x0) 度分秒形式からミリ秒形式への変換方法や /reference/commands/load コマンドにおける指定方法はTokyoGeoPointと同じです。 Limitations about types Types that can't be specified in primary key of table Text and LongText can't be specified in primary key of table. ベクターとして格納できない型 Groongaのカラムは、ある型のベクターを保存することができます。しかし、ShortText, Text, LongTextの3つの型についてはベクターとして保存したり出力したりすることはできますが、検索条件やドリルダウン条件に指定することができません。 テーブル型は、ベクターとして格納することができます。よって、ShortTextのベクターを検索条件やドリルダウン条件に使用したい場合には、主キーがShortText型のテーブルを別途作成し、そのテーブルを型として利用します。 Tables Summary Table in Groonga manages relation between ID and key. Groonga provides four table types. They are TABLE_NO_KEY, TABLE_HASH_KEY, TABLE_PAT_KEY and TABLE_DAT_KEY. All tables except TABLE_NO_KEY provides both fast ID search by key and fast key search by ID. TABLE_NO_KEY doesn't support key. TABLE_NO_KEY only manages ID. So TABLE_NO_KEY doesn't provides ID search and key search. Characteristics Here is a chracteristic table of all tables in Groonga. (TABLE_ prefix is omitted in the table.) ┌─────────────────┬────────┬────────────┬───────────────┬──────────────────┐ │ │ NO_KEY │ HASH_KEY │ PAT_KEY │ DAT_KEY │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Data structure │ Array │ Hash table │ Patricia trie │ Double array │ │ │ │ │ │ trie │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │ID support │ o │ o │ o │ o │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Key support │ x │ o │ o │ o │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Value support │ o │ o │ o │ x │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Key -> ID speed │ - │ oo │ x │ o │ │ │ │ │ │ │ │ • o: fast │ │ │ │ │ │ │ │ │ │ │ │ • x: slow │ │ │ │ │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Update speed │ ooo │ o │ o │ x │ │ │ │ │ │ │ │ • o: fast │ │ │ │ │ │ │ │ │ │ │ │ • x: slow │ │ │ │ │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Size │ ooo │ o │ oo │ x │ │ │ │ │ │ │ │ • o: │ │ │ │ │ │ small │ │ │ │ │ │ │ │ │ │ │ │ • x: │ │ │ │ │ │ large │ │ │ │ │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Key update │ - │ x │ x │ o │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Common prefix │ - │ x │ o │ o │ │search │ │ │ │ │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Predictive │ - │ x │ o │ o │ │search │ │ │ │ │ ├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤ │Range search │ - │ x │ o │ o │ └─────────────────┴────────┴────────────┴───────────────┴──────────────────┘ TABLE_NO_KEY TABLE_NO_KEY is very fast and very small but it doesn't support key. TABLE_NO_KEY is a only table that doesn't support key. You cannot use TABLE_NO_KEY for lexicon for fulltext search because lexicon stores tokens as key. TABLE_NO_KEY is useful for no key records such as log. TABLE_HASH_KEY TABLE_HASH_KEY is fast but it doesn't support advanced search functions such as common prefix search and predictive search. TABLE_HASH_KEY is useful for index for exact search such as tag search. TABLE_PAT_KEY TABLE_PAT_KEY is small and supports advanced search functions. TABLE_PAT_KEY is useful for lexicon for fulltext search and index for range search. TABLE_DAT_KEY TABLE_DAT_KEY is fast and supports key update but it is large. It is not suitable for storing many records. TABLE_DAT_KEY is a only table that supports key update. TABLE_DAT_KEY is used in Groonga database. Groonga database needs to convert object name such as ShortText, TokenBigram and table names to object ID. And Groonga database needs to rename object name. Those features are implemented by TABLE_DAT_KEY. The number of objects is small. So large data size demerit of TABLE_DAT_KEY can be ignored. Record ID Record ID is assigned automatically. You cannot assign record ID. Record ID of deleted record may be reused. Valid record ID range is between 1 and 268435455. (1 and 268435455 are valid IDs.) Persistent table and temporary table Table is persistent table or temporary table. Persistent table Persistent table is named and registered to database. Records in persistent table aren't deleted after closing table or database. Persistent table can be created by /reference/commands/table_create command. Temporary table Temporary table is anonymous. Records in temporary table are deleted after closing table. Temporary table is used to store search result, sort result, group (drilldown) result and so on. TABLE_HASH_KEY is used for search result and group result. TABLE_NO_KEY is used for sort result. Limitations The max number of records is 268435455. You cannot add 268435456 or more records in a table. The max number of a key size is 4096byte. You cannot use 4097byte or larger key. You can use column instead of key for 4097byte or larger size data. Text and LargeText types supports 4097byte or larger size data. The max number of total key size is 4GiB. You need to split a table, split a database (sharding) or reduce each key size to handle 4GiB or more larger total key size. See also • /reference/commands/table_create Column Column is a data store object or an index object for fast search. A column belongs to a table. Table has zero or more columns. Both data store column and index column have type. Type of data store column specifies data range. In other words, it is "value type". Type of index column specifies set of documents to be indexed. A set of documents is a table in Groonga. In other words, type of index column must be a table. Here are data store columns: Scalar column Summary TODO Usage TODO Vector column Summary Vector column is a data store object. It can stores zero or more scalar values. In short, scalar value is a single value such as number and string. See scalar about scalar value details. One of vector column use cases is tags store. You can use a vector column to store tag values. You can use vector column as index search target in the same way as scalar column. You can set weight for each element. The element that has one or more weight is matched, the record has more score rather than no weight case. It is a vector column specific feature. Vector column that can store weight is called weight vector column. You can also do full text search against each text element. But search score is too high when weight is used. You should use full text search with weight carefully. Usage There are three vector column types: • Normal vector column • Reference vector column • Weight vector column This section describes how to use these types. Normal vector column Normal vector column stores zero or more scalar data. For example, scalar data are number, string and so on. A normal vector column can store the same type elements. You can't mix types. For example, you can't store a number and a string in the same normal vector column. Normal vector column is useful when a record has multiple values with a key. Tags are the most popular use case. How to create Use /reference/commands/column_create command to create a normal vector column. The point is COLUMN_VECTOR flag: Execution example: table_create Bookmarks TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Bookmarks tags COLUMN_VECTOR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] You can set zero or more tags to a bookmark. How to load You can load vector data by JSON array syntax: [ELEMENT1, ELEMENT2, ELEMENT3, ...] Let's load the following data: ┌────────────────────┬─────────────────────────────────┐ │_key │ tags │ ├────────────────────┼─────────────────────────────────┤ │http://groonga.org/ │ ["groonga"] │ ├────────────────────┼─────────────────────────────────┤ │http://mroonga.org/ │ ["mroonga", "mysql", "groonga"] │ ├────────────────────┼─────────────────────────────────┤ │http://ranguba.org/ │ ["ruby", "groonga"] │ └────────────────────┴─────────────────────────────────┘ Here is a command that loads the data: Execution example: load --table Bookmarks [ {"_key": "http://groonga.org/", "tags": ["groonga"]}, {"_key": "http://mroonga.org/", "tags": ["mroonga", "mysql", "groonga"]}, {"_key": "http://ranguba.org/", "tags": ["ruby", "groonga"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] The loaded data can be outputted as JSON array syntax: Execution example: select Bookmarks # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ] # ], # [ # 1, # "http://groonga.org/", # [ # "groonga" # ] # ], # [ # 2, # "http://mroonga.org/", # [ # "mroonga", # "mysql", # "groonga" # ] # ], # [ # 3, # "http://ranguba.org/", # [ # "ruby", # "groonga" # ] # ] # ] # ] # ] How to search You need to create an index to search normal vector column: Execution example: table_create Tags TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Tags bookmark_index COLUMN_INDEX Bookmarks tags # [[0, 1337566253.89858, 0.000355720520019531], true] There is no vector column specific way. You can create an index like a scalar column. You can search an element in tags like full text search syntax. With select-match-columns and select-query: Execution example: select Bookmarks --match_columns tags --query mysql --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://mroonga.org/", # [ # "mroonga", # "mysql", # "groonga" # ], # 1 # ] # ] # ] # ] You can also use weight in select-match-columns: Execution example: select Bookmarks --match_columns 'tags * 3' --query mysql --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://mroonga.org/", # [ # "mroonga", # "mysql", # "groonga" # ], # 3 # ] # ] # ] # ] With select-filter: Execution example: select Bookmarks --filter 'tags @ "msyql"' --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 0 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ] # ] # ] # ] Reference vector column TODO Reference vector column is space-efficient if there are many same value elements. Reference vector column keeps reference record IDs not value itself. Record ID is smaller than value itself. How to create TODO How to load TODO How to search TODO Weight vector column Weight vector column is similar to normal vector column. It can store elements. It can also stores weights for them. Weight is degree of importance of the element. Weight is positive integer. 0 is the default weight. It means that no weight. If weight is one or larger, search score is increased by the weight. If the weight is 0, score is 1. If the weight is 10, score is 11 (= 1 + 10). Weight vector column is useful for tuning search score. See also select-adjuster. You can increase search score of specific records. Limitations There are some limitations for now. They will be resolved in the future. Here are limitations: • You need to use string representation for element value on load. For example, you can't use 29 for number 29. You need to use "29" for number 29. How to create Use /reference/commands/column_create command to create a weight vector column. The point is COLUMN_VECTOR|WITH_WEIGHT flags: Execution example: table_create Bookmarks TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Bookmarks tags COLUMN_VECTOR|WITH_WEIGHT ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] If you don't specify WITH_WEIGHT flag, it is just a normal vector column. You can set zero or more tags with weight to a bookmark. How to load You can load vector data by JSON object syntax: {"ELEMENT1": WEIGHT1, "ELEMENT2": WEIGHT2, "ELEMENT3": WEIGHT3, ...} Let's load the following data: ┌────────────────────┬──────────────────────────────────┐ │_key │ tags │ ├────────────────────┼──────────────────────────────────┤ │http://groonga.org/ │ {"groonga": 100} │ ├────────────────────┼──────────────────────────────────┤ │http://mroonga.org/ │ {"mroonga": 100, "mysql": 50, │ │ │ "groonga": 10} │ ├────────────────────┼──────────────────────────────────┤ │http://ranguba.org/ │ {"ruby": 100, "groonga": 50} │ └────────────────────┴──────────────────────────────────┘ Here is a command that loads the data: Execution example: load --table Bookmarks [ {"_key": "http://groonga.org/", "tags": {"groonga": 100}}, {"_key": "http://mroonga.org/", "tags": {"mroonga": 100, "mysql": 50, "groonga": 10}}, {"_key": "http://ranguba.org/", "tags": {"ruby": 100, "groonga": 50}} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] The loaded data can be outputted as JSON object syntax: Execution example: select Bookmarks # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ] # ], # [ # 1, # "http://groonga.org/", # { # "groonga": 100 # } # ], # [ # 2, # "http://mroonga.org/", # { # "mroonga": 100, # "groonga": 10, # "mysql": 50 # } # ], # [ # 3, # "http://ranguba.org/", # { # "ruby": 100, # "groonga": 50 # } # ] # ] # ] # ] How to search You need to create an index to search weight vector column. You don't forget to specify WITH_WEIGHT flag to column_create: Execution example: table_create Tags TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Tags bookmark_index COLUMN_INDEX|WITH_WEIGHT Bookmarks tags # [[0, 1337566253.89858, 0.000355720520019531], true] There is no weight vector column specific way except WITH_WEIGHT flag. You can create an index like a scalar column. You can search an element in tags like full text search syntax. With select-match-columns and select-query: Execution example: select Bookmarks --match_columns tags --query groonga --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://groonga.org/", # { # "groonga": 100 # }, # 101 # ], # [ # "http://mroonga.org/", # { # "mroonga": 100, # "groonga": 10, # "mysql": 50 # }, # 11 # ], # [ # "http://ranguba.org/", # { # "ruby": 100, # "groonga": 50 # }, # 51 # ] # ] # ] # ] You can also use weight in select-match-columns. The score is (1 + weight_in_weight_vector) * weight_in_match_columns: Execution example: select Bookmarks --match_columns 'tags * 3' --query groonga --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://groonga.org/", # { # "groonga": 100 # }, # 303 # ], # [ # "http://mroonga.org/", # { # "mroonga": 100, # "groonga": 10, # "mysql": 50 # }, # 33 # ], # [ # "http://ranguba.org/", # { # "ruby": 100, # "groonga": 50 # }, # 153 # ] # ] # ] # ] With select-filter: Execution example: select Bookmarks --filter 'tags @ "groonga"' --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://groonga.org/", # { # "groonga": 100 # }, # 101 # ], # [ # "http://mroonga.org/", # { # "mroonga": 100, # "groonga": 10, # "mysql": 50 # }, # 11 # ], # [ # "http://ranguba.org/", # { # "ruby": 100, # "groonga": 50 # }, # 51 # ] # ] # ] # ] How to apply just weight You can use weight in weight vector column to just increase search score without changing a set of matched records. Use select-adjuster for the purpose: Execution example: select Bookmarks \ --filter true \ --adjuster 'tags @ "mysql" * 10 + tags @ "groonga" * 5' \ --output_columns _key,tags,_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tags", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "http://groonga.org/", # { # "groonga": 100 # }, # 506 # ], # [ # "http://mroonga.org/", # { # "mroonga": 100, # "groonga": 10, # "mysql": 50 # }, # 566 # ], # [ # "http://ranguba.org/", # { # "ruby": 100, # "groonga": 50 # }, # 256 # ] # ] # ] # ] The select command uses --filter true. So all records are matched with score 1. Then it applies --adjuster. The adjuster does the following: • tags @ "mysql" * 10 increases score by (1 + weight) * 10 of records that has "mysql" tag. • tags @ "groonga" * 5 increases score by (1 + weight) * 5 of records that has "groonga" tag. For example, record "http://mroonga.org/" has both "mysql" tag and "groonga" tag. So its score is increased by 565 (= ((1 + 50) * 10) + ((1 + 10) * 5) = (51 * 10) + (11 * 5) = 510 + 55). The search score is 1 by --filter true before applying --adjuster. So the final search score is 566 (= 1 + 565) of record "http://mroonga.org/". Pseudo column 名前 疑似カラム 説明 Groongaのデータベースで作成したテーブルには、いくつかのカラムが自動的に定義されます。 これらのカラムはいずれもアンダースコア('_')で始まる名前が付与されます。定義される疑似カラムは、テーブルの種類によって異なります。 _id レコードに付与される一意な番号です。全てのテーブルに定義されます。値の範囲は1〜1073741824の整数で、通常はレコードを追加した順に1ずつ加算されます。_idの値は不変で、レコードが存在する限り変更することはできません。ただし、削除されたレコードの_idの値は再利用されます。 _key レコードの主キー値を表します。主キーを持つテーブルのみに定義されます。主キー値はテーブルの中で一意であり、変更することはできません。 _value レコードの値を表します。value_typeを指定したテーブルのみに定義されます。自由に変更可能です。 _score 各レコードのスコア値を表します。検索結果として生成されたテーブルのみに定義されます。 検索処理を実行する過程で値が設定されますが、自由に変更可能です。 _nsubrecs 主キーの値が同一であったレコードの件数を表します。検索結果として生成されたテーブルのみに定義されます。グループ化(drilldown)処理を実行すると、グループ化前のテーブルにおいて、グループ化キーの値が同一であったレコードの件数が、グループ化処理の結果を格納するテーブルの_nsubrecsに記録されます。 Here is an index column: Index column Summary TODO Usage TODO Normalizers Summary Groonga has normalizer module that normalizes text. It is used when tokenizing text and storing table key. For example, A and a are processed as the same character after normalization. Normalizer module can be added as a plugin. You can customize text normalization by registering your normalizer plugins to Groonga. A normalizer module is attached to a table. A table can have zero or one normalizer module. You can attach a normalizer module to a table by table-create-normalizer option in /reference/commands/table_create. Here is an example table_create that uses NormalizerAuto normalizer module: Execution example: table_create Dictionary TABLE_HASH_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] NOTE: Groonga 2.0.9 or earlier doesn't have --normalizer option in table_create. KEY_NORMALIZE flag was used instead. You can open an old database by Groonga 2.1.0 or later. An old database means that the database is created by Groonga 2.0.9 or earlier. But you cannot open the opened old database by Groonga 2.0.9 or earlier. Once you open the old database by Groonga 2.1.0 or later, KEY_NORMALIZE flag information in the old database is converted to normalizer information. So Groonga 2.0.9 or earlier cannot find KEY_NORMALIZE flag information in the opened old database. Keys of a table that has a normalizer module are normalized: Execution example: load --table Dictionary [ {"_key": "Apple"}, {"_key": "black"}, {"_key": "COLOR"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select Dictionary # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 1, # "apple" # ], # [ # 2, # "black" # ], # [ # 3, # "color" # ] # ] # ] # ] NormalizerAuto normalizer normalizes a text as a downcased text. For example, "Apple" is normalized to "apple", "black" is normalized to "blank" and "COLOR" is normalized to "color". If a table is a lexicon for fulltext search, tokenized tokens are normalized. Because tokens are stored as table keys. Table keys are normalized as described above. Built-in normalizers Here is a list of built-in normalizers: • NormalizerAuto • NormalizerNFKC51 NormalizerAuto Normally you should use NormalizerAuto normalizer. NormalizerAuto was the normalizer for Groonga 2.0.9 or earlier. KEY_NORMALIZE flag in table_create on Groonga 2.0.9 or earlier equals to --normalizer NormalizerAuto option in table_create on Groonga 2.1.0 or later. NormalizerAuto supports all encoding. It uses Unicode NFKC (Normalization Form Compatibility Composition) for UTF-8 encoding text. It uses encoding specific original normalization for other encodings. The results of those original normalization are similar to NFKC. For example, half-width katakana (such as U+FF76 HALFWIDTH KATAKANA LETTER KA) + half-width katakana voiced sound mark (U+FF9E HALFWIDTH KATAKANA VOICED SOUND MARK) is normalized to full-width katakana with voiced sound mark (U+30AC KATAKANA LETTER GA). The former is two characters but the latter is one character. Here is an example that uses NormalizerAuto normalizer: Execution example: table_create NormalLexicon TABLE_HASH_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] NormalizerNFKC51 NormalizerNFKC51 normalizes texts by Unicode NFKC (Normalization Form Compatibility Composition) for Unicode version 5.1. It supports only UTF-8 encoding. Normally you don't need to use NormalizerNFKC51 explicitly. You can use NormalizerAuto instead. Here is an example that uses NormalizerNFKC51 normalizer: Execution example: table_create NFKC51Lexicon TABLE_HASH_KEY ShortText --normalizer NormalizerNFKC51 # [[0, 1337566253.89858, 0.000355720520019531], true] Additional normalizers There are additional normalizers: • groonga-normalizer-mysql See also • /reference/commands/table_create Tokenizers Summary Groonga has tokenizer module that tokenizes text. It is used when the following cases: • Indexing text [image] Tokenizer is used when indexing text..UNINDENT • Searching by query [image] Tokenizer is used when searching by query..UNINDENT Tokenizer is an important module for full-text search. You can change trade-off between precision and recall by changing tokenizer. Normally, TokenBigram is a suitable tokenizer. If you don't know much about tokenizer, it's recommended that you choose TokenBigram. You can try a tokenizer by /reference/commands/tokenize and /reference/commands/table_tokenize. Here is an example to try TokenBigram tokenizer by /reference/commands/tokenize: Execution example: tokenize TokenBigram "Hello World" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "He" # }, # { # "position": 1, # "force_prefix": false, # "value": "el" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lo" # }, # { # "position": 4, # "force_prefix": false, # "value": "o " # }, # { # "position": 5, # "force_prefix": false, # "value": " W" # }, # { # "position": 6, # "force_prefix": false, # "value": "Wo" # }, # { # "position": 7, # "force_prefix": false, # "value": "or" # }, # { # "position": 8, # "force_prefix": false, # "value": "rl" # }, # { # "position": 9, # "force_prefix": false, # "value": "ld" # }, # { # "position": 10, # "force_prefix": false, # "value": "d" # } # ] # ] What is tokenize ? "tokenize" is the process that extracts zero or more tokens from a text. There are some "tokenize" methods. For example, Hello World is tokenized to the following tokens by bigram tokenize method: • He • el • ll • lo • o_ (_ means a white-space) • _W (_ means a white-space) • Wo • or • rl • ld In the above example, 10 tokens are extracted from one text Hello World. For example, Hello World is tokenized to the following tokens by white-space-separate tokenize method: • Hello • World In the above example, 2 tokens are extracted from one text Hello World. Token is used as search key. You can find indexed documents only by tokens that are extracted by used tokenize method. For example, you can find Hello World by ll with bigram tokenize method but you can't find Hello World by ll with white-space-separate tokenize method. Because white-space-separate tokenize method doesn't extract ll token. It just extracts Hello and World tokens. In general, tokenize method that generates small tokens increases recall but decreases precision. Tokenize method that generates large tokens increases precision but decreases recall. For example, we can find Hello World and A or B by or with bigram tokenize method. Hello World is a noise for people who wants to search "logical and". It means that precision is decreased. But recall is increased. We can find only A or B by or with white-space-separate tokenize method. Because World is tokenized to one token World with white-space-separate tokenize method. It means that precision is increased for people who wants to search "logical and". But recall is decreased because Hello World that contains or isn't found. Built-in tokenizsers Here is a list of built-in tokenizers: • TokenBigram • TokenBigramSplitSymbol • TokenBigramSplitSymbolAlpha • TokenBigramSplitSymbolAlphaDigit • TokenBigramIgnoreBlank • TokenBigramIgnoreBlankSplitSymbol • TokenBigramIgnoreBlankSplitAlpha • TokenBigramIgnoreBlankSplitAlphaDigit • TokenUnigram • TokenTrigram • TokenDelimit • TokenDelimitNull • TokenMecab • TokenRegexp TokenBigram TokenBigram is a bigram based tokenizer. It's recommended to use this tokenizer for most cases. Bigram tokenize method tokenizes a text to two adjacent characters tokens. For example, Hello is tokenized to the following tokens: • He • el • ll • lo Bigram tokenize method is good for recall because you can find all texts by query consists of two or more characters. In general, you can't find all texts by query consists of one character because one character token doesn't exist. But you can find all texts by query consists of one character in Groonga. Because Groonga find tokens that start with query by predictive search. For example, Groonga can find ll and lo tokens by l query. Bigram tokenize method isn't good for precision because you can find texts that includes query in word. For example, you can find world by or. This is more sensitive for ASCII only languages rather than non-ASCII languages. TokenBigram has solution for this problem described in the below. TokenBigram behavior is different when it's worked with any /reference/normalizers. If no normalizer is used, TokenBigram uses pure bigram (all tokens except the last token have two characters) tokenize method: Execution example: tokenize TokenBigram "Hello World" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "He" # }, # { # "position": 1, # "force_prefix": false, # "value": "el" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lo" # }, # { # "position": 4, # "force_prefix": false, # "value": "o " # }, # { # "position": 5, # "force_prefix": false, # "value": " W" # }, # { # "position": 6, # "force_prefix": false, # "value": "Wo" # }, # { # "position": 7, # "force_prefix": false, # "value": "or" # }, # { # "position": 8, # "force_prefix": false, # "value": "rl" # }, # { # "position": 9, # "force_prefix": false, # "value": "ld" # }, # { # "position": 10, # "force_prefix": false, # "value": "d" # } # ] # ] If normalizer is used, TokenBigram uses white-space-separate like tokenize method for ASCII characters. TokenBigram uses bigram tokenize method for non-ASCII characters. You may be confused with this combined behavior. But it's reasonable for most use cases such as English text (only ASCII characters) and Japanese text (ASCII and non-ASCII characters are mixed). Most languages consists of only ASCII characters use white-space for word separator. White-space-separate tokenize method is suitable for the case. Languages consists of non-ASCII characters don't use white-space for word separator. Bigram tokenize method is suitable for the case. Mixed tokenize method is suitable for mixed language case. If you want to use bigram tokenize method for ASCII character, see TokenBigramSplitXXX type tokenizers such as TokenBigramSplitSymbolAlpha. Let's confirm TokenBigram behavior by example. TokenBigram uses one or more white-spaces as token delimiter for ASCII characters: Execution example: tokenize TokenBigram "Hello World" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "hello" # }, # { # "position": 1, # "force_prefix": false, # "value": "world" # } # ] # ] TokenBigram uses character type change as token delimiter for ASCII characters. Character type is one of them: • Alphabet • Digit • Symbol (such as (, ) and !) • Hiragana • Katakana • Kanji • Others The following example shows two token delimiters: • at between 100 (digits) and cents (alphabets) • at between cents (alphabets) and !!! (symbols) Execution example: tokenize TokenBigram "100cents!!!" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "100" # }, # { # "position": 1, # "force_prefix": false, # "value": "cents" # }, # { # "position": 2, # "force_prefix": false, # "value": "!!!" # } # ] # ] Here is an example that TokenBigram uses bigram tokenize method for non-ASCII characters. Execution example: tokenize TokenBigram "日本語の勉強" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "日本" # }, # { # "position": 1, # "force_prefix": false, # "value": "本語" # }, # { # "position": 2, # "force_prefix": false, # "value": "語の" # }, # { # "position": 3, # "force_prefix": false, # "value": "の勉" # }, # { # "position": 4, # "force_prefix": false, # "value": "勉強" # }, # { # "position": 5, # "force_prefix": false, # "value": "強" # } # ] # ] TokenBigramSplitSymbol TokenBigramSplitSymbol is similar to TokenBigram. The difference between them is symbol handling. TokenBigramSplitSymbol tokenizes symbols by bigram tokenize method: Execution example: tokenize TokenBigramSplitSymbol "100cents!!!" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "100" # }, # { # "position": 1, # "force_prefix": false, # "value": "cents" # }, # { # "position": 2, # "force_prefix": false, # "value": "!!" # }, # { # "position": 3, # "force_prefix": false, # "value": "!!" # }, # { # "position": 4, # "force_prefix": false, # "value": "!" # } # ] # ] TokenBigramSplitSymbolAlpha TokenBigramSplitSymbolAlpha is similar to TokenBigram. The difference between them is symbol and alphabet handling. TokenBigramSplitSymbolAlpha tokenizes symbols and alphabets by bigram tokenize method: Execution example: tokenize TokenBigramSplitSymbolAlpha "100cents!!!" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "100" # }, # { # "position": 1, # "force_prefix": false, # "value": "ce" # }, # { # "position": 2, # "force_prefix": false, # "value": "en" # }, # { # "position": 3, # "force_prefix": false, # "value": "nt" # }, # { # "position": 4, # "force_prefix": false, # "value": "ts" # }, # { # "position": 5, # "force_prefix": false, # "value": "s!" # }, # { # "position": 6, # "force_prefix": false, # "value": "!!" # }, # { # "position": 7, # "force_prefix": false, # "value": "!!" # }, # { # "position": 8, # "force_prefix": false, # "value": "!" # } # ] # ] TokenBigramSplitSymbolAlphaDigit TokenBigramSplitSymbolAlphaDigit is similar to TokenBigram. The difference between them is symbol, alphabet and digit handling. TokenBigramSplitSymbolAlphaDigit tokenizes symbols, alphabets and digits by bigram tokenize method. It means that all characters are tokenized by bigram tokenize method: Execution example: tokenize TokenBigramSplitSymbolAlphaDigit "100cents!!!" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "10" # }, # { # "position": 1, # "force_prefix": false, # "value": "00" # }, # { # "position": 2, # "force_prefix": false, # "value": "0c" # }, # { # "position": 3, # "force_prefix": false, # "value": "ce" # }, # { # "position": 4, # "force_prefix": false, # "value": "en" # }, # { # "position": 5, # "force_prefix": false, # "value": "nt" # }, # { # "position": 6, # "force_prefix": false, # "value": "ts" # }, # { # "position": 7, # "force_prefix": false, # "value": "s!" # }, # { # "position": 8, # "force_prefix": false, # "value": "!!" # }, # { # "position": 9, # "force_prefix": false, # "value": "!!" # }, # { # "position": 10, # "force_prefix": false, # "value": "!" # } # ] # ] TokenBigramIgnoreBlank TokenBigramIgnoreBlank is similar to TokenBigram. The difference between them is blank handling. TokenBigramIgnoreBlank ignores white-spaces in continuous symbols and non-ASCII characters. You can find difference of them by 日 本 語 ! ! ! text because it has symbols and non-ASCII characters. Here is a result by TokenBigram : Execution example: tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "日" # }, # { # "position": 1, # "force_prefix": false, # "value": "本" # }, # { # "position": 2, # "force_prefix": false, # "value": "語" # }, # { # "position": 3, # "force_prefix": false, # "value": "!" # }, # { # "position": 4, # "force_prefix": false, # "value": "!" # }, # { # "position": 5, # "force_prefix": false, # "value": "!" # } # ] # ] Here is a result by TokenBigramIgnoreBlank: Execution example: tokenize TokenBigramIgnoreBlank "日 本 語 ! ! !" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "日本" # }, # { # "position": 1, # "force_prefix": false, # "value": "本語" # }, # { # "position": 2, # "force_prefix": false, # "value": "語" # }, # { # "position": 3, # "force_prefix": false, # "value": "!!!" # } # ] # ] TokenBigramIgnoreBlankSplitSymbol TokenBigramIgnoreBlankSplitSymbol is similar to TokenBigram. The differences between them are the followings: • Blank handling • Symbol handling TokenBigramIgnoreBlankSplitSymbol ignores white-spaces in continuous symbols and non-ASCII characters. TokenBigramIgnoreBlankSplitSymbol tokenizes symbols by bigram tokenize method. You can find difference of them by 日 本 語 ! ! ! text because it has symbols and non-ASCII characters. Here is a result by TokenBigram : Execution example: tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "日" # }, # { # "position": 1, # "force_prefix": false, # "value": "本" # }, # { # "position": 2, # "force_prefix": false, # "value": "語" # }, # { # "position": 3, # "force_prefix": false, # "value": "!" # }, # { # "position": 4, # "force_prefix": false, # "value": "!" # }, # { # "position": 5, # "force_prefix": false, # "value": "!" # } # ] # ] Here is a result by TokenBigramIgnoreBlankSplitSymbol: Execution example: tokenize TokenBigramIgnoreBlankSplitSymbol "日 本 語 ! ! !" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "日本" # }, # { # "position": 1, # "force_prefix": false, # "value": "本語" # }, # { # "position": 2, # "force_prefix": false, # "value": "語!" # }, # { # "position": 3, # "force_prefix": false, # "value": "!!" # }, # { # "position": 4, # "force_prefix": false, # "value": "!!" # }, # { # "position": 5, # "force_prefix": false, # "value": "!" # } # ] # ] TokenBigramIgnoreBlankSplitSymbolAlpha TokenBigramIgnoreBlankSplitSymbolAlpha is similar to TokenBigram. The differences between them are the followings: • Blank handling • Symbol and alphabet handling TokenBigramIgnoreBlankSplitSymbolAlpha ignores white-spaces in continuous symbols and non-ASCII characters. TokenBigramIgnoreBlankSplitSymbolAlpha tokenizes symbols and alphabets by bigram tokenize method. You can find difference of them by Hello 日 本 語 ! ! ! text because it has symbols and non-ASCII characters with white spaces and alphabets. Here is a result by TokenBigram : Execution example: tokenize TokenBigram "Hello 日 本 語 ! ! !" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "hello" # }, # { # "position": 1, # "force_prefix": false, # "value": "日" # }, # { # "position": 2, # "force_prefix": false, # "value": "本" # }, # { # "position": 3, # "force_prefix": false, # "value": "語" # }, # { # "position": 4, # "force_prefix": false, # "value": "!" # }, # { # "position": 5, # "force_prefix": false, # "value": "!" # }, # { # "position": 6, # "force_prefix": false, # "value": "!" # } # ] # ] Here is a result by TokenBigramIgnoreBlankSplitSymbolAlpha: Execution example: tokenize TokenBigramIgnoreBlankSplitSymbolAlpha "Hello 日 本 語 ! ! !" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "he" # }, # { # "position": 1, # "force_prefix": false, # "value": "el" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lo" # }, # { # "position": 4, # "force_prefix": false, # "value": "o日" # }, # { # "position": 5, # "force_prefix": false, # "value": "日本" # }, # { # "position": 6, # "force_prefix": false, # "value": "本語" # }, # { # "position": 7, # "force_prefix": false, # "value": "語!" # }, # { # "position": 8, # "force_prefix": false, # "value": "!!" # }, # { # "position": 9, # "force_prefix": false, # "value": "!!" # }, # { # "position": 10, # "force_prefix": false, # "value": "!" # } # ] # ] TokenBigramIgnoreBlankSplitSymbolAlphaDigit TokenBigramIgnoreBlankSplitSymbolAlphaDigit is similar to TokenBigram. The differences between them are the followings: • Blank handling • Symbol, alphabet and digit handling TokenBigramIgnoreBlankSplitSymbolAlphaDigit ignores white-spaces in continuous symbols and non-ASCII characters. TokenBigramIgnoreBlankSplitSymbolAlphaDigit tokenizes symbols, alphabets and digits by bigram tokenize method. It means that all characters are tokenized by bigram tokenize method. You can find difference of them by Hello 日 本 語 ! ! ! 777 text because it has symbols and non-ASCII characters with white spaces, alphabets and digits. Here is a result by TokenBigram : Execution example: tokenize TokenBigram "Hello 日 本 語 ! ! ! 777" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "hello" # }, # { # "position": 1, # "force_prefix": false, # "value": "日" # }, # { # "position": 2, # "force_prefix": false, # "value": "本" # }, # { # "position": 3, # "force_prefix": false, # "value": "語" # }, # { # "position": 4, # "force_prefix": false, # "value": "!" # }, # { # "position": 5, # "force_prefix": false, # "value": "!" # }, # { # "position": 6, # "force_prefix": false, # "value": "!" # }, # { # "position": 7, # "force_prefix": false, # "value": "777" # } # ] # ] Here is a result by TokenBigramIgnoreBlankSplitSymbolAlphaDigit: Execution example: tokenize TokenBigramIgnoreBlankSplitSymbolAlphaDigit "Hello 日 本 語 ! ! ! 777" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "he" # }, # { # "position": 1, # "force_prefix": false, # "value": "el" # }, # { # "position": 2, # "force_prefix": false, # "value": "ll" # }, # { # "position": 3, # "force_prefix": false, # "value": "lo" # }, # { # "position": 4, # "force_prefix": false, # "value": "o日" # }, # { # "position": 5, # "force_prefix": false, # "value": "日本" # }, # { # "position": 6, # "force_prefix": false, # "value": "本語" # }, # { # "position": 7, # "force_prefix": false, # "value": "語!" # }, # { # "position": 8, # "force_prefix": false, # "value": "!!" # }, # { # "position": 9, # "force_prefix": false, # "value": "!!" # }, # { # "position": 10, # "force_prefix": false, # "value": "!7" # }, # { # "position": 11, # "force_prefix": false, # "value": "77" # }, # { # "position": 12, # "force_prefix": false, # "value": "77" # }, # { # "position": 13, # "force_prefix": false, # "value": "7" # } # ] # ] TokenUnigram TokenUnigram is similar to TokenBigram. The differences between them is token unit. TokenBigram uses 2 characters per token. TokenUnigram uses 1 character per token. Execution example: tokenize TokenUnigram "100cents!!!" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "100" # }, # { # "position": 1, # "force_prefix": false, # "value": "cents" # }, # { # "position": 2, # "force_prefix": false, # "value": "!!!" # } # ] # ] TokenTrigram TokenTrigram is similar to TokenBigram. The differences between them is token unit. TokenBigram uses 2 characters per token. TokenTrigram uses 3 characters per token. Execution example: tokenize TokenTrigram "10000cents!!!!!" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "10000" # }, # { # "position": 1, # "force_prefix": false, # "value": "cents" # }, # { # "position": 2, # "force_prefix": false, # "value": "!!!!!" # } # ] # ] TokenDelimit TokenDelimit extracts token by splitting one or more space characters (U+0020). For example, Hello World is tokenized to Hello and World. TokenDelimit is suitable for tag text. You can extract groonga and full-text-search and http as tags from groonga full-text-search http. Here is an example of TokenDelimit: Execution example: tokenize TokenDelimit "Groonga full-text-search HTTP" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "groonga" # }, # { # "position": 1, # "force_prefix": false, # "value": "full-text-search" # }, # { # "position": 2, # "force_prefix": false, # "value": "http" # } # ] # ] TokenDelimitNull TokenDelimitNull is similar to TokenDelimit. The difference between them is separator character. TokenDelimit uses space character (U+0020) but TokenDelimitNull uses NUL character (U+0000). TokenDelimitNull is also suitable for tag text. Here is an example of TokenDelimitNull: Execution example: tokenize TokenDelimitNull "Groonga\u0000full-text-search\u0000HTTP" NormalizerAuto # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "groongau0000full-text-searchu0000http" # } # ] # ] TokenMecab TokenMecab is a tokenizer based on MeCab part-of-speech and morphological analyzer. MeCab doesn't depend on Japanese. You can use MeCab for other languages by creating dictionary for the languages. You can use NAIST Japanese Dictionary for Japanese. TokenMecab is good for precision rather than recall. You can find 東京都 and 京都 texts by 京都 query with TokenBigram but 東京都 isn't expected. You can find only 京都 text by 京都 query with TokenMecab. If you want to support neologisms, you need to keep updating your MeCab dictionary. It needs maintain cost. (TokenBigram doesn't require dictionary maintenance because TokenBigram doesn't use dictionary.) mecab-ipadic-NEologd : Neologism dictionary for MeCab may help you. Here is an example of TokenMeCab. 東京都 is tokenized to 東京 and 都. They don't include 京都: Execution example: tokenize TokenMecab "東京都" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "東京" # }, # { # "position": 1, # "force_prefix": false, # "value": "都" # } # ] # ] TokenRegexp New in version 5.0.1. CAUTION: This tokenizer is experimental. Specification may be changed. CAUTION: This tokenizer can be used only with UTF-8. You can't use this tokenizer with EUC-JP, Shift_JIS and so on. TokenRegexp is a tokenizer for supporting regular expression search by index. In general, regular expression search is evaluated as sequential search. But the following cases can be evaluated as index search: • Literal only case such as hello • The beginning of text and literal case such as \A/home/alice • The end of text and literal case such as \.txt\z In most cases, index search is faster than sequential search. TokenRegexp is based on bigram tokenize method. TokenRegexp adds the beginning of text mark (U+FFEF) at the begging of text and the end of text mark (U+FFF0) to the end of text when you index text: Execution example: tokenize TokenRegexp "/home/alice/test.txt" NormalizerAuto --mode ADD # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # { # "position": 0, # "force_prefix": false, # "value": "" # }, # { # "position": 1, # "force_prefix": false, # "value": "/h" # }, # { # "position": 2, # "force_prefix": false, # "value": "ho" # }, # { # "position": 3, # "force_prefix": false, # "value": "om" # }, # { # "position": 4, # "force_prefix": false, # "value": "me" # }, # { # "position": 5, # "force_prefix": false, # "value": "e/" # }, # { # "position": 6, # "force_prefix": false, # "value": "/a" # }, # { # "position": 7, # "force_prefix": false, # "value": "al" # }, # { # "position": 8, # "force_prefix": false, # "value": "li" # }, # { # "position": 9, # "force_prefix": false, # "value": "ic" # }, # { # "position": 10, # "force_prefix": false, # "value": "ce" # }, # { # "position": 11, # "force_prefix": false, # "value": "e/" # }, # { # "position": 12, # "force_prefix": false, # "value": "/t" # }, # { # "position": 13, # "force_prefix": false, # "value": "te" # }, # { # "position": 14, # "force_prefix": false, # "value": "es" # }, # { # "position": 15, # "force_prefix": false, # "value": "st" # }, # { # "position": 16, # "force_prefix": false, # "value": "t." # }, # { # "position": 17, # "force_prefix": false, # "value": ".t" # }, # { # "position": 18, # "force_prefix": false, # "value": "tx" # }, # { # "position": 19, # "force_prefix": false, # "value": "xt" # }, # { # "position": 20, # "force_prefix": false, # "value": "t" # }, # { # "position": 21, # "force_prefix": false, # "value": "" # } # ] # ] Token filters Summary Groonga has token filter module that some processes tokenized token. Token filter module can be added as a plugin. You can customize tokenized token by registering your token filters plugins to Groonga. A table can have zero or more token filters. You can attach token filters to a table by table-create-token-filters option in /reference/commands/table_create. Here is an example table_create that uses TokenFilterStopWord token filter module: Execution example: register token_filters/stop_word # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStopWord # [[0, 1337566253.89858, 0.000355720520019531], true] Available token filters Here is the list of available token filters: • TokenFilterStopWord • TokenFilterStem TokenFilterStopWord TokenFilterStopWord removes stop words from tokenized token in searching the documents. TokenFilterStopWord can specify stop word after adding the documents because it removes token in searching the documents. The stop word is specified is_stop_word column on lexicon table. Here is an example that uses TokenFilterStopWord token filter: Execution example: register token_filters/stop_word # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Memos TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStopWord # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms is_stop_word COLUMN_SCALAR Bool # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Terms [ {"_key": "and", "is_stop_word": true} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] load --table Memos [ {"content": "Hello"}, {"content": "Hello and Good-bye"}, {"content": "Good-bye"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select Memos --match_columns content --query "Hello and" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 1, # "Hello" # ], # [ # 2, # "Hello and Good-bye" # ] # ] # ] # ] and token is marked as stop word in Terms table. "Hello" that doesn't have and in content is matched. Because and is a stop word and and is removed from query. TokenFilterStem TokenFilterStem stems tokenized token. Here is an example that uses TokenFilterStem token filter: Execution example: register token_filters/stem # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Memos TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto \ --token_filters TokenFilterStem # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Memos [ {"content": "I develop Groonga"}, {"content": "I'm developing Groonga"}, {"content": "I developed Groonga"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] select Memos --match_columns content --query "develops" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 1, # "I develop Groonga" # ], # [ # 2, # "I'm developing Groonga" # ], # [ # 3, # "I developed Groonga" # ] # ] # ] # ] All of develop, developing, developed and develops tokens are stemmed as develop. So we can find develop, developing and developed by develops query. See also • /reference/commands/table_create Query expanders QueryExpanderTSV Summary QueryExpanderTSV is a query expander plugin that reads synonyms from TSV (Tab Separated Values) file. This plugin provides poor feature than the embedded query expansion feature. For example, it doesn't support word normalization. But it may be easy to use because you can manage your synonyms by TSV file. You can edit your synonyms by spreadsheet application such as Excel. With the embedded query expansion feature, you manage your synonyms by Groonga's table. Install You need to register query_expanders/tsv as a plugin before you use QueryExpanderTSV: plugin_register query_expanders/tsv Usage You just add --query_expander QueryExpanderTSV parameter to select command: select --query "QUERY" --query_expander QueryExpanderTSV If QUERY has registered synonyms, they are expanded. For example, there are the following synonyms. ┌────────┬───────────┬───────────────┐ │word │ synonym 1 │ synonym 2 │ ├────────┼───────────┼───────────────┤ │groonga │ groonga │ Senna │ ├────────┼───────────┼───────────────┤ │mroonga │ mroonga │ groonga MySQL │ └────────┴───────────┴───────────────┘ The table means that synonym 1 and synonym 2 are synonyms of word. For example, groonga and Senna are synonyms of groonga. And mroonga and groonga MySQL are synonyms of mroonga. Here is an example of query expnasion that uses groonga as query: select --query "groonga" --query_expander QueryExpanderTSV The above command equals to the following command: select --query "groonga OR Senna" --query_expander QueryExpanderTSV Here is another example of query expnasion that uses mroonga search as query: select --query "mroonga search" --query_expander QueryExpanderTSV The above command equals to the following command: select --query "(mroonga OR (groonga MySQL)) search" --query_expander QueryExpanderTSV It is important that registered words (groonga and mroonga) are only expanded to synonyms and not registered words (search) are not expanded. Query expansion isn't occurred recursively. groonga is appeared in (mroonga OR (groonga MySQL)) as query expansion result but it isn't expanded. Normally, you need to include word itself into synonyms. For example, groonga and mroonga are included in synonyms of themselves. If you want to ignore word itself, you don't include word itself into synonyms. For example, if you want to use query expansion as spelling correction, you should use the following synonyms. ┌───────┬─────────┐ │word │ synonym │ ├───────┼─────────┤ │gronga │ groonga │ └───────┴─────────┘ gronga in word has a typo. A o is missing. groonga in synonym is the correct word. Here is an example of using query expnasion as spelling correction: select --query "gronga" --query_expander QueryExpanderTSV The above command equals to the following command: select --query "groonga" --query_expander QueryExpanderTSV The former command has a typo in --query value but the latter command doesn't have any typos. TSV File Synonyms are defined in TSV format file. This section describes about it. Location The file name should be synonyms.tsv and it is located at configuration directory. For example, /etc/groonga/synonyms.tsv is a TSV file location. The location is decided at build time. You can change the location by environment variable GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE at run time: % env GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE=/tmp/synonyms.tsv groonga With the above command, /tmp/synonyms.tsv file is used. Format You can define zero or more synonyms in a TSV file. You define a word and synonyms pair by a line. word is expanded to synonyms in --query value. Synonyms are combined by OR. For example, groonga and Senna synonyms are expanded as groonga OR Senna. The first column is word and the rest columns are synonyms of the word. Here is a sample line for word is groonga and synonyms are groonga and Senna. (TAB) means a tab character (U+0009): groonga(TAB)groonga(TAB)Senna Comment line is supported. Lines that start with # are ignored. Here is an example for comment line. groonga line is ignored as comment line: #groonga(TAB)groonga(TAB)Senna mroonga(TAB)mroonga(TAB)groonga MySQL Limitation You need to restart groonga to reload your synonyms. TSV file is loaded only at the plugin load time. See also • select-query-expansion Scorer Summary Groonga has scorer module that customizes score function. Score function computes score of matched record. The default scorer function uses the number of appeared terms. It is also known as TF (term frequency). TF is a fast score function but it's not suitable for the following cases: • Search query contains one or more frequently-appearing words such as "the" and "a". • Document contains many same keywords such as "They are keyword, keyword, keyword ... and keyword". Search engine spammer may use the technique. Score function can solve these cases. For example, TF-IDF (term frequency-inverse document frequency) can solve the first case. Okapi BM25 can solve the second case. But their are slower than TF. Groonga provides TF-IDF based scorer as /reference/scorers/scorer_tf_idf but doesn't provide Okapi BM25 based scorer yet. You don't need to resolve scoring only by score function. Score function is highly depends on search query. You may be able to use metadata of matched record. For example, Google uses PageRank for scoring. You may be able to use data type ("title" data are important rather than "memo" data), tag, geolocation and so on. Please stop to think about only score function for scoring. Usage This section describes how to use scorer. Here are a schema definition and sample data to show usage. Sample schema: Execution example: table_create Memos TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms title_index COLUMN_INDEX|WITH_POSITION Memos title # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms content_index COLUMN_INDEX|WITH_POSITION Memos content # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Memos [ { "_key": "memo1", "title": "Groonga is easy", "content": "Groonga is very easy full text search engine!" }, { "_key": "memo2", "title": "Mroonga is easy", "content": "Mroonga is more easier full text search engine!" }, { "_key": "memo3", "title": "Rroonga is easy", "content": "Ruby is very helpful." }, { "_key": "memo4", "title": "Groonga is fast", "content": "Groonga! Groonga! Groonga! Groonga is very fast!" }, { "_key": "memo5", "title": "PGroonga is fast", "content": "PGroonga is very fast!" }, { "_key": "memo6", "title": "PGroonga is useful", "content": "SQL is easy because many client libraries exist." }, { "_key": "memo7", "title": "Mroonga is also useful", "content": "MySQL has replication feature. Mroonga can use it." } ] # [[0, 1337566253.89858, 0.000355720520019531], 7] You can specify custom score function in select-match-columns. There are some syntaxes. For score function that doesn't require any parameter such as /reference/scorers/scorer_tf_idf: SCORE_FUNCTION(COLUMN) You can specify weight: SCORE_FUNCTION(COLUMN) * WEIGHT For score function that requires one or more parameters such as /reference/scorers/scorer_tf_at_most: SCORE_FUNCTION(COLUMN, ARGUMENT1, ARGUMENT2, ...) You can specify weight: SCORE_FUNCTION(COLUMN, ARGUMENT1, ARGUMENT2, ...) * WEIGHT You can use different score function for each select-match-columns: SCORE_FUNCTION1(COLUMN1) || SCORE_FUNCTION2(COLUMN2) * WEIGHT || SCORE_FUNCTION3(COLUMN3, ARGUMENT1) || ... Here is a simplest example: Execution example: select Memos \ --match_columns "scorer_tf_idf(content)" \ --query "Groonga" \ --output_columns "content, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga! Groonga! Groonga! Groonga is very fast!", # 2 # ], # [ # "Groonga is very easy full text search engine!", # 1 # ] # ] # ] # ] Groonga! Groonga! Groonga! Groonga is very fast! contains 4 Groonga. If you use TF based scorer that is the default scorer, _score is 4. But the actual _score is 2. Because the select command uses TF-IDF based scorer scorer_tf_idf(). Here is an example that uses weight: Execution example: select Memos \ --match_columns "scorer_tf_idf(content) * 10" \ --query "Groonga" \ --output_columns "content, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga! Groonga! Groonga! Groonga is very fast!", # 22 # ], # [ # "Groonga is very easy full text search engine!", # 10 # ] # ] # ] # ] Groonga! Groonga! Groonga! Groonga is very fast! has 22 as _score. It had 2 as _score in the previous example that doesn't specify weight. Here is an example that uses scorer that requires one argument. /reference/scorers/scorer_tf_at_most scorer requires one argument. You can limit TF score by the scorer. Execution example: select Memos \ --match_columns "scorer_tf_at_most(content, 2.0)" \ --query "Groonga" \ --output_columns "content, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga! Groonga! Groonga! Groonga is very fast!", # 2 # ], # [ # "Groonga is very easy full text search engine!", # 1 # ] # ] # ] # ] Groonga! Groonga! Groonga! Groonga is very fast! contains 4 Groonga. If you use normal TF based scorer that is the default scorer, _score is 4. But the actual _score is 2. Because the scorer used in the select command limits the maximum score value to 2. Here is an example that uses multiple scorers: Execution example: select Memos \ --match_columns "scorer_tf_idf(title) || scorer_tf_at_most(content, 2.0)" \ --query "Groonga" \ --output_columns "title, content, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "title", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga is fast", # "Groonga! Groonga! Groonga! Groonga is very fast!", # 3 # ], # [ # "Groonga is easy", # "Groonga is very easy full text search engine!", # 2 # ] # ] # ] # ] The --match_columns uses scorer_tf_idf(title) and scorer_tf_at_most(content, 2.0). _score value is sum of them. You can use the default scorer and custom scorer in the same --match_columns. You can use the default scorer by just specifying a match column: Execution example: select Memos \ --match_columns "title || scorer_tf_at_most(content, 2.0)" \ --query "Groonga" \ --output_columns "title, content, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "title", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Groonga is fast", # "Groonga! Groonga! Groonga! Groonga is very fast!", # 3 # ], # [ # "Groonga is easy", # "Groonga is very easy full text search engine!", # 2 # ] # ] # ] # ] The --match_columns uses the default scorer (TF) for title and /reference/scorers/scorer_tf_at_most for content. _score value is sum of them. Built-in scorers Here are built-in scores: scorer_tf_at_most NOTE: This scorer is an experimental feature. New in version 5.0.1. Summary scorer_tf_at_most is a scorer based on TF (term frequency). TF based scorer includes TF-IDF based scorer has a problem for the following case: If document contains many same keywords such as "They are keyword, keyword, keyword ... and keyword", the document has high score. It's not expected. Search engine spammer may use the technique. scorer_tf_at_most is a TF based scorer but it can solve the case. scorer_tf_at_most limits the maximum score value. It means that scorer_tf_at_most limits effect of a match. If document contains many same keywords such as "They are keyword, keyword, keyword ... and keyword", scorer_tf_at_most(column, 2.0) returns at most 2 as score. You don't need to resolve scoring only by score function. Score function is highly depends on search query. You may be able to use metadata of matched record. For example, Google uses PageRank for scoring. You may be able to use data type ("title" data are important rather than "memo" data), tag, geolocation and so on. Please stop to think about only score function for scoring. Syntax This scorer has two parameters: scorer_tf_at_most(column, max) scorer_tf_at_most(index, max) Usage This section describes how to use this scorer. Here are a schema definition and sample data to show usage. Sample schema: Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs message COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms message_index COLUMN_INDEX|WITH_POSITION Logs message # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Logs [ {"message": "Notice"}, {"message": "Notice Notice"}, {"message": "Notice Notice Notice"}, {"message": "Notice Notice Notice Notice"}, {"message": "Notice Notice Notice Notice Notice"} ] # [[0, 1337566253.89858, 0.000355720520019531], 5] You specify scorer_tf_at_most in select-match-columns like the following: Execution example: select Logs \ --match_columns "scorer_tf_at_most(message, 3.0)" \ --query "Notice" \ --output_columns "message, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "message", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Notice Notice Notice Notice Notice", # 3 # ], # [ # "Notice Notice Notice Notice", # 3 # ], # [ # "Notice Notice Notice", # 3 # ], # [ # "Notice Notice", # 2 # ], # [ # "Notice", # 1 # ] # ] # ] # ] If a document has three or more Notice terms, its score is 3. Because the select specify 3.0 as the max score. If a document has one or two Notice terms, its score is 1 or 2. Because the score is less than 3.0 specified as the max score. Parameters This section describes all parameters. Required parameters There is only one required parameters. column The data column that is match target. The data column must be indexed. index The index column to be used for search. Optional parameters There is no optional parameter. Return value This scorer returns score as builtin-type-float. /reference/commands/select returns _score as Int32 not Float. Because it casts to Int32 from Float for keeping backward compatibility. Score is computed as TF with limitation. See also • ../scorer scorer_tf_idf NOTE: This scorer is an experimental feature. New in version 5.0.1. Summary scorer_tf_idf is a scorer based of TF-IDF (term frequency-inverse document frequency) score function. To put it simply, TF (term frequency) divided by DF (document frequency) is TF-IDF. "TF" means that "the number of occurrences is more important". "TF divided by DF" means that "the number of occurrences of important term is more important". The default score function in Groonga is TF (term frequency). It doesn't care about term importance but is fast. TF-IDF cares about term importance but is slower than TF. TF-IDF will compute more suitable score rather than TF for many cases. But it's not perfect. If document contains many same keywords such as "They are keyword, keyword, keyword ... and keyword", it increases score by TF and TF-IDF. Search engine spammer may use the technique. But TF-IDF doesn't guard from the technique. Okapi BM25 can solve the case. But it's more slower than TF-IDF and not implemented yet in Groonga. Groonga provides scorer_tf_at_most scorer that can also solve the case. You don't need to resolve scoring only by score function. Score function is highly depends on search query. You may be able to use metadata of matched record. For example, Google uses PageRank for scoring. You may be able to use data type ("title" data are important rather than "memo" data), tag, geolocation and so on. Please stop to think about only score function for scoring. Syntax This scorer has only one parameter: scorer_tf_idf(column) scorer_tf_idf(index) Usage This section describes how to use this scorer. Here are a schema definition and sample data to show usage. Sample schema: Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs message COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenBigram \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms message_index COLUMN_INDEX|WITH_POSITION Logs message # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Logs [ {"message": "Error"}, {"message": "Warning"}, {"message": "Warning Warning"}, {"message": "Warning Warning Warning"}, {"message": "Info"}, {"message": "Info Info"}, {"message": "Info Info Info"}, {"message": "Info Info Info Info"}, {"message": "Notice"}, {"message": "Notice Notice"}, {"message": "Notice Notice Notice"}, {"message": "Notice Notice Notice Notice"}, {"message": "Notice Notice Notice Notice Notice"} ] # [[0, 1337566253.89858, 0.000355720520019531], 13] You specify scorer_tf_idf in select-match-columns like the following: Execution example: select Logs \ --match_columns "scorer_tf_idf(message)" \ --query "Error OR Info" \ --output_columns "message, _score" \ --sortby "-_score" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "message", # "Text" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Info Info Info Info", # 3 # ], # [ # "Error", # 2 # ], # [ # "Info Info Info", # 2 # ], # [ # "Info Info", # 1 # ], # [ # "Info", # 1 # ] # ] # ] # ] Both the score of Info Info Info and the score of Error are 2 even Info Info Info includes three Info terms. Because Error is more important term rather than Info. The number of documents that include Info is 4. The number of documents that include Error is 1. Term that is included in less documents means that the term is more characteristic term. Characteristic term is important term. Parameters This section describes all parameters. Required parameters There is only one required parameters. column The data column that is match target. The data column must be indexed. index The index column to be used for search. Optional parameters There is no optional parameter. Return value This scorer returns score as builtin-type-float. /reference/commands/select returns _score as Int32 not Float. Because it casts to Int32 from Float for keeping backward compatibility. Score is computed as TF-IDF based algorithm. See also • ../scorer grn_expr Grn_expr is an object that searches records with specified conditions and manipulates a database. It's pronounced as gurun expression. Conditions for searching records from a database can be represented by conbining condition expressions such as equal condition expression and less than condition expression with set operations such as AND, OR and NOT. Grn_expr executes those conditions to search records. You can also use advanced searches such as similar search and near search by grn_expr. You can also use flexible full text search. For example, you can control hit scores for specified words and improve recall by re-searching with high-recall algolithm dinamically. To determine whether re-searching or not, the number of matched rescords is used. There are three ways to create grn_expr: • Parsing /reference/grn_expr/query_syntax string. • Parsing /reference/grn_expr/script_syntax string. • Calling grn_expr related APIs. /reference/grn_expr/query_syntax is for common search form in Internet search site. It's simple and easy to use but it has a limitation. You can not use all condition expressions and set operations in /reference/grn_expr/query_syntax. You can use /reference/grn_expr/query_syntax with query option in /reference/commands/select. /reference/grn_expr/script_syntax is ECMAScript like syntax. You can use all condition expressions and set operations in /reference/grn_expr/script_syntax. You can use /reference/grn_expr/script_syntax with filter option and scorer option in /reference/commands/select. You can use groonga as a library and create a grn_expr by calling grn_expr related APIs. You can use full features with calling APIs like /reference/grn_expr/script_syntax. Calling APIs is useful creating a custom syntax to create grn_expr. They are used in rroonga that is Ruby bindings of Groonga. Rroonga can create a grn_expr by Ruby's syntax instead of parsing string. Query syntax Query syntax is a syntax to specify search condition for common Web search form. It is similar to the syntax of Google's search form. For example, word1 word2 means that groonga searches records that contain both word1 and word2. word1 OR word2 means that groogna searches records that contain either word1 or word2. Query syntax consists of conditional expression, combind expression and assignment expression. Nomrally assignment expression can be ignored. Because assignment expression is disabled in --query option of /reference/commands/select. You can use it if you use groonga as library and customize query syntax parser options. Conditinal expression specifies an condition. Combinded expression consists of one or more conditional expression, combined expression or assignment expression. Assignment expression can assigns a column to a value. Sample data Here are a schema definition and sample data to show usage. Execution example: table_create Entries TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries n_likes COLUMN_SCALAR UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"_key": "The first post!", "content": "Welcome! This is my first post!", "n_likes": 5}, {"_key": "Groonga", "content": "I started to use Groonga. It's very fast!", "n_likes": 10}, {"_key": "Mroonga", "content": "I also started to use Mroonga. It's also very fast! Really fast!", "n_likes": 15}, {"_key": "Good-bye Senna", "content": "I migrated all Senna system!", "n_likes": 3}, {"_key": "Good-bye Tritonn", "content": "I also migrated all Tritonn system!", "n_likes": 3} ] # [[0, 1337566253.89858, 0.000355720520019531], 5] There is a table, Entries, for blog entries. An entry has title, content and the number of likes for the entry. Title is key of Entries. Content is value of Entries.content column. The number of likes is value of Entries.n_likes column. Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So both Entries._key and Entries.content are fulltext search ready. OK. The schema and data for examples are ready. Escape There are special characters in query syntax. To use a special character as itself, it should be escaped by prepending \. For example, " is a special character. It is escaped as \". Here is a special character list: • [space] (escaped as [backslash][space]) (You should substitute [space] with a white space character that is 0x20 in ASCII and [backslash] with \\.) • " (escaped as \") • ' (escaped as \') • ( (escaped as \() • ) (escaped as \)) • \ (escaped as \\) You can use quote instead of escape special characters except \ (backslash). You need to use backslash for escaping backslash like \\ in quote. Quote syntax is "..." or '...'. You need escape " as \" in "..." quote syntax. You need escape ' as \' in '...' quote syntax. For example, Alice's brother (Bob) can be quoted "Alice's brother (Bob)" or 'Alice\'s brother (Bob)'. NOTE: There is an important point which you have to care. The \ (backslash) character is interpreted by command line shell. So if you want to search ( itself for example, you need to escape twice (\\() in command line shell. The command line shell interprets \\( as \(, then pass such a literal to Groonga. Groonga regards \( as (, then search ( itself from database. If you can't do intended search by Groonga, confirm whether special character is escaped properly. Conditional expression Here is available conditional expression list. Full text search condition Its syntax is keyword. Full text search condition specifies a full text search condition against the default match columns. Match columns are full text search target columns. You should specify the default match columns for full text search. They can be specified by --match_columns option of /reference/commands/select. If you don't specify the default match columns, this conditional expression fails. This conditional expression does full text search with keyword. keyword should not contain any spaces. If keyword contains a space such as search keyword, it means two full text search conditions; search and keyword. If you want to specifies a keyword that contains one or more spaces, you can use phrase search condition that is described below. Here is a simple example. Execution example: select Entries --match_columns content --query fast # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that contain a word fast in content column value. content column is the default match column. Phrase search condition Its syntax is "search keyword". Phrase search condition specifies a phrase search condition against the default match columns. You should specify the default match columns for full text search. They can be specified by --match_columns option of /reference/commands/select. If you don't specify the default match columns, this conditional expression fails. This conditional expression does phrase search with search keyword. Phrase search searches records that contain search and keyword and those terms are appeared in the same order and adjacent. Thus, Put a search keyword in the form is matched but Search by the keyword and There is a keyword. Search by it! aren't matched. Here is a simple example. Execution example: select Entries --match_columns content --query '"I started"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that contain a phrase I started in content column value. I also started isn't matched because I and started aren't adjacent. content column is the default match column. Full text search condition (with explicit match column) Its syntax is column:@keyword. It's similar to full text search condition but it doesn't require the default match columns. You need to specify match column for the full text search condition by column: instead of --match_columns option of /reference/commands/select. This condtional expression is useful when you want to use two or more full text search against different columns. The default match columns specified by --match_columns option can't be specified multiple times. You need to specify the second match column by this conditional expression. The different between full text search condition and full text search condition (with explicit match column) is whether advanced match columns are supported or not. Full text search condition supports advanced match columns but full text search condition (with explicit match column) isn't supported. Advanced match columns has the following features: • Weight is supported. • Using multiple columns are supported. • Using index column as a match column is supported. See description of --match_columns option of /reference/commands/select about them. Here is a simple example. Execution example: select Entries --query content:@fast # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that contain a word fast in content column value. Phrase search condition (with explicit match column) Its syntax is column:@"search keyword". It's similar to phrase search condition but it doesn't require the default match columns. You need to specify match column for the phrase search condition by column: instead of --match_columns option of /reference/commands/select. The different between phrase search condition and phrase search condition (with explicit match column) is similar to between full text search condition and full text search condition (with explicit match column). Phrase search condition supports advanced match columns but phrase search condition (with explicit match column) isn't supported. See description of full text search condition (with explicit match column) about advanced match columns. Here is a simple example. Execution example: select Entries --query 'content:@"I started"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that contain a phrase I started in content column value. I also started isn't matched because I and started aren't adjacent. Prefix search condition Its syntax is column:^value or value*. This conditional expression does prefix search with value. Prefix search searches records that contain a word that starts with value. You can use fast prefix search against a column. The column must be indexed and index table must be patricia trie table (TABLE_PAT_KEY) or double array trie table (TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of patricia trie table or double array trie table. You don't need to index _key. Prefix search can be used with other table types but it causes all records scan. It's not problem for small records but it spends more time for large records. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query '_key:^Goo' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ] # ] # ] # ] The expression matches records that contain a word that starts with Goo in _key pseudo column value. Good-bye Senna and Good-bye Tritonn are matched with the expression. Suffix search condition Its syntax is column:$value. This conditional expression does suffix search with value. Suffix search searches records that contain a word that ends with value. You can use fast suffix search against a column. The column must be indexed and index table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You don't need to index _key. We recommended that you use index column based fast suffix search instead of _key based fast suffix search. _key based fast suffix search returns automatically registered substrings. (TODO: write document about suffix search and link to it from here.) NOTE: Fast suffix search can be used only for non-ASCII characters such as hiragana in Japanese. You cannot use fast suffix search for ASCII character. Suffix search can be used with other table types or patricia trie table without KEY_WITH_SIS flag but it causes all records scan. It's not problem for small records but it spends more time for large records. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one of non-ASCII characters. Execution example: table_create Titles TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Titles content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create SuffixSearchTerms index COLUMN_INDEX Titles content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Titles [ {"content": "ぐるんが"}, {"content": "むるんが"}, {"content": "せな"}, {"content": "とりとん"} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] select Titles --query 'content:$んが' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 2, # "むるんが" # ], # [ # 1, # "ぐるんが" # ] # ] # ] # ] The expression matches records that have value that ends with んが in content column value. ぐるんが and むるんが are matched with the expression. Equal condition Its syntax is column:value. It matches records that column value is equal to value. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query _key:Groonga # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that _key column value is equal to Groonga. Not equal condition Its syntax is column:!value. It matches records that column value isn't equal to value. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query _key:!Groonga # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that _key column value is not equal to Groonga. Less than condition Its syntax is column:<value. It matches records that column value is less than value. If column type is numerical type such as Int32, column value and value are compared as number. If column type is text type such as ShortText, column value and value are compared as bit sequence. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query n_likes:<10 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is less than 10. Greater than condition Its syntax is column:>value. It matches records that column value is greater than value. If column type is numerical type such as Int32, column value and value are compared as number. If column type is text type such as ShortText, column value and value are compared as bit sequence. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query n_likes:>10 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is greater than 10. Less than or equal to condition Its syntax is column:<=value. It matches records that column value is less than or equal to value. If column type is numerical type such as Int32, column value and value are compared as number. If column type is text type such as ShortText, column value and value are compared as bit sequence. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query n_likes:<=10 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is less than or equal to 10. Greater than or equal to condition Its syntax is column:>=value. It matches records that column value is greater than or equal to value. If column type is numerical type such as Int32, column value and value are compared as number. If column type is text type such as ShortText, column value and value are compared as bit sequence. It doesn't require the default match columns such as full text search condition and phrase search condition. Here is a simple example. Execution example: select Entries --query n_likes:>=10 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is greater than or equal to 10. Regular expression condition New in version 5.0.1. Its syntax is column:~pattern. It matches records that column value is matched to pattern. pattern must be valid /reference/regular_expression. The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on. Execution example: select Entries --query content:~.roonga # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] In most cases, regular expression is evaluated sequentially. So it may be slow against many records. In some cases, Groonga evaluates regular expression by index. It's very fast. See /reference/regular_expression for details. Combined expression Here is available combined expression list. Logical OR Its syntax is a OR b. a and b are conditional expressions, conbinded expressions or assignment expressions. If at least one of a and b are matched, a OR b is matched. Here is a simple example. Execution example: select Entries --query 'n_likes:>10 OR content:@senna' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ] # ] # ] # ] The expression matches records that n_likes column value is greater than 10 or contain a word senna in content column value. Logical AND Its syntax is a + b or just a b. a and b are conditional expressions, conbinded expressions or assignment expressions. If both a and b are matched, a + b is matched. You can specify + the first expression such as +a. The + is just ignored. Here is a simple example. Execution example: select Entries --query 'n_likes:>=10 + content:@groonga' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that n_likes column value is greater than or equal to 10 and contain a word groonga in content column value. Logical NOT Its syntax is a - b. a and b are conditional expressions, conbinded expressions or assignment expressions. If a is matched and b is not matched, a - b is matched. You can not specify - the first expression such as -a. It's syntax error. Here is a simple example. Execution example: select Entries --query 'n_likes:>=10 - content:@groonga' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is greater than or equal to 10 and don't contain a word groonga in content column value. Grouping Its syntax is (...). ... is space separated expression list. (...) groups one ore more expressions and they can be processed as an expression. a b OR c means that a and b are matched or c is matched. a (b OR c) means that a and one of b and c are matched. Here is a simple example. Execution example: select Entries --query 'n_likes:<5 content:@senna OR content:@fast' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] select Entries --query 'n_likes:<5 (content:@senna OR content:@fast)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ] # ] # ] # ] The first expression doesn't use grouping. It matches records that n_likes:<5 and content:@senna are matched or content:@fast is matched. The second expression uses grouping. It matches records that n_likes:<5 and one of content:@senna or content:@fast are matched. Assignment expression This section is for advanced users. Because assignment expression is disabled in --query option of /reference/commands/select by default. You need to specify ALLOW_COLUMN|ALLOW_UPDATE as --query_flags option value to enable assignment expression. Assignment expression in query syntax has some limitations. So you should use /reference/grn_expr/script_syntax instead of query syntax for assignment. There is only one syntax for assignment expression. It's column:=value. value is assigend to column. value is always processed as string in query syntax. value is casted to the type of column automatically. It causes some limitations. For example, you cannot use boolean literal such as true and false for Bool type column. You need to use empty string for false but query syntax doesn't support column:= syntax. See /reference/cast about cast. Script syntax Script syntax is a syntax to specify complex search condition. It is similar to ECMAScript. For example, _key == "book" means that groonga searches records that _key value is "book". All values are string in query_syntax but its own type in script syntax. For example, "book" is string, 1 is integer, TokenBigram is the object whose name is TokenBigram and so on. Script syntax doesn't support full ECMAScript syntax. For example, script syntax doesn't support statement such as if control statement, for iteration statement and variable definition statement. Function definion is not supported too. But script syntax addes the original additional operators. They are described after ECMAScript syntax is described. Security For security reason, you should not pass an input from users to Groonga directly. If there is an evil user, the user may input a query that retrieves records that should not be shown to the user. Think about the following case. A Groonga application constructs a Groonga request by the following program: filter = "column @ \"#{user_input}\"" select_options = { # ... :filter => filter, } groonga_client.select(select_options) user_input is an input from user. If the input is query, here is the constructed select-filter parameter: column @ "query" If the input is x" || true || ", here is the constructed select-filter parameter: column @ "x" || true || "" This query matches to all records. The user will get all records from your database. The user may be evil. It's better that you just receive an user input as a value. It means that you don't accept that user input can contain operator such as @ and &&. If you accept operator, user can create evil query. If user input has only value, you blocks evil query by escaping user input value. Here is a list how to escape user input value: • True value: Convert it to true. • False value: Convert it to false. • Numerical value: Convert it to Integer or Float. For example, 1.2, -10, 314e-2 and so on. • String value: Replace " with \" and \ with \\ in the string value and surround substituted string value by ". For example, double " quote and back \ slash should be converted to "double \" quote and back \\ slash". Sample data Here are a schema definition and sample data to show usage. Execution example: table_create Entries TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries n_likes COLUMN_SCALAR UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"_key": "The first post!", "content": "Welcome! This is my first post!", "n_likes": 5}, {"_key": "Groonga", "content": "I started to use Groonga. It's very fast!", "n_likes": 10}, {"_key": "Mroonga", "content": "I also started to use Mroonga. It's also very fast! Really fast!", "n_likes": 15}, {"_key": "Good-bye Senna", "content": "I migrated all Senna system!", "n_likes": 3}, {"_key": "Good-bye Tritonn", "content": "I also migrated all Tritonn system!", "n_likes": 3} ] # [[0, 1337566253.89858, 0.000355720520019531], 5] There is a table, Entries, for blog entries. An entry has title, content and the number of likes for the entry. Title is key of Entries. Content is value of Entries.content column. The number of likes is value of Entries.n_likes column. Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So both Entries._key and Entries.content are fulltext search ready. OK. The schema and data for examples are ready. Literals Integer Integer literal is sequence of 0 to 9 such as 1234567890. + or - can be prepended as sign such as +29 and -29. Integer literal must be decimal. Octal notation, hex and so on can't be used. The maximum value of integer literal is 9223372036854775807 (= 2 ** 63 - 1). The minimum value of integer literal is -9223372036854775808 (= -(2 ** 63)). Float Float literal is sequence of 0 to 9, . and 0 to 9 such as 3.14. + or - can be prepended as sign such as +3.14 and -3.14. ${RADIX}e${EXPORNENTIAL} and ${RADIX}E${EXPORNENTIAL} formats are also supported. For example, 314e-2 is the same as 3.14. String String literal is "...". You need to escape " in literal by prepending \\'' such as ``\". For example, "Say \"Hello!\"." is a literal for Say "Hello!". string. String encoding must be the same as encoding of database. The default encoding is UTF-8. It can be changed by --with-default-encoding configure option, --encodiong /reference/executables/groonga option and so on. Boolean Boolean literal is true and false. true means true and false means false. Null Null literal is null. Groonga doesn't support null value but null literal is supported. Time NOTE: This is the groonga original notation. Time literal doesn't exit. There are string time notation, integer time notation and float time notation. String time notation is "YYYY/MM/DD hh:mm:ss.uuuuuu" or "YYYY-MM-DD hh:mm:ss.uuuuuu". YYYY is year, MM is month, DD is day, hh is hour, mm is minute, ss is second and uuuuuu is micro second. It is local time. For example, "2012/07/23 02:41:10.436218" is 2012-07-23T02:41:10.436218 in ISO 8601 format. Integer time notation is the number of seconds that have elapsed since midnight UTC, January 1, 1970. It is also known as POSIX time. For example, 1343011270 is 2012-07-23T02:41:10Z in ISO 8601 format. Float time notation is the number of seconds and micro seconds that have elapsed since midnight UTC, January 1, 1970. For example, 1343011270.436218 is 2012-07-23T02:41:10.436218Z in ISO 8601 format. Geo point NOTE: This is the groonga original notation. Geo point literal doesn't exist. There is string geo point notation. String geo point notation has the following patterns: • "LATITUDE_IN_MSECxLONGITUDE_IN_MSEC" • "LATITUDE_IN_MSEC,LONGITUDE_IN_MSEC" • "LATITUDE_IN_DEGREExLONGITUDE_IN_DEGREE" • "LATITUDE_IN_DEGREE,LONGITUDE_IN_DEGREE" x and , can be used for separator. Latitude and longitude can be represented in milliseconds or degree. Array Array literal is [element1, element2, ...]. Object literal Object literal is {name1: value1, name2: value2, ...}. Groonga doesn't support object literal yet. Control syntaxes Script syntax doesn't support statement. So you cannot use control statement such as if. You can only use A ? B : C expression as control syntax. A ? B : C returns B if A is true, C otherwise. Here is a simple example. Execution example: select Entries --filter 'n_likes == (_id == 1 ? 5 : 3)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that _id column value is equal to 1 and n_likes column value is equal to 5 or _id column value is not equal to 1 and n_likes column value is equal to 3. Grouping Its syntax is (...). ... is comma separated expression list. (...) groups one ore more expressions and they can be processed as an expression. a && b || c means that a and b are matched or c is matched. a && (b || c) means that a and one of b and c are matched. Here is a simple example. Execution example: select Entries --filter 'n_likes < 5 && content @ "senna" || content @ "fast"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] select Entries --filter 'n_likes < 5 && (content @ "senna" || content @ "fast")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ] # ] # ] # ] The first expression doesn't use grouping. It matches records that n_likes < 5 and content @ "senna" are matched or content @ "fast" is matched. The second expression uses grouping. It matches records that n_likes < 5 and one of content @ "senna" or content @ "fast" are matched. Function call Its syntax is name(arugment1, argument2, ...). name(argument1, argument2, ...) calls a function that is named name with arguments argument1, argument2 and .... See /reference/function for available functin list. Here is a simple example. Execution example: select Entries --filter 'edit_distance(_key, "Groonga") <= 1' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression uses /reference/functions/edit_distance. It matches records that _key column value is similar to "Groonga". Similality of "Groonga" is computed as edit distance. If edit distance is less than or equal to 1, the value is treated as similar. In this case, "Groonga" and "Mroonga" are treated as similar. Basic operators Groonga supports operators defined in ECMAScript. Arithmetic operators Here are arithmetic operators. Addition operator Its syntax is number1 + number2. The operator adds number1 and number2 and returns the result. Here is a simple example. Execution example: select Entries --filter 'n_likes == 10 + 5' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 15 (= 10 + 5). Subtraction operator Its syntax is number1 - number2. The operator subtracts number2 from number1 and returns the result. Here is a simple example. Execution example: select Entries --filter 'n_likes == 20 - 5' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 15 (= 20 - 5). Multiplication operator Its syntax is number1 * number2. The operator multiplies number1 and number2 and returns the result. Here is a simple example. Execution example: select Entries --filter 'n_likes == 3 * 5' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 15 (= 3 * 5). Division operator Its syntax is number1 / number2 and number1 % number2. The operator divides number2 by number1. / returns the quotient of result. % returns the remainder of result. Here is simple examples. Execution example: select Entries --filter 'n_likes == 26 / 7' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 3 (= 26 / 7). Execution example: select Entries --filter 'n_likes == 26 % 7' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 (= 26 % 7). Logical operators Here are logical operators. Logical NOT operator Its syntax is !condition. The operator inverts boolean value of condition. Here is a simple example. Execution example: select Entries --filter '!(n_likes == 5)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is not equal to 5. Logical AND operator Its syntax is condition1 && condition2. The operator returns true if both of condition1 and condition2 are true, false otherwise. Here is a simple example. Execution example: select Entries --filter 'content @ "fast" && n_likes >= 10' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that content column value has the word fast and n_likes column value is greater or equal to 10. Logical OR operator Its syntax is condition1 || condition2. The operator returns true if either condition1 or condition2 is true, false otherwise. Here is a simple example. Execution example: select Entries --filter 'n_likes == 5 || n_likes == 10' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 or 10. Logical AND NOT operator Its syntax is condition1 &! condition2. The operator returns true if condition1 is true but condition2 is false, false otherwise. It returns difference set. Here is a simple example. Execution example: select Entries --filter 'content @ "fast" &! content @ "mroonga"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that content column value has the word fast but doesn't have the word mroonga. Bitwise operators Here are bitwise operators. Bitwise NOT operator Its syntax is ~number. The operator returns bitwise NOT of number. Here is a simple example. Execution example: select Entries --filter '~n_likes == -6' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 because bitwise NOT of 5 is equal to -6. Bitwise AND operator Its syntax is number1 & number2. The operator returns bitwise AND between number1 and number2. Here is a simple example. Execution example: select Entries --filter '(n_likes & 1) == 1' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is even number because bitwise AND between an even number and 1 is equal to 1 and bitwise AND between an odd number and 1 is equal to 0. Bitwise OR operator Its syntax is number1 | number2. The operator returns bitwise OR between number1 and number2. Here is a simple example. Execution example: select Entries --filter 'n_likes == (1 | 4)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 (= 1 | 4). Bitwise XOR operator Its syntax is number1 ^ number2. The operator returns bitwise XOR between number1 and number2. Here is a simple example. Execution example: select Entries --filter 'n_likes == (10 ^ 15)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 (= 10 ^ 15). Shift operators Here are shift operators. Left shift operator Its syntax is number1 << number2. The operator performs a bitwise left shift operation on number1 by number2. Here is a simple example. Execution example: select Entries --filter 'n_likes == (5 << 1)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 10 (= 5 << 1). Signed right shift operator Its syntax is number1 >> number2. The operator shifts bits of number1 to right by number2. The sign of the result is the same as number1. Here is a simple example. Execution example: select Entries --filter 'n_likes == -(-10 >> 1)' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 (= -(-10 >> 1) = -(-5)). Unsigned right shift operator Its syntax is number1 >>> number2. The operator shifts bits of number1 to right by number2. The leftmost number2 bits are filled by 0. Here is a simple example. Execution example: select Entries --filter 'n_likes == (2147483648 - (-10 >>> 1))' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5 (= 2147483648 - (-10 >>> 1) = 2147483648 - 2147483643). Comparison operators Here are comparison operators. Equal operator Its syntax is object1 == object2. The operator returns true if object1 equals to object2, false otherwise. Here is a simple example. Execution example: select Entries --filter 'n_likes == 5' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 1, # "The first post!", # "Welcome! This is my first post!", # 5 # ] # ] # ] # ] The expression matches records that n_likes column value is equal to 5. Not equal operator Its syntax is object1 != object2. The operator returns true if object1 does not equal to object2, false otherwise. Here is a simple example. Execution example: select Entries --filter 'n_likes != 5' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 4, # "Good-bye Senna", # "I migrated all Senna system!", # 3 # ], # [ # 5, # "Good-bye Tritonn", # "I also migrated all Tritonn system!", # 3 # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] The expression matches records that n_likes column value is not equal to 5. Less than operator TODO: ... Less than or equal to operator TODO: ... Greater than operator TODO: ... Greater than or equal to operator TODO: ... Assignment operators Addition assignment operator Its syntax is column1 += column2. The operator performs addition assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score += n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 4 # ], # [ # "Good-bye Tritonn", # 3, # 4 # ], # [ # "Groonga", # 10, # 11 # ], # [ # "Mroonga", # 15, # 16 # ], # [ # "The first post!", # 5, # 6 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs addition assignment operation such as '_score = _score + n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 + 3 is evaluated and stored to _score column as the execution result. Subtraction assignment operator Its syntax is column1 -= column2. The operator performs subtraction assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score -= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # -2 # ], # [ # "Good-bye Tritonn", # 3, # -2 # ], # [ # "Groonga", # 10, # -9 # ], # [ # "Mroonga", # 15, # -14 # ], # [ # "The first post!", # 5, # -4 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score - n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 - 3 is evaluated and stored to _score column as the execution result. Multiplication assignment operator Its syntax is column1 *= column2. The operator performs multiplication assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score *= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 3 # ], # [ # "Good-bye Tritonn", # 3, # 3 # ], # [ # "Groonga", # 10, # 10 # ], # [ # "Mroonga", # 15, # 15 # ], # [ # "The first post!", # 5, # 5 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score * n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 * 3 is evaluated and stored to _score column as the execution result. Division assignment operator Its syntax is column1 /= column2. The operator performs division assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score /= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 0 # ], # [ # "Good-bye Tritonn", # 3, # 0 # ], # [ # "Groonga", # 10, # 0 # ], # [ # "Mroonga", # 15, # 0 # ], # [ # "The first post!", # 5, # 0 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score / n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 / 3 is evaluated and stored to _score column as the execution result. Modulo assignment operator Its syntax is column1 %= column2. The operator performs modulo assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score %= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 1 # ], # [ # "Good-bye Tritonn", # 3, # 1 # ], # [ # "Groonga", # 10, # 1 # ], # [ # "Mroonga", # 15, # 1 # ], # [ # "The first post!", # 5, # 1 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score % n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 % 3 is evaluated and stored to _score column as the execution result. Bitwise left shift assignment operator Its syntax is column1 <<= column2. The operator performs left shift assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score <<= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 8 # ], # [ # "Good-bye Tritonn", # 3, # 8 # ], # [ # "Groonga", # 10, # 1024 # ], # [ # "Mroonga", # 15, # 32768 # ], # [ # "The first post!", # 5, # 32 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score << n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 << 3 is evaluated and stored to _score column as the execution result. Bitwise signed right shift assignment operator Its syntax is column2 >>= column2. The operator performs signed right shift assignment operation on column1 by column2. Bitwise unsigned right shift assignment operator Its syntax is column1 >>>= column2. The operator performs unsigned right shift assignment operation on column1 by column2. Bitwise AND assignment operator Its syntax is column1 &= column2. The operator performs bitwise AND assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score &= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 1 # ], # [ # "Good-bye Tritonn", # 3, # 1 # ], # [ # "Groonga", # 10, # 0 # ], # [ # "Mroonga", # 15, # 1 # ], # [ # "The first post!", # 5, # 1 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score & n_likes' for each records. For example, the value of _score about the record which stores "Groonga" as the _key is 10. So the expression 1 & 10 is evaluated and stored to _score column as the execution result. Bitwise OR assignment operator Its syntax is column1 |= column2. The operator performs bitwise OR assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score |= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 3 # ], # [ # "Good-bye Tritonn", # 3, # 3 # ], # [ # "Groonga", # 10, # 11 # ], # [ # "Mroonga", # 15, # 15 # ], # [ # "The first post!", # 5, # 5 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score | n_likes' for each records. For example, the value of _score about the record which stores "Groonga" as the _key is 10. So the expression 1 | 10 is evaluated and stored to _score column as the execution result. Bitwise XOR assignment operator Its syntax is column1 ^= column2. The operator performs bitwise XOR assignment operation on column1 by column2. Execution example: select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score ^= n_likes' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "n_likes", # "UInt32" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Good-bye Senna", # 3, # 2 # ], # [ # "Good-bye Tritonn", # 3, # 2 # ], # [ # "Groonga", # 10, # 11 # ], # [ # "Mroonga", # 15, # 14 # ], # [ # "The first post!", # 5, # 4 # ] # ] # ] # ] The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as '_score = _score ^ n_likes' for each records. For example, the value of _score about the record which stores "Good-bye Senna" as the _key is 3. So the expression 1 ^ 3 is evaluated and stored to _score column as the execution result. Original operators Script syntax adds the original binary opearators to ECMAScript syntax. They operate search specific operations. They are starts with @ or *. Match operator Its syntax is column @ value. The operator searches value by inverted index of column. Normally, full text search is operated but tag search can be operated. Because tag search is also implemented by inverted index. query_syntax uses this operator by default. Here is a simple example. Execution example: select Entries --filter 'content @ "fast"' --output_columns content # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "content", # "Text" # ] # ], # [ # "I started to use Groonga. It's very fast!" # ], # [ # "I also started to use Mroonga. It's also very fast! Really fast!" # ] # ] # ] # ] The expression matches records that contain a word fast in content column value. Prefix search operator Its syntax is column @^ value. The operator does prefix search with value. Prefix search searches records that contain a word that starts with value. You can use fast prefix search against a column. The column must be indexed and index table must be patricia trie table (TABLE_PAT_KEY) or double array trie table (TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of patricia trie table or double array trie table. You don't need to index _key. Prefix search can be used with other table types but it causes all records scan. It's not problem for small records but it spends more time for large records. Here is a simple example. Execution example: select Entries --filter '_key @^ "Goo"' --output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "Good-bye Tritonn" # ], # [ # "Good-bye Senna" # ] # ] # ] # ] The expression matches records that contain a word that starts with Goo in _key pseudo column value. Good-bye Senna and Good-bye Tritonn are matched with the expression. Suffix search operator Its syntax is column @$ value. This operator does suffix search with value. Suffix search searches records that contain a word that ends with value. You can use fast suffix search against a column. The column must be indexed and index table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You don't need to index _key. We recommended that you use index column based fast suffix search instead of _key based fast suffix search. _key based fast suffix search returns automatically registered substrings. (TODO: write document about suffix search and link to it from here.) NOTE: Fast suffix search can be used only for non-ASCII characters such as hiragana in Japanese. You cannot use fast suffix search for ASCII character. Suffix search can be used with other table types or patricia trie table without KEY_WITH_SIS flag but it causes all records scan. It's not problem for small records but it spends more time for large records. Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one of non-ASCII characters. Execution example: table_create Titles TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Titles content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create SuffixSearchTerms index COLUMN_INDEX Titles content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Titles [ {"content": "ぐるんが"}, {"content": "むるんが"}, {"content": "せな"}, {"content": "とりとん"} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] select Titles --query 'content:$んが' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 2, # "むるんが" # ], # [ # 1, # "ぐるんが" # ] # ] # ] # ] The expression matches records that have value that ends with んが in content column value. ぐるんが and むるんが are matched with the expression. Near search operator Its syntax is column *N "word1 word2 ...". The operator does near search with words word1 word2 .... Near search searches records that contain the words and the words are appeared in the near distance. Near distance is always 10 for now. The unit of near distance is the number of characters in N-gram family tokenizers and the number of words in morphological analysis family tokenizers. (TODO: Add a description about TokenBigram doesn't split ASCII only word into tokens. So the unit for ASCII words with TokenBigram is the number of words even if TokenBigram is a N-gram family tokenizer.) Note that an index column for full text search must be defined for column. Here is a simple example. Execution example: select Entries --filter 'content *N "I fast"' --output_columns content # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "content", # "Text" # ] # ], # [ # "I started to use Groonga. It's very fast!" # ] # ] # ] # ] select Entries --filter 'content *N "I Really"' --output_columns content # [[0, 1337566253.89858, 0.000355720520019531], [[[0], [["content", "Text"]]]]] select Entries --filter 'content *N "also Really"' --output_columns content # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "content", # "Text" # ] # ], # [ # "I also started to use Mroonga. It's also very fast! Really fast!" # ] # ] # ] # ] The first expression matches records that contain I and fast and the near distance of those words are in 10 words. So the record that its content is I also started to use mroonga. It's also very fast! ... is matched. The number of words between I and fast is just 10. The second expression matches records that contain I and Really and the near distance of those words are in 10 words. So the record that its content is I also started to use mroonga. It's also very fast! Really fast! is not matched. The number of words between I and Really is 11. The third expression matches records that contain also and Really and the near distance of those words are in 10 words. So the record that its content is I also st arted to use mroonga. It's also very fast! Really fast! is matched. The number of words between also and Really is 10. Similar search Its syntax is column *S "document". The operator does similar search with document document. Similar search searches records that have similar content to document. Note that an index column for full text search must be defined for column. Here is a simple example. Execution example: select Entries --filter 'content *S "I migrated all Solr system!"' --output_columns content # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "content", # "Text" # ] # ], # [ # "I migrated all Senna system!" # ], # [ # "I also migrated all Tritonn system!" # ] # ] # ] # ] The expression matches records that have similar content to I migrated all Solr system!. In this case, records that have I migrated all XXX system! content are matched. Term extract operator Its syntax is _key *T "document". The operator extracts terms from document. Terms must be registered as keys of the table of _key. Note that the table must be patricia trie (TABLE_PAT_KEY) or double array trie (TABLE_DAT_KEY). You can't use hash table (TABLE_HASH_KEY) and array (TABLE_NO_KEY) because they don't support longest common prefix search. Longest common prefix search is used to implement the operator. Here is a simple example. Execution example: table_create Words TABLE_PAT_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Words [ {"_key": "groonga"}, {"_key": "mroonga"}, {"_key": "Senna"}, {"_key": "Tritonn"} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] select Words --filter '_key *T "Groonga is the successor project to Senna."' --output_columns _key # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "groonga" # ], # [ # "senna" # ] # ] # ] # ] The expression extrcts terms that included in document Groonga is the successor project to Senna.. In this case, NormalizerAuto normalizer is specified to Words. So Groonga can be extracted even if it is loaded as groonga into Words. All of extracted terms are also normalized. Regular expression operator New in version 5.0.1. Its syntax is column @~ "pattern". The operator searches records by the regular expression pattern. If a record's column value is matched to pattern, the record is matched. pattern must be valid regular expression syntax. See /reference/regular_expression about regular expression syntax details. The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on. Execution example: select Entries --filter 'content @~ ".roonga"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "content", # "Text" # ], # [ # "n_likes", # "UInt32" # ] # ], # [ # 2, # "Groonga", # "I started to use Groonga. It's very fast!", # 10 # ], # [ # 3, # "Mroonga", # "I also started to use Mroonga. It's also very fast! Really fast!", # 15 # ] # ] # ] # ] In most cases, regular expression is evaluated sequentially. So it may be slow against many records. In some cases, Groonga evaluates regular expression by index. It's very fast. See /reference/regular_expression for details. See also • /reference/api/grn_expr: grn_expr related APIs Regular expression Summary NOTE: Regular expression support is an experimental feature. New in version 5.0.1. Groonga supports pattern match by regular expression. Regular expression is widely used format to describe a pattern. Regular expression is useful to represent complex pattern. In most cases, pattern match by regular expression is evaluated as sequential search. It'll be slow for many records and many texts. In some cases, pattern match by regular expression can be evaluated by index. It's very fast rather than sequential search. Patterns that can be evaluated by index are described later. New in version 5.0.7: Groonga normalizes match target text by normalizer-auto normalizer when Groonga doesn't use index for regular expression search. It means that regular expression that has upper case such as Groonga never match. Because normalizer-auto normalizer normalizes all alphabets to lower case. groonga matches to both Groonga and groonga. Why is match target text normalizered? It's for increasing index search-able patterns. If Groonga doesn't normalize match target text, you need to write complex regular expression such as [Dd][Ii][Ss][Kk] and (?i)disk for case-insensitive match. Groonga can't use index against complex regular expression. If you write disk regular expression for case-insensitive match, Groonga can search the pattern with index. It's fast. You may feel the behavior is strange. But fast search based on this behavior will help you. There are many regular expression syntaxes. Groonga uses the same syntax in Ruby. Because Groonga uses the same regular expression engine as Ruby. The regular expression engine is Onigmo. Characteristic difference with other regular expression syntax is ^ and $. The regular expression syntax in Ruby, ^ means the beginning of line and $ means the end of line. ^ means the beginning of text and $ means the end of text in other most regular expression syntaxes. The regular expression syntax in Ruby uses \A for the beginning of text and \z for the end of text. New in version 5.0.6: Groonga uses multiline mode since 5.0.6. It means that . matches on \n. But it's meaningless. Because \n is removed by normalizer-auto normalizer. You can use regular expression in select-query and select-filter options of /reference/commands/select command. Usage Here are a schema definition and sample data to show usage. There is only one table, Logs. Logs table has only message column. Log messages are stored into the message column. Execution example: table_create Logs TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Logs message COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Logs [ {"message": "host1:[error]: No memory"}, {"message": "host1:[warning]: Remained disk space is less than 30%"}, {"message": "host1:[error]: Disk full"}, {"message": "host2:[error]: No memory"}, {"message": "host2:[info]: Shutdown"} ] # [[0, 1337566253.89858, 0.000355720520019531], 5] Here is an example that uses regular expression in select-query. You need to use ${COLUMN}:~${REGULAR_EXPRESSION} syntax. Execution example: select Logs --query 'message:~"disk (space|full)"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 3, # "host1:[error]: Disk full" # ] # ] # ] # ] Here is an example that uses regular expression in select-filter. You need to use ${COLUMN} @~ ${REGULAR_EXPRESSION} syntax. Execution example: select Logs --filter 'message @~ "disk (space|full)"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 3, # "host1:[error]: Disk full" # ] # ] # ] # ] Index Groonga can search records by regular expression with index. It's very fast rather than sequential search. But it doesn't support all regular expression patterns. It supports only the following regular expression patterns. The patterns will be increased in the future. • Literal only pattern such as disk • The begging of text and literal only pattern such as \Adisk • The end of text and literal only pattern such as disk\z You need to create an index for fast regular expression search. Here are requirements of index: • Lexicon must be table-pat-key table. • Lexicon must use token-regexp tokenizer. • Index column must has WITH_POSITION flag. Other configurations such as lexicon's normalizer are optional. You can choose what you like. If you want to use case-insensitive search, use normalizer-auto normalizer. Here are recommended index definitions. In general, it's reasonable index definitions. Execution example: table_create RegexpLexicon TABLE_PAT_KEY ShortText \ --default_tokenizer TokenRegexp \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create RegexpLexicon logs_message_index \ COLUMN_INDEX|WITH_POSITION Logs message # [[0, 1337566253.89858, 0.000355720520019531], true] Now, you can use index for regular expression search. The following regular expression can be evaluated by index because it uses only "the beginning of text" and "literal". Execution example: select Logs --query message:~\\\\Ahost1 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 1, # "host1:[error]: No memory" # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 3, # "host1:[error]: Disk full" # ] # ] # ] # ] Here is an example that uses select-filter instead of select-query. It uses the same regular expression as the previous example. Execution example: select Logs --filter 'message @~ "\\\\Ahost1:"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 1, # "host1:[error]: No memory" # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 3, # "host1:[error]: Disk full" # ] # ] # ] # ] \ escape will confuse you because there are some steps that require escape between you and Groonga. Here are steps that require \ escape: • Shell only when you pass Groonga command from command line the following: % groonga /tmp/db select Logs --filter '"message @~ \"\\\\Ahost1:"\"' --filter '"message @~ \"\\\\Ahost1:\""' is evaluated as the following two arguments by shell: • --filter • "message @~ \"\\\\Ahost1:\"" • Groonga command parser only when you pass Groonga command by command line style (COMMAND ARG1_VALUE ARG2_VALUE ...) not HTTP path style (/d/COMMAND?ARG1_NAME=ARG1_VALUE&ARG2_NAME=ARG3_VALUE). "message @~ \"\\\\Ahost1:\"" is evaluated as the following value by Groonga command parser: • message @~ "\\Ahost1:" • /reference/grn_expr parser. \ escape is required in both /reference/grn_expr/query_syntax and /reference/grn_expr/script_syntax. "\\Ahost1:" string literal in script syntax is evaluated as the following value: • \Ahost1 The value is evaluated as regular expression. Syntax This section describes about only commonly used syntaxes. See Onigmo syntax documentation for other syntaxes and details. Escape In regular expression, there are the following special characters: • \ • | • ( • ) • [ • ] • . • * • + • ? • { • } • ^ • $ If you want to write pattern that matches these special character as is, you need to escape them. You can escape them by putting \ before special character. Here are regular expressions that match special character itself: • \\ • \| • \( • \) • \[ • \] • \. • \* • \+ • \? • \{ • \} • \^ • \$ Execution example: select Logs --filter 'message @~ "warning|info"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 5, # "host2:[info]: Shutdown" # ] # ] # ] # ] If your regular expression doesn't work as you expected, confirm that some special characters are used without escaping. Choice Choice syntax is A|B. The regular expression matches when either A pattern or B pattern is matched. Execution example: select Logs --filter 'message @~ "warning|info"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 5, # "host2:[info]: Shutdown" # ] # ] # ] # ] CAUTION: Regular expression that uses this syntax can't be evaluated by index. Group Group syntax is (...). Group provides the following features: • Back reference • Scope reducing You can refer matched groups by \n (n is the group number) syntax. For example, e(r)\1o\1 matches error. Because \1 is replaced with match result (r) of the first group (r). Execution example: select Logs --filter 'message @~ "e(r)\\\\1o\\\\1"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 1, # "host1:[error]: No memory" # ], # [ # 3, # "host1:[error]: Disk full" # ], # [ # 4, # "host2:[error]: No memory" # ] # ] # ] # ] You can also use more powerful back reference features. See "8. Back reference" section in Onigmo documentation for details. Group syntax reduces scope. For example, \[(warning|info)\] reduces choice syntax scope. The regular expression matches [warning] and [info]. Execution example: select Logs --filter 'message @~ "\\\\[(warning|info)\\\\]"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 5, # "host2:[info]: Shutdown" # ] # ] # ] # ] You can also use more powerful group related features. See "7. Extended groups" section in Onigmo documentation for details. CAUTION: Regular expression that uses this syntax can't be evaluated by index. Character class Character class syntax is [...]. Character class is useful to specify multiple characters to be matched. For example, [12] matches 1 or 2. Execution example: select Logs --filter 'message @~ "host[12]"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 1, # "host1:[error]: No memory" # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ], # [ # 3, # "host1:[error]: Disk full" # ], # [ # 4, # "host2:[error]: No memory" # ], # [ # 5, # "host2:[info]: Shutdown" # ] # ] # ] # ] You can specify characters by range. For example, [0-9] matches one digit. Execution example: select Logs --filter 'message @~ "[0-9][0-9]%"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ] # ] # ] # ] You can also use more powerful character class related features. See "6. Character class" section in Onigmo documentation for details. CAUTION: Regular expression that uses this syntax can't be evaluated by index. Anchor There are the following commonly used anchor syntaxes. Some anchors can be evaluated by index. ┌───────┬───────────────────────┬─────────────┐ │Anchor │ Description │ Index ready │ ├───────┼───────────────────────┼─────────────┤ │^ │ The beginning of line │ o │ ├───────┼───────────────────────┼─────────────┤ │$ │ The end of line │ x │ ├───────┼───────────────────────┼─────────────┤ │\A │ The beginning of text │ o │ ├───────┼───────────────────────┼─────────────┤ │\z │ The end of text │ x │ └───────┴───────────────────────┴─────────────┘ Here is an example that uses \z. Execution example: select Logs --filter 'message @~ "%\\\\z"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 2, # "host1:[warning]: Remained disk space is less than 30%" # ] # ] # ] # ] You can also use more anchors. See "5. Anchors" section in Onigmo documentation for details. CAUTION: Regular expression that uses this syntax except \A and \z can't be evaluated by index. Quantifier There are the following commonly used quantifier syntaxes. ┌───────────┬─────────────────┐ │Quantifier │ Description │ ├───────────┼─────────────────┤ │? │ 0 or 1 time │ ├───────────┼─────────────────┤ │* │ 0 or more times │ ├───────────┼─────────────────┤ │+ │ 1 or more times │ └───────────┴─────────────────┘ For example, er+or matches error, errror and so on. Execution example: select Logs --filter 'message @~ "er+or"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "message", # "Text" # ] # ], # [ # 1, # "host1:[error]: No memory" # ], # [ # 3, # "host1:[error]: Disk full" # ], # [ # 4, # "host2:[error]: No memory" # ] # ] # ] # ] You can also use more quantifiers. See "4. Quantifier" section in Onigmo documentation for details. CAUTION: Regular expression that uses this syntax can't be evaluated by index. Others There are more syntaxes. If you're interested in them, see Onigmo documentation for details. You may be interested in "character type" and "character" syntaxes. Function Function can be used in some commands. For example, you can use function in --filter, --scorer and output_columns options of commands/select. This section describes about function and built-in functions. TODO: Add documentations about function. between Summary between is used for checking the specified value exists in the specific range. It is often used for combination with select-filter option in /reference/commands/select. Syntax between has five parameters: between(column_or_value, min, min_border, max, max_border) Usage Here are a schema definition and sample data to show usage: Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users age COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Ages TABLE_HASH_KEY Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Ages user_age COLUMN_INDEX Users age # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "Alice", "age": 12}, {"_key": "Bob", "age": 13}, {"_key": "Calros", "age": 15}, {"_key": "Dave", "age": 16}, {"_key": "Eric", "age": 20} {"_key": "Frank", "age": 21} ] # [[0, 1337566253.89858, 0.000355720520019531], 6] Here is the query to show the persons to match PG-13 rating (MPAA). Execution example: select Users --filter 'between(age, 13, "include", 16, "include")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "age", # "Int32" # ] # ], # [ # 2, # "Bob", # 13 # ], # [ # 3, # "Calros", # 15 # ], # [ # 4, # "Dave", # 16 # ] # ] # ] # ] It returns 13, 14, 15 and 16 years old users. between function accepts not only a column of table, but also the value. If you specify the value as 1st parameter, it is checked whether the value is included or not. if it matches to the specified range, it returns the all records because between function returns true. If it doesn't match to the specified range, it returns no records because between function returns false. Execution example: select Users --filter 'between(14, 13, "include", 16, "include")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 6 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "age", # "Int32" # ] # ], # [ # 1, # "Alice", # 12 # ], # [ # 2, # "Bob", # 13 # ], # [ # 3, # "Calros", # 15 # ], # [ # 4, # "Dave", # 16 # ], # [ # 5, # "Eric", # 20 # ], # [ # 6, # "Frank", # 21 # ] # ] # ] # ] In the above case, it returns all the records, because 14 exists in between 13 and 16. This behavior is used for checking the specified value exists or not in the table. Parameters There are five required parameters, column_or_value, and min, min_border, max and max_border. column_or_value Specifies a column of the table or the value. min Specifies the minimal border value of the range. You can control the behavior that the value of max is included or excluded by max_border parameter. min_border Specifies whether the specified range contains the value of min or not. The value of min_border are either "include" or "exclude". If it is "include", min value is included. If it is "exclude", min value is not included. max Specifies the maximum border value of the range. You can control the behavior that the value of max is included or excluded by max_border parameter. max_border Specifies whether the specified range contains the value of max or not. The value of max_border are either "include" or "exclude". If it is "include", max value is included. If it is "exclude", max value is not included. Return value between returns whether the value of column exists in specified the value of range or not. If record is matched to specified the value of range, it returns true. Otherwise, it returns false. edit_distance 名前 edit_distance - 指定した2つの文字列の編集距離を計算する 書式 edit_distance(string1, string2) 説明 Groonga組込関数の一つであるedit_distanceについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。 edit_distance() 関数は、string1に指定した文字列とstring2に指定した文字列の間の編集距離を求めます。 引数 string1 文字列を指定します string2 もうひとつの文字列を指定します 返値 指定した2つ文字列の編集距離をUint32型の値として返します。 例 edit_distance(title, "hoge") 1 geo_distance Summary geo_distance calculates the value of distance between specified two points. Syntax geo_distance requires two point. The parameter approximate_type is optional: geo_distance(point1, point2) geo_distance(point1, point2, approximate_type) The default value of approximate_type is "rectangle". If you omit approximate_type, geo_distance calculates the value of distance as if "rectangle" was specified. Usage geo_distance is one of the Groonga builtin functions. You can call a builtin function in /reference/grn_expr geo_distance function calculates the value of distance (approximate value) between the coordinate of point1 and the coordinate of point2. NOTE: Groonga provides three built in functions for calculating the value of distance. There are geo_distance(), geo_distance2() and geo_distance3(). The difference of them is the algorithm of calculating distance. geo_distance2() and geo_distance3() were deprecated since version 1.2.9. Use geo_distance(point1, point2, "sphere") instead of geo_distance2(point1, point2). Use geo_distance(point1, point2, "ellipsoid") instead of geo_distance3(point1, point2). Lets's learn about geo_distance usage with examples. This section shows simple usages. Here are two schema definition and sample data to show the difference according to the usage. Those samples show how to calculate the value of distance between New York City and London. 1. Using the column value of location for calculating the distance (Cities table) 2. Using the explicitly specified coordinates for calculating the distance (Geo table) Using the column value of location Here are a schema definition of Cities table and sample data to show usage. table_create Cities TABLE_HASH_KEY ShortText column_create Cities location COLUMN_SCALAR WGS84GeoPoint load --table Cities [ { "_key", "location" }, { "New York City", "146566000x-266422000", }, ] Execution example: table_create Cities TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Cities location COLUMN_SCALAR WGS84GeoPoint # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Cities [ { "_key", "location" }, { "New York City", "146566000x-266422000", }, ] # [[0, 1337566253.89858, 0.000355720520019531], 1] This execution example creates a table named Cities which has one column named location. location column stores the value of coordinate. The coordinate of Tokyo is stored as sample data. Execution example: select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_score", # "Int32" # ] # ], # [ # 5715104 # ] # ] # ] # ] This sample shows that geo_distance use the value of location column and the value of coordinate to calculate distance. The value ("185428000x-461000") passed to geo_distance as the second argument is the coordinate of London. Using the explicitly specified value of location Here are a schema definition of Geo table and sample data to show usage. table_create Geo TABLE_HASH_KEY ShortText column_create Geo distance COLUMN_SCALAR Int32 load --table Geo [ { "_key": "the record for geo_distance() result" } ] Execution example: table_create Geo TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Geo distance COLUMN_SCALAR Int32 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Geo [ { "_key": "the record for geo_distance() result" } ] # [[0, 1337566253.89858, 0.000355720520019531], 1] This execution example creates a table named Geo which has one column named distance. distance column stores the value of distance. Execution example: select Geo --output_columns distance --scorer 'distance = geo_distance("146566000x-266422000", "185428000x-461000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "distance", # "Int32" # ] # ], # [ # 5807750 # ] # ] # ] # ] This sample shows that geo_distance use the coordinate of London and the coordinate of New York to calculate distance. Parameters Required parameters There are two required parameter, point1 and point2. point1 Specifies the start point that you want to calculate the value of distance between two points. You can specify the value of GeoPoint type. [1] See /reference/types about GeoPoint. point2 Specifies the end point that you want to calculate the value of distance between two points. You can specify the value of GeoPoint type or the string indicating the coordinate. See /reference/types about GeoPoint and the coordinate. Optional parameter There is a optional parameter, approximate_type. approximate_type Specifies how to approximate the geographical features for calculating the value of distance. You can specify the value of approximate_type by one of the followings. • rectangle • sphere • ellipsoid NOTE: There is a limitation about geo_distance. geo_distance can not calculate the value of distance between two points across meridian, equator or the date line if you use sphere or ellipsoid as approximate type. There is not such a limitation for rectangle. This is temporary limitation according to the implementation of Groonga, but it will be fixed in the future release. rectangle This parameter require to approximate the geographical features by square approximation for calculating the distance. Since the value of distance is calculated by simple formula, you can calculate the value of distance fast. But, the error of distance increases as it approaches the pole. You can also specify rect as abbrev expression. Here is a sample about calculating the value of distance with column value. Execution example: select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_score", # "Int32" # ] # ], # [ # 5715104 # ] # ] # ] # ] Here is a sample about calculating the value of distance with explicitly specified point. Execution example: select Geo --output_columns distance --scorer 'distance = geo_distance("146566000x-266422000", "185428000x-461000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "distance", # "Int32" # ] # ], # [ # 5807750 # ] # ] # ] # ] Here are samples about calculating the value of distance with explicitly specified point across meridian, equator, the date line. Execution example: select Geo --output_columns distance --scorer 'distance = geo_distance("175904000x8464000", "145508000x-13291000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "distance", # "Int32" # ] # ], # [ # 1051293 # ] # ] # ] # ] This sample shows the value of distance across meridian. The return value of geo_distance("175904000x8464000", "145508000x-13291000", "rectangle") is the value of distance from Paris, Flance to Madrid, Spain. Execution example: select Geo --output_columns distance --scorer 'distance = geo_distance("146566000x-266422000", "-56880000x-172310000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "distance", # "Int32" # ] # ], # [ # 6880439 # ] # ] # ] # ] This sample shows the value of distance across equator. The return value of geo_distance("146566000x-266422000", "-56880000x-172310000", "rectangle") is the value of distance from New York, The United Status to Brasillia, Brasil. Execution example: select Geo --output_columns distance --scorer 'distance = geo_distance("143660000x419009000", "135960000x-440760000", "rectangle")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "distance", # "Int32" # ] # ], # [ # 10475205 # ] # ] # ] # ] This sample shows the value of distance across the date line. The return value of geo_distance("143660000x419009000", "135960000x-440760000", "rectangle") is the value of distance from Beijin, China to San Francisco, The United States. NOTE: geo_distance uses square approximation as default. If you omit approximate_type, geo_distance behaves like rectangle was specified. NOTE: geo_distance accepts the string indicating the coordinate as the value of point1 when the value of approximate_type is "rectangle". If you specified the string indicating the coordinate as the value of point1 with sphere or ellipsoid, geo_distance returns 0 as the value of distance. sphere This parameter require to approximate the geographical features by spherical approximation for calculating the distance. It is slower than rectangle, but the error of distance becomes smaller than rectangle. You can also specify sphr as abbrev expression. Here is a sample about calculating the value of distance with column value. Execution example: select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "sphere")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_score", # "Int32" # ] # ], # [ # 5715102 # ] # ] # ] # ] ellipsoid This parameter require to approximate the geographical features by ellipsoid approximation for calculating the distance. It uses the calculation of distance by the formula of Hubeny. It is slower than sphere, but the error of distance becomes smaller than sphere. You can also specify ellip as abbrev expression. Here is a sample about calculating the value of distance with column value. Execution example: select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "ellipsoid")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_score", # "Int32" # ] # ], # [ # 5706263 # ] # ] # ] # ] Return value geo_distance returns the value of distance in float type. The unit of return value is meter. Footnote [1] You can specify whether TokyoGeoPoint or WGS84GeoPoint. geo_in_circle 名前 geo_in_circle - 座標が円の範囲内に存在するかどうかを調べます。 書式 geo_in_circle(point, center, radious_or_point[, approximate_type]) 説明 Groonga組込関数の一つであるgeo_in_circleについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。 geo_in_circle() 関数は、pointに指定した座標が、centerに指定した座標を中心とする円の範囲内にあるかどうかを調べます。 引数 point 円の範囲内に存在するかどうかを調べる座標を指定します。Point型の値を指定できます。 [1] center 円の中心となる座標を指定します。Point型の値、あるいは座標を示す文字列を指定できます。 radious_or_point 円の半径を指定します。数値を指定した場合には、半径(単位:メートル)が指定されたものとみなします。 Point型の値、あるいは座標を示す文字列を指定した場合は、円周上の点の一つの座標が指定されたものとみなします。 approximate_type 半径からの距離を求めるために地形をどのように近似するかを指定します。指定できる値は以下の通りです。 "rectangle" 方形近似で近似します。単純な計算式で距離を求めることができるため高速ですが、極付近では誤差が大きくなります。 "rect" と省略して指定することもできます。 この近似方法がデフォルト値です。 approximate_type を省略した場合は方形近似になります。 "sphere" 球面近似で近似します。 "rectangle" よりも遅くなりますが、誤差は小さいです。 "sphr" と省略して指定することもできます。 "ellipsoid" 楕円体近似で近似します。距離の計算にはヒュベニの距離計算式を用います。 "sphere" よりも遅くなりますが、誤差は小さくなります。 "ellip" と省略して指定することもできます。 返値 pointに指定した座標が円の範囲内にあるかどうかをBool型の値で返します。 例 geo_in_circle(pos, "100x100", 100) true 脚注 [1] TokyoGeoPoint(日本測地系座標)かWGS84GeoPoint(世界測地系座標)のいずれかを指定できます。 geo_in_rectangle 名前 geo_in_rectangle - 座標が矩形の範囲内に存在するかどうかを調べます。 書式 geo_in_rectangle(point, top_left, bottom_right) 説明 Groonga組込関数の一つであるgeo_in_rectangleについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。 geo_in_rectangle() 関数は、pointに指定した座標が、top_leftとbottom_rightがなす矩形の範囲内にあるかどうかを調べます。 引数 point 矩形の範囲内に存在するかどうかを調べる座標を指定します。Point型の値を指定できます。 [1] top_left 矩形の左上隅となる座標を指定します。Point型の値、あるいは座標を示す文字列を指定できます。 bottom_right 矩形の右下隅となる座標を指定します。Point型の値、あるいは座標を示す文字列を指定できます。 返値 pointに指定した座標が矩形の範囲内にあるかどうかをBool型の値で返します。 例 geo_in_rectangle(pos, "150x100", "100x150") true 脚注 [1] TokyoGeoPoint(日本測地系座標)かWGS84GeoPoint(世界測地系座標)のいずれかを指定できます。 highlight_full CAUTION: This feature is experimental. API will be changed. Summary highlight_full tags target text. It can use to highlight the search keyword. It can specify use/not use HTML escape, the normalizer name and change the tag for each keyword. Syntax highlight_full has required parameter and optional parameter: highlight_full(column, normalizer_name, use_html_escape, keyword1, open_tag1, close_tag1, ... [keywordN, open_tagN, close_tagN]) Usage Here are a schema definition and sample data to show usage. Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries body COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"body": "Mroonga is a MySQL storage engine based on Groonga. <b>Rroonga</b> is a Ruby binding of Groonga."} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] highlight_full can be used in only --output_columns in /reference/commands/select. highlight_full requires Groonga 4.0.5 or later. highlight_full requires /reference/command/command_version 2 or later. The following example uses HTML escape and normalzier is NormalizeAuto. It specifies the tags <span class="keyword1"> and </span> of the keyword groonga, and the tags <span class="keyword2"> and </span> of the keyword mysql. Execution example: select Entries --output_columns 'highlight_full(body, "NormalizerAuto", true, "Groonga", "<span class=\\"keyword1\\">", "</span>", "mysql", "<span class=\\"keyword2\\">", "</span>")' --command_version 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "highlight_full", # "null" # ] # ], # [ # "Mroonga is a <span class=\"keyword2\">MySQL</span> storage engine based on <span class=\"keyword1\">Groonga</span>. <b>Rroonga</b> is a Ruby binding of <span class=\"keyword1\">Groonga</span>." # ] # ] # ] # ] The text are scanned by the keywords for tagging after they are normalized by NormalizerAuto normalizer. --query "groonga mysql" matches to the first record's body. highight_full surrounds the keywords groonga contained in the text with <span class="keyword1"> and </span>, and the keywords mysql contained in the text with with <span class="keyword2"> and </span>. Special characters such as < and > are escapsed as < and >. You can specify string literal instead of column. Execution example: select Entries --output_columns 'highlight_full("Groonga is very fast fulltext search engine.", "NormalizerAuto", true, "Groonga", "<span class=\\"keyword1\\">", "</span>", "mysql", "<span class=\\"keyword2\\">", "</span>")' --command_version 2 --match_columns body --query "groonga" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "highlight_full", # "null" # ] # ], # [ # "<span class=\"keyword1\">Groonga</span> is very fast fulltext search engine." # ] # ] # ] # ] Parameters There are three required parameters, column, normalizer_name and use_html_escape. There are three or over optional parameters, keywordN, open_tagN and end_tagN. column Specifies a column of the table. normalizer_name Specifies a normalizer name. use_html_escape Specifies use or not use HTML escape. If it is true , use HTML escape. If it is false , not use HTML escape. keywordN Specifies a keyword for tagging. You can specify multiple keywords for each three arguments. open_tagN Specifies a open tag. You can specify multiple open tags for each three arguments. close_tagN Specifies a close tag. You can specify multiple close tags for each three arguments. Return value highlight_full returns a tagged string or null. If highlight_full can't find any keywords, it returns null. See also • /reference/commands/select • /reference/functions/highlight_html highlight_html CAUTION: This feature is experimental. API will be changed. New in version 4.0.5. Summary highlight_html tags target text. It can use to highlight the search keywords. The tagged text are prepared for embedding HTML. Special characters such as < and > are escapsed as < and >. Keyword is surrounded with <span class="keyword"> and </span>. For example, a tagged text of I am a groonga user. <3 for keyword groonga is I am a <span class="keyword">groonga</span> user. <3. Syntax This function has only one parameter: highlight_html(text) Usage Here are a schema definition and sample data to show usage. Execution example: table_create Entries TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Entries body COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Entries [ {"body": "Mroonga is a MySQL storage engine based on Groonga. <b>Rroonga</b> is a Ruby binding of Groonga."} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] highlight_html can be used in only --output_columns in /reference/commands/select. highlight_html requires /reference/command/command_version 2 or later. You also need to specify --query and/or --filter. Keywords are extracted from --query and --filter arguments. The following example uses --query "groonga mysql". In this case, groonga and mysql are used as keywords. Execution example: select Entries --output_columns --match_columns body --query 'groonga mysql' --output_columns 'highlight_html(body)' --command_version 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "highlight_html", # "null" # ] # ], # [ # "Mroonga is a <span class=\"keyword\">MySQL</span> storage engine based on <span class=\"keyword\">Groonga</span>. <b>Rroonga</b> is a Ruby binding of <span class=\"keyword\">Groonga</span>." # ] # ] # ] # ] The text are scanned by the keywords for tagging after they are normalized by NormalizerAuto normalizer. --query "groonga mysql" matches to only the first record's body. highlight_html(body) surrounds the keywords groonga or mysql contained in the text with <span class="keyword"> and </span>. You can specify string literal instead of column. Execution example: select Entries --output_columns 'highlight_html("Groonga is very fast fulltext search engine.")' --command_version 2 --match_columns body --query "groonga" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "highlight_html", # "null" # ] # ], # [ # "<span class=\"keyword\">Groonga</span> is very fast fulltext search engine." # ] # ] # ] # ] Parameters This section describes all parameters. Required parameters There is only one required parameters. text The text to be highlighted in HTML. Optional parameters There is no optional parameter. Return value highlight_html returns a tagged string or null. If highlight_html can't find any keywords, it returns null. See also • /reference/commands/select • /reference/functions/highlight_full html_untag Summary html_untag strips HTML tags from HTML and outputs plain text. html_untag is used in --output_columns described at select-output-columns. Syntax html_untag requires only one argument. It is html. html_untag(html) Requirements html_untag requires Groonga 3.0.5 or later. html_untag requires /reference/command/command_version 2 or later. Usage Here are a schema definition and sample data to show usage. Sample schema: Execution example: table_create WebClips TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create WebClips content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table WebClips [ {"_key": "http://groonga.org", "content": "groonga is <span class='emphasize'>fast</span>"}, {"_key": "http://mroonga.org", "content": "mroonga is <span class=\"emphasize\">fast</span>"}, ] # [[0, 1337566253.89858, 0.000355720520019531], 2] Here is the simple usage of html_untag function which strips HTML tags from content of column. Execution example: select WebClips --output_columns "html_untag(content)" --command_version 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "html_untag", # "null" # ] # ], # [ # "groonga is fast" # ], # [ # "mroonga is fast" # ] # ] # ] # ] When executing the above query, you can see "span" tag with "class" attribute is stripped. Note that you must specify --command_version 2 to use html_untag function. Parameters There is one required parameter, html. html Specifies HTML text to be untagged. Return value html_untag returns plain text which is stripped HTML tags from HTML text. in_values Summary New in version 4.0.7. in_values enables you to simplify the query which uses multiple OR or ==. It is recommended to use this function in point of view about performance improvements in such a case. Syntax in_values requires two or more arguments - target_value and multiple value. in_values(target_value, value1, ..., valueN) Usage Here is a schema definition and sample data. Sample schema: Execution example: table_create Tags TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Memos TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos tag COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Tags memos_tag COLUMN_INDEX Memos tag # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Memos [ {"_key": "Groonga is fast", "tag": "groonga"}, {"_key": "Mroonga is fast", "tag": "mroonga"}, {"_key": "Rroonga is fast", "tag": "rroonga"}, {"_key": "Droonga is fast", "tag": "droonga"}, {"_key": "Groonga is a HTTP server", "tag": "groonga"} ] # [[0, 1337566253.89858, 0.000355720520019531], 5] Here is the simple usage of in_values function which selects the records - the value of tag column is "groonga" or "mroonga" or "droonga". Execution example: select Memos --output_columns _key,tag --filter 'in_values(tag, "groonga", "mroonga", "droonga")' --sortby _id # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 4 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "tag", # "ShortText" # ] # ], # [ # "Groonga is fast", # "groonga" # ], # [ # "Mroonga is fast", # "mroonga" # ], # [ # "Droonga is fast", # "droonga" # ], # [ # "Groonga is a HTTP server", # "groonga" # ] # ] # ] # ] When executing the above query, you can get the records except "rroonga" because "rroonga" is not specified as value in in_values. Parameters There are two or more required parameter, target_value and multiple value. target_value Specifies a column of the table that is specified by table parameter in select. value Specifies a value of the column which you want to select. Return value in_values returns whether the value of column exists in specified the value of parameters or not. If record is matched to specified the value of parameters, it returns true. Otherwise, it returns false. now 名前 now - 現在時刻を返す 書式 now() 説明 Groonga組込関数の一つであるnowについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。 now() 関数は現在時刻に対応するTime型の値を返します。 返値 現在時刻に対応するTime型のオブジェクトを返します。 例 now() 1256791194.55541 prefix_rk_search() Summary prefix_rk_search() selects records by /reference/operations/prefix_rk_search. You need to create table-pat-key table for prefix RK search. You can't use prefix_rk_search() for sequential scan. It's a selector only procedure. Syntax prefix_rk_search() requires two arguments. They are column and query: prefix_rk_search(column, query) column must be _key for now. query must be string. Usage Here are a schema definition and sample data to show usage: Execution example: table_create Readings TABLE_PAT_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Readings [ {"_key": "ニホン"}, {"_key": "ニッポン"}, {"_key": "ローマジ"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] Here is the simple usage of prefix_rk_search() function which selects ニホン and ニッポン by ni: Execution example: select Readings --filter 'prefix_rk_search(_key, "ni")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ] # ], # [ # 2, # "ニッポン" # ], # [ # 1, # "ニホン" # ] # ] # ] # ] You can implement /reference/suggest/completion like feature by combining sub_filter. Create a table that has candidates of completion as records. Each records have zero or more readings. They are stored into Readings table. Don't forget define an index column for Items.readings in Readings table. The index column is needed for sub_filter: Execution example: table_create Items TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Items readings COLUMN_VECTOR Readings # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Readings items_index COLUMN_INDEX Items readings # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Items [ {"_key": "日本", "readings": ["ニホン", "ニッポン"]}, {"_key": "ローマ字", "readings": ["ローマジ"]}, {"_key": "漢字", "readings": ["カンジ"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] You can find 日本 record in Items table by niho. Because prefix RK search with niho selects ニホン reading and ニホン reading is one of readings of 日本 record: Execution example: select Items \ --filter 'sub_filter(readings, "prefix_rk_search(_key, \\"niho\\")")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "readings", # "Readings" # ] # ], # [ # 1, # "日本", # [ # "ニホン", # "ニッポン" # ] # ] # ] # ] # ] You need to combine script-syntax-prefix-search-operator to support no reading completion targets. Add one no reading completion target: Execution example: load --table Items [ {"_key": "nihon", "readings": []} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] Combine script-syntax-prefix-search-operator to support no reading completion targets: Execution example: select Items \ --filter 'sub_filter(readings, "prefix_rk_search(_key, \\"niho\\")") || \ _key @^ "niho"' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "readings", # "Readings" # ] # ], # [ # 1, # "日本", # [ # "ニホン", # "ニッポン" # ] # ], # [ # 4, # "nihon", # [] # ] # ] # ] # ] Normally, you want to use case insensitive search for completion. Use --normalizer NormalizerAuto and label column for the case: Execution example: table_create LooseItems TABLE_HASH_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create LooseItems label COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create LooseItems readings COLUMN_VECTOR Readings # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Readings loose_items_index COLUMN_INDEX LooseItems readings # [[0, 1337566253.89858, 0.000355720520019531], true] load --table LooseItems [ {"_key": "日本", "label": "日本", "readings": ["ニホン", "ニッポン"]}, {"_key": "ローマ字", "label": "ローマ字", "readings": ["ローマジ"]}, {"_key": "漢字", "label": "漢字", "readings": ["カンジ"]}, {"_key": "Nihon", "label": "日本", "readings": []} ] # [[0, 1337566253.89858, 0.000355720520019531], 4] Use LooseItems.label for display: Execution example: select LooseItems \ --filter 'sub_filter(readings, "prefix_rk_search(_key, \\"nIhO\\")") || \ _key @^ "nIhO"' \ --output_columns '_key,label' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "label", # "ShortText" # ] # ], # [ # "日本", # "日本" # ], # [ # "nihon", # "日本" # ] # ] # ] # ] Parameters There are two required parameter, column and query. column Always specifies _key for now. query Specifies a query in romaji, katakana or hiragana as string. Return value prefix_rk_search() function returns matched records. See also • /reference/operations/prefix_rk_search • /reference/functions/sub_filter query Summary query provides --match_columns and --query parameters of /reference/commands/select feature as function. You can specify multiple query functions in --filter parameter in /reference/commands/select. Because of such flexibility, you can control full text search behavior by combination of multiple query functions. query can be used in only --filter in /reference/commands/select. Syntax query requires two arguments - match_columns and query_string. The parameter query_expander or substitution_table is optional. query(match_columns, query_string) query(match_columns, query_string, query_expander) query(match_columns, query_string, substitution_table) Usage Here are a schema definition and sample data to show usage. Sample schema: Execution example: table_create Documents TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Documents content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Users TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users name COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users memo COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Lexicon TABLE_HASH_KEY ShortText \ --default_tokenizer TokenBigramSplitSymbolAlphaDigit \ --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon users_memo COLUMN_INDEX|WITH_POSITION Users memo # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Users [ {"name": "Alice", "memo": "groonga user"}, {"name": "Alisa", "memo": "mroonga user"}, {"name": "Bob", "memo": "rroonga user"}, {"name": "Tom", "memo": "nroonga user"}, {"name": "Tobby", "memo": "groonga and mroonga user. mroonga is ..."}, ] # [[0, 1337566253.89858, 0.000355720520019531], 5] Here is the simple usage of query function which execute full text search by keyword 'alice' without using --match_columns and --query arguments in --filter. Execution example: select Users --output_columns name,_score --filter 'query("name * 10", "alice")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "name", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Alice", # 10 # ] # ] # ] # ] When executing above query, the keyword 'alice' is weighted to the value - '10'. Here are the contrasting examples with/without query. Execution example: select Users --output_columns name,memo,_score --match_columns "memo * 10" --query "memo:@groonga OR memo:@mroonga OR memo:@user" --sortby -_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "name", # "ShortText" # ], # [ # "memo", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Tobby", # "groonga and mroonga user. mroonga is ...", # 4 # ], # [ # "Alice", # "groonga user", # 2 # ], # [ # "Alisa", # "mroonga user", # 2 # ], # [ # "Bob", # "rroonga user", # 1 # ], # [ # "Tom", # "nroonga user", # 1 # ] # ] # ] # ] In this case, the keywords 'groonga' and 'mroonga' and 'user' are given same weight value. You can't pass different weight value to each keyword in this way. Execution example: select Users --output_columns name,memo,_score --filter 'query("memo * 10", "groonga") || query("memo * 20", "mroonga") || query("memo * 1", "user")' --sortby -_score # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 5 # ], # [ # [ # "name", # "ShortText" # ], # [ # "memo", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "Tobby", # "groonga and mroonga user. mroonga is ...", # 51 # ], # [ # "Alisa", # "mroonga user", # 21 # ], # [ # "Alice", # "groonga user", # 11 # ], # [ # "Tom", # "nroonga user", # 1 # ], # [ # "Bob", # "rroonga user", # 1 # ] # ] # ] # ] On the other hand, by specifying multiple query, the keywords 'groonga' and 'mroonga' and 'user' are given different value of weight. As a result, you can control full text search result by giving different weight to the keywords on your purpose. Parameters Required parameters There are two required parameter, match_columns and query_string. match_columns Specifies the default target column for fulltext search by query_string parameter value. It is the same role as select-match-columns parameter in select. query_string Specifies the search condition in /reference/grn_expr/query_syntax. It is the same role as query parameter in select. See select-match-columns about query parameter in select. Optional parameter There are some optional parameters. query_expander Specifies the plugin name for query expansion. There is one plugin bundled in official release - /reference/query_expanders/tsv. See /reference/query_expanders/tsv about details. substitution_table Specifies the substitution table and substitution column name by following format such as ${TABLE}.${COLUMN} for query expansion. See select-query-expander about details. Return value query returns whether any record is matched or not. If one or more records are matched, it returns true. Otherwise, it returns false. TODO • Support query_flags See also • /reference/commands/select rand 名前 rand - 乱数を生成する 書式 rand([max]) 説明 Groonga組込関数の一つであるrandについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。 rand() 関数は 0 から max の間の疑似乱数整数を返します。 引数 max 返値の最大値を指定します。省略した場合は RAND_MAX が指定されたものとみなされます。 返値 0 と max の間の数を表すInt32型の値を返します。 例 rand(10) 3 snippet_html CAUTION: This feature is experimental. API will be changed. Summary snippet_html extracts snippets of target text around search keywords (KWIC. KeyWord In Context). The snippets are prepared for embedding HTML. Special characters such as < and > are escapsed as < and >. Keyword is surrounded with <span class="keyword"> and </span>. For example, a snippet of I am a groonga user. <3 for keyword groonga is I am a <span class="keyword">groonga</span> user. <3. Syntax snippet_html has only one parameter: snippet_html(column) snippet_html has many parameters internally but they can't be specified for now. You will be able to custom those parameters soon. Usage Here are a schema definition and sample data to show usage. Execution example: table_create Documents TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Documents content COLUMN_SCALAR Text # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Documents [ ["content"], ["Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of groonga is that a newly registered document instantly appears in search results. Also, groonga allows updates without read locks. These characteristics result in superior performance on real-time applications."], ["Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, groonga can cover weakness of row-oriented systems."] ] # [[0, 1337566253.89858, 0.000355720520019531], 2] snippet_html can be used in only --output_columns in /reference/commands/select. You need to specify --command_version 2 argument explicitly because function call in --output_columns is experimental feature in Groonga 2.0.9. It will be enabled by default soon. You also need to specify --query and/or --filter. Keywords are extracted from --query and --filter arguments. The following example uses --query "fast performance". In this case, fast and performance are used as keywords. Execution example: select Documents --output_columns "snippet_html(content)" --command_version 2 --match_columns content --query "fast performance" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "snippet_html", # "null" # ] # ], # [ # [ # "Groonga is a <span class=\"keyword\">fast</span> and accurate full text search engine based on inverted index. One of the characteristics of groonga is that a newly registered document instantly appears in search results. Also, gro", # "onga allows updates without read locks. These characteristics result in superior <span class=\"keyword\">performance</span> on real-time applications." # ] # ] # ] # ] # ] --query "fast performance" matches to only the first record's content. snippet_html(content) extracts two text parts that include the keywords fast or performance and surrounds the keywords with <span class="keyword"> and </span>. The max number of text parts is 3. If there are 4 or more text parts that include the keywords, only the leading 3 parts are only used. The max size of a text part is 200byte. The unit is bytes not characters. The size doesn't include inserted <span keyword="keyword"> and </span>. Both the max number of text parts and the max size of a text part aren't customizable. You can specify string literal instead of column. Execution example: select Documents --output_columns 'snippet_html("Groonga is very fast fulltext search engine.")' --command_version 2 --match_columns content --query "fast performance" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "snippet_html", # "null" # ] # ], # [ # [ # "Groonga is very <span class=\"keyword\">fast</span> fulltext search engine." # ] # ] # ] # ] # ] Return value snippet_html returns an array of string or null. If snippet_html can't find any snippets, it returns null. An element of array is a snippet: [SNIPPET1, SNIPPET2, SNIPPET3] A snippet includes one or more keywords. The max byte size of a snippet except <span keyword="keyword"> and </span> is 200byte. The unit isn't the number of characters. The array size is larger than or equal to 0 and less than or equal to 3. The max size 3 will be customizable soon. TODO • Make the max number of text parts customizable. • Make the max size of a text part customizable. • Make keywords customizable. • Make tag that surrounds a keyword customizable. • Make normalization customizable. • Support options by object literal. See also • /reference/commands/select sub_filter Summary sub_filter evaluates filter_string in scope context. sub_filter can be used in only --filter in /reference/commands/select. Syntax sub_filter requires two arguments. They are scope and filter_string. sub_filter(scope, filter_string) Usage Here are a schema definition and sample data to show usage. Sample schema: Execution example: table_create Comment TABLE_PAT_KEY UInt32 # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comment name COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comment content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Blog TABLE_PAT_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Blog title COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Blog content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Blog comments COLUMN_VECTOR Comment # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Comment blog_comment_index COLUMN_INDEX Blog comments # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon comment_content COLUMN_INDEX|WITH_POSITION Comment content # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon comment_name COLUMN_INDEX|WITH_POSITION Comment name # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Lexicon blog_content COLUMN_INDEX|WITH_POSITION Blog content # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Comment [ {"_key": 1, "name": "A", "content": "groonga"}, {"_key": 2, "name": "B", "content": "groonga"}, {"_key": 3, "name": "C", "content": "rroonga"}, {"_key": 4, "name": "A", "content": "mroonga"}, ] # [[0, 1337566253.89858, 0.000355720520019531], 4] load --table Blog [ {"_key": "groonga's blog", "content": "content of groonga's blog", comments: [1, 2, 3]}, {"_key": "mroonga's blog", "content": "content of mroonga's blog", comments: [2, 3, 4]}, {"_key": "rroonga's blog", "content": "content of rroonga's blog", comments: [3]}, ] # [[0, 1337566253.89858, 0.000355720520019531], 3] Here is the simple usage of sub_filter function which extracts the blog entry commented by user 'A'. Execution example: select Blog --output_columns _key --filter "comments.name @ \"A\" && comments.content @ \"groonga\"" # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "groonga's blog" # ], # [ # "mroonga's blog" # ] # ] # ] # ] When executing the above query, not only "groonga's blog", but also "mroonga's blog". This is not what you want because user "A" does not mention "groonga" to "mroonga's blog". Without sub_filter, it means that following conditions are met. • There is at least one record that user "A" commented out. • There is at least one record that mentioned about "groonga". Execution example: select Blog --output_columns _key --filter 'sub_filter(comments, "name @ \\"A\\" && content @ \\"groonga\\"")' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ] # ], # [ # "groonga's blog" # ] # ] # ] # ] On the other hand, executing the above query returns the intended result. Because the arguments of sub_filter is evaluated in comments column's context. It means that sub_filter requires the following condition is met. • There are the records that user "A" mentions about "groonga". Parameters There are two required parameter, scope and filter_string. scope Specifies a column of the table that is specified by table parameter in select. The column has a limitation. The limitation is described later. filter_string is evaluated in the column context. It means that filter_string is evaluated like select --table TYPE_OF_THE_COLUMN --filter FILTER_STRING. The specified column type must be a table. In other words, the column type must be reference type. You can chain columns by COLUMN_1.COLUMN_2.COLUMN_3...COLUMN_N syntax. For example, user.group.name. See select-table about table parameter in select. filter_string Specifies a search condition in /reference/grn_expr/script_syntax. It is evaluated in scope context. Return value sub_filter returns whether any record is matched or not. If one or more records are matched, it returns true. Otherwise, it returns false. See also • /reference/commands/select • /reference/grn_expr/script_syntax vector_size Summary New in version 5.0.3. vector_size returns the value of vector column size. To enable this function, register functions/vector plugin by following the command: plugin_register functions/vector Then, use vector_size function with --command_version 2 option. Note that you must specify --command_version 2 to use vector_size function." Syntax vector_size requires one argument - target. vector_size(target) Usage Here is a schema definition and sample data. Sample schema: Execution example: table_create Memos TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Memos tags COLUMN_VECTOR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] Sample data: Execution example: load --table Memos [ {"_key": "Groonga", "tags": ["Groonga"]}, {"_key": "Rroonga", "tags": ["Groonga", "Ruby"]}, {"_key": "Nothing"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] Here is the simple usage of vector_size function which returns tags and size - the value of tags column and size of it. Execution example: select Memos --output_columns 'tags, vector_size(tags)' --command_version 2 # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 3 # ], # [ # [ # "tags", # "ShortText" # ], # [ # "vector_size", # "Object" # ] # ], # [ # [ # "Groonga" # ], # 1 # ], # [ # [ # "Groonga", # "Ruby" # ], # 2 # ], # [ # [], # 0 # ] # ] # ] # ] Parameters There is one required parameter, target. target Specifies a vector column of table that is specified by table parameter in select. Return value vector_size returns the value of target vector column size. Operations Groonga has the multiple search operations. This section describes about search operations. Geolocation search Groonga supports geolocation search. It uses index for search. It means that you can search by geolocation fast like fulltext search. Supported features Groonga supports only point as data type. Line, surface and so on aren't supported yet. Here is a feature list: 1. Groonga can store a point to a column. 2. Groonga can search records that have a point in the specified rectangle. 3. Groonga can search records that have a point in the specified circle. 4. Groonga can calculate distance between two points. 5. Groonga can sort records by distance from the specified point in ascending order. Here are use cases for Groonga's geolocation search: • You list McDonald's around a station. • You list KFC around the current location sort by distance from the current location in ascending order with distance. Here are not use cases: • You search McDonald's in a city. (Groonga doesn't support geolocation search by a shape except a rectangle and a circle.) • You store a region instead of a point as a lake record. (A column can't has geolocation data except a point.) The following figures show about Groonga's geolocation search features. Here is a figure that only has records. A black point describes a record. The following figures shows how records are treated. [image: only records] [image] Coming soon... Prefix RK search Summary Groonga supports prefix RK search. RK means Romaji and Kana (reading). Prefix RK search can find registered text in katakana by query in romaji, hiragana or katakana. Found registered texts are started with query. Prefix RK search is useful for completing Japanese text. Because romaji is widely used to input Japanese on computer. See also Japanese input methods on Wikipedia. If users can search Japanese text in romaji, users doesn't need to convert romaji to hiragana, katakana or kanji by themselves. For example, if you register a reading for "日本" as "ニホン", users can find "日本" by "ni", "に" or "二". The feature is helpful because it reduces one or more operations of users. This feature is used in /reference/suggest/completion. You can use this feature in select-filter by /reference/functions/prefix_rk_search. Usage You need table-pat-key table for using prefix RK search. You need to put reading in katakana to TABLE_PAT_KEY as key: Execution example: table_create Readings TABLE_PAT_KEY ShortText --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Readings [ {"_key": "ニホン"}, {"_key": "ニッポン"}, {"_key": "ローマジ"} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] You can finds ニホン and ニッポン by prefix RK search with ni as query from the Readings table. You can finds ローマジ by prefix RK search with r as query from the Readings table. How to convert romaji to reading Prefix RK search is based on JIS X 4063:2000 specification. The specification was obsoleted. See ローマ字入力 on Japanese Wikipedia for JIS X 4063:2000. Normally, you can get converted results as expected. See also • /reference/suggest/completion • /reference/functions/prefix_rk_search Configuration New in version 5.1.2. Groonga can manage configuration items in each database. These configuration items are persistent. It means that these configuration items are usable after a Groonga process exits. Summary You can change some Groonga behaviors such as /spec/search by some ways such as request parameter (select-match-escalation-threshold) and build parameter (install-configure-with-match-escalation-threshold). Configuration is one of these ways. You can change some Groonga behaviors per database by configuration. A configuration item consists of key and value. Both of key and value are string. The max key size is 4KiB. The max value size is 4091B (= 4KiB - 5B). You can set a configuration item by /reference/commands/config_set. You can get a configuration item by /reference/commands/config_get. You can delete a configuration item by /reference/commands/config_delete. You can confirm all configuration items by /reference/commands/dump. Commands Alias New in version 5.1.2. You can refer a table and column by multiple names by using alias feature. Summary The alias feature is useful for the following cases: • You want to rename a table but you can't change some Groonga clients that uses the current table name. • You want to change column type without downtime. In the former case, some Groonga clients can use the current table name after you rename a table. Because the alias feature maps the current table name to the renamed new table name. In the latter case, all Groonga clients access the column by aliased name such as aliased_column. aliased_column refers current_column. You create a new column new_column with new type and copy data from current_column by /reference/commands/column_copy. You change aliased_column to refer new_column from current_column. Now, all Groonga clients access new_column by aliased_column without stopping search requests. Usage You manage alias to real name mapping by a normal table and a normal column. You can use any table type except table-no-key for the table. table-hash-key is recommended because exact key match search is only used for the alias feature. table-hash-key is the fastest table type for exact key match search. The column must be /reference/columns/scalar and type is ShortText. You can also use Text and LongText types but they are meaningless. Because the max table/column name size is 4KiB. ShortText can store 4KiB data. Here are example definitions of table and column for managing aliases: Execution example: table_create Aliases TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Aliases real_name COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] You need to register the table and column by configuration. The alias feature uses alias.column configuration item. You can register the table and column by the following /reference/commands/config_set: Execution example: config_set alias.column Aliases.real_name # [[0, 1337566253.89858, 0.000355720520019531], true] Here are schema and data to show how to use alias: Execution example: table_create Users TABLE_HASH_KEY ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Users age COLUMN_SCALAR UInt8 # [[0, 1337566253.89858, 0.000355720520019531], true] load --table Users [ {"_key": "alice", "age": 14}, {"_key": "bob", "age": 29} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] You can use Users.age in /reference/commands/select: Execution example: select Users --filter 'age < 20' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "age", # "UInt8" # ] # ], # [ # 1, # "alice", # 14 # ] # ] # ] # ] You can't use Users.age when you rename Users.age to Users.years by /reference/commands/column_rename: Execution example: column_rename Users age years # [[0, 1337566253.89858, 0.000355720520019531], true] select Users --filter 'age < 20' # [ # [ # -63, # 1337566253.89858, # 0.000355720520019531, # "Syntax error: <age| |< 20>", # [ # [ # "yy_syntax_error", # "grn_ecmascript.lemon", # 34 # ] # ] # ], # [] # ] But you can use Users.age by registering Users.age to Users.years mapping to Aliases. Execution example: load --table Aliases [ {"_key": "Users.age", "real_name": "Users.years"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] select Users --filter 'age < 20' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "years", # "UInt8" # ] # ], # [ # 1, # "alice", # 14 # ] # ] # ] # ] Now, you can use Users.age as alias of Users.years. How to resolve alias This section describes how to resolve alias. Groonga uses the alias feature when nonexistent object name (table name, column name, command name, function name and so on) is referred. It means that you can't override existing object (table, column, command, function and so on) by the alias feature. For example, alias isn't resolved in the following example because Users.years exists: Execution example: column_rename Users years years_old # [[0, 1337566253.89858, 0.000355720520019531], true] select Users --filter 'age < 20' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "years_old", # "UInt8" # ] # ], # [ # 1, # "alice", # 14 # ] # ] # ] # ] Alias is resolved recursively. If you rename Users.years to Users.years_old and you refer Users.age, Groonga replaces Users.age with Users.years and then Users.years with Users.years_old. Because Aliases table has the following records: ┌────────────┬─────────────────┐ │_key │ real_name │ ├────────────┼─────────────────┤ │Users.age │ Users.years │ ├────────────┼─────────────────┤ │Users.years │ Users.years_old │ └────────────┴─────────────────┘ Here is an example to Users.age is resolved recursively: Execution example: column_rename Users years years_old # [[0, 1337566253.89858, 0.000355720520019531], true] select Users --filter 'age < 20' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "_key", # "ShortText" # ], # [ # "years_old", # "UInt8" # ] # ], # [ # 1, # "alice", # 14 # ] # ] # ] # ] See also • /reference/configuration • /reference/commands/config_set • /reference/commands/table_create • /reference/commands/column_create • /reference/commands/select Suggest Groonga has the suggest feature. This section describes how to use it and how it works. Introduction The suggest feature in Groonga provides the following features: • Completion • Correction • Suggestion Completion Completion helps user input. If user inputs a partial word, Groonga can return complete words from registered words. For example, there are registered words: • "groonga" • "complete" • "correction" • "suggest" An user inputs "co" and groonga returns "complete" and "correction" because they starts with "co". An user inputs "sug" and groonga returns "suggest" because "suggest" starts with "sug". An user inputs "ab" and groonga returns nothing because no word starts with "ab". Correction Correction also helps user input. If user inputs a wrong word, groonga can return correct words from registered correction pairs. For example, there are registered correction pairs: ┌───────────┬──────────────┐ │wrong word │ correct word │ ├───────────┼──────────────┤ │grroonga │ groonga │ ├───────────┼──────────────┤ │gronga │ groonga │ ├───────────┼──────────────┤ │gronnga │ groonga │ └───────────┴──────────────┘ An user inputs "gronga" and groonga returns "groonga" because "gronga" is in wrong word and corresponding correct word is "groonga". An user inputs "roonga" and groonga returns nothing because "roonga" isn't in wrong word. Suggestion Suggestion helps that user filters many found documents. If user inputs a query, groonga can return new queries that has more additional keywords from registered related query pairs. For example, there are registered related query pairs: ┌────────┬───────────────────────┐ │keyword │ related query │ ├────────┼───────────────────────┤ │groonga │ groonga search engine │ ├────────┼───────────────────────┤ │search │ Google search │ ├────────┼───────────────────────┤ │speed │ groonga speed │ └────────┴───────────────────────┘ An user inputs "groonga" and groonga returns "groonga search engine" because "groonga" is in keyword column and related query column is "groonga search engine". An user inputs "MySQL" and groonga returns nothing because "MySQL" isn't in keyword column values. Learning The suggest feature requires registered data before using the feature. Those data can be registered from user inputs. Gronnga-suggest-httpd and groonga-suggest-learner commands are provided for the propose. Completion This section describes about the following completion features: • How it works • How to use • How to learn How it works The completion feature uses three searches to compute completed words: 1. Prefix RK search against registered words. 2. Cooccurrence search against learned data. 3. Prefix search against registered words. (optional) Prefix RK search See /reference/operations/prefix_rk_search for prefix RK search. If you create dataset which is named as example by /reference/executables/groonga-suggest-create-dataset executable file, you can update pairs of registered word and its reading by loading data to _key and kana column of item_example table explicitly for prefix RK search. Cooccurrence search Cooccurrence search can find registered words from user's partial input. It uses user input sequences that will be learned from query logs, access logs and so on. For example, there is the following user input sequence: ┌────────┬────────────┐ │input │ submit │ ├────────┼────────────┤ │s │ no │ ├────────┼────────────┤ │se │ no │ ├────────┼────────────┤ │sea │ no │ ├────────┼────────────┤ │sear │ no │ ├────────┼────────────┤ │searc │ no │ ├────────┼────────────┤ │search │ yes │ ├────────┼────────────┤ │e │ no │ ├────────┼────────────┤ │en │ no │ ├────────┼────────────┤ │eng │ no │ ├────────┼────────────┤ │engi │ no │ ├────────┼────────────┤ │engin │ no │ ├────────┼────────────┤ │engine │ no │ ├────────┼────────────┤ │enginen │ no (typo!) │ ├────────┼────────────┤ │engine │ yes │ └────────┴────────────┘ Groonga creates the following completion pairs: ┌────────┬────────────────┐ │input │ completed word │ ├────────┼────────────────┤ │s │ search │ ├────────┼────────────────┤ │se │ search │ ├────────┼────────────────┤ │sea │ search │ ├────────┼────────────────┤ │sear │ search │ ├────────┼────────────────┤ │searc │ search │ ├────────┼────────────────┤ │e │ engine │ ├────────┼────────────────┤ │en │ engine │ ├────────┼────────────────┤ │eng │ engine │ ├────────┼────────────────┤ │engi │ engine │ ├────────┼────────────────┤ │engin │ engine │ ├────────┼────────────────┤ │engine │ engine │ └────────┴────────────────┘ │enginen │ engine │ └────────┴────────────────┘ All user not-submitted inputs (e.g. "s", "se" and so on) before each an user submission maps to the submitted input (e.g. "search"). To be precise, this description isn't correct because it omits about time stamp. Groonga doesn't case about "all user not-submitted inputs before each an user submission". Groonga just case about "all user not-submitted inputs within a minute from an user submission before each an user submission". Groonga doesn't treat user inputs before a minute ago. If an user inputs "sea" and cooccurrence search returns "search" because "sea" is in input column and corresponding completed word column value is "search". Prefix search Prefix search can find registered word that start with user's input. This search doesn't care about romaji, katakana and hiragana not like prefix RK search. This search isn't always ran. It's just ran when it's requested explicitly or both prefix RK search and cooccurrence search return nothing. For example, there is a registered word "search". An user can find "search" by "s", "se", "sea", "sear", "searc" and "search". How to use Groonga provides /reference/commands/suggest command to use completion. --type complete option requests completion. For example, here is an command to get completion results by "en": Execution example: suggest --table item_query --column kana --types complete --frequency_threshold 1 --query en # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "complete": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "engine", # 1 # ] # ] # } # ] How it learns Cooccurrence search uses learned data. They are based on query logs, access logs and so on. To create learned data, Groonga needs user input sequence with time stamp and user submit input with time stamp. For example, an user wants to search by "engine". The user inputs the query with the following sequence: 1. 2011-08-10T13:33:23+09:00: e 2. 2011-08-10T13:33:23+09:00: en 3. 2011-08-10T13:33:24+09:00: eng 4. 2011-08-10T13:33:24+09:00: engi 5. 2011-08-10T13:33:24+09:00: engin 6. 2011-08-10T13:33:25+09:00: engine (submit!) Groonga can be learned from the input sequence by the following command: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "1", "time": 1312950803.86057, "item": "e"}, {"sequence": "1", "time": 1312950803.96857, "item": "en"}, {"sequence": "1", "time": 1312950804.26057, "item": "eng"}, {"sequence": "1", "time": 1312950804.56057, "item": "engi"}, {"sequence": "1", "time": 1312950804.76057, "item": "engin"}, {"sequence": "1", "time": 1312950805.86057, "item": "engine", "type": "submit"} ] How to update reading data Groonga requires registered word and its reading for prefix RK search. This section describes how to register a word and its reading. Here is an example to register "日本" which means Japan in English: Execution example: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "1", "time": 1312950805.86058, "item": "日本", "type": "submit"} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] Here is an example to update reading data to complete "日本": Execution example: load --table item_query [ {"_key":"日本", "kana":["ニホン", "ニッポン"]} ] # [[0, 1337566253.89858, 0.000355720520019531], 1] Then you can complete registered word "日本" by Romaji input - "nihon". Execution example: suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "complete": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "日本", # 2 # ] # ] # } # ] Without loading above reading data, you can't complete registered word "日本" by query - "nihon". You can register multiple readings for a registered word because kana column in item_query table is defined as a /reference/columns/vector. This is the reason that you can also complete the registered word "日本" by query - "nippon". Execution example: suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nippon # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "complete": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "日本", # 2 # ] # ] # } # ] This feature is very convenient because you can search registered word even though Japanese input method is disabled. If there are multiple candidates as completed result, you can customize priority to set the value of boost column in item_query table. Here is an example to customize priority for prefix RK search: Execution example: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "1", "time": 1312950805.86059, "item": "日本語", "type": "submit"} {"sequence": "1", "time": 1312950805.86060, "item": "日本人", "type": "submit"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] load --table item_query [ {"_key":"日本語", "kana":"ニホンゴ"} {"_key":"日本人", "kana":"ニホンジン"} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "complete": [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "日本", # 2 # ], # [ # "日本人", # 2 # ], # [ # "日本語", # 2 # ] # ] # } # ] load --table item_query [ {"_key":"日本人", "boost": 100}, ] # [[0, 1337566253.89858, 0.000355720520019531], 1] suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "complete": [ # [ # 3 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "日本人", # 102 # ], # [ # "日本", # 2 # ], # [ # "日本語", # 2 # ] # ] # } # ] Correction This section describes about the following correction features: • How it works • How to use • How to learn How it works The correction feature uses three searches to compute corrected words: 1. Cooccurrence search against learned data. 2. Similar search against registered words. (optional) Cooccurrence search Cooccurrence search can find registered words from user's wrong input. It uses user submit sequences that will be learned from query logs, access logs and so on. For example, there are the following user submissions: ┌────────────────┬───────────────────────────┐ │query │ time │ ├────────────────┼───────────────────────────┤ │serach (typo!) │ 2011-08-10T22:20:50+09:00 │ ├────────────────┼───────────────────────────┤ │search (fixed!) │ 2011-08-10T22:20:52+09:00 │ └────────────────┴───────────────────────────┘ Groonga creates the following correction pair from the above submissions: ┌───────┬────────────────┐ │input │ corrected word │ ├───────┼────────────────┤ │serach │ search │ └───────┴────────────────┘ Groonga treats continuous submissions within a minute as input correction by user. Not submitted user input sequence between two submissions isn't used as learned data for correction. If an user inputs "serach" and cooccurrence search returns "search" because "serach" is in input column and corresponding corrected word column value is "search". Similar search Similar search can find registered words that has one or more the same tokens as user input. TokenBigram tokenizer is used for tokenization because suggest dataset schema created by /reference/executables/groonga-suggest-create-dataset uses TokenBigram tokenizer as the default tokenizer. For example, there is a registered query "search engine". An user can find "search engine" by "web search service", "sound engine" and so on. Because "search engine" and "web search engine" have the same token "search" and "search engine" and "sound engine" have the same token "engine". "search engine" is tokenized to "search" and "engine" tokens. (Groonga's TokenBigram tokenizer doesn't tokenize two characters for continuous alphabets and continuous digits for reducing search noise. TokenBigramSplitSymbolAlphaDigit tokenizer should be used to ensure tokenizing to two characters.) "web search service" is tokenized to "web", "search" and "service". "sound engine" is tokenized to "sound" and "engine". How to use Groonga provides /reference/commands/suggest command to use correction. --type correct option requests corrections. For example, here is an command to get correction results by "saerch": Execution example: suggest --table item_query --column kana --types correction --frequency_threshold 1 --query saerch # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "correct": [ # [ # 1 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search", # 1 # ] # ] # } # ] How it learns Cooccurrence search uses learned data. They are based on query logs, access logs and so on. To create learned data, groonga needs user submit inputs with time stamp. For example, an user wants to search by "search" but the user has typo "saerch" before inputs the correct query. The user inputs the query with the following sequence: 1. 2011-08-10T13:33:23+09:00: s 2. 2011-08-10T13:33:23+09:00: sa 3. 2011-08-10T13:33:24+09:00: sae 4. 2011-08-10T13:33:24+09:00: saer 5. 2011-08-10T13:33:24+09:00: saerc 6. 2011-08-10T13:33:25+09:00: saerch (submit!) 7. 2011-08-10T13:33:29+09:00: serch (correcting...) 8. 2011-08-10T13:33:30+09:00: search (submit!) Groonga can be learned from the input sequence by the following command: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "1", "time": 1312950803.86057, "item": "s"}, {"sequence": "1", "time": 1312950803.96857, "item": "sa"}, {"sequence": "1", "time": 1312950804.26057, "item": "sae"}, {"sequence": "1", "time": 1312950804.56057, "item": "saer"}, {"sequence": "1", "time": 1312950804.76057, "item": "saerc"}, {"sequence": "1", "time": 1312950805.76057, "item": "saerch", "type": "submit"}, {"sequence": "1", "time": 1312950809.76057, "item": "serch"}, {"sequence": "1", "time": 1312950810.86057, "item": "search", "type": "submit"} ] Suggestion This section describes about the following completion features: • How it works • How to use • How to learn How it works The suggestion feature uses a search to compute suggested words: 1. Cooccurrence search against learned data. Cooccurrence search Cooccurrence search can find related words from user's input. It uses user submissions that will be learned from query logs, access logs and so on. For example, there are the following user submissions: ┌────────────────────┐ │query │ ├────────────────────┤ │search engine │ ├────────────────────┤ │web search realtime │ └────────────────────┘ Groonga creates the following suggestion pairs: ┌─────────┬─────────────────────┐ │input │ suggested words │ ├─────────┼─────────────────────┤ │search │ search engine │ └─────────┴─────────────────────┘ │engine │ search engine │ ├─────────┼─────────────────────┤ │web │ web search realtime │ ├─────────┼─────────────────────┤ │search │ web search realtime │ ├─────────┼─────────────────────┤ │realtime │ web search realtime │ └─────────┴─────────────────────┘ Those pairs are created by the following steps: 1. Tokenizes user input query by TokenDelimit tokenizer that uses a space as token delimiter. (e.g. "search engine" is tokenized to two tokens "search" and "engine".) 2. Creates a pair that is consists of a token and original query for each token. If an user inputs "search" and cooccurrence search returns "search engine" and "web search realtime" because "search" is in two input columns and corresponding suggested word columns have "search engine" and "web search realtime". How to use Groonga provides /reference/commands/suggest command to use suggestion. --type suggest option requests suggestion For example, here is an command to get suggestion results by "search": Execution example: suggest --table item_query --column kana --types suggest --frequency_threshold 1 --query search # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "suggest": [ # [ # 2 # ], # [ # [ # "_key", # "ShortText" # ], # [ # "_score", # "Int32" # ] # ], # [ # "search engine", # 1 # ], # [ # "web search realtime", # 1 # ] # ] # } # ] How it learns Cooccurrence search uses learned data. They are based on query logs, access logs and so on. To create learned data, groonga needs user input sequence with time stamp and user submit input with time stamp. For example, an user wants to search by "engine". The user inputs the query with the following sequence: 1. 2011-08-10T13:33:23+09:00: search engine (submit) 2. 2011-08-10T13:33:28+09:00: web search realtime (submit) Groonga can be learned from the submissions by the following command: load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)' [ {"sequence": "1", "time": 1312950803.86057, "item": "search engine", "type": "submit"}, {"sequence": "1", "time": 1312950808.86057, "item": "web search realtime", "type": "submit"} ] How to extract learning data The learning data is stored into item_DATASET and pair_DATASET tables. By using select command for such tables, you can all extract learing data. Here is the query to extract all learning data: select item_DATASET --limit -1 select pair_DATASET --filter 'freq0 > 0 || freq1 > 0 || freq2 > 0' --limit -1 Without '--limit -1', you can't get all data. In pair table, the valid value of freq0, freq1 and freq2 column must be larger than 0. Don't execute above query via HTTP request because enourmous number of records are fetched. Indexing Groonga supports both online index construction and offline index construction since 2.0.0. Online index construction In online index construction, registered documents can be searchable quickly while indexing. But indexing requires more cost rather than indexing by offline index construction. Online index construction is suitable for a search system that values freshness. For example, a search system for tweets, news, blog posts and so on will value freshness. Online index construction can make fresh documents searchable and keep searchable while indexing. Offline index construction In offline index construction, indexing cost is less than indexing cost by online index construction. Indexing time will be shorter. Index will be smaller. Resources required for indexing will be smaller. But a registering document cannot be searchable until all registered documents are indexed. Offline index construction is suitable for a search system that values less required resources. If a search system doesn't value freshness, offline index construction will be suitable. For example, a reference manual search system doesn't value freshness because a reference manual will be updated only at a release. How to use Groonga uses online index construction by default. We register a document, we can search it quickly. Groonga uses offline index construction by adding an index to a column that already has data. We define a schema: Execution example: table_create Tweets TABLE_NO_KEY # [[0, 1337566253.89858, 0.000355720520019531], true] column_create Tweets content COLUMN_SCALAR ShortText # [[0, 1337566253.89858, 0.000355720520019531], true] table_create Lexicon TABLE_HASH_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto # [[0, 1337566253.89858, 0.000355720520019531], true] We register data: Execution example: load --table Tweets [ {"content":"Hello!"}, {"content":"I just start it!"}, {"content":"I'm sleepy... Have a nice day... Good night..."} ] # [[0, 1337566253.89858, 0.000355720520019531], 3] We can search with sequential search when we don't have index: Execution example: select Tweets --match_columns content --query 'good nice' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 3, # "I'm sleepy... Have a nice day... Good night..." # ] # ] # ] # ] We create index for Tweets.content. Already registered data in Tweets.content are indexed by offline index construction: Execution example: column_create Lexicon tweet COLUMN_INDEX|WITH_POSITION Tweets content # [[0, 1337566253.89858, 0.000355720520019531], true] We search with index. We get a matched record: Execution example: select Tweets --match_columns content --query 'good nice' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 1 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 3, # "I'm sleepy... Have a nice day... Good night..." # ] # ] # ] # ] We register data again. They are indexed by online index construction: Execution example: load --table Tweets [ {"content":"Good morning! Nice day."}, {"content":"Let's go shopping."} ] # [[0, 1337566253.89858, 0.000355720520019531], 2] We can also get newly registered records by searching: Execution example: select Tweets --match_columns content --query 'good nice' # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # [ # [ # [ # 2 # ], # [ # [ # "_id", # "UInt32" # ], # [ # "content", # "ShortText" # ] # ], # [ # 3, # "I'm sleepy... Have a nice day... Good night..." # ], # [ # 4, # "Good morning! Nice day." # ] # ] # ] # ] Sharding New in version 5.0.0. Groonga has /limitations against table size. You can't add 268,435,455 more records in one table. Groonga supports time based sharding to resolve the limitation. It works in the same database. It doesn't work with multiple databases. It means that this sharding feature isn't for distributing large data to multiple hosts. If you want distributed sharding feature, use Mroonga or PGroonga. You can use sharding feature by MySQL or PostgreSQL. You'll be able to use Droonga for distributed sharding feature soon. Summary Sharding is implemented in sharding plugin. The plugin is written in mruby. You need to enable mruby when you build Groonga. You can confirm whether your Groonga supports mruby or not by --version command line argument of /reference/executables/groonga: % groonga --version groonga 5.0.5 [...,mruby,...] configure options: <...> If you find mruby, your Groonga supports mruby. sharding plugin provides only search commands. They have logical_ prefix in their command names such as /reference/commands/logical_select and /reference/commands/logical_range_filter. sharding plugin doesn't provide schema define commands and data load commands yet. You need to use existing commands such as /reference/commands/table_create, /reference/commands/column_create and /reference/commands/load. sharding plugin requires some rules against table and column. You need to follow these rules. They are described later. Glossary ┌───────────────────┬───────────────────────────────────┐ │Name │ Description │ ├───────────────────┼───────────────────────────────────┤ │Logical table │ It's a table that consists of │ │ │ shards. It doesn't exist in │ │ │ Groonga database. It just exists │ │ │ in our minds. │ ├───────────────────┼───────────────────────────────────┤ │Logical table name │ The name of logical table. It's │ │ │ prefix of shard names. For │ │ │ example, Logs is a logical table │ │ │ name and Logs_20150814 and │ │ │ Logs_20150815 are shard names. │ └───────────────────┴───────────────────────────────────┘ │Shard │ It's a table that has records in │ │ │ a day or month. One shard has │ │ │ only partial records. │ │ │ │ │ │ Shard name (= table name) must │ │ │ follow │ │ │ ${LOGICAL_TABLE_NAME}_${YYYYMMDD} │ │ │ format or │ │ │ ${LOGICAL_TABLE_NAME}_${YYYYMM} │ │ │ format. ${LOGICAL_TABLE_NAME} is │ │ │ expanded to logical table name. │ │ │ ${YYYYMMDD} is expanded to day. │ │ │ ${YYYYMM} is expanded to month. │ │ │ │ │ │ For example, Logs_20150814 is │ │ │ consists of Logs logical name │ │ │ and 20150814 day. │ └───────────────────┴───────────────────────────────────┘ Rules TODO Commands Log Groonga has two log files. They are process log and query log. Process log is for all of executables/groonga works. Query log is just for query processing. Process log Process log is enabled by default. Log path can be customized by --log-path option. Each log has its log level. If a log is smaller than groonga process' log level, it's not logged. Log level can be customized by -l or commands/log_level. Format Process log uses the following format: #{TIME_STAMP}|#{L}| #{MESSAGE} TIME_STAMP It's time stamp uses the following format: YYYY-MM-DD hh:mm:ss.SSSSSS YYYY Year with four digits. MM Month with two digits. DD Day with two digits. hh Hour with two digits. mm Minute with two digits. ss Second with two digits. SSSSSS Microsecond with six digits. Example: 2011-07-05 06:25:18.345734 L Log level with a character. Here is a character and log level map. E Emergency A Alert C Critical e Error w Warning n Notification i Information d Debug - Dump Example: E MESSAGE Details about the log with free format. Example: log opened. Example: 2011-07-05 08:35:09.276421|n| grn_init 2011-07-05 08:35:09.276553|n| RLIMIT_NOFILE(4096,4096) Query log Query log is disabled by default. It can be enabled by --query-log-path option. Format Query log uses the following formats: #{TIME_STAMP}|#{MESSAGE} #{TIME_STAMP}|#{ID}|>#{QUERY} #{TIME_STAMP}|#{ID}|:#{ELAPSED_TIME} #{PROGRESS} #{TIME_STAMP}|#{ID}|<#{ELAPSED_TIME} #{RETURN_CODE} TIME_STAMP It's time stamp uses the following format: YYYY-MM-DD hh:mm:ss.SSSSSS YYYY Year with four digits. MM Month with two digits. DD Day with two digits. hh Hour with two digits. mm Minute with two digits. ss Second with two digits. SSSSSS Microsecond with six digits. Example: 2011-07-05 06:25:18.345734 ID ID of a thread. Groonga process creates threads to process requests concurrently. Each thread outputs some logs for a request. This ID can be used to extract a log sequence by a thread. Example: 45ea3034 > A character that indicates query is started. : A character that indicates query is processing. < A character that indicates query is finished. MESSAGE Details about the log with free format. Example: query log opened. QUERY A query to be processed. Example: select users --match_columns hobby --query music ELAPSED_TIME Elapsed time in nanoseconds since query is started. Example: 000000000075770 (It means 75,770 nanoseconds.) PROGRESS A processed work at the time. Example: select(313401) (It means that 'select' is processed and 313,401 records are remained.) RETURN_CODE A return code for the query. Example: rc=0 (It means return code is 0. 0 means GRN_SUCCESS.) Example: 2011-07-05 06:25:19.458756|45ea3034|>select Properties --limit 0 2011-07-05 06:25:19.458829|45ea3034|:000000000072779 select(19) 2011-07-05 06:25:19.458856|45ea3034|:000000000099998 output(0) 2011-07-05 06:25:19.458875|45ea3034|<000000000119062 rc=0 2011-07-05 06:25:19.458986|45ea3034|>quit Tuning Summary There are some tuning parameters for handling a large database. Parameters This section describes tuning parameters. The max number of open files per process This parameter is for handling a large database. Groonga creates one or more files per table and column. If your database has many tables and columns, Groonga process needs to open many files. System limits the max number of open files per process. So you need to relax the limitation. Here is an expression that compute how many files are opened by Groonga: 3 (for DB) + N tables + N columns (except index clumns) + (N index columns * 2) + X (the number of plugins etc.) Here is an example schema: table_create Entries TABLE_HASH_KEY ShortText column_create Entries content COLUMN_SCALAR Text column_create Entries n_likes COLUMN_SCALAR UInt32 table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content This example opens at least 11 files: 3 + 2 (Entries and Terms) + 2 (Entries.content and Entries.n_likes) + 4 (Terms.entries_key_index and Terms.entries_content_index) + X = 11 + X Memory usage This parameter is for handling a large database. Groonga maps database files onto memory and accesses to them. Groonga doesn't maps unnecessary files onto memory. Groonga maps files when they are needed. If you access to all data in database, all database files are mapped onto memory. If total size of your database files is 6GiB, your Groonga process uses 6GiB memory. Normally, your all database files aren't mapped onto memory. But it may be occurred. It is an example case that you dump your database. Normally, you must have memory and swap that is larger than database. Linux has tuning parameter to work with less memory and swap than database size. Linux This section describes how to configure parameters on Linux. nofile You can relax the The max number of open files per process parameter by creating a configuration file /etc/security/limits.d/groonga.conf that has the following content: ${USER} soft nofile ${MAX_VALUE} ${USER} hard nofile ${MAX_VALUE} If you run Groonga process by groonga user and your Groonga process needs to open less than 10000 files, use the following configuration: groonga soft nofile 10000 groonga hard nofile 10000 The configuration is applied after your Groonga service is restarted or re-login as your groonga user. vm.overcommit_memory This is Memory usage related parameter. You can handle a database that is larger than your memory and swap by setting vm.overcommit_memory kernel parameter to 1. 1 means that Groonga can always map database files onto memory. Groonga recommends the configuration. See Linux kernel documentation about overcommit about vm.overcommit_memory parameter details. You can set the configuration by putting a configuration file /etc/sysctl.d/groonga.conf that has the following content: vm.overcommit_memory = 1 The configuration can be applied by restarting your system or run the following command: % sudo sysctl --system vm.max_map_count This is Memory usage related parameter. You can handle a 16GiB or more larger size database by increasing vm.max_map_count kernel parameter. The parameter limits the max number of memory maps. The default value of the kernel parameter may be 65530 or 65536. Groonga maps 256KiB memory chunk at one time. If a database is larger than 16GiB, Groonga reaches the limitation. (256KiB * 65536 = 16GiB) You needs to increase the value of the kernel parameter to handle 16GiB or more larger size database. For example, you can handle almost 32GiB size database by 65536 * 2 = 131072. You can set the configuration by putting a configuration file /etc/sysctl.d/groonga.conf that has the following content: vm.max_map_count = 131072 Note that your real configuration file will be the following because you already have vm.overcommit_memory configuration: vm.overcommit_memory = 1 vm.max_map_count = 131072 The configuration can be applied by restarting your system or run the following command: % sudo sysctl -p FreeBSD This section describes how to configure parameters on FreeBSD. kern.maxfileperproc TODO API Groonga can be used as a fulltext search library. This section describes APIs that are provided by groonga. Overview Summary You can use Groonga as a library. You need to use the following APIs to initialize and finalize Groonga. grn_init() initializes Groonga. In contrast, grn_fin() finalizes Groonga. You must call grn_init() only once before you use APIs which are provided by Groonga. You must call grn_fin() only once after you finish to use APIs which are provided by Groonga. Example Here is an example that uses Groonga as a full-text search library. grn_rc rc; /* It initializes resources used by Groonga. */ rc = grn_init(); if (rc != GRN_SUCCESS) { return EXIT_FAILURE; } /* Some Groonga API calling codes... */ /* It releases resources used by Groonga. */ grn_fin(); return EXIT_SUCCESS; Reference grn_rc grn_init(void) grn_init() initializes resources that are used by Groonga. You must call it just once before you call other Groonga APIs. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_rc grn_fin(void) grn_fin() releases resources that are used by Groonga. You can't call other Groonga APIs after you call grn_fin(). Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. Global configurations Summary Groonga has the global configurations. You can access them by API. Reference int grn_get_lock_timeout(void) Returns the lock timeout. grn_ctx acquires a lock for updating a shared value. If other grn_ctx is already updating the same value, grn_ctx that try to acquire a lock can't acquires a lock. The grn_ctx that can't acquires a lock waits 1 millisecond and try to acquire a lock again. The try is done timeout times. If the grn_ctx that can't acquires a lock until timeout times, the tries are failed. The default lock timeout is 10000000. It means that Groonga doesn't report a lock failure until about 3 hours. (1 * 10000000 [msec] = 10000 [sec] = 166.666... [min] = 2.777... [hour]) Returns The lock timeout. grn_rc grn_set_lock_timeout(int timeout) Sets the lock timeout. See grn_get_lock_timeout() about lock timeout. There are some special values for timeout. • 0: It means that Groonga doesn't retry acquiring a lock. Groonga reports a failure after one lock acquirement failure. • negative value: It means that Groonga retries acquiring a lock until Groonga can acquire a lock. Parameters • timeuot -- The new lock timeout. Returns GRN_SUCCESS. It doesn't fail. Plugin Summary Groonga supports plugin. You can create a new plugin with the following API. TOOD: Describe about how to create the minimum plugin here or create a tutorial about it. Reference grn_rc GRN_PLUGIN_INIT(grn_ctx *ctx) grn_rc GRN_PLUGIN_REGISTER(grn_ctx *ctx) grn_rc GRN_PLUGIN_FIN(grn_ctx *ctx) GRN_PLUGIN_MALLOC(ctx, size) GRN_PLUGIN_MALLOC() allocates size bytes and returns a pointer to the allocated memory space. Note that the memory space is associated with ctx. GRN_PLUGIN_REALLOC(ctx, ptr, size) GRN_PLUGIN_REALLOC() resizes the memory space pointed to by ptr or allocates a new memory space of size bytes. GRN_PLUGIN_REALLOC() returns a pointer to the memory space. The contents is unchanged or copied from the old memory space to the new memory space. GRN_PLUGIN_FREE(ctx, ptr) GRN_PLUGIN_FREE() frees a memory space allocated by GRN_PLUGIN_MALLOC() or GRN_PLUGIN_REALLOC(). This means that ptr must be a pointer returned by GRN_PLUGIN_MALLOC() or GRN_PLUGIN_REALLOC(). GRN_PLUGIN_LOG(ctx, level, ...) GRN_PLUGIN_LOG() reports a log of level. Its error message is generated from the varying number of arguments, in which the first one is the format string and the rest are its arguments. See grn_log_level in "groonga.h" for more details of level. GRN_PLUGIN_ERROR(ctx, error_code, ...) GRN_PLUGIN_ERROR() reports an error of error_code. Its error message is generated from the varying number of arguments, in which the first one is the format string and the rest are its arguments. See grn_rc in "groonga.h" for more details of error_code. grn_plugin_mutex grn_plugin_mutex is available to make a critical section. See the following functions. grn_plugin_mutex *grn_plugin_mutex_open(grn_ctx *ctx) grn_plugin_mutex_open() returns a pointer to a new object of grn_plugin_mutex. Memory for the new object is obtained with GRN_PLUGIN_MALLOC(). grn_plugin_mutex_open() returns NULL if sufficient memory is not available. void grn_plugin_mutex_close(grn_ctx *ctx, grn_plugin_mutex *mutex) grn_plugin_mutex_close() finalizes an object of grn_plugin_mutex and then frees memory allocated for that object. void grn_plugin_mutex_lock(grn_ctx *ctx, grn_plugin_mutex *mutex) grn_plugin_mutex_lock() locks a mutex object. If the object is already locked, the calling thread waits until the object will be unlocked. void grn_plugin_mutex_unlock(grn_ctx *ctx, grn_plugin_mutex *mutex) grn_plugin_mutex_unlock() unlocks a mutex object. grn_plugin_mutex_unlock() should not be called for an unlocked object. grn_obj *grn_plugin_proc_alloc(grn_ctx *ctx, grn_user_data *user_data, grn_id domain, grn_obj_flags flags) grn_plugin_proc_alloc() allocates a grn_obj object. You can use it in function that is registered as GRN_PROC_FUNCTION. grn_obj grn_plugin_proc_get_var(grn_ctx *ctx, grn_user_data *user_data, const char *name, int name_size) It gets a variable value from grn_user_data by specifying the variable name. Parameters • name -- The variable name. • name_size -- The number of bytes of name. If name_size is negative, name must be NUL-terminated. name_size is computed by strlen(name) for the case. Returns A variable value on success, NULL otherwise. grn_obj *grn_plugin_proc_get_var_by_offset(grn_ctx *ctx, grn_user_data *user_data, unsigned int offset) It gets a variable value from grn_user_data by specifying the offset position of the variable. Parameters • offset -- The offset position of the variable. Returns A variable value on success, NULL otherwise. const char *grn_plugin_win32_base_dir(void) Deprecated since version 5.0.9.: Use grn_plugin_windows_base_dir() instead. It returns the Groonga install directory. The install directory is computed from the directory that has groonga.dll. You can use the directory to generate install directory aware path. It only works on Windows. It returns NULL on other platforms. const char *grn_plugin_windows_base_dir(void) New in version 5.0.9. It returns the Groonga install directory. The install directory is computed from the directory that has groonga.dll. You can use the directory to generate install directory aware path. It only works on Windows. It returns NULL on other platforms. int grn_plugin_charlen(grn_ctx *ctx, const char *str_ptr, unsigned int str_length, grn_encoding encoding) grn_plugin_charlen() returns the length (#bytes) of the first character in the string specified by str_ptr and str_length. If the starting bytes are invalid as a character, grn_plugin_charlen() returns 0. See grn_encoding in "groonga.h" for more details of encoding. int grn_plugin_isspace(grn_ctx *ctx, const char *str_ptr, unsigned int str_length, grn_encoding encoding) grn_plugin_isspace() returns the length (#bytes) of the first character in the string specified by str_ptr and str_length if it is a space character. Otherwise, grn_plugin_isspace() returns 0. grn_rc grn_plugin_expr_var_init(grn_ctx *ctx, grn_expr_var *var, const char *name, int name_size) It initializes a grn_expr_var. Parameters • var -- The pointer of grn_expr_var object to be initialized. • name -- The name of grn_expr_var object to be initialized. • name_size -- The number of bytes of name. If name_size is negative, name must be NUL-terminated. name_size is computed by strlen(name) for the case. Returns GRN_SUCCESS. It doesn't fail. grn_obj * grn_plugin_command_create(grn_ctx *ctx, const char *name, int name_size, grn_proc_func func, unsigned int n_vars, grn_expr_var *vars) It creates a command. Parameters • name -- The proc name of the command to be created. • name_size -- The number of bytes of name. If name_size is negative, name must be NUL-terminated. name_size is computed by strlen(name) for the case. • func -- The function name to be called by the created command. • n_vars -- The number of the variables of the command to create. • vars -- The pointer of initialized grn_expr_var object. Returns The created command object if it creates a command successfully, NULL otherwise. See ctx for error details. grn_cache Summary NOTE: This API is experimental. grn_cache is a data store that keeps responses of /reference/commands/select command. It is not general use cache object. It is only for /reference/commands/select command. You can just change the current cache object by grn_cache_current_set(). /reference/commands/select command response cache is done internally. /reference/commands/select command uses one global cache object. If you open multiple databases, the one cache is shared. It is an important problem. If you open multiple databases and use /reference/commands/select command, you need to use grn_cache object. It is /reference/executables/groonga-httpd case. If you open only one database or don't use /reference/commands/select command, you don't need to use grn_cache object. It is rroonga case. Example Here is an example that change the current cache object. grn_cache *cache; grn_cache *cache_previous; cache = grn_cache_open(ctx); cache_previous = grn_cache_current_get(ctx); grn_cache_current_set(ctx, cache); /* grn_ctx_send(ctx, ...); */ grn_cache_current_set(ctx, cache_previous); Reference grn_cache It is an opaque cache object. You can create a grn_cache by grn_cache_open() and free the created object by grn_cache_close(). grn_cache *grn_cache_open(grn_ctx *ctx) Creates a new cache object. If memory allocation for the new cache object is failed, NULL is returned. Error information is stored into the ctx. Parameters • ctx -- The context. Returns A newly allocated cache object on success, NULL otherwise. The returned cache object must be freed by grn_cache_close(). grn_rc grn_cache_close(grn_ctx *ctx, grn_cache *cache) Frees resourses of the cache. Parameters • ctx -- The context. • cache -- The cache object to be freed. Returns GRN_SUCCESS on success, not GRN_SUCCESS otherwise. grn_rc grn_cache_current_set(grn_ctx *ctx, grn_cache *cache) Sets the cache object that is used in /reference/commands/select command. Parameters • ctx -- The context. • cache -- The cache object that is used in /reference/commands/select command. Returns GRN_SUCCESS on success, not GRN_SUCCESS otherwise. grn_cache *grn_cache_current_get(grn_ctx *ctx) Gets the cache object that is used in /reference/commands/select command. Parameters • ctx -- The context. Returns The cache object that is used in /reference/commands/select command. It may be NULL. grn_rc grn_cache_set_max_n_entries(grn_ctx *ctx, grn_cache *cache, unsigned int n) Sets the max number of entries of the cache object. Parameters • ctx -- The context. • cache -- The cache object to be changed. • n -- The new max number of entries of the cache object. Returns GRN_SUCCESS on success, not GRN_SUCCESS otherwise. unsigned int grn_cache_get_max_n_entries(grn_ctx *ctx, grn_cache *cache) Gets the max number of entries of the cache object. Parameters • ctx -- The context. • cache -- The target cache object. Returns The max number of entries of the cache object. grn_column Summary TODO... Example TODO... Reference GRN_COLUMN_NAME_ID It returns the name of /reference/columns/pseudo _id. It is useful to use with GRN_COLUMN_NAME_ID_LEN like the following: grn_obj *id_column; id_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_ID, GRN_COLUMN_NAME_ID_LEN); Since 3.1.1. GRN_COLUMN_NAME_ID_LEN It returns the byte size of GRN_COLUMN_NAME_ID. Since 3.1.1. GRN_COLUMN_NAME_KEY It returns the name of /reference/columns/pseudo _key. It is useful to use with GRN_COLUMN_NAME_KEY_LEN like the following: grn_obj *key_column; key_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_KEY, GRN_COLUMN_NAME_KEY_LEN); Since 3.1.1. GRN_COLUMN_NAME_KEY_LEN It returns the byte size of GRN_COLUMN_NAME_KEY. Since 3.1.1. GRN_COLUMN_NAME_VALUE It returns the name of /reference/columns/pseudo _value. It is useful to use with GRN_COLUMN_NAME_VALUE_LEN like the following: grn_obj *value_column; value_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_VALUE, GRN_COLUMN_NAME_VALUE_LEN); Since 3.1.1. GRN_COLUMN_NAME_VALUE_LEN It returns the byte size of GRN_COLUMN_NAME_VALUE. Since 3.1.1. GRN_COLUMN_NAME_SCORE It returns the name of /reference/columns/pseudo _score. It is useful to use with GRN_COLUMN_NAME_SCORE_LEN like the following: grn_obj *score_column; score_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_SCORE, GRN_COLUMN_NAME_SCORE_LEN); Since 3.1.1. GRN_COLUMN_NAME_SCORE_LEN It returns the byte size of GRN_COLUMN_NAME_SCORE. Since 3.1.1. GRN_COLUMN_NAME_NSUBRECS It returns the name of /reference/columns/pseudo _nsubrecs. It is useful to use with GRN_COLUMN_NAME_NSUBRECS_LEN like the following: grn_obj *nsubrecs_column; nsubrecs_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_NSUBRECS, GRN_COLUMN_NAME_NSUBRECS_LEN); Since 3.1.1. GRN_COLUMN_NAME_NSUBRECS_LEN It returns the byte size of GRN_COLUMN_NAME_NSUBRECS. Since 3.1.1. grn_obj *grn_column_create(grn_ctx *ctx, grn_obj *table, const char *name, unsigned int name_size, const char *path, grn_obj_flags flags, grn_obj *type) tableに新たなカラムを定義します。nameは省略できません。一つのtableに同一のnameのcolumnを複数定義することはできません。 Parameters • table -- 対象tableを指定します。 • name -- カラム名を指定します。 • name_size -- nameパラメータのsize(byte)を指定します。 • path -- カラムを格納するファイルパスを指定します。 flagsに GRN_OBJ_PERSISTENT が指定されている場合のみ有効です。 NULLなら自動的にファイルパスが付与されます。 • flags -- GRN_OBJ_PERSISTENT を指定すると永続columnとなります。 GRN_OBJ_COLUMN_INDEX を指定すると転置インデックスとなります。 GRN_OBJ_COLUMN_SCALAR を指定するとスカラ値(単独の値)を格納します。 GRN_OBJ_COLUMN_VECTOR を指定すると値の配列を格納します。 GRN_OBJ_COMPRESS_ZLIB を指定すると値をzlib圧縮して格納します。 GRN_OBJ_COMPRESS_LZO を指定すると値をlzo圧縮して格納します。 GRN_OBJ_COLUMN_INDEX と共に GRN_OBJ_WITH_SECTION を指定すると、転置索引にsection(段落情報)を合わせて格納します。 GRN_OBJ_COLUMN_INDEX と共に GRN_OBJ_WITH_WEIGHT を指定すると、転置索引にweight情報を合わせて格納します。 GRN_OBJ_COLUMN_INDEX と共に GRN_OBJ_WITH_POSITION を指定すると、転置索引に出現位置情報を合わせて格納します。 • type -- カラム値の型を指定します。定義済みのtypeあるいはtableを指定できます。 grn_rc grn_column_index_update(grn_ctx *ctx, grn_obj *column, grn_id id, unsigned int section, grn_obj *oldvalue, grn_obj *newvalue) oldvalue, newvalueの値から得られるキーに対応するcolumnの値の中の、id, sectionに対応するエントリを更新します。columnは GRN_OBJ_COLUMN_INDEX 型のカラムでなければなりません。 Parameters • column -- 対象columnを指定します。 • id -- 対象レコードのIDを指定します。 • section -- 対象レコードのセクション番号を指定します。 • oldvalue -- 更新前の値を指定します。 • newvalue -- 更新後の値を指定します。 grn_obj *grn_column_table(grn_ctx *ctx, grn_obj *column) columnが属するtableを返します。 Parameters • column -- 対象columnを指定します。 grn_rc grn_column_rename(grn_ctx *ctx, grn_obj *column, const char *name, unsigned int name_size) ctxが使用するdbにおいてcolumnに対応する名前をnameに更新します。columnは永続オブジェクトでなければいけません。 Parameters • column -- 対象columnを指定します。 • name -- 新しい名前を指定します。 • name_size -- nameパラメータのsize(byte)を指定します。 int grn_column_name(grn_ctx *ctx, grn_obj *obj, char *namebuf, int buf_size) カラムobjの名前の長さを返します。buf_sizeの長さが名前の長さ以上であった場合は、namebufに該当する名前をコピーします。 Parameters • obj -- 対象objectを指定します。 • namebuf -- 名前を格納するバッファ(呼出側で準備する)を指定します。 • buf_size -- namebufのサイズ(byte長)を指定します。 int grn_column_index(grn_ctx *ctx, grn_obj *column, grn_operator op, grn_obj **indexbuf, int buf_size, int *section) columnに張られているindexのうち、opの操作を実行可能なものの数を返します。またそれらのidを、buf_sizeに指定された個数を上限としてindexbufに返します。 Parameters • column -- 対象のcolumnを指定します。 • op -- indexで実行したい操作を指定します。 • indexbuf -- indexを格納するバッファ(呼出側で準備する)を指定します。 • buf_size -- indexbufのサイズ(byte長)を指定します。 • section -- section番号を格納するint長バッファ(呼出側で準備する)を指定します。 grn_rc grn_column_truncate(grn_ctx *ctx, grn_obj *column) NOTE: This is a dangerous API. You must not use this API when other thread or process accesses the target column. If you use this API against shared column, the process that accesses the column may be broken and the column may be broken. New in version 4.0.9. Clears all values in the column. Parameters • column -- The column to be truncated. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_command_version Summary TODO... Example TODO... Reference grn_command_version GRN_COMMAND_VERSION_MIN GRN_COMMAND_VERSION_STABLE GRN_COMMAND_VERSION_MAX grn_command_version grn_get_default_command_version(void) デフォルトのcommand_versionを返します。 grn_rc grn_set_default_command_version(grn_command_version version) デフォルトのcommand_versionを変更します。 Parameters • version -- 変更後のデフォルトのcommand_versionを指定します。 grn_content_type Summary grn_content_type shows input type and output type. Currently, it is used only for output type. Normally, you don't need to use this type. It is used internally in grn_ctx_send(). Reference grn_content_type Here are available values: GRN_CONTENT_NONE It means that outputting nothing or using the original format. /reference/commands/dump uses the type. GRN_CONTENT_TSV It means tab separated values format. GRN_CONTENT_JSON It means JSON format. GRN_CONTENT_XML It means XML format. GRN_CONTENT_MSGPACK It means MessagePack format. You need MessagePack library on building Groonga. If you don't have MessagePack library, you can't use this type. grn_ctx Summary grn_ctx is the most important object. grn_ctx keeps the current information such as: • The last occurred error. • The current encoding. • The default thresholds. (e.g. select-match-escalation-threshold) • The default command version. (See /reference/command/command_version) grn_ctx provides platform features such as: • Memory management. • Logging. Most APIs receive grn_ctx as the first argument. You can't use the same grn_ctx from two or more threads. You need to create a grn_ctx for a thread. You can use two or more grn_ctx in a thread but it is not needed for usual use-case. Example TODO... Reference grn_ctx TODO... grn_rc grn_ctx_init(grn_ctx *ctx, int flags) ctxを初期化します。 Parameters • ctx -- 初期化するctx構造体へのポインタを指定します。 • flags -- 初期化する ctx のオプションを指定します。 Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_rc grn_ctx_fin(grn_ctx *ctx) ctxの管理するメモリを解放し、使用を終了します。 If ctx is initialized by grn_ctx_open() not grn_ctx_init(), you need to use grn_ctx_close() instead of grn_ctx_fin(). Parameters • ctx -- 解放するctx構造体へのポインタを指定します。 Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_ctx *grn_ctx_open(int flags) 初期化された grn_ctx オブジェクトを返します。 grn_ctx_init() で初期化された grn_ctx オブジェクトは構造体の実体をAPIの呼び元で確保するのに対して、 grn_ctx_open() ではGroongaライブラリの内部で、実体を確保します。 どちらで初期化された grn_ctx も、 grn_ctx_fin() で解放できます。 grn_ctx_open() で確保した grn_ctx 構造体に関しては、grn_ctx_fin() で解放した後に、その grn_ctx で作成した grn_obj を grn_obj_close() によって解放しても問題ありません。 Parameters • flags -- 初期化する ctx のオプションを指定します。 Returns 初期化された grn_ctx オブジェクトを返します。 grn_rc grn_ctx_close(grn_ctx *ctx) It calls grn_ctx_fin() and frees allocated memory for ctx by grn_ctx_open(). Parameters • ctx -- no longer needed grn_ctx. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_rc grn_ctx_set_finalizer(grn_ctx *ctx, grn_proc_func *func) ctxを破棄するときに呼ばれる関数を設定します。 Parameters • ctx -- 対象ctxを指定します。 • func -- ctx を破棄するときに呼ばれる関数を指定します。 Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_command_version grn_ctx_get_command_version(grn_ctx *ctx) command_versionを返します。 grn_rc grn_ctx_set_command_version(grn_ctx *ctx, grn_command_version version) command_versionを変更します。 Parameters • version -- 変更後のcommand_versionを指定します。 grn_rc grn_ctx_use(grn_ctx *ctx, grn_obj *db) ctxが操作対象とするdbを指定します。NULLを指定した場合は、dbを操作しない状態(init直後の状態)になります。 Don't use it with grn_ctx that has GRN_CTX_PER_DB flag. Parameters • db -- ctxが使用するdbを指定します。 grn_obj *grn_ctx_db(grn_ctx *ctx) ctxが現在操作対象としているdbを返します。dbを使用していない場合はNULLを返します。 grn_obj *grn_ctx_get(grn_ctx *ctx, const char *name, int name_size) ctxが使用するdbからnameに対応するオブジェクトを検索して返す。nameに一致するオブジェクトが存在しなければNULLを返す。 Parameters • name -- 検索しようとするオブジェクトの名前。 • name_size -- The number of bytes of name. If negative value is specified, name is assumed that NULL-terminated string. grn_obj *grn_ctx_at(grn_ctx *ctx, grn_id id) ctx、またはctxが使用するdbからidに対応するオブジェクトを検索して返す。idに一致するオブジェクトが存在しなければNULLを返す。 Parameters • id -- 検索しようとするオブジェクトのidを指定します。 grn_rc grn_ctx_get_all_tables(grn_ctx *ctx, grn_obj *tables_buffer) It pushes all tables in the database of ctx into tables_buffer. tables_buffer should be initialized as GRN_PVECTOR. You can use GRN_PTR_INIT() with GRN_OBJ_VECTOR flags to initialize tables_buffer. Here is an example: grn_rc rc; grn_obj tables; int i; int n_tables; GRN_PTR_INIT(&tables, GRN_OBJ_VECTOR, GRN_ID_NIL); rc = grn_ctx_get_all_tables(ctx, &tables); if (rc != GRN_SUCCESS) { GRN_OBJ_FIN(ctx, &tables); /* Handle error. */ return; } n_tables = GRN_BULK_VSIZE(&tables) / sizeof(grn_obj *); for (i = 0; i < n_tables; i++) { grn_obj *table = GRN_PTR_VALUE_AT(&tables, i); /* Use table. */ } /* Free resources. */ for (i = 0; i < n_tables; i++) { grn_obj *table = GRN_PTR_VALUE_AT(&tables, i); grn_obj_unlink(ctx, table); } GRN_OBJ_FIN(ctx, &tables); Parameters • ctx -- The context object. • table_buffer -- The output buffer to store tables. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_content_type grn_ctx_get_output_type(grn_ctx *ctx) Gets the current output type of the context. Normally, this function isn't needed. Parameters • ctx -- The context object. Returns The output type of the context. grn_rc grn_ctx_set_output_type(grn_ctx *ctx, grn_content_type type) Sets the new output type to the context. It is used by executing a command by grn_expr_exec(). If you use grn_ctx_send(), the new output type isn't used. grn_ctx_send() sets output type from command line internally. Normally, this function isn't needed. Parameters • ctx -- The context object. • type -- The new output type. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_bool_rc grn_ctx_is_opened(grn_ctx *ctx, grn_id id) Checks whether object with the ID is opened or not. Parameters • ctx -- The context object. • id -- The object ID to be checked. Returns GRN_TRUE if object with the ID is opened, GRN_FALSE otherwise. grn_db Summary TODO... Example TODO... Reference TODO... grn_db TODO... grn_db_create_optarg It is used for specifying options for grn_db_create(). char **grn_db_create_optarg.builtin_type_names 組み込み型の名前となるnul終端文字列の配列を指定する。 int grn_db_create_optarg.n_builtin_type_names n_builtin_type_namesには、optarg.builtin_type_namesで指定する文字列の数を 指定する。配列のoffsetはenum型grn_builtin_typeの値に対応する。 grn_obj *grn_db_create(grn_ctx *ctx, const char *path, grn_db_create_optarg *optarg) 新たなdbを作成します。 Parameters • ctx -- 初期化済みの grn_ctx を指定します。 • path -- 作成するdbを格納するファイルパスを指定します。NULLならtemporary dbとなります。NULL以外のパスを指定した場合はpersistent dbとなります。 • optarg -- Currently, it is not used. It is just ignored. 作成するdbの組み込み型の名前を変更する時に指定します。 optarg.builtin_type_namesには、組み込み型の名前となるnull終端文字列の配列を指定します。optarg.n_builtin_type_namesには、optarg.builtin_type_namesで指定する文字列の数を指定します。配列のoffsetはenum型grn_builtin_typeの値に対応します。 grn_obj *grn_db_open(grn_ctx *ctx, const char *path) 既存のdbを開きます。 Parameters • path -- 開こうとするdbを格納するファイルパスを指定します。 void grn_db_touch(grn_ctx *ctx, grn_obj *db) dbの内容の最終更新時刻を現在時刻にします。 最終更新時刻はキャッシュが有効かどうかの判断などに利用されます。 Parameters • db -- 内容が変更されたdbを指定します。 grn_obj *grn_obj_db(grn_ctx *ctx, grn_obj *obj) objの属するdbを返します。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_db_recover(grn_ctx *ctx, grn_obj *db) NOTE: This is an experimental API. NOTE: This is a dangerous API. You must not use this API when other thread or process opens the target database. If you use this API against shared database, the database may be broken. New in version 4.0.9. Checks the passed database and recovers it if it is broken and it can be recovered. This API uses lock existence for checking whether the database is broken or not. Here are recoverable cases: • Index column is broken. The index column must have source column. Here are unrecoverable cases: • Object name management feature is broken. • Table is broken. • Data column is broken. Object name management feature is used for managing table name, column name and so on. If the feature is broken, the database can't be recovered. Please re-create the database from backup. Table and data column can be recovered by removing an existence lock and re-add data. Parameters • db -- The database to be recovered. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_rc grn_db_unmap(grn_ctx *ctx, grn_obj *db) NOTE: This is an experimental API. NOTE: This is a thread unsafe API. You can't touch the database while this API is running. New in version 5.0.7. Unmaps all opened tables and columns in the passed database. Resources used by these opened tables and columns are freed. Normally, this API isn't useless. Because resources used by opened tables and columns are managed by OS automatically. Parameters • db -- The database to be recovered. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_encoding Summary TODO... Example TODO... Reference grn_encoding TODO... grn_encoding grn_get_default_encoding(void) デフォルトのencodingを返します。 grn_rc grn_set_default_encoding(grn_encoding encoding) デフォルトのencodingを変更します。 Parameters • encoding -- 変更後のデフォルトのencodingを指定します。 const char *grn_encoding_to_string(grn_encoding encoding) Returns string representation for the encoding. For example, 'grn_encoding_to_string(GRN_ENC_UTF8)' returns '"utf8"'. "unknown" is returned for invalid encoding. Parameters • encoding -- The encoding. grn_encoding grn_encoding_parse(const char *name) Parses encoding name and returns grn_encoding. For example, 'grn_encoding_parse("UTF8")' returns 'GRN_ENC_UTF8'. GRN_ENC_UTF8 is returned for invalid encoding name. Parameters • name -- The encoding name. grn_expr grn_expr is an grn_obj that represents an expression. Here is a list of what expression can do: • Expression can apply some operations to a record by grn_expr_exec(). • Expression can represents search condition. grn_table_select() can select records that match against the search condition represented by expression. There are two string representations of expression: • /reference/grn_expr/query_syntax • /reference/grn_expr/script_syntax grn_expr_parse() parses string represented expression and appends the parsed expression to another expression. Example TODO... Reference GRN_API grn_obj *grn_expr_create(grn_ctx *ctx, const char *name, unsigned int name_size) GRN_API grn_rc grn_expr_close(grn_ctx *ctx, grn_obj *expr) GRN_API grn_obj *grn_expr_add_var(grn_ctx *ctx, grn_obj *expr, const char *name, unsigned int name_size) GRN_API grn_obj *grn_expr_get_var_by_offset(grn_ctx *ctx, grn_obj *expr, unsigned int offset) GRN_API grn_obj *grn_expr_append_obj(grn_ctx *ctx, grn_obj *expr, grn_obj *obj, grn_operator op, int nargs); GRN_API grn_obj *grn_expr_append_const(grn_ctx *ctx, grn_obj *expr, grn_obj *obj, grn_operator op, int nargs) GRN_API grn_obj *grn_expr_append_const_str(grn_ctx *ctx, grn_obj *expr, const char *str, unsigned int str_size, grn_operator op, int nargs) GRN_API grn_obj *grn_expr_append_const_int(grn_ctx *ctx, grn_obj *expr, int i, grn_operator op, int nargs) GRN_API grn_rc grn_expr_append_op(grn_ctx *ctx, grn_obj *expr, grn_operator op, int nargs) grn_rc grn_expr_get_keywords(grn_ctx *ctx, grn_obj *expr, grn_obj *keywords) Extracts keywords from expr and stores to keywords. Keywords in keywords are owned by expr. Don't unlink them. Each keyword is GRN_BULK and its domain is GRN_DB_TEXT. keywords must be GRN_PVECTOR. Here is an example code: grn_obj keywords; GRN_PTR_INIT(&keywords, GRN_OBJ_VECTOR, GRN_ID_NIL); grn_expr_get_keywords(ctx, expr, &keywords); { int i, n_keywords; n_keywords = GRN_BULK_VSIZE(&keywords) / sizeof(grn_obj *); for (i = 0; i < n_keywords; i++) { grn_obj *keyword = GRN_PTR_VALUE_AT(&keywords, i); const char *keyword_content; int keyword_size; keyword_content = GRN_TEXT_VALUE(keyword); keyword_size = GRN_TEXT_LEN(keyword); /* Use keyword_content and keyword_size. You don't need to unlink keyword. keyword is owned by expr. */ } } GRN_OBJ_FIN(ctx, &keywords); Parameters • ctx -- The context that creates the expr. • expr -- The expression to be extracted. • keywords -- The container to store extracted keywords. It must be GRN_PVECTOR. Each extracted keyword is GRN_BULK and its domain is GRN_DB_TEXT. Extracted keywords are owned by expr. Don't unlink them. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_rc grn_expr_syntax_escape(grn_ctx *ctx, const char *string, int string_size, const char *target_characters, char escape_character, grn_obj *escaped_string) Escapes target_characters in string by escape_character. Parameters • ctx -- Its encoding must be the same encoding of string. It is used for allocating buffer for escaped_string. • string -- String expression representation. • string_size -- The byte size of string. -1 means string is NULL terminated string. • target_characters -- NULL terminated escape target characters. For example, "+-><~*()\"\\:" is target_characters for /reference/grn_expr/query_syntax. • escape_character -- The character to use escape a character in target_characters. For example, \\ (backslash) is escaped_character for /reference/grn_expr/query_syntax. • escaped_string -- The output of escaped string. It should be text typed bulk. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_rc grn_expr_syntax_escape_query(grn_ctx *ctx, const char *query, int query_size, grn_obj *escaped_query) Escapes special characters in /reference/grn_expr/query_syntax. Parameters • ctx -- Its encoding must be the same encoding of query. It is used for allocating buffer for escaped_query. • query -- String expression representation in /reference/grn_expr/query_syntax. • query_size -- The byte size of query. -1 means query is NULL terminated string. • escaped_query -- The output of escaped query. It should be text typed bulk. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. GRN_API grn_rc grn_expr_compile(grn_ctx *ctx, grn_obj *expr) GRN_API grn_obj *grn_expr_exec(grn_ctx *ctx, grn_obj *expr, int nargs) GRN_API grn_obj *grn_expr_alloc(grn_ctx *ctx, grn_obj *expr, grn_id domain, grn_obj_flags flags) grn_geo Summary TODO... Example TODO... Reference grn_geo_point grn_rc grn_geo_select_in_rectangle(grn_ctx *ctx, grn_obj *index, grn_obj *top_left_point, grn_obj *bottom_right_point, grn_obj *res, grn_operator op) It selects records that are in the rectangle specified by top_left_point parameter and bottom_right_point parameter. Records are searched by index parameter. Found records are added to res parameter table with op parameter operation. Parameters • index -- the index column for TokyoGeoPoint or WGS84GeoPpoint type. • top_left_point -- the top left point of the target rectangle. (ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint) • bottom_right_point -- the bottom right point of the target rectangle. (ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint) • res -- the table to store found record IDs. It must be GRN_TABLE_HASH_KEY type table. • op -- the operator for matched records. int grn_geo_estimate_in_rectangle(grn_ctx *ctx, grn_obj *index, grn_obj *top_left_point, grn_obj *bottom_right_point) It estimates number of records in the rectangle specified by top_left_point parameter and bottom_right_point parameter. Number of records is estimated by index parameter. If an error is occurred, -1 is returned. Parameters • index -- the index column for TokyoGeoPoint or WGS84GeoPpoint type. • top_left_point -- the top left point of the target rectangle. (ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint) • bottom_right_point -- the bottom right point of the target rectangle. (ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint) grn_obj *grn_geo_cursor_open_in_rectangle(grn_ctx *ctx, grn_obj *index, grn_obj *top_left_point, grn_obj *bottom_right_point, int offset, int limit) It opens a cursor to get records in the rectangle specified by top_left_point parameter and bottom_right_point parameter. Parameters • index -- the index column for TokyoGeoPoint or WGS84GeoPpoint type. • top_left_point -- the top left point of the target rectangle. (ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint) • bottom_right_point -- the bottom right point of the target rectangle. (ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint) • offset -- the cursor returns records from offset parameter position. offset parameter is based on 0. • limit -- the cursor returns at most limit parameter records. -1 means no limit. grn_posting *grn_geo_cursor_next(grn_ctx *ctx, grn_obj *cursor) It returns the next posting that has record ID. It returns NULL after all records are returned. Parameters • cursor -- the geo cursor. grn_hook Summary TODO... Example TODO... Reference grn_hook_entry TODO... grn_rc grn_obj_add_hook(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry, int offset, grn_obj *proc, grn_obj *data) objに対してhookを追加します。 Parameters • obj -- 対象objectを指定します。 • entry -- GRN_HOOK_GET は、objectの参照時に呼び出されるhookを定義します。 GRN_HOOK_SET は、objectの更新時に呼び出されるhookを定義します。 GRN_HOOK_SELECT は、検索処理の実行中に適時呼び出され、処理の実行状況を調べたり、実行の中断を指示することができます。 • offset -- hookの実行順位。offsetに対応するhookの直前に新たなhookを挿入します。 0を指定した場合は先頭に挿入されます。-1を指定した場合は末尾に挿入されます。 objectに複数のhookが定義されている場合は順位の順に呼び出されます。 • proc -- 手続きを指定します。 • data -- hook固有情報を指定します。 int grn_obj_get_nhooks(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry) objに定義されているhookの数を返します。 Parameters • obj -- 対象objectを指定します。 • entry -- hookタイプを指定します。 grn_obj *grn_obj_get_hook(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry, int offset, grn_obj *data) objに定義されているhookの手続き(proc)を返します。hook固有情報が定義されている場合は、その内容をdataにコピーして返します。 Parameters • obj -- 対象objectを指定します。 • entry -- hookタイプを指定します。 • offset -- 実行順位を指定します。 • data -- hook固有情報格納バッファを指定します。 grn_rc grn_obj_delete_hook(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry, int offset) objに定義されているhookを削除します。 Parameters • obj -- 対象objectを指定します。 • entry -- hookタイプを指定します。 • offset -- 実行順位を指定します。 grn_ii Summary buffered index builder 特定のアプリケーション用に準備した内部APIです。 TODO... Example TODO... Reference grn_ii grn_ii_buffer grn_ii_buffer *grn_ii_buffer_open(grn_ctx *ctx, grn_ii *ii, long long unsigned int update_buffer_size) grn_rc grn_ii_buffer_append(grn_ctx *ctx, grn_ii_buffer *ii_buffer, grn_id rid, unsigned int section, grn_obj *value) grn_rc grn_ii_buffer_commit(grn_ctx *ctx, grn_ii_buffer *ii_buffer) grn_rc grn_ii_buffer_close(grn_ctx *ctx, grn_ii_buffer *ii_buffer) grn_index_cursor Summary TODO... Example TODO... Reference grn_obj *grn_index_cursor_open(grn_ctx *ctx, grn_table_cursor *tc, grn_obj *index, grn_id rid_min, grn_id rid_max, int flags) grn_table_cursor から取得できるそれぞれのレコードについて、 GRN_OBJ_COLUMN_INDEX 型のカラムの値を順番に取り出すためのカーソルを生成して返します。 rid_min, rid_maxを指定して取得するレコードidの値を制限することができます。 戻り値であるgrn_index_cursorは grn_obj_close() を使って解放します。 Parameters • tc -- 対象cursorを指定します。 • index -- 対象インデックスカラムを指定します。 • rid_min -- 出力するレコードidの下限を指定します。 • rid_max -- 出力するレコードidの上限を指定します。 grn_posting *grn_index_cursor_next(grn_ctx *ctx, grn_obj *ic, grn_id *tid) cursorの範囲内のインデックスの値を順番に取り出します。 tidにNULL以外を指定した場合は、index_cursorを作成するときに指定したtable_cursorの現在の対象レコードのidを返します。 戻り値である grn_posting 構造体は解放する必要はありません。 Parameters • ic -- 対象cursorを指定します。 • tid -- テーブルレコードIDを指定します。 grn_info Summary TODO... Example TODO... Reference grn_info_type TODO... grn_obj *grn_obj_get_info(grn_ctx *ctx, grn_obj *obj, grn_info_type type, grn_obj *valuebuf) objのtypeに対応する情報をvaluebufに格納します。 Parameters • obj -- 対象objを指定します。 • type -- 取得する情報の種類を指定します。 • valuebuf -- 値を格納するバッファ(呼出側で準備)を指定します。 grn_rc grn_obj_set_info(grn_ctx *ctx, grn_obj *obj, grn_info_type type, grn_obj *value) objのtypeに対応する情報をvalueの内容に更新します。 Parameters • obj -- 対象objを指定します。 • type -- 設定する情報の種類を指定します。 grn_obj *grn_obj_get_element_info(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_info_type type, grn_obj *value) objのidに対応するレコードの、typeに対応する情報をvaluebufに格納します。呼出側ではtypeに応じて十分なサイズのバッファを確保しなければいけません。 Parameters • obj -- 対象objを指定します。 • id -- 対象IDを指定します。 • type -- 取得する情報の種類を指定します。 • value -- 値を格納するバッファ(呼出側で準備)を指定します。 grn_rc grn_obj_set_element_info(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_info_type type, grn_obj *value) objのidに対応するレコードのtypeに対応する情報をvalueの内容に更新します。 Parameters • obj -- 対象objectを指定します。 • id -- 対象IDを指定します。 • type -- 設定する情報の種類を指定します。 • value -- 設定しようとする値を指定します。 grn_match_escalation Summary TODO... Example TODO... Reference long long int grn_ctx_get_match_escalation_threshold(grn_ctx *ctx) 検索の挙動をエスカレーションする閾値を返します。エスカレーションの詳細は検索の仕様に関するドキュメントを参照してください。 grn_rc grn_ctx_set_match_escalation_threshold(grn_ctx *ctx, long long int threshold) 検索の挙動をエスカレーションする閾値を変更します。エスカレーションの詳細は検索の仕様に関するドキュメントを参照してください。 Parameters • threshold -- 変更後の検索の挙動をエスカレーションする閾値を指定します。 long long int grn_get_default_match_escalation_threshold(void) デフォルトの検索の挙動をエスカレーションする閾値を返します。エスカレーションの詳細は検索の仕様に関するドキュメントを参照してください。 grn_rc grn_set_default_match_escalation_threshold(long long int threshold) デフォルトの検索の挙動をエスカレーションする閾値を変更します。エスカレーションの詳細は詳細は検索の仕様に関するドキュメントを参照してください。 Parameters • threshold -- 変更後のデフォルトの検索の挙動をエスカレーションする閾値を指定します。 grn_obj Summary TODO... Example TODO... Reference grn_obj TODO... grn_obj *grn_obj_column(grn_ctx *ctx, grn_obj *table, const char *name, unsigned int name_size) nameがカラム名の場合、それに対応するtableのカラムを返します。対応するカラムが存在しなければNULLを返します。 nameはアクセサ文字列の場合、それに対応するaccessorを返します。アクセサ文字列とは、カラム名等を'.'で連結した文字列です。'_id', '_key'は特殊なアクセサで、それぞれレコードID/keyを返します。例) 'col1' / 'col2.col3' / 'col2._id' Parameters • table -- 対象tableを指定します。 • name -- カラム名を指定します。 grn_bool grn_obj_is_builtin(grn_ctx *ctx, grn_obj *obj) Check whether Groonga built-in object. Parameters • ctx -- context • obj -- target object Returns GRN_TRUE for built-in groonga object, GRN_FALSE otherwise. grn_obj *grn_obj_get_value(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_obj *value) objのIDに対応するレコードのvalueを取得します。valueを戻り値として返します。 Parameters • obj -- 対象objectを指定します。 • id -- 対象レコードのIDを指定します。 • value -- 値を格納するバッファ(呼出側で準備する)を指定します。 int grn_obj_get_values(grn_ctx *ctx, grn_obj *obj, grn_id offset, void **values) objに指定されたカラムについて、offsetに指定されたレコードIDを開始位置として、IDが連続するレコードに対応するカラム値が昇順に格納された配列へのポインタをvaluesにセットします。 取得できた件数が戻り値として返されます。エラーが発生した場合は -1 が返されます。 NOTE: 値が固定長であるカラムのみがobjに指定できます。範囲内のIDに対応するレコードが有効であるとは限りません。delete操作を実行したことのあるテーブルに対しては、grn_table_at() などによって各レコードの存否を別途確認しなければなりません。 Parameters • obj -- 対象objectを指定します。 • offset -- 値を取得する範囲の開始位置となるレコードIDを指定します。 • values -- 値の配列がセットされます。 grn_rc grn_obj_set_value(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_obj *value, int flags) objのIDに対応するレコードの値を更新します。対応するレコードが存在しない場合は GRN_INVALID_ARGUMENT を返します。 Parameters • obj -- 対象objectを指定します。 • id -- 対象レコードのIDを指定します。 • value -- 格納する値を指定します。 • flags -- 以下の値を指定できます。 • GRN_OBJ_SET • GRN_OBJ_INCR • GRN_OBJ_DECR • GRN_OBJ_APPEND • GRN_OBJ_PREPEND • GRN_OBJ_GET • GRN_OBJ_COMPARE • GRN_OBJ_LOCK • GRN_OBJ_UNLOCK GRN_OBJ_SET_MASK GRN_OBJ_SET レコードの値をvalueと置き換えます。 GRN_OBJ_INCR レコードの値にvalueを加算します。 GRN_OBJ_DECR レコードの値にvalueを減算します。 GRN_OBJ_APPEND レコードの値の末尾にvalueを追加します。 GRN_OBJ_PREPEND レコードの値の先頭にvalueを追加します。 GRN_OBJ_GET 新しいレコードの値をvalueにセットします。 GRN_OBJ_COMPARE レコードの値とvalueが等しいか調べます。 GRN_OBJ_LOCK 当該レコードをロックします。GRN_OBJ_COMPARE と共に指定された場合は、レコードの値とvalueが等しい場合に限ってロックします。 GRN_OBJ_UNLOCK 当該レコードのロックを解除します。 grn_rc grn_obj_remove(grn_ctx *ctx, grn_obj *obj) objをメモリから解放し、それが永続オブジェクトであった場合は、該当するファイル一式を削除します。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_obj_rename(grn_ctx *ctx, grn_obj *obj, const char *name, unsigned int name_size) ctxが使用するdbにおいてobjに対応する名前をnameに更新します。objは永続オブジェクトでなければいけません。 Parameters • obj -- 対象objectを指定します。 • name -- 新しい名前を指定します。 • name_size -- nameパラメータのsize(byte)を指定します。 grn_rc grn_obj_close(grn_ctx *ctx, grn_obj *obj) 一時的なobjectであるobjをメモリから解放します。objに属するobjectも再帰的にメモリから解放されます。 永続的な、table, column, exprなどは解放してはいけません。一般的には、一時的か永続的かを気にしなくてよい grn_obj_unlink() を用いるべきです。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_obj_reinit(grn_ctx *ctx, grn_obj *obj, grn_id domain, unsigned char flags) objの型を変更します。 objは GRN_OBJ_INIT() マクロなどで初期化済みでなければいけません。 Parameters • obj -- 対象objectを指定します。 • domain -- 変更後のobjの型を指定します。 • flags -- GRN_OBJ_VECTOR を指定するとdomain型の値のベクタを格納するオブジェクトになります。 void grn_obj_unlink(grn_ctx *ctx, grn_obj *obj) objをメモリから解放します。objに属するobjectも再帰的にメモリから解放されます。 const char *grn_obj_path(grn_ctx *ctx, grn_obj *obj) objに対応するファイルパスを返します。一時objectならNULLを返します。 Parameters • obj -- 対象objectを指定します。 int grn_obj_name(grn_ctx *ctx, grn_obj *obj, char *namebuf, int buf_size) objの名前の長さを返します。無名objectなら0を返します。 名前付きのobjectであり、buf_sizeの長さが名前の長以上であった場合は、namebufに該当する名前をコピーします。 Parameters • obj -- 対象objectを指定します。 • namebuf -- 名前を格納するバッファ(呼出側で準備する)を指定します。 • buf_size -- namebufのサイズ(byte長)を指定します。 grn_id grn_obj_get_range(grn_ctx *ctx, grn_obj *obj) objパラメータのとる値の範囲を表わしているオブジェクトのIDを返します。例えば、grn_builtin_type にある GRN_DB_INT などを返します。 Parameters • obj -- 対象objectを指定します。 int grn_obj_expire(grn_ctx *ctx, grn_obj *obj, int threshold) objの占有するメモリのうち、可能な領域をthresholdを指標として解放します。 Parameters • obj -- 対象objectを指定します。 int grn_obj_check(grn_ctx *ctx, grn_obj *obj) objに対応するファイルの整合性を検査します。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_obj_lock(grn_ctx *ctx, grn_obj *obj, grn_id id, int timeout) objをlockします。timeout(秒)経過してもlockを取得できない場合は GRN_RESOURCE_DEADLOCK_AVOIDED を返します。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_obj_unlock(grn_ctx *ctx, grn_obj *obj, grn_id id) objをunlockします。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_obj_clear_lock(grn_ctx *ctx, grn_obj *obj) 強制的にロックをクリアします。 Parameters • obj -- 対象objectを指定します。 unsigned int grn_obj_is_locked(grn_ctx *ctx, grn_obj *obj) objが現在lockされていれば0以外の値を返します。 Parameters • obj -- 対象objectを指定します。 int grn_obj_defrag(grn_ctx *ctx, grn_obj *obj, int threshold) objの占有するDBファイル領域のうち、可能な領域をthresholdを指標としてフラグメントの解消を行います。 フラグメント解消が実行されたセグメントの数を返します。 Parameters • obj -- 対象objectを指定します。 grn_id grn_obj_id(grn_ctx *ctx, grn_obj *obj) objのidを返します。 Parameters • obj -- 対象objectを指定します。 grn_rc grn_obj_delete_by_id(grn_ctx *ctx, grn_obj *db, grn_id id, grn_bool removep) dbからidに対応するテーブルやカラムなどを削除します。mroonga向けに用意した内部APIです。 Parameters • db -- The target database. • id -- The object (table, column and so on) ID to be deleted. • removep -- If GRN_TRUE, clear object cache and remove relation between ID and key in database. Otherwise, just clear object cache. grn_rc grn_obj_path_by_id(grn_ctx *ctx, grn_obj *db, grn_id id, char *buffer) dbのidに対応するpathを返します。mroonga向けに用意した内部APIです。 Parameters • db -- The target database. • id -- The object (table, column and so on) ID to be deleted. • buffer -- path string corresponding to the id will be set in this buffer. grn_rc grn_obj_cast_by_id(grn_ctx *ctx, grn_obj *source, grn_obj *destination, grn_bool add_record_if_not_exist) It casts value of source to value with type of destination. Casted value is appended to destination. Both source and destination must be bulk. If destination is a reference type bulk. (Reference type bulk means that type of destination is a table.) add_record_if_not_exist is used. If source value doesn't exist in the table that is a type of destination. The source value is added to the table. Parameters • ctx -- The context object. • source -- The bulk to be casted. • destination -- The bulk to specify cast target type and store casted value. • add_record_if_not_exist -- Whether adding a new record if source value doesn't exist in cast target table. This parameter is only used when destination is a reference type bulk. Returns GRN_SUCCESS on success, not GRN_SUCCESS on error. grn_proc Summary TODO... Example TODO... Reference grn_proc_type TODO... grn_proc_func TODO... grn_obj *grn_proc_create(grn_ctx *ctx, const char *name, int name_size, grn_proc_type type, grn_proc_func *init, grn_proc_func *next, grn_proc_func *fin, unsigned int nvars, grn_expr_var *vars) nameに対応する新たなproc(手続き)をctxが使用するdbに定義します。 Parameters • name -- 作成するprocの名前を指定します。 • name_size -- The number of bytes of name parameter. If negative value is specified, name parameter is assumed that NULL-terminated string. • type -- procの種類を指定します。 • init -- 初期化関数のポインタを指定します。 • next -- 実処理関数のポインタを指定します。 • fin -- 終了関数のポインタを指定します。 • nvars -- procで使用する変数の数を指定します。 • vars -- procで使用する変数の定義を指定します。( grn_expr_var 構造体の配列) grn_obj *grn_proc_get_info(grn_ctx *ctx, grn_user_data *user_data, grn_expr_var **vars, unsigned int *nvars, grn_obj **caller) user_dataをキーとして、現在実行中の grn_proc_func 関数および定義されている変数( grn_expr_var )の配列とその数を取得します。 Parameters • user_data -- grn_proc_func に渡されたuser_dataを指定します。 • nvars -- 変数の数を取得します。 grn_rc grn_obj_set_finalizer(grn_ctx *ctx, grn_obj *obj, grn_proc_func *func) objectを破棄するときに呼ばれる関数を設定します。 table, column, proc, exprのみ設定可能です。 Parameters • obj -- 対象objectを指定します。 • func -- objectを破棄するときに呼ばれる関数を指定します。 grn_search Summary TODO... Example TODO... Reference grn_search_optarg grn_rc grn_obj_search(grn_ctx *ctx, grn_obj *obj, grn_obj *query, grn_obj *res, grn_operator op, grn_search_optarg *optarg) objを対象としてqueryにマッチするレコードを検索し、opの指定に従ってresにレコードを追加あるいは削除します。 Parameters • obj -- 検索対象のobjectを指定します。 • query -- 検索クエリを指定します。 • res -- 検索結果を格納するテーブルを指定します。 • op -- GRN_OP_OR, GRN_OP_AND, GRN_OP_AND_NOT, GRN_OP_ADJUST のいずれかを指定します。 • optarg -- 詳細検索条件を指定します。 grn_table Summary TODO... Example TODO... Reference grn_obj *grn_table_create(grn_ctx *ctx, const char *name, unsigned int name_size, const char *path, grn_obj_flags flags, grn_obj *key_type, grn_obj *value_type) nameパラメータに対応する新たなtableをctxが使用するdbに定義します。 Parameters • name -- 作成するtableの名前を指定します。NULLなら無名tableとなります。 persistent dbに対して名前をありのtableを作成するときには、flagsに GRN_OBJ_PERSISTENT が指定されていなけれなりません。 • path -- 作成するtableのファイルパスを指定します。 flagsに GRN_OBJ_PERSISTENT が指定されている場合のみ有効です。 NULLなら自動的にファイルパスが付与されます。 • flags -- GRN_OBJ_PERSISTENT を指定すると永続tableとなります。 GRN_OBJ_TABLE_PAT_KEY, GRN_OBJ_TABLE_HASH_KEY, GRN_OBJ_TABLE_NO_KEY のいずれかを指定します。 GRN_OBJ_KEY_NORMALIZE を指定すると正規化された文字列がkeyとなります。 GRN_OBJ_KEY_WITH_SIS を指定するとkey文字列の全suffixが自動的に登録されます。 • key_type -- keyの型を指定します。GRN_OBJ_TABLE_NO_KEY が指定された場合は無効です。 既存のtypeあるいはtableを指定できます。 key_typeにtable Aを指定してtable Bを作成した場合、Bは必ずAのサブセットとなります。 • value_type -- keyに対応する値を格納する領域の型を指定します。 tableはcolumnとは別に、keyに対応する値を格納する領域を一つだけ持つことができます。 grn_id grn_table_add(grn_ctx *ctx, grn_obj *table, const void *key, unsigned int key_size, int *added) keyに対応する新しいrecordをtableに追加し、そのIDを返します。keyに対応するrecordがすでにtableに存在するならば、そのrecordのIDを返します。 GRN_OBJ_TABLE_NO_KEY が指定されたtableでは、key, key_size は無視されます。 Parameters • table -- 対象tableを指定します。 • key -- 検索keyを指定します。 • added -- NULL以外の値が指定された場合、新たにrecordが追加された時には1が、既存recordだった時には0がセットされます。 grn_id grn_table_get(grn_ctx *ctx, grn_obj *table, const void *key, unsigned int key_size) It finds a record that has key parameter and returns ID of the found record. If table parameter is a database, it finds an object (table, column and so on) that has key parameter and returns ID of the found object. Parameters • table -- The table or database. • key -- The record or object key to be found. grn_id grn_table_at(grn_ctx *ctx, grn_obj *table, grn_id id) tableにidに対応するrecordが存在するか確認し、存在すれば指定されたIDを、存在しなければ GRN_ID_NIL を返します。 注意: 実行には相応のコストがかかるのであまり頻繁に呼ばないようにして下さい。 Parameters • table -- 対象tableを指定します。 • id -- 検索idを指定します。 grn_id grn_table_lcp_search(grn_ctx *ctx, grn_obj *table, const void *key, unsigned int key_size) tableが GRN_TABLE_PAT_KEY もしくは GRN_TABLE_DAT_KEY を指定して作ったtableなら、longest common prefix searchを行い、対応するIDを返します。 tableが GRN_TABLE_HASH_KEY を指定して作ったtableなら、完全に一致するキーを検索し、対応するIDを返します。 Parameters • table -- 対象tableを指定します。 • key -- 検索keyを指定します。 int grn_table_get_key(grn_ctx *ctx, grn_obj *table, grn_id id, void *keybuf, int buf_size) tableのIDに対応するレコードのkeyを取得します。 対応するレコードが存在する場合はkey長を返します。見つからない場合は0を返します。対応するキーの検索に成功し、またbuf_sizeの長さがkey長以上であった場合は、keybufに該当するkeyをコピーします。 Parameters • table -- 対象tableを指定します。 • id -- 対象レコードのIDを指定します。 • keybuf -- keyを格納するバッファ(呼出側で準備する)を指定します。 • buf_size -- keybufのサイズ(byte長)を指定します。 grn_rc grn_table_delete(grn_ctx *ctx, grn_obj *table, const void *key, unsigned int key_size) tableのkeyに対応するレコードを削除します。対応するレコードが存在しない場合は GRN_INVALID_ARGUMENT を返します。 Parameters • table -- 対象tableを指定します。 • key -- 検索keyを指定します。 • key_size -- 検索keyのサイズを指定します。 grn_rc grn_table_delete_by_id(grn_ctx *ctx, grn_obj *table, grn_id id) tableのidに対応するレコードを削除します。対応するレコードが存在しない場合は GRN_INVALID_ARGUMENT を返します。 Parameters • table -- 対象tableを指定します。 • id -- レコードIDを指定します。 grn_rc grn_table_update_by_id(grn_ctx *ctx, grn_obj *table, grn_id id, const void *dest_key, unsigned int dest_key_size) tableのidに対応するレコードのkeyを変更します。新しいkeyとそのbyte長をdest_keyとdest_key_sizeに指定します。 この操作は、GRN_TABLE_DAT_KEY 型のテーブルのみ使用できます。 Parameters • table -- 対象tableを指定します。 • id -- レコードIDを指定します。 grn_rc grn_table_update(grn_ctx *ctx, grn_obj *table, const void *src_key, unsigned int src_key_size, const void *dest_key, unsigned int dest_key_size) tableのsrc_keyに対応するレコードのkeyを変更します。新しいkeyとそのbyte長をdest_keyとdest_key_sizeに指定します。 この操作は、GRN_TABLE_DAT_KEY 型のテーブルのみ使用できます。 Parameters • table -- 対象tableを指定します。 • src_key -- 対象レコードのkeyを指定します。 • src_key_size -- 対象レコードのkeyの長さ(byte)を指定します。 • dest_key -- 変更後のkeyを指定します。 • dest_key_size -- 変更後のkeyの長さ(byte)を指定します。 grn_rc grn_table_truncate(grn_ctx *ctx, grn_obj *table) tableの全レコードを一括して削除します。 注意: multithread環境では他のthreadのアクセスによって、存在しないアドレスへアクセスし、SIGSEGVが発生する可能性があります。 Parameters • table -- 対象tableを指定します。 grn_table_sort_key TODO... grn_table_sort_flags TODO... int grn_table_sort(grn_ctx *ctx, grn_obj *table, int offset, int limit, grn_obj *result, grn_table_sort_key *keys, int n_keys) table内のレコードをソートし、上位limit個の要素をresultに格納します。 keys.keyには、tableのcolumn,accessor,procのいずれかが指定できます。keys.flagsには、GRN_TABLE_SORT_ASC / GRN_TABLE_SORT_DESC のいずれかを指定できます。GRN_TABLE_SORT_ASC では昇順、GRN_TABLE_SORT_DESC では降順でソートされます。keys.offsetは、内部利用のためのメンバです。 Parameters • table -- 対象tableを指定します。 • offset -- sortされたレコードのうち、(0ベースで)offset番目から順にresにレコードを格納します。 • limit -- resに格納するレコードの上限を指定します。 • result -- 結果を格納するtableを指定します。 • keys -- ソートキー配列へのポインタを指定します。 • n_keys -- ソートキー配列のサイズを指定します。 grn_table_group_result TODO... grn_table_group_flags TODO... grn_rc grn_table_group(grn_ctx *ctx, grn_obj *table, grn_table_sort_key *keys, int n_keys, grn_table_group_result *results, int n_results) tableのレコードを特定の条件でグループ化します。 Parameters • table -- 対象tableを指定します。 • keys -- group化キー構造体の配列へのポインタを指定します。 • n_keys -- group化キー構造体の配列のサイズを指定します。 • results -- group化の結果を格納する構造体の配列へのポインタを指定します。 • n_results -- group化の結果を格納する構造体の配列のサイズを指定します。 grn_rc grn_table_setoperation(grn_ctx *ctx, grn_obj *table1, grn_obj *table2, grn_obj *res, grn_operator op) table1とtable2をopの指定に従って集合演算した結果をresに格納します。 resにtable1あるいはtable2そのものを指定した場合を除けば、table1, table2は破壊されません。 Parameters • table1 -- 対象table1を指定します。 • table2 -- 対象table2を指定します。 • res -- 結果を格納するtableを指定します。 • op -- 実行する演算の種類を指定します。 grn_rc grn_table_difference(grn_ctx *ctx, grn_obj *table1, grn_obj *table2, grn_obj *res1, grn_obj *res2) table1とtable2から重複するレコードを取り除いた結果をそれぞれres1, res2に格納します。 Parameters • table1 -- 対象table1を指定します。 • table2 -- 対象table2を指定します。 • res1 -- 結果を格納するtableを指定します。 • res2 -- 結果を格納するtableを指定します。 int grn_table_columns(grn_ctx *ctx, grn_obj *table, const char *name, unsigned int name_size, grn_obj *res) nameパラメータから始まるtableのカラムIDをresパラメータに格納します。name_sizeパラメータが0の場合はすべてのカラムIDを格納します。 Parameters • table -- 対象tableを指定します。 • name -- 取得したいカラム名のprefixを指定します。 • name_size -- nameパラメータの長さを指定します。 • res -- 結果を格納する GRN_TABLE_HASH_KEY のtableを指定します。 Returns 格納したカラムIDの数を返します。 unsigned int grn_table_size(grn_ctx *ctx, grn_obj *table) tableに登録されているレコードの件数を返します。 Parameters • table -- 対象tableを指定します。 grn_rc grn_table_rename(grn_ctx *ctx, grn_obj *table, const char *name, unsigned int name_size) ctxが使用するdbにおいてtableに対応する名前をnameに更新します。tableの全てのcolumnも同時に名前が変更されます。tableは永続オブジェクトでなければいけません。 Parameters • name_size -- nameパラメータのsize(byte)を指定します。 grn_table_cursor Summary TODO... Example TODO... Reference grn_table_cursor TODO... grn_table_cursor *grn_table_cursor_open(grn_ctx *ctx, grn_obj *table, const void *min, unsigned int min_size, const void *max, unsigned int max_size, int offset, int limit, int flags) tableに登録されているレコードを順番に取り出すためのカーソルを生成して返します。 Parameters • table -- 対象tableを指定します。 • min -- keyの下限を指定します。(NULLは下限なしと見なします。) GRN_CURSOR_PREFIX については後述。 • min_size -- minのsizeを指定します。GRN_CURSOR_PREFIX については後述。 • max -- keyの上限を指定します。(NULLは上限なしと見なします。) GRN_CURSOR_PREFIX については後述。 • max_size -- maxのsizeを指定します。GRN_CURSOR_PREFIX については無視される場合があります。 • flags -- GRN_CURSOR_ASCENDING を指定すると昇順にレコードを取り出します。 GRN_CURSOR_DESCENDING を指定すると降順にレコードを取り出します。(下記 GRN_CURSOR_PREFIX を指定し、keyが近いレコードを取得する場合、もしくは、common prefix searchを行う場合には、GRN_CURSOR_ASCENDING / GRN_CURSOR_DESCENDING は無視されます。) GRN_CURSOR_GT を指定するとminに一致したkeyをcursorの範囲に含みません。(minがNULLの場合もしくは、下記 GRN_CURSOR_PREFIX を指定し、keyが近いレコードを取得する場合、もしくは、common prefix searchを行う場合には、GRN_CURSOR_GT は無視されます。) GRN_CURSOR_LT を指定するとmaxに一致したkeyをcursorの範囲に含みません。(maxがNULLの場合もしくは、下記 GRN_CURSOR_PREFIX を指定した場合には、GRN_CURSOR_LT は無視されます。) GRN_CURSOR_BY_ID を指定するとID順にレコードを取り出します。(下記 GRN_CURSOR_PREFIX を指定した場合には、GRN_CURSOR_BY_ID は無視されます。) GRN_OBJ_TABLE_PAT_KEY を指定したtableについては、GRN_CURSOR_BY_KEY を指定するとkey順にレコードを取り出します。( GRN_OBJ_TABLE_HASH_KEY , GRN_OBJ_TABLE_NO_KEY を指定したテーブルでは GRN_CURSOR_BY_KEY は無視されます。) GRN_CURSOR_PREFIX を指定すると、 GRN_OBJ_TABLE_PAT_KEY を指定したテーブルに関する下記のレコードを取り出すカーソルが作成されます。maxがNULLの場合には、keyがminと前方一致するレコードを取り出します。max_sizeパラメータは無視されます。 maxとmax_sizeが指定され、かつ、テーブルのkeyがShortText型である場合、maxとcommon prefix searchを行い、common prefixがmin_sizeバイト以上のレコードを取り出します。minは無視されます。 maxとmax_sizeが指定され、かつ、テーブルのkeyが固定長型の場合、maxとPAT木上で近い位置にあるノードから順番にレコードを取り出します。ただし、keyのパトリシア木で、min_sizeバイト未満のビットに対するノードで、maxと異なった方向にあるノードに対応するレコードについては取り出しません。PAT木上で位置が近いこととkeyの値が近いことは同一ではありません。この場合、maxで与えられるポインタが指す値は、対象テーブルのkeyサイズと同じか超える幅である必要があります。minは無視されます。 GRN_CURSOR_BY_ID / GRN_CURSOR_BY_KEY / GRN_CURSOR_PREFIX の3フラグは、同時に指定することができません。 GRN_OBJ_TABLE_PAT_KEY を指定して作ったテーブルで、GRN_CURSOR_PREFIX と GRN_CURSOR_RK を指定すると、半角小文字のアルファベット文字列から、それを旧JIS X 4063:2000規格に従って全角カタカナに変換した文字列に前方一致する値をkeyとするレコードを取り出します。GRN_ENC_UTF8 のみをサポートしています。GRN_CURSOR_ASCENDING / GRN_CURSOR_DESCENDING は無効であり、レコードをkey値の昇降順で取り出すことはできません。 • offset -- 該当する範囲のレコードのうち、(0ベースで)offset番目からレコードを取り出します。 GRN_CURSOR_PREFIX を指定したときは負の数を指定することはできません。 • limit -- 該当する範囲のレコードのうち、limit件のみを取り出します。-1が指定された場合は、全件が指定されたものとみなします。 GRN_CURSOR_PREFIX を指定したときは-1より小さい負の数を指定することはできません。 grn_rc grn_table_cursor_close(grn_ctx *ctx, grn_table_cursor *tc) grn_table_cursor_open() で生成したcursorを解放します。 Parameters • tc -- 対象cursorを指定します。 grn_id grn_table_cursor_next(grn_ctx *ctx, grn_table_cursor *tc) cursorのカレントレコードを一件進めてそのIDを返します。cursorの対象範囲の末尾に達すると GRN_ID_NIL を返します。 Parameters • tc -- 対象cursorを指定します。 int grn_table_cursor_get_key(grn_ctx *ctx, grn_table_cursor *tc, void **key) cursorのカレントレコードのkeyをkeyパラメータにセットし、その長さを返します。 Parameters • tc -- 対象cursorを指定します。 • key -- カレントレコードのkeyへのポインタがセットされます。 int grn_table_cursor_get_value(grn_ctx *ctx, grn_table_cursor *tc, void **value) cursorパラメータのカレントレコードのvalueをvalueパラメータにセットし、その長さを返します。 Parameters • tc -- 対象cursorを指定します。 • value -- カレントレコードのvalueへのポインタがセットされます。 grn_rc grn_table_cursor_set_value(grn_ctx *ctx, grn_table_cursor *tc, const void *value, int flags) cursorのカレントレコードのvalueを引数の内容に置き換えます。cursorのカレントレコードが存在しない場合は GRN_INVALID_ARGUMENT を返します。 Parameters • tc -- 対象cursorを指定します。 • value -- 新しいvalueの値を指定します。 • flags -- grn_obj_set_value() のflagsと同様の値を指定できます。 grn_rc grn_table_cursor_delete(grn_ctx *ctx, grn_table_cursor *tc) cursorのカレントレコードを削除します。cursorのカレントレコードが存在しない場合は GRN_INVALID_ARGUMENT を返します。 Parameters • tc -- 対象cursorを指定します。 grn_obj *grn_table_cursor_table(grn_ctx *ctx, grn_table_cursor *tc) cursorが属するtableを返します。 Parameters • tc -- 対象cursorを指定します。 grn_thread_* Summary Groonga provides thread related APIs with grn_thread_ prefix. Normally, you don't need to use these APIs. You may want to use these APIs when you write a Groonga server. Example Here is a real word use case of grn_thread_* APIs by /reference/executables/groonga. /reference/executables/groonga increases its thread pool size when the max number of threads is increased. /reference/executables/groonga decreases its thread pool size and stops too many threads when the max number of threads is decreased. static grn_mutex q_mutex; static grn_cond q_cond; static uint32_t nfthreads; static uint32_t max_nfthreads; static uint32_t groonga_get_thread_limit(void *data) { return max_nfthreads; } static void groonga_set_thread_limit(uint32_t new_limit, void *data) { uint32_t i; uint32_t current_nfthreads; MUTEX_LOCK(q_mutex); current_nfthreads = nfthreads; max_nfthreads = new_limit; MUTEX_UNLOCK(q_mutex); if (current_nfthreads > new_limit) { for (i = 0; i < current_nfthreads; i++) { MUTEX_LOCK(q_mutex); COND_SIGNAL(q_cond); MUTEX_UNLOCK(q_mutex); } } } int main(int argc, char *argv) { /* ... */ grn_thread_set_get_limit_func(groonga_get_thread_limit, NULL); grn_thread_set_set_limit_func(groonga_set_thread_limit, NULL); grn_init(); /* ... */ } Reference uint32_t (*grn_thread_get_limit_func)(void *data) It's the type of function that returns the max number of threads. void (*grn_thread_set_limit_func)(uint32_t new_limit, void *data) It's the type of function that sets the max number of threads. uint32_t grn_thread_get_limit(void) It returns the max number of threads. If grn_thread_get_limit_func isn't set by grn_thread_set_get_limit_func(), it always returns 0. Returns The max number of threads or 0. void_t grn_thread_set_limit(uint32_t new_limit) It sets the max number of threads. If grn_thread_set_limit_func isn't set by grn_thread_set_set_limit_func(), it does nothing. Parameters • new_limit -- The new max number of threads. void grn_thread_set_get_limit_func(grn_thread_get_limit_func func, void *data) It sets the custom function that returns the max number of threads. data is passed to func when func is called from grn_thread_get_limit(). Parameters • func -- The custom function that returns the max number of threads. • data -- An user data to be passed to func when func is called. void grn_thread_set_set_limit_func(grn_thread_set_limit_func func, void *data) It sets the custom function that sets the max number of threads. data is passed to func when func is called from grn_thread_set_limit(). Parameters • func -- The custom function that sets the max number of threads. • data -- An user data to be passed to func when func is called. grn_type Summary TODO... Example TODO... Reference grn_builtin_type TODO... grn_obj *grn_type_create(grn_ctx *ctx, const char *name, unsigned int name_size, grn_obj_flags flags, unsigned int size) nameに対応する新たなtype(型)をdbに定義します。 Parameters • name -- 作成するtypeの名前を指定します。 • flags -- GRN_OBJ_KEY_VAR_SIZE, GRN_OBJ_KEY_FLOAT, GRN_OBJ_KEY_INT, GRN_OBJ_KEY_UINT のいずれかを指定します。 • size -- GRN_OBJ_KEY_VAR_SIZE の場合は最大長、それ以外の場合は長さ(単位:byte)を指定します。 grn_user_data Summary TODO... Example TODO... Reference grn_user_data TODO... grn_user_data *grn_obj_user_data(grn_ctx *ctx, grn_obj *obj) objectに登録できるユーザデータへのポインタを返します。table, column, proc, exprのみ使用可能です。 Parameters • obj -- 対象objectを指定します。
SPECIFICATION
GQTP GQTP is the acronym of Groonga Query Transfer Protocol. GQTP is the original protocol for groonga. Protocol GQTP is stateful client server model protocol. The following sequence is one processing unit: • Client sends a request • Server receives the request • Server processes the request • Server sends a response • Client receives the response You can do zero or more processing units in a session. Both request and response consist of GQTP header and body. GQTP header is fixed size data. Body is variable size data and its size is stored in GQTP header. The content of body isn't defined in GQTP. GQTP header GQTP header consists of the following unsigned integer values: ┌───────────┬───────┬───────────────────────┐ │Name │ Size │ Description │ ├───────────┼───────┼───────────────────────┤ │protocol │ 1byte │ Protocol type. │ ├───────────┼───────┼───────────────────────┤ │query_type │ 1byte │ Content type of body. │ ├───────────┼───────┼───────────────────────┤ │key_length │ 2byte │ Not used. │ ├───────────┼───────┼───────────────────────┤ │level │ 1byte │ Not used. │ ├───────────┼───────┼───────────────────────┤ │flags │ 1byte │ Flags. │ ├───────────┼───────┼───────────────────────┤ │status │ 2byte │ Return code. │ ├───────────┼───────┼───────────────────────┤ │size │ 4byte │ Body size. │ ├───────────┼───────┼───────────────────────┤ │opaque │ 4byte │ Not used. │ ├───────────┼───────┼───────────────────────┤ │cas │ 8byte │ Not used. │ └───────────┴───────┴───────────────────────┘ All header values are encoded by network byte order. The following sections describes available values of each header value. The total size of GQTP header is 24byte. protocol The value is always 0xc7 in both request and response GQTP header. query_type The value is one of the following values: ┌────────┬───────┬───────────────────────┐ │Name │ Value │ Description │ ├────────┼───────┼───────────────────────┤ │NONE │ 0 │ Free format. │ ├────────┼───────┼───────────────────────┤ │TSV │ 1 │ Tab Separated Values. │ ├────────┼───────┼───────────────────────┤ │JSON │ 2 │ JSON. │ ├────────┼───────┼───────────────────────┤ │XML │ 3 │ XML. │ ├────────┼───────┼───────────────────────┤ │MSGPACK │ 4 │ MessagePack. │ └────────┴───────┴───────────────────────┘ This is not used in request GQTP header. This is used in response GQTP header. Body is formatted as specified type. flags The value is bitwise OR of the following values: ┌──────┬───────┬─────────────────────────┐ │Name │ Value │ Description │ ├──────┼───────┼─────────────────────────┤ │MORE │ 0x01 │ There are more data. │ ├──────┼───────┼─────────────────────────┤ │TAIL │ 0x02 │ There are no more data. │ ├──────┼───────┼─────────────────────────┤ │HEAD │ 0x04 │ Not used. │ ├──────┼───────┼─────────────────────────┤ │QUIET │ 0x08 │ Be quiet. │ ├──────┼───────┼─────────────────────────┤ │QUIT │ 0x10 │ Quit. │ └──────┴───────┴─────────────────────────┘ You must specify MORE or TAIL flag. If you use MORE flag, you should also use QUIET flag. Because you don't need to show a response for your partial request. Use QUIT flag to quit this session. status Here are available values. The new statuses will be added in the future. • 0: SUCCESS • 1: END_OF_DATA • 65535: UNKNOWN_ERROR • 65534: OPERATION_NOT_PERMITTED • 65533: NO_SUCH_FILE_OR_DIRECTORY • 65532: NO_SUCH_PROCESS • 65531: INTERRUPTED_FUNCTION_CALL • 65530: INPUT_OUTPUT_ERROR • 65529: NO_SUCH_DEVICE_OR_ADDRESS • 65528: ARG_LIST_TOO_LONG • 65527: EXEC_FORMAT_ERROR • 65526: BAD_FILE_DESCRIPTOR • 65525: NO_CHILD_PROCESSES • 65524: RESOURCE_TEMPORARILY_UNAVAILABLE • 65523: NOT_ENOUGH_SPACE • 65522: PERMISSION_DENIED • 65521: BAD_ADDRESS • 65520: RESOURCE_BUSY • 65519: FILE_EXISTS • 65518: IMPROPER_LINK • 65517: NO_SUCH_DEVICE • 65516: NOT_A_DIRECTORY • 65515: IS_A_DIRECTORY • 65514: INVALID_ARGUMENT • 65513: TOO_MANY_OPEN_FILES_IN_SYSTEM • 65512: TOO_MANY_OPEN_FILES • 65511: INAPPROPRIATE_I_O_CONTROL_OPERATION • 65510: FILE_TOO_LARGE • 65509: NO_SPACE_LEFT_ON_DEVICE • 65508: INVALID_SEEK • 65507: READ_ONLY_FILE_SYSTEM • 65506: TOO_MANY_LINKS • 65505: BROKEN_PIPE • 65504: DOMAIN_ERROR • 65503: RESULT_TOO_LARGE • 65502: RESOURCE_DEADLOCK_AVOIDED • 65501: NO_MEMORY_AVAILABLE • 65500: FILENAME_TOO_LONG • 65499: NO_LOCKS_AVAILABLE • 65498: FUNCTION_NOT_IMPLEMENTED • 65497: DIRECTORY_NOT_EMPTY • 65496: ILLEGAL_BYTE_SEQUENCE • 65495: SOCKET_NOT_INITIALIZED • 65494: OPERATION_WOULD_BLOCK • 65493: ADDRESS_IS_NOT_AVAILABLE • 65492: NETWORK_IS_DOWN • 65491: NO_BUFFER • 65490: SOCKET_IS_ALREADY_CONNECTED • 65489: SOCKET_IS_NOT_CONNECTED • 65488: SOCKET_IS_ALREADY_SHUTDOWNED • 65487: OPERATION_TIMEOUT • 65486: CONNECTION_REFUSED • 65485: RANGE_ERROR • 65484: TOKENIZER_ERROR • 65483: FILE_CORRUPT • 65482: INVALID_FORMAT • 65481: OBJECT_CORRUPT • 65480: TOO_MANY_SYMBOLIC_LINKS • 65479: NOT_SOCKET • 65478: OPERATION_NOT_SUPPORTED • 65477: ADDRESS_IS_IN_USE • 65476: ZLIB_ERROR • 65475: LZO_ERROR • 65474: STACK_OVER_FLOW • 65473: SYNTAX_ERROR • 65472: RETRY_MAX • 65471: INCOMPATIBLE_FILE_FORMAT • 65470: UPDATE_NOT_ALLOWED • 65469: TOO_SMALL_OFFSET • 65468: TOO_LARGE_OFFSET • 65467: TOO_SMALL_LIMIT • 65466: CAS_ERROR • 65465: UNSUPPORTED_COMMAND_VERSION size The size of body. The maximum body size is 4GiB because size is 4byte unsigned integer. If you want to send 4GiB or more larger data, use MORE flag. Example How to run a GQTP server Groonga has a special protocol, named Groonga Query Transfer Protocol (GQTP), for remote access to a database. The following form shows how to run Groonga as a GQTP server. Form: groonga [-p PORT_NUMBER] -s DB_PATH The -s option specifies to run Groonga as a server. DB_PATH specifies the path of the existing database to be hosted. The -p option and its argument, PORT_NUMBER, specify the port number of the server. The default port number is 10043, which is used when you don't specify PORT_NUMBER. The following command runs a server that listens on the default port number. The server accepts operations to the specified database. Execution example: % groonga -s /tmp/groonga-databases/introduction.db Ctrl-c % How to run a GQTP daemon You can also run a GQTP server as a daemon by using the -d option, instead of the -s option. Form: groonga [-p PORT_NUMBER] -d DB_PATH A Groonga daemon prints its process ID as follows. In this example, the process ID is 12345. Then, the daemon opens a specified database and accepts operations to that database. Execution example: % groonga -d /tmp/groonga-databases/introduction.db 12345 % How to run a GQTP client You can run Groonga as a GQTP client as follows: Form: groonga [-p PORT_NUMBER] -c [HOST_NAME_OR_IP_ADDRESS] This command establishes a connection with a GQTP server and then enters into interactive mode. HOST_NAME_OR_IP_ADDRESS specifies the hostname or the IP address of the server. If not specified, Groonga uses the default hostname "localhost". The -p option and its argument, PORT_NUMBER, specify the port number of the server. If not specified, Groonga uses the default port number 10043. Execution example: % groonga -c status # [ # [ # 0, # 1337566253.89858, # 0.000355720520019531 # ], # { # "uptime": 0, # "max_command_version": 2, # "n_queries": 0, # "cache_hit_rate": 0.0, # "version": "4.0.1", # "alloc_count": 140, # "command_version": 1, # "starttime": 1395806078, # "default_command_version": 1 # } # ] > ctrl-d % In interactive mode, Groonga reads commands from the standard input and executes them one by one. How to terminate a GQTP server You can terminate a GQTP server with a /reference/commands/shutdown command. Execution example: % groonga -c > shutdown % See also • /reference/executables/groonga • /server/gqtp 検索 /reference/commands/select コマンドがqueryパラメータを使ってどのように検索するのかを説明します。 検索の挙動 検索の挙動には以下の3種類あり、検索結果によって動的に使い分けています。 1. 完全一致検索 2. 非わかち書き検索 3. 部分一致検索 どのように検索の挙動を使い分けているかを説明する前に、まず、それぞれの検索の挙動を説明します。 完全一致検索 検索対象文書は複数の語彙にトークナイズ(分割)され、それぞれを単位とした語彙表に索引を管理します。 検索キーワードも同一の方法でトークナイズされます。 このとき、検索キーワードをトークナイズした結果得られる語彙の配列と同一の配列を含む文書を検索する処理を完全一致検索と呼んでいます。 たとえば、TokenMecabトークナイザを使用した索引では「東京都民」という文字列は 東京 / 都民 という二つの語彙の配列として格納されます。この索引に対して「東京都」というキーワードで検索した時、このキーワードは、 東京 / 都 という二つの語彙の配列として処理されます。この語彙の並びは、「東京 / 都民」という語彙の並びには一致しませんので、完全一致検索ではヒットしません。 これに対して、TokenBigramトークナイザを使用した索引では「東京都民」という文字列は 東京 / 京都 / 都民 / 民 という四つの語彙の配列として格納されます。この索引に対して「東京都」というキーワードで検索した時、このキーワードは、 東京 / 京都 という二つの語彙の配列として処理されます。この語彙の並びは、「東京 / 京都 / 都民」という語彙の並びに含まれますので、完全一致検索でヒットします。 なお、TokenBigramトークナイザでは、アルファベット・数値・記号文字列についてはbigramを生成せず、一つの連続したトークンとして扱います。たとえば、「楽しいbilliard」という文字列は、 楽し / しい / billiard という三つの語彙の配列として格納されます。これに対して「bill」というキーワードで検索した時、このキーワードは、 bill という一つの語彙として処理されます。この語彙の並びは「楽し / しい / billiard」という語彙の並びには含まれないので、完全一致でヒットしません。 これに対して、TokenBigramSplitSymbolAlphaトークナイザではアルファベット文字列についてもbigramを生成し、「楽しいbilliard」という文字列は、 楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d という十一の語彙の配列として格納されます。これに対して「bill」というキーワードで検索した時、このキーワードは、 bi / il / ll という三つの語彙として処理されます。この語彙の並びは「楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d」という語彙の並びに含まれるので、完全一致でヒットします。 非わかち書き検索 非わかち書き検索はパトリシア木で語彙表を構築している場合のみ利用可能です。 非わかち書き検索の挙動はTokenBigramなどN-gram系のトークナイザーを利用している場合とTokenMecabトークナイザーを利用している場合で挙動が変わります。 N-gram系のトークナイザーを利用している場合はキーワードで前方一致検索をします。 例えば、「bill」というキーワードで検索した場合、「bill」も「billiard」もヒットします。 TokenMeCabトークナイザーの場合はわかち書き前のキーワードで前方一致検索をします。 例えば、「スープカレー」というキーワードで検索した場合、「スープカレーバー」(1単語扱い)にヒットしますが、「スープカレー」("スープ"と"カレー"の2単語扱い)や「スープカレーライス」("スープ"と"カレーライス"の2単語扱い)にはヒットしません。 部分一致検索 部分一致検索はパトリシア木で語彙表を構築していて、かつ、KEY_WITH_SISオプションを指定している場合のみ利用可能です。KEY_WITH_SISオプションが指定されていない場合は非わかち書き検索と同等です。 部分一致検索の挙動はTokenBigramなどN-gram系のトークナイザーを利用している場合とTokenMecabトークナイザーを利用している場合で挙動が変わります。 Bigramの場合は前方一致検索と中間一致検索と後方一致検索を行います。 例えば、「ill」というキーワードで検索した場合、「bill」も「billiard」もヒットします。 TokenMeCabトークナイザーの場合はわかち書き後のキーワードで前方一致検索と中間一致検索と後方一致検索をします。 例えば、「スープカレー」というキーワードで検索した場合、「スープカレー」("スープ"と"カレー"の2単語扱い)や「スープカレーライス」("スープ"と"カレーライス"の2単語扱い)、「スープカレーバー」(1単語扱い)にもヒットします。 検索の使い分け Groongaは基本的に完全一致検索のみを行います。完全一致検索でのヒット件数が所定の閾値以下の場合に限り、非わかち書き検索を行い、それでもヒット件数が閾値以下の場合は部分一致検索を行います。(閾値のデフォルト値は0です。) ただし、すでに検索結果セットが存在する場合はたとえヒット件数が閾値以下でも完全一致検索のみを行います。 例えば、以下のようなクエリの場合は、それぞれの検索でヒット件数が閾値以下の場合は完全一致検索、非わかち書き検索、部分一致検索を順に行います。: select Shops --match_column description --query スープカレー しかし、以下のように全文検索を行う前に検索結果セットが存在する場合は完全一致検索のみを行います。(point > 3で閾値の件数よりヒットしている場合): select Shops --filter '"point > 3 && description @ \"スープカレー\""' そのため、descriptionに「スープカレーライス」が含まれていても、「スープカレーライス」は「スープカレー」に完全一致しないのでヒットしません。
LIMITATIONS
Groonga has some limitations. Limitations of table A table has the following limitations. • The maximum one key size: 4KiB • The maximum total size of keys: 4GiB or 1TiB (by specifying KEY_LARGE flag to table-create-flags) • The maximum number of records: 268,435,455 (more than 268 million) Keep in mind that these limitations may vary depending on conditions. Limitations of indexing A full-text index has the following limitations. • The maximum number of distinct terms: 268,435,455 (more than 268 million) • The maximum index size: 256GiB Keep in mind that these limitations may vary depending on conditions. Limitations of column A column has the following limitation. • The maximum stored data size of a column: 256GiB
トラブルシューティング
同じ検索キーワードなのに全文検索結果が異なる 同じ検索キーワードでも一緒に指定するクエリによっては全文検索の結果が異なることがあります。ここでは、その原因と対策方法を説明します。 例 まず、実際に検索結果が異なる例を説明します。 DDLは以下の通りです。BlogsテーブルのbodyカラムをTokenMecabトークナイザーを使ってトークナイズしてからインデックスを作成しています。: table_create Blogs TABLE_NO_KEY column_create Blogs body COLUMN_SCALAR ShortText column_create Blogs updated_at COLUMN_SCALAR Time table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenMecab --normalizer NormalizerAuto column_create Terms blog_body COLUMN_INDEX|WITH_POSITION Blogs body テスト用のデータは1件だけ投入します。: load --table Blogs [ ["body", "updated_at"], ["東京都民に深刻なダメージを与えました。", "2010/9/21 10:18:34"], ] まず、全文検索のみで検索します。この場合ヒットします。: > select Blogs --filter 'body @ "東京都"' [[0,4102.268052438,0.000743783],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]] 続いて、範囲指定と全文検索を組み合わせて検索します(1285858800は2010/10/1 0:0:0の秒表記)。この場合もヒットします。: > select Blogs --filter 'body @ "東京都" && updated_at < 1285858800' [[0,4387.524084839,0.001525487],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]] 最後に、範囲指定と全文検索の順番を入れ替えて検索します。個々の条件は同じですが、この場合はヒットしません。: > select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"' [[0,4400.292570838,0.000647716],[[[0],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]]]]] どうしてこのような挙動になるかを説明します。 原因 このような挙動になるのは全文検索時に複数の検索の挙動を使い分けているからです。ここでは簡単に説明するので、詳細は /spec/search を参照してください。 検索の挙動には以下の3種類があります。 1. 完全一致検索 2. 非わかち書き検索 3. 部分一致検索 Groongaは基本的に完全一致検索のみを行います。上記の例では「東京都民に深刻なダメージを与えました。」を「東京都」というクエリで検索していますが、TokenMecabトークナイザーを使っている場合はこのクエリはマッチしません。 検索対象の「東京都民に深刻なダメージを与えました。」は 東京 / 都民 / に / 深刻 / な / ダメージ / を / 与え / まし / た / 。 とトークナイズされますが、クエリの「東京都」は 東京 / 都 とトークナイズされるため、完全一致しません。 Groongaは完全一致検索した結果のヒット件数が所定の閾値を超えない場合に限り、非わかち書き検索を行い、それでもヒット件数が閾値を超えない場合は部分一致検索を行います(閾値は1がデフォルト値となっています)。このケースのデータは部分一致検索ではヒットするので、「東京都」クエリのみを指定するとヒットします。 しかし、以下のように全文検索前にすでに閾値が越えている場合(「updated_at < 1285858800」で1件ヒットし、閾値を越える)は、たとえ完全一致検索で1件もヒットしない場合でも部分一致検索などを行いません。: select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"' そのため、条件の順序を変えると検索結果が変わるという状況が発生します。以下で、この情報を回避する方法を2種類紹介しますが、それぞれトレードオフとなる条件があるので採用するかどうかを十分検討してください。 対策方法1: トークナイザーを変更する TokenMecabトークナイザーは事前に準備した辞書を用いてトークナイズするため、再現率よりも適合率を重視したトークナイザーと言えます。一方、TokenBigramなど、N-gram系のトークナイザーは適合率を重視したトークナイザーと言えます。例えば、TokenMecabの場合「東京都」で「京都」に完全一致することはありませんが、TokenBigramでは完全一致します。一方、TokenMecabでは「東京都民」に完全一致しませんが、TokenBigramでは完全一致します。 このようにN-gram系のトークナイザーを指定することにより再現率をあげることができますが、適合率が下がり検索ノイズが含まれる可能性が高くなります。この度合いを調整するためには /reference/commands/select のmatch_columnsで使用する索引毎に重み付けを指定します。 ここでも、前述の例を使って具体例を示します。まず、TokenBigramを用いた索引を追加します。: table_create Bigram TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto column_create Bigram blog_body COLUMN_INDEX|WITH_POSITION Blogs body この状態でも以前はマッチしなかったレコードがヒットするようになります。: > select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"' [[0,7163.448064902,0.000418127],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]] しかし、N-gram系のトークナイザーの方がTokenMecabトークナイザーよりも語のヒット数が多いため、N-gram系のヒットスコアの方が重く扱われてしまいます。N-gram系のトークナイザーの方がTokenMecabトークナイザーよりも適合率の低い場合が多いので、このままでは検索ノイズが上位に表示される可能性が高くなります。 そこで、TokenMecabトークナイザーを使って作った索引の方をTokenBigramトークナイザーを使って作った索引よりも重視するように重み付けを指定します。これは、match_columnsオプションで指定できます。: > select Blogs --match_columns 'Terms.blog_body * 10 || Bigram.blog_body * 3' --query '東京都' --output_columns '_score, body' [[0,8167.364602632,0.000647003],[[[1],[["_score","Int32"],["body","ShortText"]],[13,"東京都民に深刻なダメージを与えました。"]]]] この場合はスコアが11になっています。内訳は、Terms.blog_body索引(TokenMecabトークナイザーを使用)でマッチしたので10、Bigram.blog_body索引(TokenBigramトークナイザーを使用)でマッチしたので3、これらを合計して13になっています。このようにTokenMecabトークナイザーの重みを高くすることにより、検索ノイズが上位にくることを抑えつつ再現率を上げることができます。 この例は日本語だったのでTokenBigramトークナイザーでよかったのですが、アルファベットの場合はTokenBigramSplitSymbolAlphaトークナイザーなども利用する必要があります。例えば、「楽しいbilliard」はTokenBigramトークナイザーでは 楽し / しい / billiard となり、「bill」では完全一致しません。一方、TokenBigramSplitSymbolAlphaトークナイザーを使うと 楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d となり、「bill」でも完全一致します。 TokenBigramSplitSymbolAlphaトークナイザーを使う場合も重み付けを考慮する必要があることはかわりありません。 利用できるバイグラム系のトークナイザーの一覧は以下の通りです。 • TokenBigram: バイグラムでトークナイズする。連続する記号・アルファベット・数字は一語として扱う。 • TokenBigramSplitSymbol: 記号もバイグラムでトークナイズする。連続するアルファベット・数字は一語として扱う。 • TokenBigramSplitSymbolAlpha: 記号とアルファベットもバイグラムでトークナイズする。連続する数字は一語として扱う。 • TokenBigramSplitSymbolAlphaDigit: 記号・アルファベット・数字もバイグラムでトークナイズする。 • TokenBigramIgnoreBlank: バイグラムでトークナイズする。連続する記号・アルファベット・数字は一語として扱う。空白は無視する。 • TokenBigramIgnoreBlankSplitSymbol: 記号もバイグラムでトークナイズする。連続するアルファベット・数字は一語として扱う。空白は無視する。 • TokenBigramIgnoreBlankSplitSymbolAlpha: 記号とアルファベットもバイグラムでトークナイズする。連続する数字は一語として扱う。空白は無視する。 • TokenBigramIgnoreBlankSplitSymbolAlphaDigit: 記号・アルファベット・数字もバイグラムでトークナイズする。空白は無視する。 対策方法2: 閾値をあげる 非わかち書き検索・部分一致検索を利用するかどうかの閾値は--with-match-escalation-threshold configureオプションで変更することができます。以下のように指定すると、100件以下のヒット数であれば、たとえ完全一致検索でヒットしても、非わかち書き検索・部分一致検索を行います。: % ./configure --with-match-escalation-threshold=100 この場合も対策方法1同様、検索ノイズが上位に現れる可能性が高くなることに注意してください。検索ノイズが多くなった場合は指定する値を低くする必要があります。 How to avoid mmap Cannot allocate memory error Example There is a case following mmap error in log file: 2013-06-04 08:19:34.835218|A|4e86e700|mmap(4194304,551,432017408)=Cannot allocate memory <13036498944> Note that <13036498944> means total size of mmap (almost 12GB) in this case. Solution So you need to confirm following point of views. • Are there enough free memory? • Are maximum number of mappings exceeded? To check there are enough free memory, you can use vmstat command. To check whether maximum number of mappings are exceeded, you can investigate the value of vm.max_map_count. If this issue is fixed by modifying the value of vm.max_map_count, it's exactly the reason. As groonga allocates memory chunks each 256KB, you can estimate the size of database you can handle by following formula: (database size) = vm.max_map_count * (memory chunks) If you want to handle over 16GB groonga database, you must specify at least 65536 as the value of vm.max_map_count: database size (16GB) = vm.max_map_count (65536) * memory chunks (256KB) You can modify vm.max_map_count temporary by sudo sysctl -w vm.max_map_count=65536. Then save the configuration value to /etc/sysctl.conf or /etc/sysctl.d/*.conf. See /reference/tuning documentation about tuning related parameters.
DEVELOPMENT
This section describes about developing with Groonga. You may develop an application that uses Groonga as its database, a library that uses libgroonga, language bindings of libgroonga and so on. Travis CI This section describes about using Groonga on Travis CI. Travis CI is a hosted continuous integration service for the open source community. You can use Travis CI for your open source software. This section only describes about Groonga related configuration. See Travis CI: Documentation about general Travis CI. Configuration Travis CI is running on 64-bit Ubuntu 12.04 LTS Server Edition. (See Travis CI: About Travis CI Environment.) You can use apt-line for Ubuntu 12.04 LTS provided by Groonga project to install Groonga on Travis CI. You can custom build lifecycle by .travis.yml. (See Travis CI: Conifugration your Travis CI build with .travis.yml.) You can use before_install hook or install hook. You should use before_install if your software uses a language that is supported by Travis CI such as Ruby. You should use install otherwise. Add the following sudo and before_install configuration to .travis.yml: sudo: required before_install: - curl --silent --location https://github.com/groonga/groonga/raw/master/data/travis/setup.sh | sh sudo: required configuration is required because sudo command is used in the setup script. If you need to use install hook instead of before_install, you just substitute before_install: with install:. With the above configuration, you can use Groonga for your build. Examples Here are open source software that use Groonga on Travis CI: • rroonga (Ruby bindings) • rroonga on Travis CI • .travis.yml for rroonga • nroonga (node.js bindings) • nroonga on Travis CI • .travis.yml for nroonga • logaling-command (A glossary management command line tool) • logaling-command on Travis CI • .travis.yml for logaling-command
HOW TO CONTRIBUTE TO GROONGA
We welcome your contributions to the groonga project. There are many ways to contribute, such as using groonga, introduction to others, etc. For example, if you find a bug when using groonga, you are welcome to report the bug. Coding and documentation are also welcome for groonga and its related projects. As a user: If you are interested in groonga, please read this document and try it. As a spokesman: Please introduce groonga to your friends and colleagues. As a developer: Bug report, development and documentation This section describes the details. How to report a bug There are two ways to report a bug: • Submit a bug to the issue tracker • Report a bug to the mailing list You can use either way It makes no difference to us. Submit a bug to the issue tracker Groonga project uses GitHub issue tracker. You can use English or Japanese to report a bug. Report a bug to the mailing list Groonga project has /community for discussing about groonga. Please send a mail that describes a bug. How to contribute in documentation topics We use Sphinx for documentation tool. Introduction This documentation describes about how to write, generate and manage Groonga documentation. Install depended software Groonga uses Sphinx as documentation tool. Here are command lines to install Sphinx. Debian GNU/Linux, Ubuntu: % sudo apt-get install -V -y python-sphinx CentOS, Fedora: % sudo yum install -y python-pip % sudo pip install sphinx OS X: % brew install python % brew install gettext % export PATH=`brew --prefix gettext`/bin:$PATH % pip install sphinx If the version of Python on your platform is too old, you'll need to install a newer version of Python 2.7 by your hand. For example, here are installation steps based on pyenv: % pyenv install 2.7.11 % pyenv global 2.7.11 % pip install sphinx Run configure with --enable-document Groonga disables documentation generation by default. You need to enable it explicitly by adding --enable-document option to configure: % ./configure --enable-document Now, your Groonga build is documentation ready. Generate HTML You can generate HTML by the following command: % make -C doc html You can find generated HTML documentation at doc/locale/en/html/. Update You can find sources of documentation at doc/source/. The sources should be written in English. See i18n about how to translate documentation. You can update the target file when you update the existing documentation file. You need to update file list after you add a new file, change file path and delete existing file. You can update file list by the following command: % make -C doc update-files The command updates doc/files.am. I18N We only had documentation in Japanese. We start to support I18N documentation by gettext based Sphinx I18N feature. We'll use English as base language and translate English into other languages such as Japanese. We'll put all documentations into doc/source/ and process them by Sphinx. But we still use Japanese in doc/source/ for now. We need to translate Japanese documentation in doc/source/ into English. We welcome to you help us by translating documentation. Translation flow After doc/source/*.txt are updated, we can start translation. Here is a translation flow: 1. Install Sphinx, if it is not installed. 2. Clone Groonga repository. 3. Update .po files. 4. Edit .po files. 5. Generate HTML files. 6. Confirm HTML output. 7. Repeat 2.-4. until you get good result. 8. Send your works to us! Here are command lines to do the above flow. Following sections describes details. # Please fork https://github.com/groonga/groonga on GitHub % git clone https://github.com/${YOUR_GITHUB_ACCOUNT}/groonga.git % ./autogen.sh % ./configure --enable-document % cd doc/locale/${LANGUAGE}/LC_MESSAGES # ${LANGUAGE} is language code such as 'ja'. % make update # *.po are updated % editor *.po # translate *.po # you can use your favorite editor % cd .. % make html % browser html/index.html # confirm translation % git add LC_MESSAGES/*.po % git commit % git push How to install Sphinx See the introduction. How to clone Groonga repository First, please fork Groonga repository on GitHub. You just access https://github.com/groonga/groonga and press Fork button. Now you can clone your Groonga repository: % git clone https://github.com/${YOUR_GITHUB_ACCOUNT}/groonga.git Then you need to configure your cloned repository: % cd groonga % ./autogen.sh % ./configure --enable-document The above steps are just needed at the first setup. If you have troubles on the above steps, you can use source files available on http://packages.groonga.org/source/groonga/ . How to update .po files You can update .po files by running make update on doc/locale/${LANGUAGE}/LC_MESSAGES. (Please substitute ${LANGUAGE} with your language code such as 'ja'.): % cd doc/locale/ja/LC_MESSAGES % make update How to edit .po There are some tools to edit .po files. .po files are just text. So you can use your favorite editor. Here is a specialized editor for .po file edit list. Emacs's po-mode It is bundled in gettext. Poedit It is a .po editor and works on many platform. gted It is also a .po editor and is implemented as Eclipse plugin. How to generate HTML files You can generate HTML files with updated .po files by running make html on doc/locale/${LANGUAGE}. (Please substitute ${LANGUAGE} with your language code such as 'ja'.): % cd doc/locale/ja/ % make html You can also generate HTML files for all languages by running make html on doc/locale: % cd doc/locale % make html NOTE: .mo files are updated automatically by make html. So you don't care about .mo files. How to confirm HTML output HTML files are generated in doc/locale/${LANGUAGE}/html/. (Please substitute ${LANGUAGE} with your language code such as 'ja'.) You can confirm HTML output by your favorite browser: % firefox doc/locale/ja/html/index.html How to send your works We can receive your works via pull request on GitHub or E-mail attachment patch or .po files themselves. How to send pull request Here are command lines to send pull request: % git add doc/locale/ja/LC_MESSAGES/*.po % git commit % git push Now you can send pull request on GitHub. You just access your repository page on GitHub and press Pull Request button. SEE ALSO: Help.GitHub - Sending pull requests. How to send patch Here are command lines to create patch: % git add doc/locale/ja/LC_MESSAGES/*.po % git commit % git format-patch origin/master You can find 000X-YYY.patch files in the current directory. Please send those files to us! SEE ALSO: /community describes our contact information. How to send .po files Please archive doc/locale/${LANGUAGE}/LC_MESSAGES/ (Please substitute ${LANGUAGE} with your language code such as 'ja'.) and send it to us! We extract and merge them to the Groonga repository. SEE ALSO: /community describes our contact information. How to add new language Here are command lines to add new translation language: % cd doc/locale % make add LOCALE=${LANGUAGE} # specify your language code such as 'de'. Please substitute ${LANGUAGE} with your language code such as 'ja'. SEE ALSO: Codes for the Representation of Names of Languages. C API We still have C API documentation in include/groonga.h. But we want to move them into doc/source/c-api/*.txt. We welcome to you help us by moving C API documentation. We will use the C domain markup of Sphinx. For Groonga developers Repository There is the repository of Groonga on GitHub. If you want to check-out Groonga, type the below command: % git clone --recursive https://github.com/groonga/groonga.git There is the list of related projects of Groonga (grntest, fluent-plugin-groonga and so on). How to build Groonga at the repository This document describes how to build Groonga at the repository for each build system. You can choose GNU Autotools or CMake if you develop Groonga on GNU/Linux or Unix (*BSD, Solaris, OS X and so on). You need to use CMake if you develop on Windows. How to build Groonga at the repository by GNU Autotools This document describes how to build Groonga at the repository by GNU Autotools. You can't choose this way if you develop Groonga on Windows. If you want to use Windows for developing Groonga, see windows_cmake. Install depended software TODO • Autoconf • Automake • GNU Libtool • Ruby • Git • Cutter • ... Checkout Groonga from the repository Users use released source archive. But developers must build Groonga at the repository. Because source code in the repository is the latest. The Groonga repository is hosted on GitHub. Checkout the latest source code from the repository: % git clone --recursive git@github.com:groonga/groonga.git Create configure You need to create configure. configure is included in source archive but not included in the repository. configure is a build tool that detects your system and generates build configurations for your environment. Run autogen.sh to create configure: % ./autogen.sh Run configure You can custom your build configuration by passing options to configure. Here are recommended configure options for developers: % ./configure --prefix=/tmp/local --enable-debug --enable-mruby --with-ruby Here are descriptions of these options: --prefix=/tmp/local It specifies that you install your Groonga into temporary directory. You can do "clean install" by removing /tmp/local directory. It'll be useful for debugging install. --enable-debug It enables debug options for C/C++ compiler. It's useful for debugging on debugger such as GDB and LLDB. --eanble-mruby It enables mruby support. The feature isn't enabled by default but developers should enable the feature. --with-ruby It's needed for --enable-mruby and running functional tests. Run make Now, you can build Groonga. Here is a recommended make command line for developers: % make -j8 > /dev/null -j8 decreases build time. It enables parallel build. If you have 8 or more CPU cores, you can increase 8 to decreases more build time. You can just see only warning and error messages by > /dev/null. Developers shouldn't add new warnings and errors in new commit. See also • /contribution/development/test How to build Groonga at the repository by CMake on GNU/Linux or Unix This document describes how to build Groonga at the repository by CMake on GNU/Linux or Unix. Unix is *BSD, Solaris, OS X and so on. If you want to use Windows for developing Groonga, see windows_cmake. You can't choose this way if you want to release Groonga. Groonga release system is only supported by GNU Autotools build. See unix_autotools about GNU Autotools build. Install depended software TODO • CMake • Ruby • Git • Cutter • ... Checkout Groonga from the repository Users use released source archive. But developers must build Groonga at the repository. Because source code in the repository is the latest. The Groonga repository is hosted on GitHub. Checkout the latest source code from the repository: % git clone --recursive git@github.com:groonga/groonga.git Run cmake You need to create Makefile for your environment. You can custom your build configuration by passing options to cmake. Here are recommended cmake options for developers: % cmake . -DCMAKE_INSTALL_PREFIX=/tmp/local -DGRN_WITH_DEBUG=on -DGRN_WITH_MRUBY=on Here are descriptions of these options: -DCMAKE_INSTALL_PREFIX=/tmp/local It specifies that you install your Groonga into temporary directory. You can do "clean install" by removing /tmp/local directory. It'll be useful for debugging install. -DGRN_WITH_DEBUG=on It enables debug options for C/C++ compiler. It's useful for debugging on debugger such as GDB and LLDB. -DGRN_WITH_MRUBY=on It enables mruby support. The feature isn't enabled by default but developers should enable the feature. Run make Now, you can build Groonga. Here is a recommended make command line for developers: % make -j8 > /dev/null -j8 decreases build time. It enables parallel build. If you have 8 or more CPU cores, you can increase 8 to decreases more build time. You can just see only warning and error messages by > /dev/null. Developers shouldn't add new warnings and errors in new commit. See also • /contribution/development/test How to build Groonga at the repository by CMake on Windows This document describes how to build Groonga at the repository by CMake on Windows. If you want to use GNU/Linux or Unix for developing Groonga, see unix_cmake. Unix is *BSD, Solaris, OS X and so on. Install depended software • Microsoft Visual Studio Express 2013 for Windows Desktop • CMake • Ruby • RubyInstaller for Windows is recommended. • Git: There are some Git clients for Windows. For example: • The official Git package • TortoiseGit • Git for Windows • GitHub Desktop Checkout Groonga from the repository Users use released source archive. But developers must build Groonga at the repository. Because source code in the repository is the latest. The Groonga repository is hosted on GitHub. Checkout the latest source code from the repository: > git clone --recursive git@github.com:groonga/groonga.git Run cmake You need to create Makefile for your environment. You can custom your build configuration by passing options to cmake. You must to pass -G option. Here are available -G value: • "Visual Studio 12 2013": For 32bit build. • "Visual Studio 12 2013 Win64": For 64bit build. Here are recommended cmake options for developers: > cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga -DGRN_WITH_MRUBY=on Here are descriptions of these options: -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga It specifies that you install your Groonga into C:\\Groonga folder. -DGRN_WITH_MRUBY=on It enables mruby support. The feature isn't enabled by default but developers should enable the feature. Build Groonga Now, you can build Groonga. You can use Visual Studio or cmake --build. Here is a command line to build Groonga by cmake --build: > cmake --build . --config Debug See also • /contribution/development/test Groonga 通信アーキテクチャ GQTPでのアーキテクチャ • comが外部からの接続を受け付ける。 • comは1スレッド。 • comがedgeを作る。 • edgeは接続と1対1対応。 • edgeはctxを含む。 • workerはthreadと1対1対応。 • workerは上限が個定数。 • workerは、1つのedgeと結びつくことができる。 • edgeごとにqueueを持つ。 • msgはcomによって、edgeのqueueにenqueueされる。 edgeがworkerに結びついていないときは、同時に、ctx_newというqueueに、msgをenqueueした対象のedgeをenqueueする。 ユーザーと協力して開発をうまく進めていくための指針 Groongaを使ってくれているユーザーと協力して 開発をうまく進めていくためにこうするといい、という事柄をまとめました。 まとめておくと、新しく開発に加わる人とも共有することができます。 twitter編 Groongaを使ってもらえるようにtwitterのアカウントGroongaを取得して 日々、リリースの案内をしたり、ユーザーサポートをしたりしています。 リリースの案内に利用する場合には、やりとりを考えなくて良いですが、 複数人によるサポートをGroongaで行う場合に、どうサポートするのが 良いのか/どうしてそうするのかという共通認識を持っていないと一貫性のないサポートとなってしま います。 twitterでサポートされている安心感からGroongaユーザーの拡大に繋げる ことができるようにサポートの際に気をつけることをまとめます。 過去のツイートはおさらいしておく 理由 自分がツイートした内容を把握していない返信をされたら普通いい気はしません。 対応 過去のツイートをおさらいし、こうすれば良いという提案をできるのが望ましいです。: 良い例: ○○だと原因は□□ですね。××すると大丈夫です。 こちらから情報を提供する 理由 困っているユーザーが複数回ツイートして限られたなかで情報を提供してくれていることがあります。 その限られたツイートから解決方法が見つかればユーザーにとって余計な手間が少なくて済みます。 あれこれ情報提供を要求すると、ユーザーはそのぶん確認する作業が必要になります。 対応 最初に声をかけるときに解決策を1つか2つ提案できると望ましいです。ユーザーにあまり負担を感じさせないようにすると良いです。: 良い例: ○○の場合は□□の可能性があるので、××を試してもらえますか? twitterでのやりとりはできるだけ他の場所(例えばredmine)へと誘導しない 理由 twitterは気軽につぶやけることが重要なので、気軽にできないことを相手に要求すると萎縮されてしまう可能性があります。 いきなりredmineでバグ報告をお願いすると、しりごみしてしまうかもしれません。: 駄目な例: 再現手順をMLかredmineに報告してもらえますか? Groonga関連で気軽につぶやけないとなると開発者は困っている人を見つけられないし、利用者は困ったままとなるので、双方にとって嬉しくない状態になってしまいます。 対応 twitterでやりとりを完結できるようにします。 クエリの実現 Groongaのデータベースには大量のデータを格納し、その中から必要な部分を高速に取り出すことができます。必要な部分をGroongaのデータベースに問い合わせるためのクエリの表現と実行に関して、Groongaは複数の手段を用意しています。 クエリ実行のためのインタフェース Groongaは低機能で単純なライブラリインタフェースから、高機能で複雑なコマンドインタフェースまでいくつかの階層的なインタフェースをユーザプログラムに提供しています。 クエリ実行のためのインタフェースも階層的なインタフェースのそれぞれに対応する形で用意されています。以下に低レイヤなインタフェースから順に説明します。 DB_API DB_APIは、Groongaデータベースを操作するための一群のC言語向けAPI関数を提供します。DB_APIはデータベースを構成する個々の部分に対する単純な操作関数を提供します。DB_APIの機能を組み合わせることによって複雑なクエリを実行することができます。後述のすべてのクエリインタフェースはDB_APIの機能を組み合わせることによって実現されています。 grn_expr grn_exprは、Groongaデータベースに対する検索処理や更新処理のための条件を表現するためのデータ構造で、複数の条件を再帰的に組み合わせてより複雑な条件を表現することができます。grn_exprによって表現されたクエリを実行するためには、grn_table_select()関数を使用します。 Groonga実行ファイル Groongaデータベースを操作するためのコマンドインタープリタです。渡されたコマンドを解釈し、実行結果を返します。コマンドの実処理はC言語で記述されます。ユーザがC言語で定義した関数を新たなコマンドとしてGroonga実行ファイルに組み込むことができます。各コマンドはいくつかの文字列引数を受け取り、これをクエリとして解釈して実行します。引数をgrn_exprとして解釈するか、別の形式として解釈してDB_APIを使ってデータベースを操作するかはコマンド毎に自由に決めることができます。 grn_exprで表現できるクエリ grn_exprは代入や関数呼び出しのような様々な操作を表現できますが、この中で検索クエリを表現するgrn_exprのことを特に条件式とよびます。条件式を構成する個々の要素を関係式と呼びます。条件式は一個以上の関係式か、あるいは条件式を論理演算子で結合したものです。 論理演算子は、以下の3種類があります。 && (論理積) || (論理和) ! (否定) 関係式は、下記の11種類が用意されています。また、ユーザが定義した関数を新たな関係式として使うこともできます。 equal(==) not_equal(!=) less(<) greater(>) less_equal(<=) greater_equal(>=) contain() near() similar() prefix() suffix() grn_table_select() grn_table_select()関数は、grn_exprで表現された検索クエリを実行するときに使います。引数として、検索対象となるテーブル、クエリを表すgrn_expr、検索結果を格納するテーブル、それに検索にマッチしたレコードを検索結果にどのように反映するかを指定する演算子を渡します。演算子と指定できるのは下記の4種類です。 GRN_OP_OR GRN_OP_AND GRN_OP_BUT GRN_OP_ADJUST GRN_OP_ORは、検索対象テーブルの中からクエリにマッチするレコードを検索結果テーブルに加えます。GRN_OP_OR以外の演算子は、検索結果テーブルが空でない場合にだけ意味を持ちます。GRN_OP_ANDは、検索結果テーブルの中からクエリにマッチしないレコードを取り除きます。GRN_OP_BUTは、検索結果テーブルの中からクエリにマッチするレコードを取り除きます。GRN_OP_ADJUSTは、検索結果テーブルの中でクエリにマッチするレコードに対してスコア値の更新のみを行います。 grn_table_select()は、データベース上に定義されたテーブルや索引などを組み合わせて可能な限り高速に指定されたクエリを実行しようとします。 関係式 関係式は、検索しようとしているデータが満たすべき条件を、指定した値の間の関係として表現します。いずれの関係式も、その関係が成り立ったときに評価されるcallback、コールバック関数に渡されるargとを引数として指定することができます。callbackが与えられず、argのみが数値で与えられた場合はスコア値の係数とみなされます。主な関係式について説明します。 equal(v1, v2, arg, callback) v1の値とv2の値が等しいことを表します。 not_equal(v1, v2, arg, callback) v1の値とv2の値が等しくないことを表します。 less(v1, v2, arg, callback) v1の値がv2の値よりも小さいことを表します。 greater(v1, v2, arg, callback) v1の値がv2の値よりも大きいことを表します。 less_equal(v1, v2, arg, callback) v1の値がv2の値と等しいか小さいことを表します。 greater_equal(v1, v2, arg, callback) v1の値がv2の値と等しいか大きいことを表します。 contain(v1, v2, mode, arg, callback) v1の値がv2の値を含んでいることを表します。また、v1の値が要素に分解されるとき、それぞれの要素に対して二つ目の要素が一致するためのmodeとして下記のいずれかを指定することができます。 EXACT: v2の値もv1の値と同様に要素に分解したとき、それぞれの要素が完全に一致する(デフォルト) UNSPLIT: v2の値は要素に分解しない PREFIX: v1の値の要素がv2の値に前方一致する SUFFIX: v1の値の要素がv2の値に後方一致する PARTIAL: v1の値の要素がv2の値に中間一致する near(v1, v2, arg, callback) v1の値の中に、v2の値の要素が接近して含まれていることを表します。(v2には値の配列を渡します) similar(v1, v2, arg, callback) v1の値とv2の値が類似していることを表します。 prefix(v1, v2, arg, callback) v1の値がv2の値に対して前方一致することを表します。 suffix(v1, v2, arg, callback) v1の値がv2の値に対して後方一致することを表します。 クエリの実例 grn_exprを使って様々な検索クエリを表現することができます。 検索例1 GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var); grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1); grn_expr_append_obj(ctx, query, column, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1); grn_expr_append_op(ctx, query, GRN_OP_CALL, 3); result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR); tableのcolumnの値がstringを含むレコードをresultに返します。columnの値が'needle in haystack'であるレコードr1と、columnの値が'haystack'であるレコードr2がtableに登録されていたとき、stringに'needle'を指定したなら、レコードr1のみがヒットします。 検索例2 GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var); grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1); grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, score1, GRN_OP_PUSH, 1); grn_expr_append_op(ctx, query, GRN_OP_CALL, 5); result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR); grn_obj_close(ctx, query); GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var); grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1); grn_expr_append_obj(ctx, query, column2, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, score2, GRN_OP_PUSH, 1); grn_expr_append_op(ctx, query, GRN_OP_CALL, 5); grn_table_select(ctx, table, query, result, GRN_OP_ADJUST); grn_obj_close(ctx, query); tableのcolumn1の値がstringにexactモードでヒットするレコードについて得られるスコア値にscore1を積算してresultにセットします。次に、resultにセットされたレコードのうち、column2の値がstringにexactモードでヒットするレコードについては、得られたスコア値にscore2を積算したものを、元のスコア値に加えます。 検索例3 GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var); grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1); grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, score1, GRN_OP_PUSH, 1); grn_expr_append_op(ctx, query, GRN_OP_CALL, 5); result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR); grn_obj_close(ctx, query); if (grn_table_size(ctx, result) < t1) { GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var); grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1); grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, partial, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, score2, GRN_OP_PUSH, 1); grn_expr_append_op(ctx, query, GRN_OP_CALL, 3); grn_table_select(ctx, table, query, result, GRN_OP_OR); grn_obj_close(ctx, query); } tableのcolumn1の値がstringにexactモードでヒットするレコードについて得られるスコア値にscore1を積算してresultにセットします。得られた検索結果数がt1よりも小さい場合は、partialモードで再度検索し、ヒットしたレコードについて得られるスコア値にscore2を積算してresultに追加します。 検索例4 GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var); grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1); grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1); grn_expr_append_obj(ctx, query, column, GRN_OP_PUSH, 1); grn_expr_append_op(ctx, query, GRN_OP_CALL, 3); result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR); tableのcolumnの値がstringに含まれるレコードをresultに返します。 columnの値が'needle'であるレコードr1と、columnの値が'haystack'であるレコードr2がtableに登録されていたとき、stringに'hay in haystack'を指定したなら、レコードr2のみがヒットします。 リリース手順 前提条件 リリース手順の前提条件は以下の通りです。 • ビルド環境は Debian GNU/Linux (sid) • コマンドラインの実行例はzsh 作業ディレクトリ例は以下を使用します。 • GROONGA_DIR=$HOME/work/groonga • GROONGA_CLONE_DIR=$HOME/work/groonga/groonga.clean • GROONGA_ORG_PATH=$HOME/work/groonga/groonga.org • CUTTER_DIR=$HOME/work/cutter • CUTTER_SOURCE_PATH=$HOME/work/cutter/cutter ビルド環境の準備 以下にGroongaのリリース作業を行うために事前にインストール しておくべきパッケージを示します。 なお、ビルド環境としては Debian GNU/Linux (sid)を前提として説明しているため、その他の環境では適宜読み替えて下さい。: % sudo apt-get install -V debootstrap createrepo rpm mercurial python-docutils python-jinja2 ruby-full mingw-w64 g++-mingw-w64 mecab libmecab-dev nsis gnupg2 dh-autoreconf python-sphinx bison Debian系(.deb)やRed Hat系(.rpm)パッケージのビルドには Vagrant を使用します。apt-getでインストールできるのは古いバージョンなので、Webサイトから最新版をダウンロードしてインストールすることをおすすめします。 Vagrantで使用する仮想化ソフトウェア(VirtualBox、VMwareなど)がない場合、合わせてインストールしてください。なお、VirtualBoxはsources.listにcontribセクションを追加すればapt-getでインストールできます。: % cat /etc/apt/sources.list deb http://ftp.jp.debian.org/debian/ sid main contrib deb-src http://ftp.jp.debian.org/debian/ sid main contrib % sudo apt-get update % sudo apt-get install virtualbox また、rubyのrakeパッケージを以下のコマンドによりインストールします。: % sudo gem install rake パッケージ署名用秘密鍵のインポート リリース作業ではRPMパッケージに対する署名を行います。 その際、パッケージ署名用の鍵が必要です。 Groongaプロジェクトでは署名用の鍵をリリース担当者の公開鍵で暗号化してリポジトリのpackages/ディレクトリ以下へと登録しています。 リリース担当者はリポジトリに登録された秘密鍵を復号した後に鍵のインポートを以下のコマンドにて行います。: % cd packages % gpg --decrypt release-key-secret.asc.gpg.(担当者) > (復号した鍵 ファイル) % gpg --import (復号した鍵ファイル) 鍵のインポートが正常終了すると gpg --list-keys でGroongaの署名用の鍵を確認することができます。: pub 1024R/F10399C0 2012-04-24 uid groonga Key (groonga Official Signing Key) <packages@groonga.org> sub 1024R/BC009774 2012-04-24 鍵をインポートしただけでは使用することができないため、インポートした鍵に対してtrust,signを行う必要があります。 以下のコマンドを実行して署名を行います。(途中の選択肢は省略): % gpg --edit-key packages@groonga.org gpg> trust gpg> sign gpg> save gpg> quit この作業は、新規にリリースを行うことになった担当者やパッケージに署名する鍵に変更があった場合などに行います。 リリース作業用ディレクトリの作成 Groongaのリリース作業ではリリース専用の環境下(コンパイルフラグ)でビルドする必要があります。 リリース時と開発時でディレクトリを分けずに作業することもできますが、誤ったコンパイルフラグでリリースしてしまう危険があります。 そのため、以降の説明では$GROONGA_DIR以下のディレクトリにリリース用の作業ディレクトリ(groonga.clean)としてソースコードをcloneしたものとして説明します。 リリース用のクリーンな状態でソースコードを取得するために$GROONGA_DIRにて以下のコマンドを実行します。: % git clone --recursive git@github.com:groonga/groonga.git groonga.clean この作業はリリース作業ごとに行います。 変更点のまとめ 前回リリース時からの変更点を$GROONGA_CLONE_DIR/doc/source/news.txtにまとめます。 ここでまとめた内容についてはリリースアナウンスにも使用します。 前回リリースからの変更履歴を参照するには以下のコマンドを実行します。: % git log -p --reverse $(git tag | tail -1).. ログを^commitで検索しながら、以下の基準を目安として変更点を追記していきます。 含めるもの • ユーザへ影響するような変更 • 互換性がなくなるような変更 含めないもの • 内部的な変更(変数名の変更やらリファクタリング) Groongaのウェブサイトの取得 GroongaのウェブサイトのソースはGroonga同様にgithubにリポジトリを置いています。 リリース作業では後述するコマンド(make update-latest-release)にてトップページのバージョンを置き換えることができるようになっています。 Groongaのウェブサイトのソースコードを$GROONGA_ORG_PATHとして取得するためには、$GROONGA_DIRにて以下のコマンドを実行します。: % git clone git@github.com:groonga/groonga.org.git これで、$GROONGA_ORG_PATHにgroonga.orgのソースを取得できます。 cutterのソースコード取得 Groongaのリリース作業では、cutterに含まれるスクリプトを使用しています。 そこであらかじめ用意しておいた$HOME/work/cutterディレクトリにてcutterのソースコードを以下のコマンドにて取得します。: % git clone git@github.com:clear-code/cutter.git これで、$CUTTER_SOURCE_PATHディレクトリにcutterのソースを取得できます。 configureスクリプトの生成 Groongaのソースコードをcloneした時点ではconfigureスクリプトが含まれておらず、そのままmakeコマンドにてビルドすることができません。 $GROONGA_CLONE_DIRにてautogen.shを以下のように実行します。: % sh autogen.sh このコマンドの実行により、configureスクリプトが生成されます。 configureスクリプトの実行 Makefileを生成するためにconfigureスクリプトを実行します。 リリース用にビルドするためには以下のオプションを指定してconfigureを実行します。: % ./configure \ --prefix=/tmp/local \ --with-launchpad-uploader-pgp-key=(Launchpadに登録したkeyID) \ --with-groonga-org-path=$HOME/work/groonga/groonga.org \ --enable-document \ --with-ruby \ --enable-mruby \ --with-cutter-source-path=$HOME/work/cutter/cutter configureオプションである--with-groonga-org-pathにはGroongaのウェブサイトのリポジトリをcloneした場所を指定します。 configureオプションである--with-cutter-source-pathにはcutterのソースをcloneした場所を指定します。 以下のようにGroongaのソースコードをcloneした先からの相対パスを指定することもできます。: % ./configure \ --prefix=/tmp/local \ --with-launchpad-uploader-pgp-key=(Launchpadに登録したkeyID) \ --with-groonga-org-path=../groonga.org \ --enable-document \ --with-ruby \ --enable-mruby \ --with-cutter-source-path=../../cutter/cutter あらかじめpackagesユーザでpackages.groonga.orgにsshログインできることを確認しておいてください。 ログイン可能であるかの確認は以下のようにコマンドを実行して行います。: % ssh packages@packages.groonga.org make update-latest-releaseの実行 make update-latest-releaseコマンドでは、OLD_RELEASE_DATEに前回のリリースの日付を、NEW_RELEASE_DATEに次回リリースの日付を指定します。 2.0.2のリリースを行った際は以下のコマンドを実行しました。:: % make update-latest-release OLD_RELEASE=2.0.1 OLD_RELEASE_DATE=2012-03-29 NEW_RELEASE_DATE=2012-04-29 これにより、clone済みのGroongaのWebサイトのトップページのソース(index.html,ja/index.html)やRPMパッケージのspecファイルのバージョン表記などが更新されます。 make update-filesの実行 ロケールメッセージの更新や変更されたファイルのリスト等を更新するために以下のコマンドを実行します。: % make update-files make update-filesを実行すると新規に追加されたファイルなどが各種.amファイルへとリストアップされます。 リリースに必要なファイルですので漏れなくコミットします。 make update-poの実行 ドキュメントの最新版と各国語版の内容を同期するために、poファイルの更新を以下のコマンドにて実行します。: % make update-po make update-poを実行すると、doc/locale/ja/LC_MESSAGES以下の各種.poファイルが更新されます。 poファイルの翻訳 make update-poコマンドの実行により更新した各種.poファイルを翻訳します。 翻訳結果をHTMLで確認するために、以下のコマンドを実行します。: % make -C doc/locale/ja html % make -C doc/locale/en html 確認が完了したら、翻訳済みpoファイルをコミットします。 リリースタグの設定 リリース用のタグを打つには以下のコマンドを実行します。: % make tag NOTE: タグを打った後にconfigureを実行することで、ドキュメント生成時のバージョン番号に反映されます。 リリース用アーカイブファイルの作成 リリース用のソースアーカイブファイルを作成するために以下のコマンドを$GROONGA_CLONE_DIRにて実行します。: % make dist これにより$GROONGA_CLONE_DIR/groonga-(バージョン).tar.gzが作成されます。 NOTE: タグを打つ前にmake distを行うとversionが古いままになることがあります。 するとgroonga --versionで表示されるバージョン表記が更新されないので注意が必要です。 make distで生成したtar.gzのversionおよびversion.shがタグと一致することを確認するのが望ましいです。 パッケージのビルド リリース用のアーカイブファイルができたので、パッケージ化する作業を行います。 パッケージ化作業は以下の3種類を対象に行います。 • Debian系(.deb) • Red Hat系(.rpm) • Windows系(.exe,.zip) パッケージのビルドではいくつかのサブタスクから構成されています。 ビルド用パッケージのダウンロード debパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。: % cd packages/apt % make download これにより、lucid以降の関連する.debパッケージやソースアーカイブなどがカレントディレクトリ以下へとダウンロードされます。 rpmパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。: % cd packages/yum % make download これにより、GroongaやMySQLのRPM/SRPMパッケージなどがカレントディレクトリ以下へとダウンロードされます。 Windowsパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。: % cd packages/windows % make download これにより、Groongaのインストーラやzipアーカイブがカレントディレクトリ以下へとダウンロードされます。 sourceパッケージに必要なものをダウンロードするには以下のコマンドを実行します。: % cd packages/source % make download これにより過去にリリースしたソースアーカイブ(.tar.gz)が packages/source/filesディレクトリ以下へとダウンロードされます。 Debian系パッケージのビルド Groongaのpackages/aptサブディレクトリに移動して、以下のコマンドを実行します。: % cd packages/apt % make build PALALLEL=yes make build PALALLEL=yesコマンドを実行すると、ディストリビューションのリリースとアーキテクチャの組み合わせでビルドを平行して行うことができます。 現在サポートされているのは以下の通りです。 • Debian GNU/Linux • wheezy i386/amd64 • jessie i386/amd64 正常にビルドが終了すると$GROONGA_CLONE_DIR/packages/apt/repositories配下に.debパッケージが生成されます。 make build ではまとめてビルドできないこともあります。 その場合にはディストリビューションごとやアーキテクチャごとなど、個別にビルドすることで問題が発生している箇所を切り分ける必要があります。 生成したパッケージへの署名を行うには以下のコマンドを実行します。: % make sign-packages リリース対象のファイルをリポジトリに反映するには以下のコマンドを実行します。: % make update-repository リポジトリにGnuPGで署名を行うために以下のコマンドを実行します。: % make sign-repository Red Hat系パッケージのビルド Groongaのpackages/yumサブディレクトリに移動して、以下のコマンドを実行します。: % cd packages/yum % make build PALALLEL=yes make build PALALLEL=yesコマンドを実行すると、ディストリビューションのリリースとアーキテクチャの組み合わせでビルドを平行して行うことができます。 現在サポートされているのは以下の通りです。 • centos-5 i386/x86_64 • centos-6 i386/x86_64 • centos-7 i386/x86_64 ビルドが正常終了すると$GROONGA_CLONE_DIR/packages/yum/repositories配下にRPMパッケージが生成されます。 • repositories/yum/centos/5/i386/Packages • repositories/yum/centos/5/x86_64/Packages • repositories/yum/centos/6/i386/Packages • repositories/yum/centos/6/x86_64/Packages • repositories/yum/centos/7/i386/Packages • repositories/yum/centos/7/x86_64/Packages リリース対象のRPMに署名を行うには以下のコマンドを実行します。: % make sign-packages リリース対象のファイルをリポジトリに反映するには以下のコマンドを実行します。: % make update-repository Windows用パッケージのビルド packages/windowsサブディレクトリに移動して、以下のコマンドを実行します。: % cd packages/windows % make build % make package % make installer make releaseを実行することでbuildからuploadまで一気に実行することができますが、途中で失敗することもあるので順に実行することをおすすめします。 make buildでクロスコンパイルを行います。 正常に終了するとdist-x64/dist-x86ディレクトリ以下にx64/x86バイナリを作成します。 make packageが正常に終了するとzipアーカイブをfilesディレクトリ以下に作成します。 make installerが正常に終了するとWindowsインストーラをfilesディレクトリ以下に作成します。 パッケージの動作確認 ビルドしたパッケージに対しリリース前の動作確認を行います。 Debian系もしくはRed Hat系の場合には本番環境へとアップロードする前にローカルのaptないしyumのリポジトリを参照して正常に更新できることを確認します。 ここでは以下のようにrubyを利用してリポジトリをwebサーバ経由で参照できるようにします。: % ruby -run -e httpd -- packages/yum/repositories (yumの場合) % ruby -run -e httpd -- packages/apt/repositories (aptの場合) grntestの準備 grntestを実行するためにはGroongaのテストデータとgrntestのソースが必要です。 まずGroongaのソースを任意のディレクトリへと展開します。: % tar zxvf groonga-(バージョン).tar.gz 次にGroongaのtest/functionディレクトリ以下にgrntestのソースを展開します。 つまりtest/function/grntestという名前でgrntestのソースを配置します。: % ls test/function/grntest/ README.md binlib license test grntestの実行方法 grntestではGroongaコマンドを明示的にしていすることができます。 後述のパッケージごとのgrntestによる動作確認では以下のようにして実行します。: % GROONGA=(groongaのパス指定) test/function/run-test.sh 最後にgrntestによる実行結果が以下のようにまとめて表示されます。: 55 tests, 52 passes, 0 failures, 3 not checked tests. 94.55% passed. grntestでエラーが発生しないことを確認します。 Debian系の場合 Debian系の場合の動作確認手順は以下の通りとなります。 • 旧バージョンをchroot環境へとインストールする • chroot環境の/etc/hostsを書き換えてpackages.groonga.orgがホストを 参照するように変更する • ホストでwebサーバを起動してドキュメントルートをビルド環境のもの (repositories/apt/packages)に設定する • アップグレード手順を実行する • grntestのアーカイブを展開してインストールしたバージョンでテストを実 行する • grntestの正常終了を確認する Red Hat系の場合 Red Hat系の場合の動作確認手順は以下の通りとなります。 • 旧バージョンをchroot環境へとインストール • chroot環境の/etc/hostsを書き換えてpackages.groonga.orgがホストを参照するように変更する • ホストでwebサーバを起動してドキュメントルートをビルド環境のもの(packages/yum/repositories)に設定する • アップグレード手順を実行する • grntestのアーカイブを展開してインストールしたバージョンでテストを実行する • grntestの正常終了を確認する Windows向けの場合 • 新規インストール/上書きインストールを行う • grntestのアーカイブを展開してインストールしたバージョンでテストを実行する • grntestの正常終了を確認する zipアーカイブも同様にしてgrntestを実行し動作確認を行います。 リリースアナウンスの作成 リリースの際にはリリースアナウンスを流して、Groongaを広く通知します。 news.txtに変更点をまとめましたが、それを元にリリースアナウンスを作成します。 リリースアナウンスには以下を含めます。 • インストール方法へのリンク • リリースのトピック紹介 • リリース変更点へのリンク • リリース変更点(news.txtの内容) リリースのトピック紹介では、これからGroongaを使う人へアピールする点や既存のバージョンを利用している人がアップグレードする際に必要な情報を提供します。 非互換な変更が含まれるのであれば、回避方法等の案内を載せることも重要です。 参考までに過去のリリースアナウンスへのリンクを以下に示します。 • [Groonga-talk] [ANN] Groonga 2.0.2 • http://sourceforge.net/mailarchive/message.php?msg_id=29195195 • [groonga-dev,00794] [ANN] Groonga 2.0.2 • http://osdn.jp/projects/groonga/lists/archive/dev/2012-April/000794.html パッケージのアップロード 動作確認が完了し、Debian系、Red Hat系、Windows向け、ソースコードそれぞれにおいてパッケージやアーカイブのアップロードを行います。 Debian系のパッケージのアップロードには以下のコマンドを実行します。: % cd packages/apt % make upload Red Hat系のパッケージのアップロードには以下のコマンドを実行します。: % cd packages/yum % make upload Windows向けのパッケージのアップロードには以下のコマンドを実行します。: % cd packages/windows % make upload ソースアーカイブのアップロードには以下のコマンドを実行します。: % cd packages/source % make upload アップロードが正常終了すると、リリース対象のリポジトリデータやパッケージ、アーカイブ等がpackages.groonga.orgへと反映されます。 Ubuntu用パッケージのアップロード Ubuntu向けのパッケージのアップロードには以下のコマンドを実行します。: % cd packages/ubuntu % make upload 現在サポートされているのは以下の通りです。 • precise i386/amd64 • trusty i386/amd64 • vivid i386/amd64 アップロードが正常終了すると、launchpad.net上でビルドが実行され、ビルド結果がメールで通知されます。ビルドに成功すると、リリース対象のパッケージがlaunchpad.netのGroongaチームのPPAへと反映されます。公開されているパッケージは以下のURLで確認できます。 https://launchpad.net/~groonga/+archive/ubuntu/ppa blogroonga(ブログ)の更新 http://groonga.org/blog/ および http://groonga.org/blog/ にて公開されているリリース案内を作成します。 基本的にはリリースアナウンスの内容をそのまま記載します。 cloneしたWebサイトのソースに対して以下のファイルを新規追加します。 • groonga.org/en/_post/(リリース日)-release.md • groonga.org/ja/_post/(リリース日)-release.md 編集した内容をpushする前に確認したい場合にはJekyllおよびRedCloth(Textileパーサー)、RDiscount(Markdownパーサー)、JavaScript interpreter(therubyracer、Node.jsなど)が必要です。 インストールするには以下のコマンドを実行します。: % sudo gem install jekyll RedCloth rdiscount therubyracer jekyllのインストールを行ったら、以下のコマンドでローカルにwebサーバを起動します。: % jekyll serve --watch あとはブラウザにてhttp://localhost:4000にアクセスして内容に問題がないかを確認します。 NOTE: 記事を非公開の状態でアップロードするには.mdファイルのpublished:をfalseに設定します。: --- layout: post.en title: Groonga 2.0.5 has been released published: false --- ドキュメントのアップロード doc/source以下のドキュメントを更新、翻訳まで完了している状態で、ドキュメントのアップロード作業を行います。 そのためにはまず以下のコマンドを実行します。: % make update-document これによりcloneしておいたgroonga.orgのdoc/locale以下に更新したドキュメントがコピーされます。 生成されているドキュメントに問題のないことを確認できたら、コミット、pushしてgroonga.orgへと反映します。 Homebrewの更新 OS Xでのパッケージ管理方法として Homebrew があります。 Groongaを簡単にインストールできるようにするために、Homebrewへpull requestを送ります。 https://github.com/mxcl/homebrew すでにGroongaのFormulaは取り込まれているので、リリースのたびにFormulaの内容を更新する作業を実施します。 Groonga 3.0.6のときは以下のように更新してpull requestを送りました。 https://github.com/mxcl/homebrew/pull/21456/files 上記URLを参照するとわかるようにソースアーカイブのurlとsha1チェックサムを更新します。 リリースアナウンス 作成したリリースアナウンスをメーリングリストへと流します。 • groonga-dev groonga-dev@lists.osdn.me • Groonga-talk groonga-talk@lists.sourceforge.net Twitterでリリースアナウンスをする blogroongaのリリースエントリには「リンクをあなたのフォロワーに共有する」ためのツイートボタンがあるので、そのボタンを使ってリリースアナウンスします。(画面下部に配置されている) このボタンを経由する場合、ツイート内容に自動的にリリースタイトル(「groonga 2.0.8リリース」など)とblogroongaのリリースエントリのURLが挿入されます。 この作業はblogroongaの英語版、日本語版それぞれで行います。 あらかじめgroongaアカウントでログインしておくとアナウンスを円滑に行うことができます。 以上でリリース作業は終了です。 リリース後にやること リリースアナウンスを流し終えたら、次期バージョンの開発が始まります。 • Groonga プロジェクトの新規バージョン追加 • Groonga のbase_versionの更新 Groonga プロジェクトの新規バージョン追加 Groonga プロジェクトの設定ページ にて新規バージョンを追加します。(例: release-2.0.6) Groonga バージョン更新 $GROONGA_CLONE_DIRにて以下のコマンドを実行します。: % make update-version NEW_VERSION=2.0.6 これにより$GROONGA_CLONE_DIR/base_versionが更新されるのでコミットしておきます。 NOTE: base_versionはtar.gzなどのリリース用のファイル名で使用します。 ビルド時のTIPS ビルドを並列化したい make build PALALLEL=yesを指定するとchroot環境で並列にビルドを 実行できます。 特定の環境向けのみビルドしたい Debian系の場合、CODES,ARCHITECTURESオプションを明示的に指定することで、特定のリリース、アーキテクチャのみビルドすることができます。 squeezeのi386のみビルドしたい場合には以下のコマンドを実行します。: % make build ARCHITECTURES=i386 CODES=squeeze buildコマンド以外でも build-package-deb build-repository-debなどのサブタスクでもARCHITECTURES,CODES指定は有効です。 Red Hat系の場合、ARCHITECTURES,DISTRIBUTIONSオプションを明示的に指定することで、特定のリリース、アーキテクチャのみビルドすることができます。 fedoraのi386のみビルドしたい場合には以下のコマンドを実行します。: % make build ARCHITECTURES=i386 DISTRIBUTIONS=fedora buildコマンド以外でも build-in-chroot build-repository-rpmなどのサブタスクでもARCHITECTURES,DISTRIBUTIONSの指定は有効です。 centosの場合、CENTOS_VERSIONSを指定することで特定のバージョンのみビルドすることができます。 パッケージの署名用のパスフレーズを知りたい パッケージの署名に必要な秘密鍵のパスフレーズについては リリース担当者向けの秘密鍵を復号したテキストの1行目に記載してあります。 バージョンを明示的に指定してドキュメントを生成したい リリース後にドキュメントの一部を差し替えたい場合、特に何も指定しないと生成したHTMLに埋め込まれるバージョンが「v3.0.1-xxxxxドキュメント」となってしまうことがあります。gitでのコミット時ハッシュの一部が使われるためです。 これを回避するには、以下のようにDOCUMENT_VERSIONやDOCUMENT_VERSION_FULLを明示的に指定します。: % make update-document DOCUMENT_VERSION=3.0.1 DOCUMENT_VERSION_FULL=3.0.1 テスト方法 TODO: Write in English. TODO: Write about test/command/run-test.sh. テスト環境の構築 Cutterのインストール Groongaは、テストのフレームワークとして Cutter を用いています。 Cutterのインストール方法は プラットフォーム毎のCutterのインストール方法 をご覧下さい。 lcovのインストール カバレッジ情報を計測するためには、lcov 1.6以上が必要です。DebianやUbuntuでは以下のようにしてインストールできます。: % sudo aptitude install -y lcov clangのインストール ソースコードの静的解析を行うためには、clang(scan-build)をインストールする必要があります。DebianやUbuntuでは以下のようにしてインストールできます。: % sudo aptitude install -y clang libmemcachedのインストール memcachedのバイナリプロトコルのテストを動作させるためには、libmemcachedの導入が必要です。squeeze以降のDebianやKarmic以降のUubntuでは以下の用にしてインストールできます。: % sudo aptitude install -y libmemcached-dev テストの動作 Groongaのトップディレクトリで、以下のコマンドを実行します。: make check カバレッジ情報 Groongaのトップディレクトリで、以下のコマンドを実行します。: make coverage すると、coverageディレクトリ以下に、カバレッジ情報が入ったhtmlが出力されます。 カバレッジには、Lines/Functions/Branchesの3つの対象があります。それぞれ、行/関数/分岐に対応します。Functionsがもっとも重要な対象です。すべての関数がテストされるようになっていることを心がけてください。 テストがカバーしていない部分の編集は慎重に行ってください。また、テストがカバーしている部分を増やすことも重要です。 様々なテスト テストは、test/unitディレクトリにおいて、./run-test.shを実行することによっても行えます。run-test.shはいくつかのオプションをとります。詳細は、./run-test.sh --helpを実行しヘルプをご覧ください。 特定のテスト関数のみテストする 特定のテスト関数(Cutterではテストと呼ぶ)のみをテストすることができます。 実行例: % ./run-test.sh -n test_text_otoj 特定のテストファイルのみテストする 特定のテストファイル(Cutterではテストケースと呼ぶ)のみテストすることができます。 実行例: % ./run-test.sh -t test_string 不正メモリアクセス・メモリリーク検出 環境変数CUTTER_CHECK_LEAKをyesと設定すると、valgrindを用いて不正メモリアクセスやメモリリークを検出しつつ、テストを動作させることができます。 run-test.shのみならず、make checkでも利用可能です。 実行例: % CUTTER_CHECK_LEAK=yes make check デバッガ上でのテスト実行 環境変数CUTTER_DEBUGをyesと設定すると、テストが実行できる環境が整ったgdbが実行されます。gdb上でrunを行うと、テストの実行が開始されます。 run-test.shのみならず、make checkでも利用可能です。 実行例: % CUTTER_DEBUG=yes make check 静的解析 scan-buildを用いて、ソースコードの静的解析を行うことができます。scan_buildというディレクトリに解析結果のhtmlが出力されます。: % scan-build ./configure --prefix=/usr % make clean % scan-build -o ./scan_build make -j4 configureは1度のみ実行する必要があります。 • genindex • modindex • search
AUTHOR
Groonga Project
COPYRIGHT
2009-2016, Brazil, Inc