Groongaを使ってRailsで全文検索

ブログの検索機能にGoogleを使用していましたが、広告が付いてくるのが嫌で自前で実装することにしました。
最初は PgSearch で実装してみたのですが、Groongaというライブラリの方がお手軽そうだったので、今回はこちらで実装してみることに。
Herokuのデプロイでハマったところ以外はかなり簡単に導入することができました。

Railsでは rroonga というgemをインストールします。

インストール

Gemfile

1
gem 'rroonga'

bash

1
$ bundle install

初期設定

config/groonga.rb

1
2
3
4
5
6
7
8
9
10
11
require 'fileutils'
require 'groonga'

database_path = ENV['GROONGA_DATABASE_PATH'] || 'groonga/database'
if File.exist?(database_path)
  Groonga::Database.open(database_path)
else
  FileUtils.mkdir_p(File.dirname(database_path))
  Groonga::Database.create(path: database_path)
end

ファイルがなければ作成しているのですが、ローカルでは groonga/database というパスを指定しています。
ルートに groonga というディレクトリができるのが気持ち悪かったので、tmp/ 以下に移動してみたのですが、それではデプロイに失敗してしまいました。
Herokuへのデプロイ時に使用するbuildpackが、ルートに groonga というディレクトリが存在することを想定しているみたいです。
https://github.com/groonga/heroku-buildpack-rroonga/blob/ec507ecc98750dc956a857ec5b1f565b4831da1c/bin/detect#L5

Groongaデータベースのスキーマ定義

Post という記事のモデルがあり、title と content というカラムを持っていると想定します。
Rails では created_at と updated_at は自動的に作成されると思いますが、不要であれば削ってください。

この辺りはこちらの記事を参考にさせてもらっています。

groonga/init.rb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

require_relative '../config/environment'

Groonga::Schema.define do |schema|
  schema.create_table('Posts',
                      type: :hash,
                      key_type: :uint32) do |table|
    table.short_text('title')
    table.text('content')
    table.time('created_at')
    table.time('updated_at')
  end
end

if Post.table_exists?
  Post.all.find_each do |post|
    PostIndexer.update(post)
  end
end

Groonga::Schema.define do |schema|
  schema.create_table('Terms',
                      type: :patricia_trie,
                      key_type: :short_text,
                      normalizer: 'NormalizerAuto',
                      default_tokenizer: 'TokenBigram') do |table|
    table.index('Posts.title')
    table.index('Posts.content')
  end

  schema.create_table('Times',
                      type: :patricia_trie,
                      key_type: :time) do |table|
    table.index('Posts.created_at')
    table.index('Posts.updated_at')
  end
end

記事をGroongaデータベースに追加する

上で使用している PostIndexer をまだ定義していませんので、作成します。
今回は model/ に入れてしまいましたが、適宜変更してください。

model/post_indexer.rb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class PostIndexer

  # 記事のPostモデルから検索の対象となるカラムを指定
  COLUMNS = ['title', 'content', 'updated_at', 'created_at']

  class << self
    def database
      Groonga['Posts']
    end

    def create(post)
      attributes = post.attributes.select { |k, v| k.in?(COLUMNS) }
      if database
        database.add(post.id, attributes)
      end
    end

    def destroy(post)
      if database && database[post.id]
        database[post.id].delete
      end
    end

    def update(post)
      destroy(post)
      create(post)
    end

    def search(query)
      return Post.none if query.blank?
      matched_records = database.select do |record|
        record.match(query) do |target|
          target.title | target.content
        end
      end
      ids = matched_records.collect(&:_key)
      Post.where(id: ids)
    end
  end

end

Groongaデータベースを更新する

先程つくった「Groongaデータベースのスキーマ定義」を実行するだけです。

bash

1
$ ruby groonga/init.rb

スキーマを作って、データを再生成しています。
この部分ですね。すべての記事に対して実行していますが、ブログサイトくらいのデータ量であれば、実行時間は全く気になりませんでした。

ruby

1
2
3
4
5
if Post.table_exists?
  Post.all.find_each do |post|
    PostIndexer.update(post)
  end
end

gitignore

上の処理を実行するとgroonga/ にファイルが生成されます。
Gitでトラッキングする必要はないので、gitignoreに入れておきましょう。

groonga/init.rb は必要ですので、それ以外を対象外にします。

gitignore

1
2
3
/groonga/database
/groonga/database.*

検索する

生成したデータを使用して、フリーワード検索を実装してみます。
前述の PostIndexer に search というメソッドを定義していますので、そちらを使用します。

最初にGroongaデータベースからヒットする記事のIDだけを抽出し、そのIDをActiveRecordで検索するという流れ。
Post.none は ActiveRecord::Relation [] （空のPost配列）を返しています。

結果は Post の配列だから、リストページのViewテンプレートを一切変更しなくて済むのが良いですね。

models/post_indexer.rb

1
2
3
4
5
6
7
8
9
10
def search(query)
  return Post.none if query.blank?
  matched_records = database.select do |record|
    record.match(query) do |target|
      target.title | target.content
    end
  end
  ids = matched_records.collect(&:_key)
  Post.where(id: ids)
end

Herokuへのデプロイ

こちらのインストール手順には gem のインストールだけでOKと書いてあるのですが、Herokuでは上手くいかず、調べていると Rroonga のHerokuビルドパックが必要とのこと。ただ、最終的にはそれがデプロイに失敗する原因になっていました。

Heroku > app > Settings

必要だったのは Groonga の buildpack のみ。
ちなみに、ruby と順番が逆転してしまうとビルドが失敗しますので注意しましょう。
heroku_buildpacks

Procfile

HerokuではDynoが消えると、ローカルストレージもリセットされるので、Groongaデータベースを保存しておくことができません。
そのため、リリースの度に再生成する必要があります。

自動的に実行するには、いくつか方法があるかと思いますが、今回は Procfile に記述しました。
puma.rb はサーバーの起動ですので、環境によっては bundle exec rails s 等になると思います。

Procfile

1
web: ruby groonga/init.rb && bundle exec puma -C config/puma.rb

Groongaのおかげで、簡単に全文検索機能を導入することができました。
検索速度も速いし素晴らしい 🎉

Groongaを使ってRailsで全文検索

インストール

初期設定

Groongaデータベースのスキーマ定義

記事をGroongaデータベースに追加する

Groongaデータベースを更新する

gitignore

検索する

Herokuへのデプロイ

Procfile

おすすめの記事

acts-as-taggable-on タグを表示させる順番を決めたい

Railsを4.2にバージョンアップしたら、Vagrantのローカル開発環境にアクセスできなくなった問題

Railsのバリデーションエラー後にレイアウトが崩れるとき