Full text search in Ruby on Rails 3 - ferret
There are several possibilities how to use ferret in RoR. This post will show the easy way – using the acts_as_ferret plugin.
To show the syntax and code, I will use the same data objects as in the Full text search in ruby on rails 2 – MySQL
Installation
Ferret installation is easy
gem install ferret
will do the job.
In addition, it is necessary to install the acts_as_ferret plugin.
script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret
Setup
The most simple setup is
class Article > ActiveRecord::Base acts_as_ferret end
This is enough to make the full text engine working. Now you can test it in the Rails console
Article.find_by_contents("sybase")
If you have a lot of data to be indexed, be patient with the first run. It is slow, because the index needs to be built.
The acts_as_ferret with no argument indexes automatically all fields of the Article, including arrays of child objects. This behaviour could be overwritten. You can narrow the field set
# Index only id and body, not title acts_as_ferret :fields => [ 'id', 'body' ]
Or you can widen the field set.
acts_as_ferret :fields => [ 'id', 'body', 'title', 'long_article' ] # Compute the article length def long_article self.body.length > 40 end
Note 1: see usage of long_article in Query syntax below
Note 2: once you change the structure of the index, you need to rebuild it. The easiest way is to stop your application and delete the index/~environment~/~Indexed object~ folder. It will be created automatically with the next search request.
Query syntax
Since ferret is a port of the lucene engine, it uses the same query syntax. I will show only a few queries that you can use.
For details see Lucene documentation
# Search for pages with "sybase" keyword Article.find_by_contents("sybase") # "sybase" and "replication" keywords Article.find_by_contents("sybase replication") # "sybase" or "replication" Article.find_by_contents("sybase OR replication") # short articles about sybase Article.find_by_contents("long_article:(false) *:sybase") # articles containing similar words like "increase" # will return e.g. increasing Article.find_by_contents("increase~")
Pagination
Ferret is fast, ferret is flexible, but… it is not an active record object, so you cannot use the pre-defined pagination. You have to implement it on your own. Here is how we did it in our project www.tamtami.com.
1. Create full text search function in the model
def self.full_text_search(q, options = {}) return nil if q.nil? or q=="" default_options = {:limit => 10, :page => 1} options = default_options.merge options options[:offset] = options[:limit] * (options[:page].to_i-1) results_ids = [] num = self.ferret_index.search_each("*:(#{q})", {:num_docs => options[:limit], :first_doc => options[:offset]}) { |doc, score| results_ids << self.ferret_index[doc]["id"] } results = Article.find(results_ids) return [num, results] end
or more elegant, as proposed by Jens Kraemer
def self.full_text_search(q, options = {}) return nil if q.nil? or q=="" default_options = {:limit => 10, :page => 1} options = default_options.merge options options[:offset] = options[:limit] * (options.delete(:page).to_i-1) results = Article.find_by_contents(q, options) return [results.total_hits, results] end
2. Create method that creates paginator in application.rb
def pages_for(size, options = {}) default_options = {:per_page => 10} options = default_options.merge options pages = Paginator.new self, size, options[:per_page], (params[:page]||1) pages end
3. Perform the search in the controller
def search @query=params[:query] @total, @articles = Article.full_text_search(@query, :page => (params[:page]||1)) @pages = pages_for(@total) end
4. Use it in the article view
... <%= pagination_links(@pages, :params => {:query=>@query}) %> ...
Final word
The ferret fulltext engine is fast, flexible, but needs more programming than MySQL full text index.
| Published on October 18th, 2006 | | Posted by Roman Mackovcak |
October 25th, 2006 at 12:29
Nice article, hope you don’t mind I linked it from the acts_as_ferret wiki.
One suggestion regarding your full_text_search implementation, that imho could be simplified by getting the number of total results from the find_by_contents result:
http://pastie.caboo.se/19520
October 25th, 2006 at 23:30
2 Jens Kraemer
Thanks, I am more than happy to be connected with such a great plugin.
October 27th, 2006 at 9:31
Hi,
I am using acts_as_ferret in my current application.
This is really very fast and helpful.
But needs some enhancements.
November 6th, 2006 at 18:59
Do searches with acts_as_ferret also search foreign keys fields? For example if Article had a field author_id which linked to a model Author would I be able to search by author name and get articles back?
November 9th, 2006 at 22:36
2 Matthew:
No, by default not. You have to specify what needs to be indexed.
November 22nd, 2006 at 0:10
In order to conform to the usage of options[:limit] in #find_by_contents, Model::full_text_search could be modified like so (in psuedo-diff format):
November 22nd, 2006 at 15:32
@ Roman:
Do you know of an example or tutorial of how to do this?
November 29th, 2006 at 1:35
Matthew, to do what you want, the easiest thing to do is define a method in your Article class that acts_as ferret refers to. For example, in the article class
December 14th, 2006 at 19:44
Any chance on Ajaxing? I like to figure out how to do instant feedback search with ferret … any help would be great.
December 18th, 2006 at 22:04
2 Nico
:54
Well, I would change the search query to something like “YourString*”.
But there is a danger. The ferret engine tries first to expand a wilcard query into non-wildcard queries. As a result, there might be too much “subqueries” and the search might fail with error message:
Exception (: Error occured at
Error: exception 6 not handled: Too many clauses
):
January 27th, 2007 at 17:34
I use a common function added in my application.rb for paginate:
February 19th, 2007 at 20:02
Great tutorial. I just finished writing up one of my own, which goes into more depth.
Definitely worth checking out if you’re adding search capability with acts_as_ferret.
http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial
February 20th, 2007 at 0:18
2 Gregg Pollack
Thanks Gregg, your tutorial is great!
It is a pity I did not find it before. It could save me a lot of time :o)
March 15th, 2007 at 0:14
Great article, this is what got me onto Ferret after struggling with MySQL text searches.
People need to know that in order for this to work in production, you need to run the Ferret server as a separate process, using the built in AAF DrbServer. This link explains it perfectly:
http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer
April 27th, 2007 at 1:48
Related to Matthew and Toms post…
I am trying to search a model’s relationships. For example Books have many authors and I want a search on Shakespeare to return all books by him. So in my Books model I have:
acts_as_ferret :fields => [:title, :abstract, :author_name]
has_many: authors
def author_name
authors.collect{|a| a.name}
end
But no results are found when I search for Shakespeare.
Any suggestions?
April 27th, 2007 at 22:19
2 Kim:
I would guess the problem is the author_name method. It does not return string, but collection. I would try to change it to:
def author_name
authors.collect{|a| a.name}.join(’ ’)
end
April 28th, 2007 at 1:48
Roman, thanks that worked. But what is the proper syntax when you are searching authors and want to search by books?
May 6th, 2007 at 22:58
2 Kim:
If you want to search for authors by books, you need to create a new index for your authors model. The same way as you did for books model.
May 9th, 2007 at 20:49
Hi Roman,
I guess my question was not clear. My question is to do with relationships and the proper syntax for using acts_as_ferret.
So when a book has_many Authors and we want to search books my authors we set the acts_as_ferret fields and then do something like this:
May 9th, 2007 at 22:15
Hallo Kim,
this should work. What is not working? What error message do you get?