Freitag, 10. Januar 2014

Use common terms queries in Solr queries in order to improve search performance while retaining relevancy


Removing stop words can help to improve the performance of search queries because it reduces the size of the index. Thereby the relevancy of the search results is usually not affected.

However, there are situations when it is necessecary to search for stop words (e.g. for "to be or not to be" which contains only stop words). Additionally, there could be domain-specific frequent words ("music", "book", ...) that are not in the usual stop word list. It is not desirable to remove them from the index, but on the other hand searching for them can worsen search performance.

A possible solution is the Lucene CommonTermsQuery which is already implemented in Elastic search. Here is a Github Gist that shows how to use common terms queries with Apache Solr. The common terms query can be used in a Solr query with q={!commonTermsQueryParser}query string&qf=query field.

Keine Kommentare:

Kommentar veröffentlichen