Sonntag, 12. Januar 2014

Some hints for using Solr in production


Although it is easy to start with the search platform Apache Solr, it is difficult to master Solr in production. Here are some hints that could be helpful (additional to the usual things to do for Java apps in production):

  • Memory: 
    • Solr requires sufficient memory the Java heap and OS disk cache. Some more background information is given here. Using SSDs can decrease the memory requirements.
    • The required Java heap size depends on the configuration of the Solr caches. Especially a wrongly configured filter cache can result in an OutOfMemoryError as at most as many bits as the number of documents are consumed in memory for one stored filter. That is, an upper bound for the Java heap space required by the filter cache (in bits) is the filter cache size (configured in solrconfig.xml) multiplied by the number of documents. A heap dump analysis is helpful in case of an OutOfMemoryError.
  • The search performance is heavily impacted when other I/O consuming operation are performed on the Solr server. 
  • Do benchmarking in order to know when it is necessary to shard with SolrCloud. The search performance is quite often linearly dependent on the document size until a size where the search time increases exponentially. Try to use real word queries in order to perform performance tests with SolrMeter.
  • This article gives a good overview what can be done in order to ensure relevancy. Relevancy can't be ensured solely by the developers, it is best measured by content experts.

This list could be much longer (tune Solr caches, use the autocommit feature, ...) with things that you will probably find out yourself while putting your Solr app to production;)



Keine Kommentare:

Kommentar veröffentlichen