Effective Use of Solr Index Distribution Scripts
10 02 2009Operation or automation tasks sometimes is an after-thought at the end of development. For Solr development, it’s actually not that bad to think about automation at the very end. Solr provides a set of very useful scripts to make automation easy. You can consider yourself lucky if you are short on time to build automation. I will first talk about basic architecture with Solr and then I will dive into leveraging Solr’s distribtion and operation scripts.
The most basic form of architecture for a Solr-based application only require a single application server. Assuming you develop in Java, you can have both Solr and your webapp served by the same application server. A more common and effective architecture would involve an dedicated indexing server (or indexer) and one or more slave index servers. The idea is to separate all index building work from normal queries. Conceptually, this is similar to database clustering where you have a read/write server as master and read-only servers as slaves.
The following set up involves Tomcat, Apache and Linux assuming Solr’s home is under /solr on every Solr servers.
Note: you may be able to replicate similar configuration on a Windows environment running Cygwin. I haven’t tried it on Windows yet so YMMV.
- Scripts configuration
- Environment can be configured in solr/conf/scripts.conf. Here is a sample indexer configuration:
user=solr solr_hostname=indexer solr_port=8080 rsyncd_port=18080 data_dir=data webapp_name=solr master_host=indexer master_data_dir=/solr/data master_status_dir=/solr/logs
- Sample slave server configuration:
user=solr solr_hostname=slave1 solr_port=8080 rsyncd_port=18080 data_dir=data webapp_name=solr master_host=indexer master_data_dir=/solr/data master_status_dir=/solr/logs
- Solr uses SSH and Rsync in its index distrubtion scripts so we need to make sure SSH keys are configured and public keys are exchanged between indexer and slave index servers. If you haven’t configured SSH key yet, use the ssh-keygen command to generate public/private key pair on every Solr servers.
$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/solr/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/solr/.ssh/id_rsa. Your public key has been saved in /home/solr/.ssh/id_rsa.pub. The key fingerprint is: 0c:27:27:f5:81:36:87:82:0f:4f:39:b5:aa:fd:e4:2f solr@solr
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 644 ~/.ssh/authorized_keys $ ssh solr@indexer "cat .ssh/id_rsa.pub" >> ~/.ssh/authorized_keys
- Solr uses rsync for index distribution so you need to make sure rsync is functional in your operating system. Start Rsyncd the first time with following commands:
$ /solr/bin/rsyncd-enable $ /solr/bin/rsyncd-start
/solr/bin/snapshooter /solr/bin/ true
$ /usr/bin/snappuller-enable
0 3 * * * /solr/bin/snappuller; /solr/bin/snapinstaller; /solr/bin/snapcleaner -N 3
LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_balancer_module modules/mod_proxy_balancer.so ....ProxyRequests Off ProxyPreserveHost On ProxyPass / balancer://tomcats/ stickysession=JSESSIONID lbmethod=byrequests ProxyPassReverse / balancer://tomcats/ BalancerMember ajp://slave1:8080 route=jvm1 loadfactor=20 BalancerMember ajp://slave2:8080 route=jvm2 loadfactor=20
All indexing work should be done on your indexer. When you issue the optimize command, Solr will automatically generate a snapshot. Snapshot should be generate well ahead of the scheduled snapshot pulling time (3am in this case). Apache load balancing is optional if you only have one slave server or you have other load balancing solution.
Reference links:
http://wiki.apache.org/solr/CollectionDistribution
http://wiki.apache.org/solr/SolrOperationsTools
Join the forum discussion on this post - (12) PostsCategories : Tips