DataImportHandler Runs Out of Memory on Large Table

16 05 2009

One DataImportHandler(DIH) configuration people may overlook is the batchSize attribute.  If you start your JVM with enough memory to store the entire table, you won’t even need to set batchSize at all.  batchSize basically tells DIH to call setFetchSize through JDBC to bring back certain number of records at once.  If you use MySQL, you may still run out of memory even when you set the batchSize attribute.  That’s due to a limitation in MySQL’s drive where the setting is ignored.  The workaround is setting batchSize to “-1”.  This will pass Integer.MIN_VALUE to MySQL as fetch size and prevent the driver from running out of memory.

Join the forum discussion on this post - (1) Posts



One response to “DataImportHandler Runs Out of Memory on Large Table”

30 06 2009
  Glen Newton (15:27:51) :

If you are still having problems loading data using DIH due to RAM or long running times, then you should consider LuSql, which as described in my talk at code4lib2009 conference (LuSql: (Quickly and easily) Getting your data from your DBMS into Lucene), runs much faster than Solr (often an order of magnitude) and uses a fraction of the RAM that Solr DIH does.

Leave a comment

You can use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

ERROR: si-captcha.php plugin says GD image support not detected in PHP!

Contact your web host and ask them why GD image support is not enabled for PHP.

ERROR: si-captcha.php plugin says imagepng function not detected in PHP!

Contact your web host and ask them why imagepng function is not enabled for PHP.