DataImportHandler Runs Out of Memory on Large Table
16 05 2009One DataImportHandler(DIH) configuration people may overlook is the batchSize attribute. If you start your JVM with enough memory to store the entire table, you won’t even need to set batchSize at all. batchSize basically tells DIH to call setFetchSize through JDBC to bring back certain number of records at once. If you use MySQL, you may still run out of memory even when you set the batchSize attribute. That’s due to a limitation in MySQL’s drive where the setting is ignored. The workaround is setting batchSize to “-1”. This will pass Integer.MIN_VALUE to MySQL as fetch size and prevent the driver from running out of memory.
Join the forum discussion on this post - (1) Posts
If you are still having problems loading data using DIH due to RAM or long running times, then you should consider LuSql, which as described in my talk at code4lib2009 conference (LuSql: (Quickly and easily) Getting your data from your DBMS into Lucene), runs much faster than Solr (often an order of magnitude) and uses a fraction of the RAM that Solr DIH does.