Skip to main content
07 Aug 2012

Alfresco CMS uses "Lucene" as seach engine behind the scenes. Most of the customers set up search engine either in AUTO or FULL indexing mode.

Use Case

We have a client with 1.2 Tera Bytes of content store, and the FULL indexing may take 24-30 hrs. Stopping an active Alfresco server for such long hours is not at all acceptable.
 
In a case indexes gets corrupt and want to do AUTO indexing but not sure if all the corrupt indexes can be recovered.

In such cases, we recommend going with either INCREMENTAL or PARALLEL indexing. Below are the key benefits:

  • It takes lesser time
  • Gives you a choice of start and end point of indexing. This is a kind of “AUTO” indexing but superior than Auto Indexing.
  • Performs complete indexing. Finds out number of transactions in Alfresco and divides the transactions into multiple range and run indexing in various folders. Once the indexing is over, merge those folders. This is a kind of “FULL” indexing but different then Full Indexing.
  • In both the approaches we have control in our hands to decide how and when to index. Decide the number of  transactions and time to index.

Differentiation from AUTO and FULL indexing

What is Auto Indexing and how it differentiates from Incremental Indexing

Auto indexing is based on time. It always checks for first 10 transactions and last 10 transactions. If first 10 transactions are not indexed properly it will go for “FULL” indexing.

If last 10 transactions are not indexed then it will find out a transaction time to start indexing from (there is logic place to get the time of the transaction. This goes on until there is no more transaction or transaction is found in index.). Once the transaction time is retrieved indexing will be performed from that time to till date.

Incremental indexing is based on range of transaction. Find out number of transactions in alfresco database and choose a start and end point of indexing in terms of transactions. For eg. you have 10,00,000 transactions in alfresco. You can specify to run indexing within a range 0 – 1,00,000. Next time you can start indexing from 1,00,000 – 2,00,000 in the same folder where you have done previous indexing. If it is too urgent this can be done while server is running. But the best practice is to stop server and then perform indexing.

What is Full Indexing and how it differentiates from Parallel Indexing

Full indexing performs complete indexing regardless of the indexing status (corrupt or indexed.)

Parallel indexing also performs complete indexing. Finds out number of transactions in Alfresco database, divides the transactions into multiple ranges and then runs indexing in various folders.  Imagine that you have 1,000,000 transactions in Alfresco.  For example, the transactions will be divided into five ranges of 200,000 each. Indexing will be performed in five folders. Consider   “A1”,”A2”,” A3”, “A4”,” A5” as five folders. Once the indexing is done, all these folders will be merged again.

NOTE: Before starting server, always point the server to new indexed folder location.
 
By,
Amita Bhandari, Vandana Pal
CMS Consultants, CIGNEX India Office