Skip to main content
07 Aug 2012

Our client has been using the leading Open Source Enterprise Content Management System ‘Alfresco’ for managing his documents.  The main objective was to upgrade the system to the latest version i.e. Alfresco 3.3.4.

The Intranet site has more than 2.4 million documents with 10,000 registered users. With the database size of 37 Gigs, content store size more than 1.2 Tera Bytes and indexes of 7 Gigs, the application is built on the Alfresco 2.1.1 version.

Migration/Upgrade Strategy

Below are the main steps involved in the content migration process.

Majorly the migration was divided into two steps:

Step 1:  Repository Upgrade

We used two approaches to upgrade the repository:

Upgrade (First Approach) - This approach is defined by Alfresco. We followed the steps provided in the link wiki.alfresco.com/wiki/General_Upgrade_Process.  But we faced lot of issues in doing that.

Below are few checkpoints and solutions for the issues:

  1. Always check the version of jdk required for Alfresco latest release, amount of memory and space required for upgrade.
  2.  Delete the data from archive space if not required.
  3. SQLs, which run in order to upgrade, might not support the database server you are using. Make the SQLs compatible to your database.
  4. Delete the corrupt data if possible. Delete duplicate nodes if present in the database. Run the sql query on alf_node_status to check if duplicate node entries are present. You can use the following query: select count(*), node_id from alf_node_status group by node_id having count(*) >1. Delete the resulted records.
  5. While upgrading there are chances of getting out of memory error. If tables have huge data then split the queries into 3 parts and run those sql queries directly on the database server itself one by one. Firstly, create a test table of "select statement". Secondly, create index on newly created table and thirdly, optimize the query and then run sql on it.
  6. You may face issues while running Alfresco patches. Make the patches run in the background. Also, you can disable some of the patches after consulting Alfresco support.
  7. Another point to notice is you can start upgrade from the point it fails. You don’t have to start upgrade from scratch.
  8. You can monitor the performance using Jmeter and Jconsole.

Migration (Second Approach) –Alfresco provides the capability to export information held in its repository and then subsequently import that information into either the same or another repository. If a repository can store it, then a repository can also export and import it. We have made use of Alfresco Content Package (ACP) to migrate from Alfresco 2.1.1 to Alfresco 3.3.4.

Below are the steps to achieve :

  1. We have achieved migration using webscripts and ACP functionality. Since we had huge repository we could not make use of out of the box ACP functionality. So we created our own utility for import and export with the use of webscripts. To make it faster we implemented Threading while exporting content. Note that importing has to be threaded safe.
  2. We exported and imported content in parts.  We have data of three years. We looked into relationships between Alfresco tables and developed an sql query which will fetch the data required by us. The logic was simple, to export content /spaces for a give period and continue this process until all the spaces/content of three years are exported. For e.g., we first fire query in the database to get the data of a year 2009 – 2010. The resulted records are then divided into specified number of threads; assume 50 threads are assigned which means 50 ACPS is going to create. Now import these ACPs in alfresco 334. Repeat the process until whole repository is exported/impo Second Approach rted. Make changes in export code to also export all the children along with the space.
  3. We created table in database to log all the details of the spaces and document that are exported. This plays an important role when exception comes for any ACP or any content is not created. It is also useful to verify the data exported and imported.  To implement this we were required to make change in the existing class of export/import AC. Also we modify the export /import function to catch the exceptions and make the program run without interruptions.
  4. We also made changes in the existing class of import to restrict the imported data or avoid duplicate exception. For e.g., we didn’t want to import data which has url as the extension or we restrict data with same name to be imported. These are just few examples. You can import data of your choice.  This is all required in the case if over a period of time you change the implementation or you want to clean up specific amount of data.
  5. We study the relationships of Alfresco 334 database and created SQLs to verify the amount of data is same.
  6. We imported users using another webscript. In our case users are created on some logic. It was easy for us to create a program which implements the same logic to create users and assign permission on respective spaces.

Pros and Cons of Upgrade and Migration

  1. In case of upgrade, complete repository is upgraded while in migration you have choice of what to import. You can restrict the data to be imported.
  2. During upgrade, we faced issue in two patches which finally were skipped from upgrade. But the consequences are not known so far.
  3. Migrations require more attention as compare to upgrade. There is a possibility of data not exporting/importing properly. In this case you are needed to run the program again. So it is always safer to run export /import in phases.
  4. Upgrade would be faster as compare to migration if you are skipping those patches.
  5. In upgrade the DBA involvement is required to tune sql queries, which is not the case in migration.

Step 2: Code Compatibility

We were required to make code compatible to alfresco 3.3.4 version. There was lots of customization done on out of box Alfresco over a period of time. Following are some of the challenges we faced in making the code compatible.

  1. We were required to use out of the box JSPs. Merging our customization to jsps of Alfresco 334 was a humungous task. For that we used different comparator tools.
  2. For some functionalities we override the Alfresco existing classes. That needs to be changed in order to add the code changes of the latest Alfresco version. Some of the package structure and method names got changed in the latest one. It was a huge effort in changing all this.
  3. We made use of external jars. The few jars had to be updated as they were not supported by jdk 1.6 versions.
  4. Some of the codes are not supported by Alfresco 334 version. With latest versions there are few restrictions, for e.g., created, creator, modified and modifier properties cannot be edited even through code.  Property with different prefix cannot be used.  Therefore, we were also required to make changes to support this.
  5. Security and authentication configuration  are change in newer versions , so we were required to change that also.

You can choose any of the approach which fits your requirement.

Amita Bhandari

Senior Consultant (CIGNEX India Office)

Author of 2 Alfresco ECM Books