The content across organizations (private, public, government) continues to grow. Content is spinning out at an increasing rate with high-volume and high-velocity. Big Content can be in form of records, documents, blogs, media, files, web logs, social media feeds and more. Big Content is required to manage this huge unstructured data along with structured data.

Let us take an example of government agencies that are loaded to manage database of millions of citizens and associated services and benefits data in various formats across various systems. So far, technologies haven’t evolved to understand, let alone process these data.  Managing this data and organizing them for quicker search and retrieval can be of immense value to organizations. That’s where Big Content should and is being leveraged.

Thought leaders in industry strive to match steps with rapid transformation in technology. This requires organizations to securely architect their systems for interoperability of big content which is required to make informed decisions and improve productivity.

Let’s look at one of the Government Agency

Social security benefit systems are designed to drive welfare programs of government. These programs promote the welfare of citizens through assistance measures guaranteeing access to sufficient resources like food and shelter while promoting health and wellbeing of the population. Government organizations manage data like medical diagnostics, infrastructure management, personalized benefits and emergency services.

In a typical county, government provides childcare benefits, housing, food stamps, insurance, employment services, and cash assistance to its citizens. Solution needs were

  1. Scalable Repository, High Ingestion Rate

    • 500 Million objects, expected to grow to Billion
    • 60 Million objects added per year
    • Document Ingestion rate = 200,000/ hour
  2. High Performance Search and Retrieval

    • Search (6 months date range) = 500/ 2 sec
  3. Secure access at “county level” (group)
  4. Compliance on storage (physical separation)
  5. Version Control, Workflow & Business Rules
  6. Web Services API for external access

    • PCL to PDF conversion = 25,000/ day
  7. Need for one-time migration of existing documents (10 years of archives)

The solution to address such data intensive requirement was an innovative repository solution that can easily store, retrieve, organize and manage various types of content and records. The architecture should scale and process data faster. Enterprise Content Management (ECM) addresses the need for managing this structured and unstructured big content.

Alfresco as an Open Source Content Management Platform

Alfresco with its out-of-the-box capabilities allowed us to build complete and working Content Management application quickly and easily. Alfresco played the role of a repository dedicated to store and retrieve content. This responsibility was provided by the following two foundation services:

  1. Storage Service - Content is the actual information being recorded. Metadata and content may be structured according to the rules defined in a content model.
  2. Search Service - handles indexing information and allows the retrieval of metadata and content via many different lookup options. 

Solution Enabling Management of Billion Objects

CIGNEX Datamatics developed a solution by integrating Alfresco with an existing system. Alfresco stores metadata in a database and content in the file system. Enterprise Search was the an important feature to resolve in the solution, Apache Solr - an Open Source enterprise search platform was being used for high performance search and retrieval. 


Document ingestion in Alfresco including pre-processing, splitting, metadata extraction was done by Data Processing Engine (DPE) as a central controller. DPE enables high data ingestion by receiving and queuing real time content updates from the ECM.

Benefits Client Realized

  • Scalable Platform - The solution architecture is designed for horizontal scaling – to scale really big considering future requirement of managing billion objects
  • Performance – The solution supports defined performance with concurrent users, total users and search
  • Modular Architecture - to replace or integrate new components or applications

To know more about the solution, please download presentation

Questions? Please feel free to contact us,