Overview

Enterprises are often flooded with siloed data marts where each contain various types of data (structured, semi-structured, unstructured) or are categorized by business units. These siloed units often end up being harder to manage, difficult to scale and take longer to implement.

In 3rd Century BC the Library of Alexandria is said to have contained all the scrolls and books of the known world. It attracted learned men and women like a moth to a flame. Data Lake Systems like the library allow enterprises to have a centralized data system with decentralized access points for individual business users and units to derive valuable insights. Developed mostly using Hadoop ecosystem, Data Lake platforms allow enterprises to store various formats of data in a single accessible format.

Once the data lake platforms are built, enterprises can improvise to build governance and business intelligence tools that would work on top of the system and deliver crisp analytics, correlations, quality assurances, and security. At CIGNEX Datamatics we have built Enterprise level Data Lakes using leading Data Management platforms such as Hadoop Ecosystem, MongoDB helping our customers get a 360 degree view of the business, Sales Intelligence Platforms and more.

Solutions

Increase Marketing Campaigns ROI

Targeted customer segmentation based on Products/Verticals/Industry for a leading Financial Services company by helping them develop Data Architecture that is high performing while cost effective. The platform integrated from multiple external and internal sources and finalized a delivery mechanism and a single version of truth. Hadoop Ecosystem was used as a Scalable Storage/Processing Repository, while ETL was performed using Talend. Machine Learning platforms were used for Intelligence Learning and IBM Cognos for Dashboards, Reporting and Analysis

The Solution offered the customer with the ability to correlate & compare customer behavior on the site with saved preferences, existing/past products and consolidate ad-hoc dashboards for their Account Executives.

Predicting Business Performance and Taking the Right Decisions

Hadoop based Data Lake Platform for a Subsea Engineering Company that ingests/processes real time data sources including news feeds, competitor analysis and market data. The platform establishes association between internal and external data sets and processes/store for various workloads (Streaming, Batch, Near Real Time). Analytical Processing was carried out for various requirements – Pig and Hive for Batch Processing, Apache Mahout and Apache Spark for Machine Learning, Spark for Streaming  and Gazzang and Sentry for Security and Governance.

The platform was built as a ‘Single Version of Truth’ with the ability to predict outcomes based on past trends, revenue projections, opportunity outcomes and more.

Key Features

Ingestion Layer

Flexible connectors and integration touchpoints to load data from various sources regardless of the data types (bulk, streaming, binary). We leverage ETL tools like Talend, Alteryx, custom acquisition programs to load and process data.

Processing Layer

Adding metadata for better understanding of the data with schema information, business types, adding the right attributes to text documents and associations between the various datasets that enables in relevance of data.

Governance Layer

Governance layer ensures certain factors about the ingested data sets like a designated level of trust through data quality parameters, privacy and security while maintaining an audit trail and a taxonomical chart of its usage

Storage Layer

Data Lake needs to be designed to store multiple forms of data while maintaining elasticity, high availability, scalability, accessibility, and durability

Monitoring Tools

Having the right set of data monitoring tools to ensure if the data sets are in the right shape and in compliance to the stringent practices we have defined

Presentation Layer

Providing a presentation layer to the consumers to make the most of the data sets as well as using various workloads to complete different types of data analysis

Benefits

  • Scalability – Can easily keep in track with the data growth of the organization provided the processing and governance layer are robust
  • Affordability – Associated with the previous point. In the long run, the scalability and elasticity offered by Data Lake platforms cannot be compared to any other solutions.
  • Single Version of Truth – Having a centralized data system can be a good foundation towards building a 360 degree customer/employee view for organization