Skip to main content

Business Need:

  • Lack of ability to extract content & metadata from images (GIF, PNG, JPG)
  • Unstructured input/ output format
  • Need to build rules/ training set along with Self Learning ability

Key Features:

  • Apache Tika to extract complex image content and metadata.
  • Content Storage in a scalable NoSQL repository – MongoDB
  • Entity Extraction / NLP through IBM Alchemy
  • Continuous self learning & Machine Learning through Mahout for improving accuracy


  • Ability to extract data from 100+ disparate file formats (GIF, PNG, JPG)
  • Accuracy in metadata extraction / output improved by 70% - 86%
  • Automated 95% of the review processes