Spring Cloud Data Flow is a cloud native framework from pivotal that provides a highly productive experience for deploying and managing sophisticated data pipelines consisting of standalone microservices. It facilities creation, orchestration & refactoring of data pipelines with a single programming model for common use cases like data ingest, real time analytics, and data import/export.

Spring Cloud Data Flow is the cloud-native redesign of Spring XD – a project that aimed to simplify the development of Big Data applications. These applications are now autonomous deployment units and they can "natively" run in modern runtimes such as Cloud Foundry, Apache YARN, Apache Mesos, and Kubernetes.

Features of Spring Cloud Data Flow

Spring Cloud Data Flow provides out of box experience for typical use cases. Below is a list covering some of these features:

  • Orchestrate applications across a variety of distributed runtime platforms including: Cloud Foundry, Lattice, and Apache YARN
  • Create, unit-test, troubleshoot and manage microservice applications in isolation
  • Build data pipelines rapidly using the out-of-the-box stream and task/batch applications
  • Separate runtime dependencies backed by ‘spring profiles’
  • Consume microservice applications as maven or docker artifacts
  • Develop using DSL, REST-APIs, Dashboard, and the drag-and-drop GUI - Flo
  • Take advantage of metrics, health checks, and the remote management of each microservice application
  • Scale stream and batch pipelines without interrupting data flows

Our Spring Cloud Data Flow Experience

CIGNEX Datamatics has leveraged the below architecture in the existing customer engagements.

Spring Cloud Data Flow Local Server
Spring Cloud Data Flow Cloud Foundry Server
Spring Cloud Data Flow Kubernetes Server
Spring Cloud Data Flow Apache Yarn Server
Spring Cloud Data Flow Apache Mesos Server

REST-APIs / Shell / DSL
Spring Flo
Spring Cloud Data Flow Metrics Collector
Spring Cloud Data Flow - Core

↓     Uses     ↓

Spring Cloud Deployer - Service Provider Interface (SPI)

↑     Implements     ↑

Spring Cloud Deployer Local
Spring Cloud Deployer Cloud Foundry
Spring Cloud Deployer Kubernetes
Spring Cloud Deployer Yarn
Spring Cloud Deployer Mesos

↓     Deploys     ↓

Spring Cloud Stream App Starters
Spring Cloud Task App Starters
Spring Cloud Stream
Spring Cloud Task

↓     Uses     ↓

Spring Integration
Spring Boot
Spring Batch

The architecture for Spring Cloud Data Flow is divided into a number of distinct components.

  • The Core domain module comprises the concept of a stream that is a composition of spring-cloud-stream modules in a linear pipeline from a source to a sink, alternatively including processor modules in between. The domain also includes the concept of a task, which may be any process that does not run indefinitely, including Spring Batch jobs.
  • The Module Registry maintains the set of available modules, and their mappings to Maven coordinates
  • The Module Deployer SPI provides the abstraction layer for deploying the modules of a given stream across a variety of runtime environments, including: Local, Lattice, Cloud Foundry & Yarn
  • The Admin provides a REST API and UI. It is an executable Spring Boot application that is profile aware, so that the proper implementation of the Module Deployer SPI will be instantiated based on the environment within which the Admin application itself is running.
  • The Shell connects to the Admin’s REST API and supports a DSL that simplifies the process of defining a stream and managing its lifecycle.

We will be more than happy to connect and know your business requirements. Contact us to discuss about your Spring Cloud Data Flow Project.