Today we are happy to announce the availability of Apache Spark 2.2.0 as part of the Databricks Runtime 3.0.
This release marks a major milestone for Structured Streaming by marking it as production ready and removing the experimental tag. In this release, we also support for arbitrary stateful operations in a stream, and Apache Kafka 0.10 support for both reading and writing using the streaming and batch APIs. In addition to extending new functionality to SparkR, MLlib, and GraphX, the release focuses on usability, stability, and refinement, resolving over 1100 tickets.
This blog post discusses some of the high-level changes, improvements and bug fixes:
Production ready Structured Streaming
Expanding SQL functionalities
New distributed machine learning algorithms in R
Additional Algorithms in MLlib and GraphX
Introduced in Spark 2.0, Structured Streaming is a high-level API for building continuous applications. Our goal is to make it easier to build end-to-end streaming applications, which integrate with storage, serving systems,