New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi

Storing your data in Amazon S3 provides lots of benefits in terms of scale, reliability, and cost effectiveness. On top of that, you can leverage Amazon EMR to process and analyze your data using open source tools like Apache Spark, Hive, and Presto. As powerful as these tools are, it can still be challenging to deal with use cases where you need to do incremental data processing, and record-level insert, update, and delete.
Talking with customers, we found that there are use cases that need to handle incremental changes to individual records, for example:
Complying with data privacy regulations, where their users choose to exercise their right to be forgotten, or change their consent as to how their data can be used.
Working with streaming data, when you have to handle specific data insertion and update events.
Using change data capture (CDC) architectures to track and ingest database change logs from enterprise data


Original URL: http://feedproxy.google.com/~r/AmazonWebServicesBlog/~3/u5A7k_BanFE/

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: