data lake

CDC with Delta Lake Streaming

Alexey Novakov published on

7 min, 1304 words



Change Data Capture (CDC) is a popular technique for replication of data from OLTP to OLAP data store. Usually CDC tools integrate with transactional logs of relational databases and thus are mainly dedicated to replicate all possible data changes from relational databases. NoSQL databases are usually coming with built-in CDC for any possible data change (insert, update, delete), for example AWS DynamoDB Streams.

In this blog-post, we will look at Delta Lake table format, which supports "merge" operation. This operation is useful when we need to update replicated data in Data Lake.

Read More