Databricks Lakehouse
Overview
This destination syncs data to Delta Lake on Databricks Lakehouse. Each stream is written to its own delta-table.
This connector requires a JDBC driver to connect to the Databricks cluster. By using the driver and the connector, you must agree to the JDBC ODBC driver license. This means that you can only use this connector to connect third party applications to Apache Spark SQL within a Databricks offering using the ODBC and/or JDBC protocols.
Currently, this connector requires 30+MB of memory for each stream. When syncing multiple streams, it may run into an out-of-memory error if the allocated memory is too small. This performance bottleneck is tracked in this issue. Once this issue is resolved, the connector should be able to sync an almost infinite number of streams with less than 500MB of memory.
Getting started
Databricks AWS Setup
1. Create a Databricks Workspace
- Follow Databricks guide
Create a workspace using the account console.
IMPORTANT: Don't forget to create a cross-account IAM role for workspaces
TIP: Alternatively use Databricks quickstart for new workspace