Skip to main content

Joining Datasources

Given the fact that Cluvio runs the analytical queries directly on your database or data warehouse, it is not possible to use data that live in different datasources directly within a single report.

Having said that, there are several alternatives that you may use to work around that limitation. Here is an overview of the most typical approaches:

1. External tables

BigQuery, Redshift, Athena, and Presto are examples of databases that can be configured with data that live outside of the database and can perform joins with the primary data. Details can be found at the following locations:

2. Use a data warehouse and ETL to pull data into a single place

This is the most common solution, that can be set up fast and operated much cheaper than only a few years ago. The most typical setups combine one of the cloud analytical databases (Redshift, BigQuery, Snowflake, Vertica or even Postgres or MySql) with an ETL tool like StitchData, Fivetran, ETLeap, Airbyte. The ETL tool can then pull data from databases or business services (like Zendesk, Salesforce, etc.) into the single data warehouse. With data living in a single DB (typically with per-source schema), it is then easy to create high-performance analytical results that combine data across the sources.

3. Query data directly where they live (federated or distributed queries)

This approach was pioneered by Presto DB (originally created by Facebook), AWS has its own version of Presto running under the name Athena. Starburst has its own distribution of Presto as a service with additional connectors. The set of connectors available differs between the products. For further information, please refer to the following links:

4. MySql Databases

For completeness, we mention the slightly confusingly named concept of "Database" in MySQL. MySQL Database is what most other DBs call "schema", and MySQL fully supports joining data between databases living in the same database server/instance, using the database name as a prefix.