Data Sources

SQL-Based
Cluvio is a SQL-based analytics tool. You can connect to a SQL-capable database or data warehouse to run the queries that produce the data for your dashboards, or upload CSV and Excel files directly as Static Tables and query them with SQL — no database required.
For database connections, the most common setup is either a read-only replica of a primary database, or a dedicated analytics database that contains data extracted from other sources. The connection configuration for a database is called a datasource.
Cluvio queries your database directly — there is no intermediate data layer or extraction step. This has several benefits:
- Get started in seconds. Connect to your database and run queries right away, with no data extraction or modeling required.
- Use the SQL you already know. There are only a few Cluvio-specific concepts to learn.
- You have the full expressiveness of SQL at hand - and where things get complicated or impossible in SQL, you can use R to post-process the SQL results.
Because Cluvio is SQL-based, it works best when users are comfortable writing queries. Building reports requires translating analytical questions into SQL — joining tables, grouping, and aggregating — which assumes familiarity with the relational data model. If you need drag-and-drop self-service BI for non-technical users, Cluvio may not be the right fit.
Supported File Types
The following file types can be uploaded as Static Tables:
Support for additional file types, URLs, and Google Sheets as data sources is planned.
Supported SQL Databases
The following SQL databases are supported as data sources:
- PostgreSQL
- CrateDB
- Amazon Redshift
- MySQL / Amazon Aurora
- MariaDB
- MongoDB Atlas
- Google BigQuery
- Oracle
- Microsoft SQL Server
- Snowflake
- Vertica
- Amazon Athena
- Presto
- Trino
- Databricks
- ClickHouse
3 Connection Options
A datasource in Cluvio is used to configure access to your database, including the connection method, the database user credentials and any additional connection options. Cluvio supports the following 3 connection methods:
-
Cluvio Agents, which you can install and run on your host or laptop/PC to provide easy and secure database connectivity, even to databases on your local computer, without requiring any inbound firewall rules for Cluvio. The agent only needs to be able to establish outbound connections both to Cluvio and to your internal database.
-
SSH Tunnels, whereby you set up an SSH server with a publicly reachable IP address that Cluvio can connect to. Cluvio owns a secret key whose corresponding public key needs to be trusted by your SSH server. As with agents, the SSH tunnel host must have network access to the database you want to connect to. Additionally, the SSH server must allow inbound traffic from a set of static public IP addresses controlled by Cluvio.
-
Direct connections, which can be used when your database is a "cloud-native" database such as BigQuery or has a publicly reachable IP address. In the latter case, analogous to a setup with an SSH server, the database server must allow inbound traffic from a set of static public IP addresses controlled by Cluvio.
Not all connection methods are available with all databases. The Cluvio UI will only offer those connection methods which are supported for the chosen database type.
Connection Security
If you use a Cluvio Agent or SSH tunnel to connect to a database that resides in a private network, your database connections are encrypted and secure by default. The same applies to direct connections to cloud databases like BigQuery, Athena, Snowflake and Databricks.
Using a direct connection to a self-hosted database is not recommended, as Cluvio does not offer full certificate verification with custom certificates used on these connections. As a result, your database connections, even if encrypted, may be subject to man-in-the-middle attacks. The recommended connection method for self-hosted databases, which includes services such as AWS RDS, is a Cluvio Agent running in a private network that has access to the database.
For added safety, unless the ability for your analysts to make changes to the
data via Cluvio is explicitly desired, we also recommend to restrict Cluvio to
read-only access to avoid
inadvertent changes to your data. Otherwise Cluvio can be used by admins and
analysts in your organization to execute any type of SQL query, not only
SELECT statements.
Database Metadata
When you successfully configure a datasource, Cluvio will query database information schema to learn database metadata such as the tables and columns in the connected database. Cluvio will also obtain table row counts. All of this information is made available in in the almanac of the report editor to help you explore your data.
To keep the information in the almanac current, Cluvio refreshes the database metadata once per day. If the periodic queries for database metadata put unwanted load on your database you may choose to disable this behavior in the datasource configuration, at the cost of potentially working with stale schema information in Cluvio's report editor.
Even if you disable the periodic metadata updates, you can always trigger an explicit refresh of a datasource's database metadata from the context menu of the specific datasource on the datasource overview page.
Query Result Data
The data in your database is only queried by Cluvio via the SQL queries that you use to define reports, dashboards, SQL alerts, and other Cluvio features.
How Cluvio treats your data and how it is kept secure is described in detail on our website: Security & Privacy.
Limits
When Cluvio runs a SQL query against your database, the result size is limited as follows:
- The number of rows is limited to 5,000 on the Free & Pro plans and to 10,000 on the Business plan.
- The result size in bytes: 50 MB
Note that these are constraints on query result data, not the size of your database tables. Analytical queries can run on tables with millions or billions of rows of data but should produce aggregated results that are practical to process and visualize in web browsers.
There are some exceptions to these limits, such as when exporting results as "unlimited" CSV files, which is available on paid plans.
If you find the default plan limits to be insufficient, customer-specific limits can be offered.
Please detail your requirements and use-cases by contacting support@cluvio.com.
Smart Caching
When you create reports in Cluvio and your SQL queries are run, the results are cached by Cluvio and reused by all objects (reports & filters) with the same SQL query and parameters. On any dashboard, report or SQL-based filter, you can configure your requirement for "data freshness", i.e. the maximum age that is still considered "current" and can be served from the cache without re-running the SQL query. See Dashboard Data Caching.
This smart caching of results provides a significant load reduction for your database or data warehouse, esp. when lots of users are accessing your dashboards. It also makes the dashboards load much faster for your users. In this way, Cluvio can reduce the size and cost requirements on your own database.