A data source in Cluvio is a configuration for accessing your database, including the connection method, the database user credentials and any additional options. Cluvio supports the following 3 connection methods:
- Cluvio Agents, which you can install and run on your host to provide easy and secure database connectivity without requiring any inbound firewall rules for Cluvio. The agent only needs to be able to establish outbound connections both to Cluvio and your internal database.
- SSH Tunnels, whereby you set up an SSH server with a publicly reachable IP address that Cluvio can connect to using a secret key corresponding to a trusted public key that you add to your SSH server. As with agents, the SSH tunnel host must have network access to the database you want to connect to, and it must allow inbound traffic from a set of public IP addresses controlled by Cluvio.
- Direct connections, which can be used when your database is a "cloud-native" database such as BigQuery or has a publicly reachable IP address.
How Cluvio treats your data and how it is kept secure is described in detail in the Security Whitepaper.
The following databases can be used as data sources:
- Amazon Redshift / Panoply
- PostgreSQL / Heroku Postgres / CrateDB
- MySQL / MariaDB / Amazon Aurora
- Google BigQuery
- Presto
- Snowflake
- Microsoft SQL Server
- Oracle Database
- MongoDB BI Connector
- Vertica
Not all connection methods are applicable to all database types. The UI will only offer those connection methods which are available for the chosen database type.
When you successfully configure a data source, Cluvio will query some database metadata (e.g. version, timezone, maximum number of simultaneous connections), schema information (tables and columns of the selected database) and table row counts. This information is queried to give you a convenient database schema overview in the Cluvio UI to begin exploring your data. This data source metadata is refreshed nightly, so we display up-to-date schema information even when you make changes to the remote database tables or columns outside of Cluvio (which is the common case, as Cluvio access to your database should be constrained to a read-only user and schema).
Actual data is only queried by Cluvio with the SQL that you use to define reports, dashboards, SQL alerts, etc. Query result data is cached for ~24 hours to improve load times and scalability of your dashboards for many viewers without overloading your database with redundant queries.
For further information on configuring and securing database access with Cluvio, see also:
- Securing your database connection
- Connecting through an SSH tunnel
- Creating a read-only user in the database
- Connecting to a local or test database