Note: This document does not apply to Google BigQuery or AWS Athena.
It is usually not a good idea to expose a production database (or any database with real data) to the internet. Apart from direct TLS connections to your database, there are two main approaches that can be employed to ensure that your database access remains secure:
- Running a Cluvio Agent in front of your database, which establishes outbound connections to Cluvio without the need to explicitly permit inbound traffic from Cluvio server IP addresses and provides an encrypted channel for all query executions and result data between the agent machine and Cluvio servers. See Using Cluvio Agents for details.
- Running an SSH server in front of your database and connecting your Cluvio data source to your database through an SSH tunnel. By restricting access on your SSH server to a specific user and Cluvio's SSH public key, only Cluvio servers can access your database through the tunnel as that particular user and using Cluvio's SSH secret key. See Connecting through an SSH tunnel for details.
When using direct connections or connections through an SSH tunnel for a Cluvio data source, your firewall must permit inbound traffic from a particular set of Cluvio IP addresses. For any particular query execution, our query executors connect via one of the following IP addresses, depending on the location of your Cluvio account (found in the Admin Settings).
If you run on AWS, you can add the applicable three IPs to your Security Group inbound rules that guard access to the database (e.g. RDS, Redshift, or your own EC2 instance).
Additionally, for most use cases, you would want to create a read-only user in the database for use by Cluvio to avoid inadvertent changes to your data.