Skip to main content

Connect to Google BigQuery

Connection Modes: Direct

Connection

To connect to Google BigQuery, select Add Datasource on the datasources overview page to open the Create Datasource dialog and select Google BigQuery as the database type.

image-600 image-600

The Name is a required human-friendly identifier or description of the datasource in Cluvio. Datasource names need not be unique but we recommend to give each datasource a unique and meaningful name for ease of identification, especially if your organization uses multiple datasources.

The Project ID is the identifier for your Google Cloud project that uses BigQuery. See Locate the project ID in the Google Cloud documentation.

The optional Public Project IDs allows adding projects with public datasets to the datasource configuration. The public datasets from these projects can then be included in the almanac in addition to those available through the configured Project ID. For example, add bigquery-public-data to include one or more of the BigQuery public datasets in the almanac schema configuration.

Select Upload Service Key to upload a Service Account Key in JSON format that has permissions to access Google BigQuery in the Google Cloud project specified by Project ID. The service key JSON contains a private key that is securely stored by Cluvio and only used to authenticate your Cluvio connection to Google BigQuery. See Create Service Key for details on how to obtain a service key.

When you have entered all the required information, select Test Connection to check that Cluvio can connect to your database. The connection test will report errors if the connection fails. See Troubleshooting for common problems.

Create Service Key

A service key in the Google Cloud is associated with a service account. You can use an existing service account or create a new service account specifically for Cluvio.

image-200 image-200

From the Actions drop-down menu for the Cluvio service account, select Manage Keys.

image-200 image-200

Create a new service key for Cluvio by selecting Add key > Create new key.

image-200 image-200

Choose JSON as the key type and click Create.

image-500 image-500

The JSON file is automatically downloaded by your web browser. Copy the contents of this file into the Service Key (JSON) field of your datasource configuration.

Configuration

The Configuration tab of the datasource dialog shows settings that affect the datasource's behavior.

image-600 image-600

The Data Time Zone defaults to UTC and is the time zone that Cluvio assumes for any timestamps returned from queries that do not contain time zone information. See Data Time Zone for details.

The Maximum number of concurrent query executions control the maximum concurrency that Cluvio allows for the datasource. This setting can be used to control the maximum load on your database. The default is 20.

The Included Schemas configure the projects or individual datasets that are included in the almanac. The default selection of All includes all current and future datasets of all accessible projects in the almanac. A selection of Projects includes all current and future datasets of the selected projects, whereas a selection of Schemas includes only the selected datasets in the almanac.

Multiple Projects

To be able to include schemas for datasets of multiple projects in the Cluvio almanac, and thus query datasets across different projects, add the service key used on your Cluvio datasource as a principal with at least the BigQuery Data Viewer role to your other projects in the Google Cloud Admin Console IAM section of each project.

The toggle Update schema nightly controls whether Cluvio queries your database schema nightly to ensure that the almanac in the report editor has up-to-date information on your database schema. Together with fetching schema information, Cluvio also tries to retrieve approximate row counts in each table. If you disable nightly schema updates, the almanac is only updated when you manually trigger a schema refresh on the datasource from the Cluvio datasources overview. Nightly schema updates are enabled by default.

The toggle Update exact row counts is only available when Update schema nightly is enabled. This setting controls whether Cluvio will determine exact row counts for every table in your database schema. Row counts are shown in the report editor almanac. Determining exact row counts usually involves issuing a COUNT(*) query on each table, which may cause undesirable load on your database. You can disable this setting to avoid these nightly queries. When disabled, the tables in the report editor almanac may not show row count information if the database does not provide approximate row counts.

Troubleshooting

If you need help connecting to your Google BigQuery data, please contact support@cluvio.com.