Connect to Google BigQuery
Connection Modes: Direct
Connection
To connect to Google BigQuery, select Add Datasource
on the datasources
overview page to open the Create Datasource
dialog and select
Google BigQuery
as the database type.
The Name
is a required human-friendly identifier or description of the
datasource in Cluvio. Datasource names need not be unique but we recommend
to give each datasource a unique and meaningful name for ease of identification,
especially if your organization uses multiple datasources.
The Project ID
is the identifier for your Google Cloud project that uses
BigQuery. See Locate the project
ID in the Google
Cloud documentation.
The optional Public Project IDs
allows adding projects with public datasets to
the datasource configuration. The public datasets from these projects can then
be included in the almanac in addition to those available through the configured
Project ID
. For example, add bigquery-public-data
to include one
or more of the BigQuery public
datasets in the almanac schema
configuration.
Select Upload Service Key
to upload a Service Account
Key in JSON
format that has permissions to access Google BigQuery in the Google Cloud
project specified by Project ID
. The service key JSON contains a private key
that is securely stored by Cluvio and only used to authenticate your Cluvio
connection to Google BigQuery. See Create Service
Key for details on how to obtain a service key.
When you have entered all the required information, select Test Connection
to
check that Cluvio can connect to your database. The connection test will report
errors if the connection fails. See Troubleshooting for
common problems.
Create Service Key
A service key in the Google Cloud is associated with a service account. You can use an existing service account or create a new service account specifically for Cluvio.
From the Actions
drop-down menu for the Cluvio service account, select Manage Keys
.
Create a new service key for Cluvio by selecting Add key > Create new key
.
Choose JSON
as the key type and click Create
.
The JSON file is automatically downloaded by your web browser. Copy the contents
of this file into the Service Key (JSON)
field of your datasource
configuration.
Configuration
The Configuration
tab of the datasource dialog shows settings that
affect the datasource's behavior.
The Data Time Zone
defaults to UTC
and is the time zone that Cluvio assumes
for any timestamps returned from queries that do not contain time zone
information. See Data Time Zone for details.
The Maximum number of concurrent query executions
control the maximum
concurrency that Cluvio allows for the datasource. This setting can be used to
control the maximum load on your database. The default is 20
.
The Included Schemas
configure the projects or individual datasets that are included in the
almanac. The default selection of All
includes all
current and future datasets of all accessible projects in the almanac. A
selection of Projects
includes all current and future datasets of the selected
projects, whereas a selection of Schemas
includes only the selected datasets in the almanac.
To be able to include schemas for datasets of multiple projects in the Cluvio
almanac, and thus query datasets across different projects, add the service key
used on your Cluvio datasource as a principal with at least the BigQuery Data Viewer
role to your other projects in the Google Cloud Admin
Console IAM
section of each project.
The toggle Update schema nightly
controls whether Cluvio queries your database
schema nightly to ensure that the almanac in the
report editor has up-to-date information on your database schema. Together with
fetching schema information, Cluvio also tries to retrieve approximate row
counts in each table. If you disable nightly schema updates, the almanac is
only updated when you manually trigger a schema refresh on the datasource from
the Cluvio datasources overview. Nightly schema updates are enabled by default.
The toggle Update exact row counts
is only available when Update schema nightly
is enabled. This setting controls whether Cluvio will determine exact
row counts for every table in your database schema. Row counts are shown in the
report editor almanac. Determining exact row counts
usually involves issuing a COUNT(*)
query on each table, which may cause
undesirable load on your database. You can disable this setting to avoid these
nightly queries. When disabled, the tables in the report editor almanac may not
show row count information if the database does not provide approximate row
counts.
Troubleshooting
If you need help connecting to your Google BigQuery data, please contact support@cluvio.com
.