Time Zones
Overview
Filtering or aggregating data by time is one of the most common use cases of data analytics, but it is also one of the most complex tasks to do correctly. Most of this complexity is owed to the existence of time zones and different approaches to storing, processing and visualizing timestamps.
Date & time information always comes with an explicit or assumed time zone and that time zone needs to be taken into account
to get correct results - 1st of June 2023 at 13:00:00 in Europe/Berlin
is a different point in time than 1st
of June 2023 13:00:00 in America/Los_Angeles
.
Additional complexity comes from the concept of daylight saving time. Especially when working with historical data, if a time zone has daylight saving time, it needs to be taken into account to interpret the historical timestamps correctly.
In general, there are 3 key aspects to producing correct results:
- When filtering by a timestamp, is the filter input comparable to the timestamps in the database?
- When aggregating by a time unit (e.g. "number of orders by day"), is each timestamp attributed to the correct day?
- When displaying a timestamp (e.g.
11/7/2023, 2:42pm
), is it in the time zone that the user is expecting?
Example Problem
To illustrate the problem in more detail, consider an excerpt from a list of shop orders:
To aggregate orders by day, each order must be correctly attributed to a particular date (year, month and day). Since day boundaries are regional to a time zone, the order totals aggregated by day can be very different in different time zones, for the same underlying data set.
If we interpret the order timestamps to be in UTC and compute the order totals in
UTC
, we get the following for the above excerpt.
In contrast, if we assume the order timestamps to be in UTC and compute the
order totals for Europe/Berlin
, we get the following for the same excerpt.
Similar considerations apply to the filtering of data with relative date-time ranges (e.g. "Yesterday"), the sending of scheduled e-mails and more.
Writing SQL queries with correct time zone conversions can be complicated, not least because of database-specific SQL syntax for time zone conversions. Fortunately, Cluvio's second-generation time zone support comes with extensive and easy-to-use functionality for working with time zones, hiding most of the complexity behind a few configuration options.
Working with Time Zones
The good news is that with a bit of configuration, working with time zones can be easy and fun! The following steps cover the fundamentals of working with time zones in Cluvio.
-
Configure the data time zone for your datasources. Settings a data time zone allows Cluvio to correctly interpret values of type
TIMESTAMP WITHOUT TIME ZONE
,TIMESTAMP_NTZ
,DATETIME
or another database-specific type which has no time zone information. A data time zone must be set before any other of Cluvio's time zone functionality can be used. -
Configure a dashboard time zone for dashboards that should use a specific or user-selectable time zone for filtering, aggregation and visualization of timestamps.
-
Use the built-in aggregation or a custom date-time-aggregation transform to correctly aggregate timestamps in the dashboard time zone:
SELECT {created_at:aggregation}, COUNT(1)
FROM orders
GROUP BY 1 -
Use the built-in timerange or a custom date-time-range filter to compare timestamps in the dashboard time zone:
SELECT SUM(price)
FROM orders
WHERE {created_at=timerange}
Time Zone Configuration
Cluvio's support for time zones comes in three layers:
-
The organization time zone is used for the sending of alerts and e-mail schedules, as well as a dashboard's refresh anchor time. Importantly, the organization time zone does not, on its own, have any effect on the data shown in reports, but it can be configured as a dashboard's time zone (see below).
-
A data time zone, configured for a datasource, defines the time zone of data queried from that source and is applied when processing query results unless a specific time zone is set as the dashboard time zone (see below). You should always configure a data time zone for a datasource that matches the database time zone or the assumed time zone of the timestamp values in the database tables. For further details, see Data Time Zone.
-
A dashboard time zone is configured on a dashboard and defines the final time zone for all reports, filters and result data on that dashboard - SQL queries will use database-specific time zone conversions when aggregating and filtering timestamps. The dashboard time zone can be set as the organization time zone, a fixed time zone, or made user-selectable. For further details, see Dashboard Time Zone.
To use a Cluvio data time zone or dashboard time zone other than UTC with a MySQL datasource, the time-zone tables must be initialized. See Populating the Time Zone Tables in the MySQL reference documentation.
Common Configurations
The following sections describe Cluvio time zone configurations for common use-cases.
Data in a local time zone
With a small business that operates only in a single time zone, it is common that date & time information is stored in the database without a time zone. In such an environment, it is understood that the date & time information always refers to the "local" time zone of the business. In this situation, we recommend to only set the data time zone on the datasource to the business's local time zone.
Data in UTC
By far the most common setup for date-time values is to store them in UTC in the
database and display them in a specific "local" time zone, which is either fixed
or user selectable. This is ideal for companies operating globally, where users
expect to see dashboards in their local time zones. For this case, we recommend
to set the data time zone on the datasource to
UTC
and to configure the Dashboard Time Zone for your
dashboards, where you can decide separately for each dashboard whether to use a
specific fixed time zone, the organization time zone, or a user-selectable time
zone.