Overview
Histogram chart displays a distribution of the occurrence of values within the range between the minimum and maximum of the values. An example can be the population distribution by age or distribution of the population by height. The range between the minimum and the maximum value is split into a certain number of ranges (bins), and for each bin, the corresponding column (x-axis) displays the number of values (y-axis) that fall into that bin.
Chart modes
There are 2 basic modes determining how the Histogram chart gets data:
1. Raw values - The SQL query in this case produces the individual occurrences of the value and the chart displays the distribution of those. The total number of values occurring will be limited by the maximum number of rows the SQL query can produce in Cluvio (5.000 or 10.000 depending on the plan).
2. Pre-aggregated - The SQL query produces pre-aggregated counts of occurrences of values. The big benefit of this case is that you can work with a very large number of underlying values (millions or more) - e.g. imagine doing an age distribution of all Facebook users, where there is a very large number of the individual user values, but the magnitude of the age distribution is low (e.g. whole years of age being 0-116, i.e. max 116 unique values that would occur in the distribution).
And 2 modes the Histogram chart defines the bins / ranges
1. Number of bins - the bins are automatically determined as ranges in the values, based on the desired number of bins. Needs a single numeric column with the values.
2. Bin width - the bins are automatically determined as ranges in the values, based on the desired width of a bin. Needs a single numeric column with the values.
Data Requirements
For the pre-aggregated mode, the histogram needs 2 columns in the results, one (any displayable data type) that defines the bin names and the second (numerical) that contains the values.
For the raw data mode, the histogram needs a numerical column in the results (any numerical type produced by DB-specific SQL: typically INT, FLOAT or BIGINT). Any number of additional columns can be present, but the data will not be used in any way.
Example queries:
Raw:
-- Get user ages
SELECT age FROM users;
-- Get airport elevations:
SELECT elevation FROM airports;
Pre-aggregated:
-- Get users by age in years:
SELECT age, COUNT(1) FROM users
GROUP BY age;
-- Get airports by elevation with 1m resolution:
SELECT elevation, COUNT(1) FROM airports
GROUP BY elevation;
-- Get airports by elevation with 10m resolution:
SELECT 10 * (elevation / 10), COUNT(1) from airports
GROUP BY 1;
Number of bins and Bin width modes:
Distribution of numbers of airports in countries (how many countries in the world have a certain number of airports). The country value is ignored, but kept in the query so the CSV/Excel export has the underlying data when needed:
SELECT country, COUNT(1) FROM airports GROUP BY country;
Distribution of the number of users by their height (cm/feet), i.e. how many users would fall into a certain height range. Note that the maximum number of values the SQL produces may be easily hit. In case this is needed for a large number of rows, pre-binning in SQL may be a better approach.
SELECT height FROM users;
Data Mapping Options
Values
Frequencies
Core Options
Binning: Specify how the bins (ranges) in the chart are determined: With "Number of bins" you decide how many bins there should be, with "Bin Width" you specify the desired range width of the bins. The maximum number of bins in both cases is 100. If the selected value would cause a higher number of bins, it is recalculated to value corresponding to 100 bins.
Number of bins: Specify a value between 2 and 100 for the desired number of bins
Smart bins: When turned on, determines the number of bins dynamically, so the ranges are "human", i.e. multiples of 2,5,10. The resulting number of bins will be close to the desired number but may differ.
Axes Options: allows setting custom titles for X and Y axes
Labels: show the number on each bar, select position for the labels
Color: select a color for the bars: automatic, specific color from the current scheme or manually assigned color
Visual Options
S/M/L/XL: Font sizes for any texts displayed on the chart
Vertical / Horizontal: The orientation of the columns/bars
Tilt X-axis: Tilt font of the x-axis 45 degreed, which can help fitting lots of very long labels
Tooltip: The tooltip over the bin will display the bin range and the corresponding number of values and a percentage that each value represents from the total.