Cluvio allows to not just run SQL and visualise the results. You can also use the full power of R by running a custom R script on top of the SQL results.
This is for instance how easy it is to add running total column to the SQL results:
SQL:
SELECT {saletime:week}, SUM(pricepaid) as "revenue" FROM sales GROUP BY 1 ORDER BY 1
R:
sql.data %>% mutate(running_total=cumsum(revenue))
The SQL results are available to the R script in a 'sql.data' variable (as a data.frame).
At the moment, the following R libraries are available within the R script. Please let us know at hi@cluvio.com in case you'd like to use a package that is not currently supported.
- R (3.4.2) - most standard R functions
- reshape2 (1.4.2)
- plyr (1.8.4)
- dplyr (0.8.3)
- tidyr (1.0.0)
- purrr (0.2.4)
- tibble (2.1.3)
- broom (0.4.0)
- stringr (1.2.0)
- lubridate (1.6.0)
- hms (0.3)
- magrittr (1.5)
- glue (1.1.1)
- zoo (1.8-3)
- DataCombine (0.2.21)
- googlesheets4
System and I/O functions are not available, and generally you should only expect to be able to operate on the data available as an input to provide data as output.
The results of the R script are the last command, this is what gets passed to the visualisation as a data table. Supported R types are:
- primitive types, date, timestamp
- vectors of these (will appear as multiple rows of a single column)
- data.frame
You can name columns in the output as usual in R via the 'names' attribute, e.g. the following will produce a column named "Metric" with 2 rows of data:
res <- c(42, 84) names(res) <- "Metric" res
Important: Restrictions
The R script runs within a constrained environment with the following restrictions:
1. the script has to finish within max 1 minute (ping us if you need this limit raised)
2. the script can use max 100MB of RAM
3. most system functions and functions accessing local filesystem are not available - the script is meant to process the SQL output applying R functions on it, not to interact with other systems or user
4. at the moment the output of the R script has to be data (as stated above); there is not much use for generating visuals using R (e.g. using ggplot or shiny) in Cluvio, as the visualization step we apply is the same as with plain SQL results.