Microsoft bought Revolution Analytics for its enterprise-level R spectrum and it was swiftly integrated into the latest SQL Server. By ‘swiftly’ I mean that they have version 1 and that it shows. Much like any other acquisition, it takes time to be absorbed and probably things will get better over time. In any case, the opportunities are intriguing and I was eager to give it a go.
The setup of R is optional on top of the database engine and although the install was flawless it could have been somewhat more streamlined. Various packages have to be downloaded from Revolution Analytics and since they have various offerings it takes a while to figure out which version and so on. It’s also clear that the brand name is not everywhere ‘Microsoft’ and still a lot of ‘Revolution Analytics’.
In case you get lost, here is a summary of the setup and the links you need:
- easiest way to play with the whole lot is via the Azure VM image. The additional analytics setup requires the CD/setup files and you can find them under “C:\SQLServer_13.0_Full” in the supplied VM image.
- some details about the screen flow during setup can be found here
- do NOT install the open source RRO but rather the RRO for RRE version which can be downloaded here
- install the Enterprise packages via this setup
- the SQL Server 2016 samples containing analytics-specific scripts
Execute the following couple of line in SSMS:
Exec sp_configure 'external scripts enabled', 1 RECONFIGURE
Pro tip: save yourself from tremendous frustrations, install Google Chrome on the Azure VM rather than using the cursed Internet Explorer. It will save you time, trust me.
There are post-install steps which are described in the article “Post-Installation Server Configuration (SQL Server R Services)”. In particular, executing the following in an elevated CMD is important
Once all this in place you can use R inside and outside SQL Server. By ‘outside’ I mean that the the RRO setup also installs the basic IDE to develop R-scripts on their own.
If you want to develop things beyond the hello-world level you better install RStudio or another IDE (Statet Eclipse plugin from WalWare comes to mind).
Now, what can you do with this new runtime?
You can access data from R-packages and combine it with SQL data. For example, the often-used iris data in R can be fetched like so
execute sp_execute_external_script @language = N'R' , @script = N'iris_data <- iris;' , @input_data_1 = N'' , @output_data_1_name = N'iris_data' with result sets (("Sepal.Length" float not null, "Sepal.Width" float not null , "Petal.Length" float not null, "Petal.Width" float not null, "Species" varchar(100)));
You can use some machine-learning to predict things
declare @nb_model varbinary(max) = (select model from iris_models where model_name = @model); -- Predict species based on the specified model: exec sp_execute_external_script @language = N'R' , @script = N' library("e1071"); irismodel<-unserialize(nb_model) species<-predict(irismodel, iris_data[,2:5]); OutputDataSet <- cbind(iris_data, species, iris_data); colnames(OutputDataSet) <- c("id", "Species.Actual", "Species.Expected"); OutputDataSet <- subset(OutputDataSet, Species.Actual != Species.Expected); ' , @input_data_1 = N' select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from iris_data' , @input_data_1_name = N'iris_data' , @params = N'@nb_model varbinary(max)' , @nb_model = @nb_model with result sets ( ("id" int, "Species.Actual" varchar(max), "Species.Expected" varchar(max)) );
Note the peculiar parameter marshaling and the serialize/unserialize keywords which are specific to this integration. It’s clear that developing scripts within SSMS is probably not the way to go and that even if you have developed things (in, say, RStudio) there are still some glueing steps to integrate it into SQL.
You can use any package inside your scripts. This offers a particularly convenient way to wrap custom code to minimize the code inside the stored procedures. Creating packages is actually easy and fun to do. As a test I made a little package called ‘SwaR’ with a ‘gimme’ function returning some random numbers. This experiment also shows that generating plots/images as part of R-sprocs is possible;
execute sp_execute_external_script @language = N'R' , @script = N' library("SwaR"); image_file ="c:/temp/swar.jpg"; jpeg(filename = image_file, width=600, height = 800); print(plot(gimme(100), t="l")); dev.off(); OutputDataSet = data.frame(data=readBin(file(image_file, "rb"), what=raw(), n=1e6)); ' , @input_data_1 = N'' with result sets ((plot varbinary(max)));
Data type marshaling is somewhat constraint for now. For instance, you cannot use vectors from R to SQL, only data-frames are allowed;
execute sp_execute_external_script @language = N'R' , @script = N'stuff <- data.frame(Numbers=rnorm(12));' ,@input_data_1 = N'' ,@output_data_1_name = N'stuff' with result sets (("Numbers" float not null));
What would be particularly interesting is to see how time series data/analysis can be integrated with this.