Microsoft bought Revolution Analytics for its enterprise-level R spectrum and it was swiftly integrated into the latest SQL Server. By ‘swiftly’ I mean that they have version 1 and that it shows. Much like any other acquisition, it takes time to be absorbed and probably things will get better over time. In any case, the opportunities are intriguing and I was eager to give it a go.

Setup

The setup of R is optional on top of the database engine and although the install was flawless it could have been somewhat more streamlined. Various packages have to be downloaded from Revolution Analytics and since they have various offerings it takes a while to figure out which version and so on. It’s also clear that the brand name is not everywhere ‘Microsoft’ and still a lot of ‘Revolution Analytics’.

 

Analytics R
In case you get lost, here is a summary of the setup and the links you need:

Execute the following couple of line in SSMS:

Exec sp_configure 'external scripts enabled', 1
RECONFIGURE

Pro tip: save yourself from tremendous frustrations, install Google Chrome on the Azure VM rather than using the cursed Internet Explorer. It will save you time, trust me.

There are post-install steps which are described in the article “Post-Installation Server Configuration (SQL Server R Services)”. In particular, executing the following in an elevated CMD is important

%programfiles%\RRO\RRO-3.2.2-for-RRE-7.5.0\R-3.2.2\library\RevoScaleR\rxLibs\x64\RegisterRExt.exe /install

Once all this in place you can use R inside and outside SQL Server. By ‘outside’ I mean that the the RRO setup also installs the basic IDE to develop R-scripts on their own.
If you want to develop things beyond the hello-world level you better install RStudio or another IDE (Statet Eclipse plugin from WalWare comes to mind).

Experimenting

Now, what can you do with this new runtime?

You can access data from R-packages and combine it with SQL data. For example, the often-used iris data in R can be fetched like so

execute sp_execute_external_script
@language = N'R'
, @script = N'iris_data <- iris;'
, @input_data_1 = N''
, @output_data_1_name = N'iris_data'
with result sets (("Sepal.Length" float not null, "Sepal.Width" float not null
, "Petal.Length" float not null, "Petal.Width" float not null, "Species" varchar(100)));

You can use some machine-learning to predict things

declare @nb_model varbinary(max) = (select model from iris_models where model_name = @model);
 -- Predict species based on the specified model:
 exec sp_execute_external_script 
 @language = N'R'
 , @script = N'
 library("e1071");
 irismodel<-unserialize(nb_model)
 species<-predict(irismodel, iris_data[,2:5]);
 OutputDataSet <- cbind(iris_data[1], species, iris_data[6]);
 colnames(OutputDataSet) <- c("id", "Species.Actual", "Species.Expected");
 OutputDataSet <- subset(OutputDataSet, Species.Actual != Species.Expected);
'
 , @input_data_1 = N'
 select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"
 from iris_data'
 , @input_data_1_name = N'iris_data'
 , @params = N'@nb_model varbinary(max)'
 , @nb_model = @nb_model
 with result sets ( ("id" int, "Species.Actual" varchar(max), "Species.Expected" varchar(max))
 );

Note the peculiar parameter marshaling and the serialize/unserialize keywords which are specific to this integration. It’s clear that developing scripts within SSMS is probably not the way to go and that even if you have developed things (in, say, RStudio) there are still some glueing steps to integrate it into SQL.

You can use any package inside your scripts. This offers a particularly convenient way to wrap custom code to minimize the code inside the stored procedures. Creating packages is actually easy and fun to do. As a test I made a little package called ‘SwaR’ with a ‘gimme’ function returning some random numbers. This experiment also shows that generating plots/images as part of R-sprocs is possible;

execute sp_execute_external_script
      @language = N'R'
    , @script = N'
         library("SwaR");
         image_file ="c:/temp/swar.jpg";
         jpeg(filename = image_file, width=600, height = 800);
         print(plot(gimme(100), t="l"));
         dev.off();
         OutputDataSet = data.frame(data=readBin(file(image_file, "rb"), what=raw(), n=1e6));
        '
    , @input_data_1 = N''
    with result sets ((plot varbinary(max)));

In essence, the sp_execute_external_script calls the R-runtime and one wonders why it’s not possible to integrate in the same fashion calls to, say, JavaScript V8 or Python. SQL Server simply channels the runtime in and out of the TSQL context. So, if you have created a custom package or wish to use a third-party one you simply need to use ‘install.packages’ outside TSQL and you are done.

Data type marshaling is somewhat constraint for now. For instance, you cannot use vectors from R to SQL, only data-frames are allowed;

execute sp_execute_external_script

    @language = N'R'
    , @script = N'stuff <- data.frame(Numbers=rnorm(12));'
    ,@input_data_1 = N''
    ,@output_data_1_name = N'stuff'
    with result sets (("Numbers" float not null));

What would be particularly interesting is to see how time series data/analysis can be integrated with this.