In a related article we have shown how you can do some PdM with R. It’s tedious. In this article we show how a platform (Dataiku) can simplify your life tremendously.
Business context: a car rental company predicting failure and maintenance of parts.
What you need:
- as much data about the parts as possible
- knowledge about what failed and when
- anything you think might have an influence on the predictions (does air humidity have an influence?). This is where domain expertise matters and where as a data scientist you need input.
The process in Dataiku is not different from a manually created model: data upload, data cleaning, learning a model and making predictions. In the picture above you can see the flow and the process steps highlighted.
Dataiku makes things easy in multiple ways:
- the process flow, once created can be run repeatedly with the same or different data. That is, you have reproducible data science.
- uploading data or connecting raw data to external source is a breeze
- the tedious process of cleaning data, transforming and/or partitioning things is super-easy. Lots of helpful tools perform what takes deep API knowledge in R or Python.
- the result can be immediately turned into a web service
Not always needed but still nice to have: you can use the integrated dashboarding (below) environment (HTML, d3.js and all that).
What is the catch? Well, you need to buy a license but as far as I can see that’s the only setback. Just compare the amount of code you need in R or Python vs. the visual construction in Dataiku. Besides the collaborative features and much more (Spark integration and such).