Big data is data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. There are a number of concepts associated with big data: originally there were 3 concepts volume, variety, velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.

Spam with Sparkling Water

Using Spark and H2O to detect spam.

Ensemble Learner

Ensemble learner using H2O.

Diverse Dataiku tricks

Diverse things I collected while developing solutions on top of Dataiku.
Markov Chain

Using Dataiku for digital marketing optimization

Describing in some details how Dataiku makes it easy to develop and deploy a digital marketing solution.

Dataiku: a great data science platform

An overview of Dataiku and a data science platform.

Dummy DCOS Propensity Service based on NodeJS for DCOS on Azure Container Services

Straightforward implementation of a propensity scoring service on top of Azure's Container Services.