Machine learning is the subfield of computer science that “gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is unfeasible; example applications include spam filtering, optical character recognition (OCR), search engines and computer vision.

Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses in prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining,[6] where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning.

Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to “produce reliable, repeatable decisions and results” and uncover “hidden insights” through learning from historical relationships and trends in the data.


Named entities and random fields

A rather detailed overview of using NLP for Dutch named entity recognition.

Techniques in Digital marketing optimization

A bit a compilation of ideas and techniques I use when helping customers use data science for marketing optimization.

Basic dataviz with Apache Zeppelin

About Zeppeling and the fun/great/useful things you can do with it.

Unbalanced data (SMOTE)

Intro Imbalanced data typically refers to a problem with classification problems where the classes are not represented equally.For example, you may have a 2-class (binary) classification problem with 100 instances (rows). A total of 80 instances…

Filtering out noise with chi2

Using chi2 to find signals in noisy data.

R integration in SQL Server 2016

Microsoft bought Revolution Analytics for its enterprise-level R spectrum and it was swiftly integrated into the latest SQL Server.

Classification, dimensional reduction and chi2.

When talking about dimensional reduction in the context of machine learning you have many options; linear discriminant analysis (LDA), principal component analysis (PCA) and many other. Here I want to highlight a technique that I explored as part of a very large research project and which entails the usage of chi2.

Pearson p-value

Numerical algorithms and statistical theory is quite robust and universal, but once you look into the various software implementations you discover that presumed standards are not so universal.
Twitter collage.