Big data is data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. There are a number of concepts associated with big data: originally there were 3 concepts volume, variety, velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.

What is a graph database?

Graph databases for the clueless.

Classic spam classification using Spark MLLib

Using MLLib naive Bayes for spam classification.

Poor man’s blockchain

A five mins bitcoin-ish implementation to understand it.

Timeseries forecasting with H2O

By expanding a time series horizontally you can use H2O to forecast it.

Graph analytics with Spark GraphFrames

, ,
Large scale graph analytics with Spark and Apache GraphFrames.

Statistics on Apache Hive

Basics of stats using Apache Hive.

Apache Spark Streaming

Spark Streaming in HortonWorks.

Getting started with Apache Zeppelin

Zeppelin as the Jupyter of big data.