Topological Data Analysis (TDA) employs modern mathematical concepts such as functors, and posseses such desirable properties as success in coordinate-freeness and robustness to noise. TDA is able to make some strong claims as to its practical uses; it is, however, one of the most mathematically-rigorous areas of staistical analysis.

The video above of Ayasdi co-founder and long-time TDA researcher Gunnar Carlsson very succinctly explains TDA.

This Github repo contains a curated list of resources for learning about TDA, including both gentle non-mathemtical introductions and more rigorous mathematical treatments.

Ohio State University offers the course Computational Topology and Data Analysis, and many of the course’s notes and resources are available on the site.


Python Mapper

The Mapper algorithm is a method for topological data analysis invented by Gurjeet Singh, Facundo Mémoli and Gunnar Carlsson. See the Reference [R1] for the publication. While the Mapper algorithm alone does not constitute a complete data analysis tool itself, it is the key part of a processing chain with (minimally) filter functions, the Mapper algorithm itself and visualization of the results.
Python Mapper is a realization of this toolchain, written by Daniel Müllner and Aravindakshan Babu. It is open source software and is released under the GNU GPLv3 license.

Proof of Concept Mapper by @mlwave for Digit Recognition (Python)


  1. MinMaxScaler on the train set.
  2. t-SNE on first 5k images from train set to 2 components.
  3. Create overlapping intervals on first 2 dimensions and cluster points inside this overlap.
  4. The clusters then become nodes in a graph.
  5. When different clusters have one or more non-unique members we draw an edge.
  6. Size the nodes by the number of points in that cluster.
  7. Color the nodes by the distance to min of first dimension.
  8. Show the images for every cluster member inside a tooltip.

TDA: Statistical Tools for Topological Data Analysis (R)

Tools for the statistical analysis of persistent homology and for density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus, and PHAT (see vignette).

TDAmapper: Topological Data Analysis using Mapper (R)

An R package for using discrete Morse theory to analyze a data set using the Mapper algorithm described in G. Singh, F. Memoli, G. Carlsson (2007).

Kohonen (Python)

This module contains some basic implementations of Kohonen-style vector quantizers: Self-Organizing Map (SOM), Neural Gas, and Growing Neural Gas. Kohonen-style vector quantizers use some sort of explicitly specified topology to encourage good separation among prototype “neurons”.