Below is a compilation of techniques and ideas related to (digital) marketing optimization from a data science/mining point of view. Many of the techniques can be applied outside marketing and, in fact, the most interesting approaches are a creative modulation from elsewhere. For example, using modern portfolio theory for ads positioning, using random walks on graphs to compute buying cycles and so on.

There are nowadays R and Python packages, Jupyter notebooks and ebooks for anything and everything. The best algorithms are probably well protected corporate jewels but you can achieve great results with existing defaults. I have tried to give useful links in the text. If you need more pointers or ideas, give us a call and we’ll help.


Customer buying cycle

What is it?

A buying process is the series of steps that a consumer will take to make a purchasing decision. A standard model of consumer purchase decision-making includes recognition of needs and wants, information search, evaluation of choices, purchase, and post-purchase evaluation.

The concept is related to customer retention and how to increase both the frequency and this retention. Some methods are: gamification, email campaigns, loyaltiy programs.

How does it work?

In its simplest shape the interval is simply the difference between first contact and conversion.

One can also use Markov chains and the mean first-passage of random walks on graphs. This approach gives more control on the calculation since one can arbitrarily choose the (sub)graph on which the simulation is done.


Pricing research and conjoint analysis

What is it?

It’s a modeling technique designed to show how product attributes affect purchasing decisions.

It’s the answer to what drives buyers choice.

How does it work?

You create a survey and let users pick out what they prefer. For example, here we have combinations of color and size using the support.CEs package:

library(support.CEs)
opts list(color = c(“red”,”orange”,”green”),
size=c(“1″,”2″,”3″,”4”)), nalternatives = 2, nblocks=1, seed=9999)
print(questionnaire(opts))

The result of this survey is a ranking of the choices. Now this ranking is used in a learning model which gives the importance of the various options. The fitted regression coefficients represent conjoint measures of utility called part-worths. The visualization of the part-worths is called a spine chart. It highlights what matters most to customers when considering purchasing a product/service.

Parts worth

Note that a similar technique can be used to value a brand or a company.


Predicting customer choice

What is it?

Trying to predict what a customer will pick based on certain criteria.

How does it work?

The method is an extension of the conjoin analysis and tells which factors contribute to the decision. One can also use decision trees to determine the factors. Or some feature engineering. In any case, it’s in general rather easy to find a classifier.

The thing is that one usually does not want to predict the outcome but rather influence it. Still, the method can be used to give a customer suggestions based on history or affinity.


Targetting customers (aka leads)

What is it?

Lead generation is the initiation of consumer interest or inquiry into products or services of a business. Leads can be created for purposes such as list building, e-newsletter list acquisition or for sales leads.

How does it work?

There are several ways:

  • propensity scoring: this means that one scores non-customers with respect to existing customers by measuring their distance
  • explanatory variables and decision trees. If one needs dimensional reduction (which happens rather often) the interpretation becomes an issue since vectorial combinations of features have seldom a real-world meaning.

Customer segments

What is it?

Customer segmentation is the practice of dividing a company’s customers into groups that reflect similarity among customers in each group. The goal of segmenting customers is to decide how to relate to customers in each segment in order to maximize the value of each customer to the business.

How does it work?

The most common way is to use unsupervised methods like k-means to find clusters. Within this approach you can have geographical segmentation, lifestyle segmentation, based on purchase history, psychological profiling and alike.


Longitudinal analysis

What is it?

Longitudinal analysis is the study of short series of observations obtained from many respondents over time and is also referred to as panel analysis (of a cross-section of time series), or repeated measures, or growth curve analysis (polynomials in time), or multilevel analysis (where one level is a sequence of observations from respondents).

How does it work?

The TraMineR toolkit is probably the ideal package for longitudinal analysis though it is not the most scalable approach.
On a high level this is a variation on segmentation and clustering which returns both temporal insights and sequential info on stages of a process (e.g. purchase, learning, approval process and so on).
Traminer


Product positioning

What is it?

Product positioning is a form of marketing that presents the benefits of your product to a particular target audience. Through market research and focus groups, marketers can determine which audience to target based on favorable responses to the product.

  • Elaboration Likelihood Model: the model aims to explain different ways of processing stimuli, why they are used, and their outcomes on attitude change.
  • Cognitive Response Model: this approach identifies the most direct cause of persuasion in the self-talk of the persuasion target, rather than the content of the message.
  • Peripheral Response Model: related to the likelihood model and identifies the peripheral route to persuasion. This occurs when the listener/consumer decides whether to agree with the message based on other cues besides the strength of the arguments or ideas in the message. For example, a listener may decide to agree with a message because the source appears to be an expert, or is attractive.

How does it work

One identifies differentiating attributes through which one calculates (di)similarities, product replacements or product classes, competition distances and alike. The method is a combination of clustering and propensity on a product level rather than a customer level.


Product recommendation

What is it?

This is a subclass of information filtering systems that seek to predict the “rating” or “preference” that a user would give to an item. A highly rated product is then equivalent to a product/service a customer wishes to buy.

How does it work?

The standard way here is to use basket analysis and association rules. Sometimes this is augmented with geographic segmentation and other data dimensions.


Clickstream analysis

What is it?

Clickstream analysis/analytics is the process of collecting, analyzing and reporting aggregate data about which pages a website visitor visits — and in what order. The path the visitor takes though a website is called the clickstream.

How does it work?

The approach usually involves the creation of Markov chains, geographic differentiation (clustering), analysis of browsers caps. The analysis can complement longitudinal analytics or basket analysis.


Markov chains

What is it?

Markov chains is a mathematical topic on its own but it suits marketing optimization because it captures the probabilistic behavior of customers interacting with a channel. It collects the essence of random discrete behavior across a finite set of states (web pages, touchpoints…) and through this one can model and predict the future (within well-defined constraints).

How does it work?
A Markov chain can be seen as a big sparse matrix with some constraints on the numbers. The construction of such a matrix is really just basic counting using existing customer data.
Markov chains can complement time series analysis and dimensional reduction (feature engineering).
Often you will encounter Markov chains together with speed-improving techniques like Monte-Carlo (the so-called MCMC sampler), mean field approximation and whatnot.


Propensity scoring

What is it?

The idea is that if a touchpoint consumer (visitor or to-be customer) has almost the same behavior as an existing customer he/she supposedly will also become a customer (soon). Similar behvior leads to similar outcome. There are various ways to define ‘similar’ depending on the data and the aims (segmentation, targetting…).

How does it work?

One reduces to-be and existing customers to high-dimensional (feature) vectors and compare them to find some affinity or clusters. Sparsity of the data and high-dimension often requires feature engineering, approximations and dimensional reduction.


Decision field theory

What is it?

This approach is somewhat different from the typical customer-centric approaches in the sense that it models the marketer rather than the customer; how and when should incentives be activated towards customer in order to increase conversion? So, one tries to define when and what has to be done from the side of the marketing division in order to tune/guide customers towards a goal (usually conversion).
This approach is widely used in the context of politics, behavioral sciences and public services.

How does it work?

The modeling is based on diffusion processes and has many links to game theory. A nice intro can be found here.


Forecasting

What is it?

Time series appear everywhere and one can apply forecasting on various levels; feature consumption, personas, complementing clickstream and longitudinal analysis. Forecasting is rooted in the parametrization of discrete time event series (linear and non-linear) and is an academic field on its own.
Often forecasting returns insights in how customers behave and what kinda marketing trends one has but it does not often help in deciding what to do to influence it. Forecasting does however complement campaign modeling and alike to see how certain changes (in a probabilistic fashion) influence the time-like behavior. What Markov chains are in feature dimensions, forecasting is in time.

How does it work?

There are many many forecasting libraries in pretty much any programming language. While it’s fairly easy to get started, it’s an art to understand fully what to do (and what not). It takes a lot of experience to master forecasting.


Imbalanced data

What is it?

Imbalanced data is a direct consequences of a business with lots of visitors but few (or much less) buying customers. This is not uncommon if your websites has plenty of info and desirable alas expensive products. It means that when you apply typical machine learning methods you learn much more about stuff people do unrelated to a purchase. Hence, your optimization or influence affects mostly non-buying behavior and this is precisely the opposite of what you actually want to do.

How does it work?

This article gives you a great overview of various techniques to combat imbalanced data. There is no simple answer to this problem and it takes often an detailed analysis to figure out how to proceed with imbalanced data.


Hotpaths

What is it?

Hotpaths define the most used or most effective path between an initial intention to purchase a product/service and conversion. It defines the highways in a landscape of touchpoints. It also allows you to see which key-touchpoints are contributing to an effective campaign or marketing strategy.

How does it work?

As explained above, a Markov chain can be seen as a big (sparse) matrix. This matric can at the same time be seen as an adjacency matrix of a graph. With some tweaks and assumptions (e.g. truncating low-probability transitions) one can use graph techniques to identify hotpaths (shortest paths, highest weights paths).


Modern portfolio theory

What is it?

Markowitz theory can be seen as a way to collect the most appropriate items in order to increase (maximize) some function of the items. If one thinks of explanatory features respectively machine learning accuracy then portfolio theory can be used to select (engineer) features towards increased conversion rates.

How does it work?

The technique is fairly straightforward and one can rely on various R and Python packages to solve the quadratic programming.


Mean first passage

What is it?

If you think of customers as stochastic variable on a graph then the random walk necessarily defines an expected first-hit time on each node. That is, if you look at the behavior of visitors on a website you statistically can define when someone will hit the first time any particular page, a product or conversion page in particular.

How does it work?

This approach presumes a Markov chain on a graph and from this it’s quite easy to use graph centrality techniques to compute the mean first-hit. This approach is also related to the well-known pagerank algorithm presumaly used by Google.


Churn rate

What is it?

Also called attrition rate, it’s the amount of people per unit time (or some time interval) moving out of a group, usually customers.

How does it work?

The concept is hugely important in the context of subscriber-based business models, but from a technical point of view the approach is standard machine learning. You have various (historical) attributes and you attempt to classify an existing customer as ‘amost leaving’ or ‘still happy’. Details can be found e.g. here.

A/B testing and multivariate testing

What is it?

A/b testing, also known as split testing, is a method of testing through which marketing variables are compared to each other to identify the one that brings a better response rate. In this context, the element that is being testing is called “control” and the element that is argued to give a better result is called “treatment”.

How does it work?

There are various approaches here, the Bayesian one being the most common. You can find a great explanation here.


Packages, modules, libraries, code and more