# Top 10 algorithms in data mining using R

In their book Wu et al. describe the top 10 algorithms in data mining. Below is shown how easy one can do this in R. The datasets used are available in R itself, no need to download anything. Run data() to see the available datasets. Nothing is original here, everything was Googled, and no references are made to sources. The purpose of all this is to show how quickly you can prototype most algorithms with minimal code, in R.

#### 1. C4.5

require(rJava) # needed for printing strings out of Java objects
require(RWeka) # contains the J48() function that builds C4.5 decision trees
iris_c4.5 <- J48(Species ~ ., data=iris)
points(iris_km$centers[, c("Sepal.Length", "Sepal.Width")], col=2:4, pch=17, cex=3) #### 3. Support Vector Machines require(e1071) iris_svm <- svm(Species ~ ., data = iris, method = "C-classification", kernel = "radial", cost = 10, gamma = 0.1) summary(iris_svm) plot(iris_svm, iris, Petal.Width ~ Petal.Length, slice = list(Sepal.Width = 3, Sepal.Length = 4)) #### 4. The Apriori algorithm require(arules) data("Adult") rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules")) inspect(head(rules)) #### 5. The EM algorithm require(mixtools) data('faithful') wait1 <- normalmixEM(faithful$waiting, lambda = .5, mu = c(55, 80), sigma = 5)
plot(wait1, density=TRUE, cex.axis=1.4, cex.lab=1.4, cex.main=1.8, whichplots=2,
main2="Time between Old Faithful eruptions", xlab2="Minutes", ask=F)

#### 6. Page Rank

require(igraph)
# We cheat here a bit by starting directly with a graph instead
# of building one from some data like a set of web pages.
g <- random.graph.game(20, 5/20, directed=TRUE) # get's us a directed graph
plot(g)

#### 8. k-nearest neighbor

require(class)
iris_knn <- knn(train=subset(iris, select = -Species),
test=subset(iris, select = -Species),
cl=iris$Species, k = 3, prob=TRUE) table(Actual=iris$Species, Predicted=iris_knn)

#### 9. Naive Bayes

require(e1071)
iris_nb <- naiveBayes(subset(iris, select=-Species), iris$Species) table(predict(iris_nb, subset(iris, select=-Species)), iris$Species, dnn=list('predicted','actual'))

#### 10. CART

require(rpart)
iris_cart <- rpart(Species ~., data=iris)
plot(iris_cart)
text(iris_cart)
Tags: