Scikit hyperparameter optimizationScikit shines when it comes to hyperparameter optimization. Here the algorithm parameters are referred to as hyperparameters whereas the coefficients found by the machine learning algorithm itself are referred to as parameters. There are mainly two search strategies; systematic and stochastic. The systematic one tries all combinations within some given intervals while the random search, well, randomly searches for the optimized value.

 

Systematic search

Grid search is an approach to parameter tuning that will systematically build and evaluate a model for each combination of algorithm parameters specified in a grid.

The recipe below evaluates different alpha values for the Ridge Regression algorithm on the standard diabetes dataset. This is a one-dimensional grid search, you can have arbitrary dimensions;

# Grid Search for Algorithm Tuning
import numpy as np
from sklearn import datasets
from sklearn.linear_model import Ridge
from sklearn.grid_search import GridSearchCV
# load the diabetes datasets
dataset = datasets.load_diabetes()
# prepare a range of alpha values to test
alphas = np.array([1,0.1,0.01,0.001,0.0001,0])
# create and fit a ridge regression model, testing each alpha
model = Ridge()
grid = GridSearchCV(estimator=model, param_grid=dict(alpha=alphas))
grid.fit(dataset.data, dataset.target)
print(grid)
# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_.alpha)

For more information see the API for GridSearchCV and Exhaustive Grid Search section in the user guide.

Random search

Random search is an approach to parameter tuning that will sample algorithm parameters from a random distribution (i.e. uniform) for a fixed number of iterations. A model is constructed and evaluated for each combination of parameters chosen.

The recipe below evaluates different alpha random values between 0 and 1 for the Ridge Regression algorithm on the standard diabetes dataset.

# Randomized Search for Algorithm Tuning
import numpy as np
from scipy.stats import uniform as sp_rand
from sklearn import datasets
from sklearn.linear_model import Ridge
from sklearn.grid_search import RandomizedSearchCV
# load the diabetes datasets
dataset = datasets.load_diabetes()
# prepare a uniform distribution to sample for the alpha parameter
param_grid = {'alpha': sp_rand()}
# create and fit a ridge regression model, testing random alpha values
model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
rsearch.fit(dataset.data, dataset.target)
print(rsearch)
# summarize the results of the random parameter search
print(rsearch.best_score_)
print(rsearch.best_estimator_.alpha)

For more information see the API for RandomizedSearchCV and the the Randomized Parameter Optimization section in the user guide. The user manual also gives very comprehensive info on how to specify the various hyperparameters of the tuning procedure.