# Hyperparameter Search¶

Models can be further refined with the scikit cross validation hyperparameter search libraries. We currently support the `grid`

and `random`

implementations.

## Visual Analysis¶

The optimal set of hyperparams will be chosen for the model, and that combination is highlighted in the dashboard among all search candidates. The impact of each hyperparameter is measured against the key metric, as well as the training and test times. In this particular case, it's interesting that as `max_depth`

increases, the "Test Score" on the key metric trends lower, so the smallest value of `max_depth`

is chosen to maximize the "Test Score". Luckily, the smallest `max_depth`

values also have the fastest "Fit Time", indicating that we pay less for training these higher quality models. It's a little less obvious how the different values `n_estimators`

and `learning_rate`

impact the test score. We may want to rerun our search and zoom in our out in the search space to get more insight.

## API¶

The arguments to `pgml.train`

that begin with `search`

are used for hyperparameter tuning.

`search`

can either be`grid`

or`random`

.`search_params`

is the set of hyperparameters to search for your algorithm`search_args`

are passed to the scikit learn model selection algorithm for extra configuration

search | description |
---|---|

grid | Trains every permutation of `search_params` |

random | Randomly samples `search_params` to train models |

You may pass any of the arguments listed in the algorithms documentation as hyperparameters. See Algorithms for the complete list of algorithms and their associated documentation.

## Example¶

This grid search will train `len(max_depth) * len(n_estimators) * len(learning_rate) = 6 * 4 * 4 = 96`

combinations to compare all possible permutations of the `search_params`

. It takes a couple of minutes on my computer, but you can delete some values if you want to speed things up. I like to watch all cores operate at 100% utilization in a separate terminal with `htop`

.

As you can see from the output, a new set model has been deployed with a better performance. There will also be a new analysis available on this model visible in the dashboard.