Hyperparameter Search¶
Models can be further refined with the scikit cross validation hyperparameter search libraries. We currently support the grid
and random
implementations.
Visual Analysis¶
The optimal set of hyperparams will be chosen for the model, and that combination is highlighted in the dashboard among all search candidates. The impact of each hyperparameter is measured against the key metric, as well as the training and test times. In this particular case, it's interesting that as max_depth
increases, the "Test Score" on the key metric trends lower, so the smallest value of max_depth
is chosen to maximize the "Test Score". Luckily, the smallest max_depth
values also have the fastest "Fit Time", indicating that we pay less for training these higher quality models. It's a little less obvious how the different values n_estimators
and learning_rate
impact the test score. We may want to rerun our search and zoom in our out in the search space to get more insight.
API¶
The arguments to pgml.train
that begin with search
are used for hyperparameter tuning.
search
can either begrid
orrandom
.search_params
is the set of hyperparameters to search for your algorithmsearch_args
are passed to the scikit learn model selection algorithm for extra configuration
search | description |
---|---|
grid | Trains every permutation of search_params |
random | Randomly samples search_params to train models |
You may pass any of the arguments listed in the algorithms documentation as hyperparameters. See Algorithms for the complete list of algorithms and their associated documentation.
Example¶
This grid search will train len(max_depth) * len(n_estimators) * len(learning_rate) = 6 * 4 * 4 = 96
combinations to compare all possible permutations of the search_params
. It takes a couple of minutes on my computer, but you can delete some values if you want to speed things up. I like to watch all cores operate at 100% utilization in a separate terminal with htop
.
As you can see from the output, a new set model has been deployed with a better performance. There will also be a new analysis available on this model visible in the dashboard.