Algorithm Selection
We currently support regression and classification algorithms from scikit-learn, XGBoost, and LightGBM.
Algorithms
Gradient Boosting
Algorithm | Regression | Classification |
---|---|---|
xgboost |
XGBRegressor | XGBClassifier |
xgboost_random_forest |
XGBRFRegressor | XGBRFClassifier |
lightgbm |
LGBMRegressor | LGBMClassifier |
Scikit Ensembles
Algorithm | Regression | Classification |
---|---|---|
ada_boost |
AdaBoostRegressor | AdaBoostClassifier |
bagging |
BaggingRegressor | BaggingClassifier |
extra_trees |
ExtraTreesRegressor | ExtraTreesClassifier |
gradient_boosting_trees |
GradientBoostingRegressor | GradientBoostingClassifier |
random_forest |
RandomForestRegressor | RandomForestClassifier |
hist_gradient_boosting |
HistGradientBoostingRegressor | HistGradientBoostingClassifier |
Support Vector Machines
Algorithm | Regression | Classification |
---|---|---|
svm |
SVR | SVC |
nu_svm |
NuSVR | NuSVC |
linear_svm |
LinearSVR | LinearSVC |
Linear Models
Algorithm | Regression | Classification |
---|---|---|
linear |
LinearRegression | LogisticRegression |
ridge |
Ridge | RidgeClassifier |
lasso |
Lasso | - |
elastic_net |
ElasticNet | - |
least_angle |
LARS | - |
lasso_least_angle |
LassoLars | - |
orthoganl_matching_pursuit |
OrthogonalMatchingPursuit | - |
bayesian_ridge |
BayesianRidge | - |
automatic_relevance_determination |
ARDRegression | - |
stochastic_gradient_descent |
SGDRegressor | SGDClassifier |
perceptron |
- | Perceptron |
passive_aggressive |
PassiveAggressiveRegressor | PassiveAggressiveClassifier |
ransac |
RANSACRegressor | - |
theil_sen |
TheilSenRegressor | - |
huber |
HuberRegressor | - |
quantile |
QuantileRegressor | - |
Other
Algorithm | Regression | Classification |
---|---|---|
kernel_ridge |
KernelRidge | - |
gaussian_process |
GaussianProcessRegressor | GaussianProcessClassifier |
Comparing Algorithms
Any of the above algorithms can be passed to our pgml.train()
function using the algorithm
parameter. If the parameter is omitted, linear regression is used by default.
SELECT * FROM pgml.train(
'My First PostgresML Project',
task => 'classification',
relation_name => 'pgml.digits',
y_column_name => 'target',
algorithm => 'xgboost',
);
The hyperparams
argument will pass the hyperparameters on to the algorithm. Take a look at the associated documentation for valid hyperparameters of each algorithm. Our interface uses the scikit-learn notation for all parameters.
SELECT * FROM pgml.train(
'My First PostgresML Project',
algorithm => 'xgboost',
hyperparams => '{
"n_estimators": 25
}'
);
Once prepared, the training data can be efficiently reused by other PostgresML algorithms for training and predictions. Every time the pgml.train()
function receives the relation_name
and y_column_name
arguments, it will create a new snapshot of the relation (table) and save it in the pgml
schema.
To train another algorithm on the same dataset, omit the two arguments. PostgresML will reuse the latest snapshot with the new algorithm.
Try experimenting with multiple algorithms to explore their performance characteristics on your dataset. It's often hard to know which algorithm will be the best.
Dashboard
The PostgresML dashboard makes it easy to compare various algorithms on your dataset. You can explore individual metrics & compare algorithms to each other, all trained on the same dataset for a fair benchmark.
Have Questions?
Join our Discord and ask us anything! We're friendly and would love to talk about PostgresML.
Try It Out
Try PostresML using our free serverless cloud. It comes with GPUs, 5 GiB of space and plenty of datasets to get you started.