Quick Start w/ Docker¶
We've prepared a Docker image that will allow you to quickly spin up a new PostgreSQL database with PostgreML already installed. It also includes some Scikit toy datasets so you can easily experiment with PostgresML workflows without having to import your own data.
You can skip to Installation for production installation instructions.
Install Docker for Linux. Some package managers (e.g. Ubuntu/Debian) additionally require the docker-compose
package to be installed separately.
Install Docker for Windows. Use the Linux instructions if you're installing in Windows Subsystem for Linux.
-
Clone the repo:
-
Start Dockerized services. PostgresML will run on port 5433, just in case you already have Postgres running:
-
Connect to Postgres in the container with PostgresML installed:
-
Validate your installation:
-
Browse the dashboard on http://localhost:8000/
Note
If you'd like to preserve your database over multiple docker sessions, use docker-compose stop
or ctrl+c
when you shut down the containers. docker-compose down
will remove the docker volumes, and completely reset the database.
Basic Workflow¶
Here is a simple PostgresML example to get you started. We'll import a Scikit dataset, train a couple models on it and make real time predictions, all of it using only SQL.
-
Import the
digits
dataset: -
Train an XGBoost model:
pgml=# SELECT * FROM pgml.train('My First PostgresML Project', task => 'classification', relation_name => 'pgml.digits', y_column_name => 'target', algorithm => 'xgboost', hyperparams => '{ "n_estimators": 25 }' ); INFO: Snapshotting table "pgml.digits", this may take a little while... INFO: Snapshot of table "pgml.digits" created and saved in "pgml"."snapshot_1" INFO: Dataset { num_features: 64, num_labels: 1, num_rows: 1797, num_train_rows: 1348, num_test_rows: 449 } INFO: Training Model { id: 15, algorithm: xgboost, runtime: rust } INFO: Hyperparameter searches: 1, cross validation folds: 1 INFO: Hyperparams: { "n_estimators": 25 } INFO: Metrics: { "f1": 0.88522536, "precision": 0.8835865, "recall": 0.88687027, "accuracy": 0.8841871, "mcc": 0.87189955, "fit_time": 0.44059604, "score_time": 0.005983766 } project | task | algorithm | deployed -----------------------------+----------------+-----------+---------- My first PostgresML project | classification | xgboost | t (1 row)
-
Train a LightGBM model:
pgml=# SELECT * FROM pgml.train('My First PostgresML Project', task => 'classification', relation_name => 'pgml.digits', y_column_name => 'target', algorithm => 'lightgbm' ); INFO: Snapshotting table "pgml.digits", this may take a little while... INFO: Snapshot of table "pgml.digits" created and saved in "pgml"."snapshot_18" INFO: Dataset { num_features: 64, num_labels: 1, num_rows: 1797, num_train_rows: 1348, num_test_rows: 449 } INFO: Training Model { id: 16, algorithm: lightgbm, runtime: rust } INFO: Hyperparameter searches: 1, cross validation folds: 1 INFO: Hyperparams: {} INFO: Metrics: { "f1": 0.91579026, "precision": 0.915012, "recall": 0.9165698, "accuracy": 0.9153675, "mcc": 0.9063865, "fit_time": 0.27111048, "score_time": 0.004169579 } project | task | algorithm | deployed -----------------------------+----------------+-----------+---------- My first PostgresML project | classification | lightgbm | t (1 row)
Looks like LightGBM did better with default hyperparameters. It's automatically deployed and will be used for inference.
-
Infer a few data points in real time:
The following common machine learning tasks are performed automatically by PostgresML:
- Snapshot the data so the experiment is reproducible
- Split the dataset into train and test sets
- Train and validate the model
- Save it into the model store (a Postgres table)
- Load it and cache it during inference
Check out our Training and Predictions documentation for more details. Some more advanced topics like hyperparameter search and GPU acceleration are available as well.
Dashboard¶
The Dashboard app is available at http://localhost:8000/. You can use it to write experiments in Jupyter-style notebooks, manage projects, and visualize datasets used by PostgresML.