PostgresML is a Postgres extension, so running it is very similar to running a self-hosted PostgreSQL database server. A typical architecture consists of a primary database that will serve reads and writes, optional replicas to scale reads horizontally, and a pooler to load balance connections.
At PostgresML, we prefer running Postgres on Ubuntu, mainly because of its extensive network of supported hardware architectures, packages, and drivers. The rest of this guide will assume that we're using Ubuntu 22.04, the current long term support release of Ubuntu, but you can run PostgresML pretty easily on any other flavor of Linux.
PostgresML for Ubuntu 22.04 can be downloaded directly from our APT repository. There is no need to install any additional dependencies or compiling from source.
To add our APT repository to our sources, you can run:
content_copy
echo "deb [trusted=yes] https://apt.postgresml.org jammy main" | \
sudo tee -a /etc/apt/sources.list
We don't sign our Debian packages since we can rely on HTTPS to guarantee the authenticity of our binaries.
Once you've added the repository, make sure to update APT:
content_copy
sudo apt update
Finally, you can install PostgresML:
content_copy
sudo apt install -y postgresml-14
Ubuntu 22.04 ships with PostgreSQL 14, but if you have a different version installed on your system, just change 14
in the package name to your Postgres version. We currently support all versions supported by the community: Postgres 12 through 15.
You should be able to connect to Postgres and install the extension into the database of your choice:
content_copy
sudo -u postgres psql
content_copy
postgres=# CREATE EXTENSION pgml;
INFO: Python version: 3.10.6 (main, Nov 2 2022, 18:53:38) [GCC 11.3.0]
INFO: Scikit-learn 1.1.3, XGBoost 1.7.1, LightGBM 3.3.3, NumPy 1.23.5
CREATE EXTENSION
postgres=#
If you have access to Nvidia GPUs and would like to use them for accelerating LLMs or XGBoost/LightGBM/Catboost, you'll need to install Cuda and the matching drivers.
Nvidia has an apt repository that can be added to your system pretty easily:
content_copy
curl -LsSf \
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb \
-o /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
sudo apt update
sudo apt install -y cuda
Once installed, you should check your installation by running nvidia-smi
:
$ nvidia-smi
Fri Oct 6 09:38:19 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.04 Driver Version: 536.23 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3070 Ti On | 00000000:08:00.0 On | N/A |
| 0% 41C P8 28W / 290W | 1268MiB / 8192MiB | 5% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
It's important that the Cuda version and the Nvidia driver versions are compatible. When installing Cuda for the first time, it's common to have to reboot the system before both are detected successfully.
pgvector
is optimized for the CPU architecture of your machine, so it's best to compile it from source directly on the machine that will be using it.
pgvector
has very few dependencies beyond just the standard build chain. You can install all of them with this command:
content_copy
sudo apt install -y \
build-essential \
postgresql-server-dev-14
Replace 14
in postgresql-server-dev-14
with your Postgres version.
You can install pgvector
directly from GitHub by just running:
content_copy
git clone https://github.com/pgvector/pgvector /tmp/pgvector
git -C /tmp/pgvector checkout v0.5.0
echo "trusted = true" >> "/tmp/pgvector/vector.control"
make -C /tmp/pgvector
sudo make install -C /tmp/pgvector
Once installed, you can create the extension in the database of your choice:
content_copy
postgresml=# CREATE EXTENSION vector;
CREATE EXTENSION