Vector Operations
PostgresML adds optimized vector operations that can be used inside SQL queries. Vector operations are particularly useful for dealing with embeddings that have been generated from other machine learning algorithms, and can provide functions like nearest neighbor calculations using various distance functions.
Embeddings can be a relatively efficient mechanism to leverage the power of deep learning, without the runtime inference costs. These functions are fast with the most expensive distance functions computing upwards of ~100k per second for a memory resident dataset on modern hardware.
The PostgreSQL planner will also automatically parallelize evaluation on larger datasets, if configured to take advantage of multiple CPU cores when available.
Vector operations are implemented in Rust using ndarray
and BLAS, for maximum performance.
Element-wise Arithmetic with Constants
Addition
pgml.add(a REAL[], b REAL) -> REAL[]
SELECT pgml.add(ARRAY[1.0, 2.0, 3.0], 3);
pgml=# SELECT pgml.add(ARRAY[1.0, 2.0, 3.0], 3);
add
---------
{4,5,6}
(1 row)
Subtraction
pgml.subtract(minuend REAL[], subtrahend REAL) -> REAL[]
Multiplication
pgml.multiply(multiplicand REAL[], multiplier REAL) -> REAL[]
Division
pgml.divide(dividend REAL[], divisor REAL) -> REAL[]
Pairwise arithmetic with Vectors
Addition
pgml.add(a REAL[], b REAL[]) -> REAL[]
Subtraction
pgml.subtract(minuend REAL[], subtrahend REAL[]) -> REAL[]
Multiplication
pgml.multiply(multiplicand REAL[], multiplier REAL[]) -> REAL[]
Division
pgml.divide(dividend REAL[], divisor REAL[]) -> REAL[]
Norms
Dimensions not at origin
pgml.norm_l0(vector REAL[]) -> REAL
Manhattan distance from origin
pgml.norm_l1(vector REAL[]) -> REAL
Euclidean distance from origin
pgml.norm_l2(vector REAL[]) -> REAL
Absolute value of largest element
pgml.norm_max(vector REAL[]) -> REAL
Normalization
Unit Vector
pgml.normalize_l1(vector REAL[]) -> REAL[]
Squared Unit Vector
pgml.normalize_l2(vector REAL[]) -> REAL[]
-1:1 values
pgml.normalize_max(vector REAL[]) -> REAL[]
Distances
Manhattan
pgml.distance_l1(a REAL[], b REAL[]) -> REAL
Euclidean
pgml.distance_l2(a REAL[], b REAL[]) -> REAL
Projection
pgml.dot_product(a REAL[], b REAL[]) -> REAL
Direction
pgml.cosine_similarity(a REAL[], b REAL[]) -> REAL
Nearest Neighbor Example
If we had precalculated the embeddings for a set of user and product data, we could find the 100 best products for a user with a similarity search.
SELECT
products.id,
pgml.cosine_similarity(
users.embedding,
products.embedding
) AS distance
FROM users
JOIN products
WHERE users.id = 123
ORDER BY distance ASC
LIMIT 100;
Have Questions?
Join our Discord and ask us anything! We're friendly and would love to talk about PostgresML.
Try It Out
Try PostresML using our free serverless cloud. It comes with GPUs, 5 GiB of space and plenty of datasets to get you started.