Skip to content

Vector Operations

PostgresML adds native vector operations that can be used in SQL queries. Vector operations are particularly useful for dealing with embeddings that have been generated from other machine learning algorithms and can provide functions like nearest neighbor calculations using the distance functions.

Emeddings can be a relatively efficient mechanism to leverage the power deep learning, without the runtime inference costs. These functions are relatively fast and the more expensive distance functions can compute ~100k per second for a memory resident dataset on modern hardware.

The PostgreSQL planner will also automatically parallelize evalualtion on larger datasets, as configured to take advantage of multiple CPU cores when available.

Nearest neighbor example

If we had precalculated the embeddings for a set of user and product data, we could find the 100 best products for a user with a similarity search.

1
2
3
4
5
6
7
8
SELECT 
    products.id, 
    pgml.cosine_similarity(users.embedding, products.embedding) AS distance
FROM users
JOIN products
WHERE users.id = 123
ORDER BY distance ASC
LIMIT 100;

Elementwise arithmetic w/ constants

Addition

pgml.add(a REAL[], b REAL) -> REAL[]

Subtraction

pgml.subtract(minuend REAL[], subtrahend REAL) -> REAL[]

Multiplication

pgml.multiply(multiplicand REAL[], multiplier REAL) -> REAL[]

Division

pgml.divide(dividend REAL[], divisor REAL) -> REAL[]

Pairwise arithmetic w/ vectors

Addition

pgml.add(a REAL[], b REAL[]) -> REAL[]

Subtraction

pgml.subtract(minuend REAL[], subtrahend REAL[]) -> REAL[]

Multiplication

pgml.multiply(multiplicand REAL[], multiplier REAL[]) -> REAL[]

Division

pgml.divide(dividend REAL[], divisor REAL[]) -> REAL[]

Norms

Dimensions not at origin

pgml.norm_l0(vector REAL[]) -> REAL

Manhattan distance from origin

pgml.norm_l1(vector REAL[]) -> REAL 

Euclidean distance from origin

pgml.norm_l2(vector REAL[]) -> REAL 

Absolute value of largest element

pgml.norm_max(vector REAL[]) -> REAL 

Normalization

Unit Vector

pgml.normalize_l1(vector REAL[]) -> REAL[]

Squared Unit Vector

pgml.normalize_l2(vector REAL[]) -> REAL[]

-1:1 values

pgml.normalize_max(vector REAL[]) -> REAL[]

Distances

Manhattan

pgml.distance_l1(a REAL[], b REAL[]) -> REAL

Euclidean

pgml.distance_l2(a REAL[], b REAL[]) -> REAL

Projection

pgml.dot_product(a REAL[], b REAL[]) -> REAL

Direction

pgml.cosine_similarity(a REAL[], b REAL[]) -> REAL

Comments