Collections

Collections are the organizational building blocks of the SDK. They manage all documents and related chunks, embeddings, tsvectors, and pipelines.

Creating Collections

By default, collections will read and write to the database specified by PGML_DATABASE_URL environment variable.

Default PGML_DATABASE_URL

content_copy link edit
const collection = pgml.newCollection("test_collection")

content_copy link edit
collection = Collection("test_collection")

Custom PGML_DATABASE_URL

Create a Collection that reads from a different database than that set by the environment variable PGML_DATABASE_URL.

content_copy link edit
const collection = pgml.newCollection("test_collection", CUSTOM_DATABASE_URL)

content_copy link edit
collection = Collection("test_collection", CUSTOM_DATABASE_URL)

Upserting Documents

Documents are dictionaries with one required key: id. All other keys/value pairs are stored and can be chunked, embedded, broken into tsvectors, and searched over as specified by a Pipeline.

content_copy link edit
const documents = [
{
id: "document_one",
title: "Document One",
text: "document one contents...",
random_key: "here is some random data",
},
{
id: "document_two",
title: "Document Two",
text: "document two contents...",
random_key: "here is some random data",
},
];
await collection.upsert_documents(documents);

content_copy link edit
documents = [
{
"id": "document_one",
"title": "Document One",
"text": "Here are the contents of Document 1",
"random_key": "here is some random data",
},
{
"id": "document_two",
"title": "Document Two",
"text": "Here are the contents of Document 2",
"random_key": "here is some random data",
},
]
await collection.upsert_documents(documents)

Documents can be replaced by upserting documents with the same id.

content_copy link edit
const documents = [
{
id: "document_one",
title: "Document One New Title",
text: "Here is some new text for document one",
random_key: "here is some new random data",
},
{
id: "document_two",
title: "Document Two New Title",
text: "Here is some new text for document two",
random_key: "here is some new random data",
},
];
await collection.upsert_documents(documents);

content_copy link edit
documents = [
{
"id": "document_one",
"title": "Document One",
"text": "Here is some new text for document one",
"random_key": "here is some random data",
},
{
"id": "document_two",
"title": "Document Two",
"text": "Here is some new text for document two",
"random_key": "here is some random data",
},
]
await collection.upsert_documents(documents)

Documents can be merged by setting the merge option. On conflict, new document keys will override old document keys.

content_copy link edit
const documents = [
{
id: "document_one",
new_key: "this will be a new key in document one",
random_key: "this will replace old random_key"
},
{
id: "document_two",
new_key: "this will bew a new key in document two",
random_key: "this will replace old random_key"
},
];
await collection.upsert_documents(documents, {
merge: true
});

content_copy link edit
documents = [
{
"id": "document_one",
"new_key": "this will be a new key in document one",
"random_key": "this will replace old random_key",
},
{
"id": "document_two",
"new_key": "this will be a new key in document two",
"random_key": "this will replace old random_key",
},
]
await collection.upsert_documents(documents, {"merge": True})

Getting Documents

Documents can be retrieved using the get_documents method on the collection object.

content_copy link edit
const documents = await collection.get_documents({limit: 100 })

content_copy link edit
documents = await collection.get_documents({ "limit": 100 })

Paginating Documents

The SDK supports limit-offset pagination and keyset pagination.

Limit-Offset Pagination

content_copy link edit
const documents = await collection.get_documents({ limit: 100, offset: 10 })

content_copy link edit
documents = await collection.get_documents({ "limit": 100, "offset": 10 })

Keyset Pagination

content_copy link edit
const documents = await collection.get_documents({ limit: 100, last_row_id: 10 })

content_copy link edit
documents = await collection.get_documents({ "limit": 100, "last_row_id": 10 })

The last_row_id can be taken from the row_id field in the returned document's dictionary. Keyset pagination does not currently work when specifying the order_by key.

Filtering Documents

Documents can be filtered by passing in the filter key.

content_copy link edit
const documents = await collection.get_documents({
limit: 10,
filter: {
id: {
$eq: "document_one"
}
}
})

content_copy link edit
documents = await collection.get_documents(
{
"limit": 100,
"filter": {
"id": {"$eq": "document_one"},
},
}
)

Sorting Documents

Documents can be sorted on any key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.

content_copy link edit
const documents = await collection.get_documents({
limit: 100,
offset: 10,
order_by: {
id: "desc"
}
})

content_copy link edit
documents = await collection.get_documents({
"limit": 100,
"offset": 10,
"order_by": {
"id": "desc"
}
})

Deleting Documents

Documents can be deleted with the delete_documents method on the collection object.

content_copy link edit
const documents = await collection.delete_documents({
id: {
$eq: 1
}
})

content_copy link edit
documents = await collection.delete_documents(
{
"id": {"$eq": 1},
}
)