Semantic Search

This tutorial demonstrates using the pgml SDK to create a collection, add documents, build a pipeline for vector search, make a sample query, and archive the collection when finished. It loads sample data, indexes questions, times a semantic search query, and prints formatted results.

Imports and Setup

The SDK is imported and environment variables are loaded.

const pgml = require("pgml");

from pgml import Collection, Model, Splitter, Pipeline
from datasets import load_dataset
from dotenv import load_dotenv
import asyncio

Initialize Collection

A collection object is created to represent the search collection.

const main = async () => {
const collection = pgml.newCollection("my_javascript_collection");

content_copy link edit
async def main():
collection = Collection("my_collection")

Create Pipeline

A pipeline encapsulating a model and splitter is created and added to the collection.

const model = pgml.newModel();
const splitter = pgml.newSplitter();
const pipeline = pgml.newPipeline("my_javascript_pipeline", model, splitter);
await collection.add_pipeline(pipeline);

model = Model()
splitter = Splitter()
pipeline = Pipeline("my_pipeline", model, splitter)
await collection.add_pipeline(pipeline)

Upsert Documents

Documents are upserted into the collection and indexed by the pipeline.

const documents = [
id: "Document One",
text: "...",
id: "Document Two",
text: "...",
await collection.upsert_documents(documents);

documents = [
{"id": "doc1", "text": "..."},
{"id": "doc2", "text": "..."}
await collection.upsert_documents(documents)


A vector similarity search query is made on the collection.

const queryResults = await collection

results = await collection.query()
.vector_recall("query", pipeline)

Archive Collection

The collection is archived when finished.

await collection.archive();

await collection.archive()


Boilerplate to call main() async function.

main().then((results) => {
console.log("Vector search Results: \n", results);

if __name__ == "__main__":