Semantic Search

This tutorial demonstrates using the pgml SDK to create a collection, add documents, build a pipeline for vector search, make a sample query, and archive the collection when finished. It loads sample data, indexes questions, times a semantic search query, and prints formatted results.

Imports and Setup

The SDK is imported and environment variables are loaded.

content_copy link edit
const pgml = require("pgml");
require("dotenv").config();

content_copy link edit
from pgml import Collection, Model, Splitter, Pipeline
from datasets import load_dataset
from dotenv import load_dotenv
import asyncio

Initialize Collection

A collection object is created to represent the search collection.

content_copy link edit
const main = async () => {
const collection = pgml.newCollection("my_javascript_collection");
}

content_copy link edit
async def main():
load_dotenv()
collection = Collection("my_collection")

Create Pipeline

A pipeline encapsulating a model and splitter is created and added to the collection.

content_copy link edit
const model = pgml.newModel();
const splitter = pgml.newSplitter();
const pipeline = pgml.newPipeline("my_javascript_pipeline", model, splitter);
await collection.add_pipeline(pipeline);

content_copy link edit
model = Model()
splitter = Splitter()
pipeline = Pipeline("my_pipeline", model, splitter)
await collection.add_pipeline(pipeline)

Upsert Documents

Documents are upserted into the collection and indexed by the pipeline.

content_copy link edit
const documents = [
{
id: "Document One",
text: "...",
},
{
id: "Document Two",
text: "...",
},
];
await collection.upsert_documents(documents);

content_copy link edit
documents = [
{"id": "doc1", "text": "..."},
{"id": "doc2", "text": "..."}
]
await collection.upsert_documents(documents)

Query

A vector similarity search query is made on the collection.

content_copy link edit
const queryResults = await collection
.query()
.vector_recall(
"query",
pipeline,
)
.fetch_all();

content_copy link edit
results = await collection.query()
.vector_recall("query", pipeline)
.fetch_all()

Archive Collection

The collection is archived when finished.

content_copy link edit
await collection.archive();

content_copy link edit
await collection.archive()

Main

Boilerplate to call main() async function.

content_copy link edit
main().then((results) => {
console.log("Vector search Results: \n", results);
});

content_copy link edit
if __name__ == "__main__":
asyncio.run(main())