Summarizing Question Answering

Here are the Python and JavaScript examples for text summarization using pgml SDK

Imports and Setup

The SDK and datasets are imported. Builtins are used for transformations.

const pgml = require("pgml");

from pgml import Collection, Model, Splitter, Pipeline, Builtins
from datasets import load_dataset
from dotenv import load_dotenv

Initialize Collection

A collection is created to hold text passages.

const collection = pgml.newCollection("my_javascript_sqa_collection");

collection = Collection("squad_collection")

Create Pipeline

A pipeline is created and added to the collection.

const pipeline = pgml.newPipeline(
await collection.add_pipeline(pipeline);

model = Model()
splitter = Splitter()
pipeline = Pipeline("squadv1", model, splitter)
await collection.add_pipeline(pipeline)

Upsert Documents

Text passages are upserted into the collection.

const documents = [
id: "...",
text: "...",
await collection.upsert_documents(documents);

data = load_dataset("squad")
documents = [
{"id": ..., "text": ...}
for r in data
await collection.upsert_documents(documents)

Query for Context

A vector search retrieves a relevant text passage.

const queryResults = await collection
.vector_recall(query, pipeline)
const context = queryResults[0][1];

results = await collection.query()
.vector_recall(query, pipeline)
context = results[0][1]

Summarize Text

The text is summarized using a pretrained model.

const builtins = pgml.newBuiltins();
const summary = await builtins.transform(
{task: "summarization",
model: "sshleifer/distilbart-cnn-12-6"},

builtins = Builtins()
summary = await builtins.transform(
{"task": "summarization",
"model": "sshleifer/distilbart-cnn-12-6"},