Summarizing Question Answering

Here are the Python and JavaScript examples for text summarization using pgml SDK

Imports and Setup

The SDK and datasets are imported. Builtins are used for transformations.

content_copy link edit
const pgml = require("pgml");
require("dotenv").config();

content_copy link edit
from pgml import Collection, Model, Splitter, Pipeline, Builtins
from datasets import load_dataset
from dotenv import load_dotenv

Initialize Collection

A collection is created to hold text passages.

content_copy link edit
const collection = pgml.newCollection("my_javascript_sqa_collection");

content_copy link edit
collection = Collection("squad_collection")

Create Pipeline

A pipeline is created and added to the collection.

content_copy link edit
const pipeline = pgml.newPipeline(
"my_javascript_sqa_pipeline",
pgml.newModel(),
pgml.newSplitter(),
);
await collection.add_pipeline(pipeline);

content_copy link edit
model = Model()
splitter = Splitter()
pipeline = Pipeline("squadv1", model, splitter)
await collection.add_pipeline(pipeline)

Upsert Documents

Text passages are upserted into the collection.

content_copy link edit
const documents = [
{
id: "...",
text: "...",
}
];
await collection.upsert_documents(documents);

content_copy link edit
data = load_dataset("squad")
documents = [
{"id": ..., "text": ...}
for r in data
]
await collection.upsert_documents(documents)

Query for Context

A vector search retrieves a relevant text passage.

content_copy link edit
const queryResults = await collection
.query()
.vector_recall(query, pipeline)
.fetch_all();
const context = queryResults[0][1];

content_copy link edit
results = await collection.query()
.vector_recall(query, pipeline)
.fetch_all()
context = results[0][1]

Summarize Text

The text is summarized using a pretrained model.

content_copy link edit
const builtins = pgml.newBuiltins();
const summary = await builtins.transform(
{task: "summarization",
model: "sshleifer/distilbart-cnn-12-6"},
[context]
);

content_copy link edit
builtins = Builtins()
summary = await builtins.transform(
{"task": "summarization",
"model": "sshleifer/distilbart-cnn-12-6"},
[context]
)