pgml.transform()

The pgml.transform() function is the most powerful feature of PostgresML. It integrates open-source large language models, like Llama, Mixtral, and many more, which allows to perform complex tasks on your data.

The models are downloaded from 🤗 Hugging Face which hosts tens of thousands of pre-trained and fine-tuned models for various tasks like text generation, question answering, summarization, text classification, and more.

API

The pgml.transform() function comes in two flavors, task-based and model-based.

Task-based API

The task-based API automatically chooses a model based on the task:

content_copy link edit
pgml.transform(
task TEXT,
args JSONB,
inputs TEXT[]
)
Argument Description Example Required
task The name of a natural language processing task. 'text-generation' Required
args Additional kwargs to pass to the pipeline. '{"max_new_tokens": 50}'::JSONB Optional
inputs Array of prompts to pass to the model for inference. Each prompt is evaluated independently and a separate result is returned. ARRAY['Once upon a time...'] Required

Examples

content_copy link edit
SELECT *
FROM pgml.transform(
task => 'text-generation',
inputs => ARRAY['In a galaxy far far away']
);

content_copy link edit
SELECT *
FROM pgml.transform(
task => 'translation_en_to_fr',
inputs => ARRAY['How do I say hello in French?']
);

Model-based API

The model-based API requires the name of the model and the task, passed as a JSON object. This allows it to be more generic and support more models:

content_copy link edit
pgml.transform(
model JSONB,
args JSONB,
inputs TEXT[]
)
Argument Description Example
model Model configuration, including name and task.
'{
  "task": "text-generation",
  "model": "mistralai/Mixtral-8x7B-v0.1"
}'::JSONB
args Additional kwargs to pass to the pipeline. '{"max_new_tokens": 50}'::JSONB
inputs Array of prompts to pass to the model for inference. Each prompt is evaluated independently. ARRAY['Once upon a time...']

Example

content_copy link edit
SELECT pgml.transform(
task => '{
"task": "text-generation",
"model": "TheBloke/zephyr-7B-beta-GPTQ",
"model_type": "mistral",
"revision": "main",
"device_map": "auto"
}'::JSONB,
inputs => ARRAY['AI is going to'],
args => '{
"max_new_tokens": 100
}'::JSONB
);

content_copy link edit
import transformers
def transform(task, call, inputs):
return transformers.pipeline(**task)(inputs, **call)
transform(
{
"task": "text-generation",
"model": "TheBloke/zephyr-7B-beta-GPTQ",
"model_type": "mistral",
"revision": "main",
},
{"max_new_tokens": 100},
['AI is going to change the world in the following ways:']
)

Supported tasks

PostgresML currently supports most NLP tasks available on Hugging Face:

Task Name Description
Fill mask key-mask Fill in the blank in a sentence.
Question answering question-answering Answer a question based on a context.
Summarization summarization Summarize a long text.
Text classification text-classification Classify a text as positive or negative.
Text generation text-generation Generate text based on a prompt.
Text-to-text generation text-to-text-generation Generate text based on an instruction in the prompt.
Token classification token-classification Classify tokens in a text.
Translation translation Translate text from one language to another.
Zero-shot classification zero-shot-classification Classify a text without training data.
Conversational conversational Engage in a conversation with the model, e.g. chatbot.

Structured inputs

Both versions of the pgml.transform() function also support structured inputs, formatted with JSON. Structured inputs are used with the conversational task, e.g. to differentiate between the system and user prompts. Simply replace the text array argument with an array of JSONB objects.

Additional resources