pgml.transform()
The pgml.transform()
function is the most powerful feature of PostgresML. It integrates open-source large language models, like Llama, Mixtral, and many more, which allows to perform complex tasks on your data.
The models are downloaded from 🤗 Hugging Face which hosts tens of thousands of pre-trained and fine-tuned models for various tasks like text generation, question answering, summarization, text classification, and more.
API
The pgml.transform()
function comes in two flavors, task-based and model-based.
Task-based API
The task-based API automatically chooses a model based on the task:
pgml.transform(
task TEXT,
args JSONB,
inputs TEXT[]
)
Argument | Description | Example | Required |
---|---|---|---|
task | The name of a natural language processing task. | 'text-generation' |
Required |
args | Additional kwargs to pass to the pipeline. | '{"max_new_tokens": 50}'::JSONB |
Optional |
inputs | Array of prompts to pass to the model for inference. Each prompt is evaluated independently and a separate result is returned. | ARRAY['Once upon a time...'] |
Required |
Examples
SELECT *
FROM pgml.transform(
task => 'text-generation',
inputs => ARRAY['In a galaxy far far away']
);
SELECT *
FROM pgml.transform(
task => 'translation_en_to_fr',
inputs => ARRAY['How do I say hello in French?']
);
Model-based API
The model-based API requires the name of the model and the task, passed as a JSON object. This allows it to be more generic and support more models:
pgml.transform(
model JSONB,
args JSONB,
inputs TEXT[]
)
Argument | Description | Example |
---|---|---|
model | Model configuration, including name and task. |
'{
"task": "text-generation", "model": "mistralai/Mixtral-8x7B-v0.1" }'::JSONB |
args | Additional kwargs to pass to the pipeline. | '{"max_new_tokens": 50}'::JSONB |
inputs | Array of prompts to pass to the model for inference. Each prompt is evaluated independently. | ARRAY['Once upon a time...'] |
Example
SELECT pgml.transform(
task => '{
"task": "text-generation",
"model": "TheBloke/zephyr-7B-beta-GPTQ",
"model_type": "mistral",
"revision": "main",
"device_map": "auto"
}'::JSONB,
inputs => ARRAY['AI is going to'],
args => '{
"max_new_tokens": 100
}'::JSONB
);
import transformers
def transform(task, call, inputs):
return transformers.pipeline(**task)(inputs, **call)
transform(
{
"task": "text-generation",
"model": "TheBloke/zephyr-7B-beta-GPTQ",
"model_type": "mistral",
"revision": "main",
},
{"max_new_tokens": 100},
['AI is going to change the world in the following ways:']
)
Supported tasks
PostgresML currently supports most NLP tasks available on Hugging Face:
Task | Name | Description |
---|---|---|
Fill mask | key-mask |
Fill in the blank in a sentence. |
Question answering | question-answering |
Answer a question based on a context. |
Summarization | summarization |
Summarize a long text. |
Text classification | text-classification |
Classify a text as positive or negative. |
Text generation | text-generation |
Generate text based on a prompt. |
Text-to-text generation | text-to-text-generation |
Generate text based on an instruction in the prompt. |
Token classification | token-classification |
Classify tokens in a text. |
Translation | translation |
Translate text from one language to another. |
Zero-shot classification | zero-shot-classification |
Classify a text without training data. |
Conversational | conversational |
Engage in a conversation with the model, e.g. chatbot. |
Structured inputs
Both versions of the pgml.transform()
function also support structured inputs, formatted with JSON. Structured inputs are used with the conversational task, e.g. to differentiate between the system and user prompts. Simply replace the text array argument with an array of JSONB objects.