The pgml.transform()
function is the most powerful feature of PostgresML. It integrates open-source large language models, like Llama, Mixtral, and many more, which allows to perform complex tasks on your data.
The models are downloaded from 🤗 Hugging Face which hosts tens of thousands of pre-trained and fine-tuned models for various tasks like text generation, question answering, summarization, text classification, and more.
The pgml.transform()
function comes in two flavors, task-based and model-based.
The task-based API automatically chooses a model based on the task:
content_copy
pgml.transform(
task TEXT,
args JSONB,
inputs TEXT[]
)
Argument |
Description |
Example |
Required |
task |
The name of a natural language processing task. |
'text-generation' |
Required |
args |
Additional kwargs to pass to the pipeline. |
'{"max_new_tokens": 50}'::JSONB |
Optional |
inputs |
Array of prompts to pass to the model for inference. Each prompt is evaluated independently and a separate result is returned. |
ARRAY['Once upon a time...'] |
Required |
-
-
content_copy
SELECT *
FROM pgml.transform(
task => 'text-generation',
inputs => ARRAY['In a galaxy far far away']
);
content_copy
SELECT *
FROM pgml.transform(
task => 'translation_en_to_fr',
inputs => ARRAY['How do I say hello in French?']
);
The model-based API requires the name of the model and the task, passed as a JSON object. This allows it to be more generic and support more models:
content_copy
pgml.transform(
model JSONB,
args JSONB,
inputs TEXT[]
)
Argument |
Description |
Example |
model |
Model configuration, including name and task. |
'{
"task": "text-generation",
"model": "mistralai/Mixtral-8x7B-v0.1"
}'::JSONB
|
args |
Additional kwargs to pass to the pipeline. |
'{"max_new_tokens": 50}'::JSONB |
inputs |
Array of prompts to pass to the model for inference. Each prompt is evaluated independently. |
ARRAY['Once upon a time...'] |
-
-
content_copy
SELECT pgml.transform(
task => '{
"task": "text-generation",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"model_type": "mistral",
"revision": "main",
"device_map": "auto"
}'::JSONB,
inputs => ARRAY['AI is going to'],
args => '{
"max_new_tokens": 100
}'::JSONB
);
content_copy
import transformers
def transform(task, call, inputs):
return transformers.pipeline(**task)(inputs, **call)
transform(
{
"task": "text-generation",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"model_type": "mistral",
"revision": "main",
},
{"max_new_tokens": 100},
['AI is going to change the world in the following ways:']
)
PostgresML currently supports most NLP tasks available on Hugging Face:
Both versions of the pgml.transform()
function also support structured inputs, formatted with JSON. Structured inputs are used with the conversational task, e.g. to differentiate between the system and user prompts. Simply replace the text array argument with an array of JSONB objects.