
OpenSourceAI is a drop in replacement for OpenAI's chat completion endpoint.


Follow the instillation section in getting-started

When done, set the environment variable DATABASE_URL to your PostgresML database url.

export DATABASE_URL=postgres://

Note that an alternative to setting the environment variable is passing the url to the constructor of OpenSourceAI

const pgml = require("pgml");
const client = pgml.newOpenSourceAI(YOUR_DATABASE_URL);

import pgml
client = pgml.OpenSourceAI(YOUR_DATABASE_URL)


Our OpenSourceAI class provides 4 functions:

  • chat_completions_create
  • chat_completions_create_async
  • chat_completions_create_stream
  • chat_completions_create_stream_async

They all take the same arguments:

  • model a String or Object
  • messages an Array/List of Objects
  • max_tokens the maximum number of new tokens to produce. Default none
  • temperature the temperature of the model. Default 0.8
  • n the number of choices to create. Default 1
  • chat_template a Jinja template to apply the messages onto before tokenizing

The return types of the stream and non-stream variations match OpenAI's return types.

The following examples run through some common use cases.

Synchronous Overview

Here is a simple example using zephyr-7b-beta, one of the best 7 billion parameter models at the time of writing.

const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const results = client.chat_completions_create(
role: "system",
content: "You are a friendly chatbot who always responds in the style of a pirate",
role: "user",
content: "How many helicopters can a human eat in one sitting?",

import pgml
client = pgml.OpenSourceAI()
results = client.chat_completions_create(
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",

"choices": [
"index": 0,
"message": {
"content": "Ahoy, me hearty! As your friendly chatbot, I'd like to inform ye that a human cannot eat a helicopter in one sitting. Helicopters are not edible, as they are not food items. They are flying machines used for transportation, search and rescue operations, and other purposes. A human can only eat food items, such as fruits, vegetables, meat, and other edible items. I hope this helps, me hearties!",
"role": "assistant"
"created": 1701291672,
"id": "abf042d2-9159-49cb-9fd3-eef16feb246c",
"model": "HuggingFaceH4/zephyr-7b-beta",
"object": "chat.completion",
"system_fingerprint": "eecec9d4-c28b-5a27-f90b-66c3fb6cee46",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0

We don't charge per token, so OpenAI “usage” metrics are not particularly relevant. We'll be extending this data with more direct CPU/GPU resource utilization measurements for users who are interested, or need to pass real usage based pricing on to their own customers.

Notice there is near one to one relation between the parameters and return type of OpenAI’s chat.completions.create and our chat_completion_create.

The best part of using open-source AI is the flexibility with models. Unlike OpenAI, we are not restricted to using a few censored models, but have access to almost any model out there.

Here is an example of streaming with the popular Mythalion model, an uncensored MythoMax variant designed for chatting.

const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const it = client.chat_completions_create_stream(
role: "system",
content: "You are a friendly chatbot who always responds in the style of a pirate",
role: "user",
content: "How many helicopters can a human eat in one sitting?",
let result =;
while (!result.done) {
result =;

import pgml
client = pgml.OpenSourceAI()
results = client.chat_completions_create_stream(
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",
for c in results:

  "choices": [
      "delta": {
        "content": "Y",
        "role": "assistant"
      "index": 0
  "created": 1701296792,
  "id": "62a817f5-549b-43e0-8f0c-a7cb204ab897",
  "model": "PygmalionAI/mythalion-13b",
  "object": "chat.completion.chunk",
  "system_fingerprint": "f366d657-75f9-9c33-8e57-1e6be2cf62f3"
  "choices": [
      "delta": {
        "content": "e",
        "role": "assistant"
      "index": 0
  "created": 1701296792,
  "id": "62a817f5-549b-43e0-8f0c-a7cb204ab897",
  "model": "PygmalionAI/mythalion-13b",
  "object": "chat.completion.chunk",
  "system_fingerprint": "f366d657-75f9-9c33-8e57-1e6be2cf62f3"

We have truncated the output to two items

Once again, notice there is near one to one relation between the parameters and return type of OpenAI’s chat.completions.create with the stream argument set to true and our chat_completions_create_stream.

Asynchronous Variations

We also have asynchronous versions of the chat_completions_create and chat_completions_create_stream

const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const results = await client.chat_completions_create_async(
role: "system",
content: "You are a friendly chatbot who always responds in the style of a pirate",
role: "user",
content: "How many helicopters can a human eat in one sitting?",

import pgml
client = pgml.OpenSourceAI()
results = await client.chat_completions_create_async(
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",

"choices": [
"index": 0,
"message": {
"content": "Ahoy, me hearty! As your friendly chatbot, I'd like to inform ye that a human cannot eat a helicopter in one sitting. Helicopters are not edible, as they are not food items. They are flying machines used for transportation, search and rescue operations, and other purposes. A human can only eat food items, such as fruits, vegetables, meat, and other edible items. I hope this helps, me hearties!",
"role": "assistant"
"created": 1701291672,
"id": "abf042d2-9159-49cb-9fd3-eef16feb246c",
"model": "HuggingFaceH4/zephyr-7b-beta",
"object": "chat.completion",
"system_fingerprint": "eecec9d4-c28b-5a27-f90b-66c3fb6cee46",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0

Notice the return types for the sync and async variations are the same.

const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const it = await client.chat_completions_create_stream_async(
role: "system",
content: "You are a friendly chatbot who always responds in the style of a pirate",
role: "user",
content: "How many helicopters can a human eat in one sitting?",
let result = await;
while (!result.done) {
result = await;

import pgml
client = pgml.OpenSourceAI()
results = await client.chat_completions_create_stream_async(
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",
async for c in results:

"choices": [
"delta": {
"content": "Y",
"role": "assistant"
"index": 0
"created": 1701296792,
"id": "62a817f5-549b-43e0-8f0c-a7cb204ab897",
"model": "PygmalionAI/mythalion-13b",
"object": "chat.completion.chunk",
"system_fingerprint": "f366d657-75f9-9c33-8e57-1e6be2cf62f3"
"choices": [
"delta": {
"content": "e",
"role": "assistant"
"index": 0
"created": 1701296792,
"id": "62a817f5-549b-43e0-8f0c-a7cb204ab897",
"model": "PygmalionAI/mythalion-13b",
"object": "chat.completion.chunk",
"system_fingerprint": "f366d657-75f9-9c33-8e57-1e6be2cf62f3"

We have truncated the output to two items

Specifying Unique Models

We have tested the following models and verified they work with the OpenSourceAI:

  • Phind/Phind-CodeLlama-34B-v2
  • HuggingFaceH4/zephyr-7b-beta
  • deepseek-ai/deepseek-llm-7b-chat
  • PygmalionAI/mythalion-13b
  • Gryphe/MythoMax-L2-13b
  • Undi95/ReMM-SLERP-L2-13B
  • Undi95/Toppy-M-7B
  • Open-Orca/Mistral-7B-OpenOrca
  • teknium/OpenHermes-2.5-Mistral-7B
  • mistralai/Mistral-7B-Instruct-v0.1
  • HuggingFaceH4/zephyr-7b-beta

Any model on hugging face should work with our OpenSourceAI. Here is an example of using one of the more popular quantized models from TheBloke.

const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const results = await client.chat_completions_create_async(
model: "TheBloke/vicuna-13B-v1.5-16K-GPTQ",
device_map: "auto",
revision: "main"
role: "system",
content: "You are a friendly chatbot who always responds in the style of a pirate",
role: "user",
content: "How many helicopters can a human eat in one sitting?",

import pgml
client = pgml.OpenSourceAI()
results = client.chat_completions_create(
"model": "TheBloke/vicuna-13B-v1.5-16K-GPTQ",
"device_map": "auto",
"revision": "main"
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",

Notice that we don't specify a model name, but model JSON this time. The JSON keys in the model argument roughly follow the task argument when using our text-generation SQL API.

To access a gated repo like meta-llama/Llama-2-7b-chat-hf simply provide the necessary hugging face token.

const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const results = await client.chat_completions_create_async(
model: "meta-llama/Llama-2-7b-chat-hf",
torch_dtype: "bfloat16",
device_map: "auto",
token: "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"
role: "system",
content: "You are a friendly chatbot who always responds in the style of a pirate",
role: "user",
content: "How many helicopters can a human eat in one sitting?",

import pgml
client = pgml.OpenSourceAI()
results = client.chat_completions_create(
"model": "meta-llama/Llama-2-7b-chat-hf",
"torch_dtype": "bfloat16",
"device_map": "auto",
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",