Plans and

Start small, scale instantly.
Get $100 Free Usage Credits
Get $100
free usage


The easiest way to start and scale your RAG app.

    Burst GPU capacity
    Access curated models
    Support on Discord


Dedicated hardware for teams with established workloads.

    Committed use discounts
    Dedicated hardware
    Use any model on HuggingFace
    Deploy on major cloud providers in any region
    Dedicated support on private Slack or MS Teams


Dedicated hardware for at-scale teams w/ advanced security needs.

    Pay as you go or committed use pricing
    VPC deployments on major cloud providers in any region
    Multiple GPUs
    Custom SLAs
    Premium support and onboarding
    Dedicated support on Slack or MS Teams
    Priority feature requests

How does PostgresML
pricing work?


Only use what you need, and pay as you go with no up-front costs. 

Committed use discounts

Commit to certain levels of usage for a fixed monthly cost and get a discounted rate. Scale your configuration up or down at any time with the click of a button.

Serverless pricing

Storage is charged per GB/mo, and all requests by CPU or GPU millisecond of compute required to perform them.

Vector & Relational Database

Name Pricing
Tables & index storage $0.25/GB per month
Retrieval, filtering, ranking & other queries $7.50 per hour
Embeddings Included w/ queries
LLMs Included w/ queries
Fine tuning Included w/ queries
Machine learning Included w/ queries

Serverless models

Serverless AI engines come with predefined models and a flexible pricing structure.

Embedding Models

Name Parameters (M) Max input tokens Dimensions Strengths
intfloat/e5-small-v2 33.4 512 384 Good quality, low latency
mixedbread-ai/mxbai-embed-large-v1 335 512 1024 High quality, higher latency
Alibaba-NLP/gte-base-en-v1.5 137 8192 768 Supports up to 8,000 input tokens
Alibaba-NLP/gte-large-en-v1.5 434 8192 1024 Highest quality, 8,000 input tokens

Instruct Models

Name Parameters (M) Active Parameters (M) Context size Strengths
meta-llama/Meta-Llama-3-70B-Instruct 70,000 70,000 8,000 Highest quality
meta-llama/Meta-Llama-3-8B-Instruct 8,000 8,000 8,000 High quality, low latency
microsoft/Phi-3-mini-128k-instruct 3,820 3,820 128,000 Lowest latency
mistralai/Mixtral-8x7B-Instruct-v0.1 56,000 12,900 32,768 MOE high quality
mistralai/Mistral-7B-Instruct-v0.2 7,000 7,000 32,768 High quality, low latency

Summarization Models

Name Parameters (M) Context size Strengths
google/pegasus-xsum 568 512 8,000

Cost estimator

bytes of metadata per record
vector dimensions
embedding tokens per record
Read Queries
per month
Write Vectors
per month
Text Generation
input tokens per request
per month
output tokens per request
cost per month

vs Pinecone + OpenAI

Get Started

Detailed estimate 🤌

Vector Database
unit total
import -
storage - /month
read queries - /month
write queries - /month
total - /month
unit total
import -
storage - /month
read queries - /month
write queries - /month
total - /month
Open AI
ADA-V2 total
import -
read tokens - /month
write tokens - /month
total - /month
units total
import included -
read tokens included - /month
write tokens included - /month
total - /month
Text Generation
Open AI
model total
gpt-3.5-turbo-0125 - /month
model total
mixtral-8x7B - /month
All-in Rag
import -
total - /month
import -
total - /month
Get Started



What does serverless mean on PostgresML?
add remove

On PostgresML you can build and scale an AI engine without having to manage servers, GPUs, or a database. Your AI engine will respond to your application’s demand automatically, and scale up or down as needed. Your charges will be based purely on your usage, and measured down to the millisecond.

Does PostgresML charge per token?
add remove

PostgresML does not charge per token. We charge by the amount of time a query runs. Queries that generate or process more tokens will often run longer, but queries that use smaller models will run more quickly. You’re only charged for the resources you use.

Does PostgresML charge for storage?
add remove

PostgresML charges $0.25 per gigabyte per month for storage. This includes fault tolerant RAID configurations for high availability as well as backups for disaster recovery.

How is PostgresML so inexpensive?
add remove

Our approach to GPU memory management is inherently more efficient because at PostgresML, we move full AI capability to the database rather than moving the data to the models.

How does the cost estimator work?
add remove

PostgresML estimates costs based on typical workloads and real world benchmarks. Workload prediction is difficult which can make future cost estimation even harder. Please contact our team if you would like help estimating the size of your workload and the associated costs. We’re happy to help if you have any questions.

What can I do with my free credits?
add remove

Anything you want with PostgresML. We’ll send you an email when your free credits expire as a reminder that you may start incurring charges in the future.

How does billing work?
add remove

By default, you will be billed monthly based on your usage. You will receive an invoice with total charges three days before your elected payment method is automatically billed. If you incur significantly increased utilization before your normal billing cycle, we will notify you with an off cycle invoice to help you control costs and maintain service.

Does PostgresML provide technical support?
add remove

Serverless plans have access to our community Discord. Dedicated plans offer a private Slack or MS teams channel for direct communication with our team. PostgresML provides custom SLAs for enterprise plans. Contact us for details.

Still have questions?

Contact us for more details about PostgresML plans and pricing.

Contact Us

Get started
with $100 in
free credits

Sign up and and complete your profile to get $100
in free usage credits towards your first AI engine.

Start building with PostgresML
Start building with PostgresML