Logical replication allows your PostgresML database to copy data from your primary database to PostgresML in real time. As soon as your customers make changes to their data on your website, those changes will become available in PostgresML.
Setting up & maintaining logical replication requires a few steps, but once you're done, you'll be able to generate embeddings, train models & generate text using LLMs directly using your production data.
First things first, make sure your primary database is configured to support logical replication. To do so, make sure the following settings are set:
Setting |
Value |
wal_level |
logical |
wal_senders |
Greater than 0 |
max_replication_slots |
Greater than 0 |
rds.logical_replication (only on AWS RDS) |
1 |
Make sure to restart your database after changing any of these settings.
All PostgresML databases are allowed to connect to any other database through the Internet by default. You can test connectivity to your database from PostgresML by using the dblink
extension:
content_copy
SELECT
dblink(
'postgres://user:password@your-production-db.amazonaws.com:5432/production_db',
'SELECT 1 AS one'
) AS t1(one integer);
Logical replication works like a pub/sub system: your primary database decides which tables it would like to publish, and PostgresML subscribes to those changes and downloads them as they are made.
A publication is a set of tables that your primary database would like to share with your PostgresML database. To create a publication, connect to your primary database as a superuser and create the publication for your tables of choice:
content_copy
CREATE PUBLICATION postgresml
FOR TABLE your_list_of_tables;
where your_list_of_tables
are the tables you'd like to replicate. For example, if you have two tables, users and blog_posts, you can create a publication for those two tables using this command:
content_copy
CREATE PUBLICATION postgresml_users
FOR TABLE users, blog_posts;
content_copy
CREATE PUBLICATION
Now that we have a list of tables we want to replicate, we need to make sure those tables exist in your PostgresML database. Logical replication only sends over the data, without knowing anything else about your databases. Therefore, we need to make sure both the tables in your primary database and in your PostgresML databases match.
You can get the schema for your tables either by using a PostgreSQL client like pgAdmin or, more easily, by using pg_dump and then importing it into PostgresML using psql:
-
-
content_copy
pg_dump \
postgres://user:password@yyour-production-db.amazonaws.com:5432/prodution_db \
--schema-only \
--no-owner \
--no-privileges \
-t users \
-t blog_posts \
> schema.sql
content_copy
psql \
postgres://user:password@db.cloud.postgresml.org:6432/your_postgresml_database \
-f schema.sql
Once you have the tables created, we can start replicating data:
content_copy
CREATE SUBSCRIPTION postgresml
CONNECTION 'postgres://user:password@your-production-db.amazonaws.com:5432/prodution_db'
PUBLICATION postgresml;
content_copy
CREATE SUBSCRIPTION
As soon you run this command, the PostgresML database will create a connection to your production database and copy the data from your tables into your PostgresML tables. Once that's done, the replication will start in real time and individual changes will be sent one row at a time.