Introducing PostgresML Python SDK: Build End-to-End Vector Search Applications without OpenAI and Pinecone

Santi Adavani

June 01, 2023

We are excited to introduce a Python SDK for PostgresML that streamlines the development of scalable vector search applications on PostgreSQL databases. Traditionally, building a vector search application requires spinning up an application database, connecting to external OpenAI or HuggingFace REST API services for generating embeddings, and integrating with vector databases like Pinecone for indexing and search. This approach increases infrastructure footprint, maintenance efforts, and query latency.

With the PostgresML Python SDK, developers now have a unified solution. They can effortlessly manage a single application database where they can handle: document management, embedding generation, indexing, and searching. This eliminates the need for multiple infrastructure components, simplifies maintenance, and reduces query latencies. The SDK offers a comprehensive set of tools for managing database tables related to documents, text chunks, text splitters, LLM models, and embeddings, enabling seamless integration of advanced search functionalities.

Key Features

Automated Database Management

The Python SDK automates the management of various database tables, eliminating the complexity of setting up and maintaining the data structure required for vector search applications. With this automated system, you can focus on building robust search functionalities while the SDK handles the underlying database management.

Embedding Generation from Open Source Models

Leveraging the Python SDK, you gain access to a vast collection of open source models. These models have been trained on extensive datasets and capture the semantic meaning of text. With just a few lines of code, you can generate embeddings using these models, enabling powerful analysis and search capabilities in your application.

Flexible and Scalable Vector Search

The Python SDK seamlessly integrates with PgVector, a PostgreSQL extension designed for efficient vector-based indexing and querying. By leveraging the power of PgVector, you can perform advanced searches, rank results by relevance, and retrieve accurate and meaningful information from your database. The SDK ensures that your vector search application scales effortlessly to handle increasing amounts of data.

How the Python SDK Works

The Python SDK simplifies the development of vector search applications by abstracting away the complexities of database management and indexing. Here's an overview of how it works:

Document and Text Chunk Management

The SDK simplifies the process of upserting documents and generating text chunks by offering a user-friendly interface. It allows you to effortlessly add and configure various text splitters to generate text chunks of different sizes, overlaps, and file formats, such as Python and Markdown.

Open Source Model Integration

With the SDK, you can seamlessly incorporate a wide range of open source models from HuggingFace into your application. These models capture the semantic meaning of text and enable powerful analysis and search capabilities. Generating high-quality embeddings from these models is a breeze with the Python SDK.

Embedding Indexing

The Python SDK utilizes the PgVector extension to efficiently index the embeddings generated by the open source models. This indexing process optimizes search performance and allows for fast and accurate retrieval of relevant results, even with large volumes of data.

Querying and Search

Once the embeddings are indexed, the SDK provides intuitive methods for executing vector-based searches on the documents and text chunks stored in the PostgreSQL database. You can easily execute queries and retrieve search results with precise and relevant information.

Use Cases

The Python SDK's embedding capabilities find applications in various scenarios, including:

Search

By comparing embeddings of query strings and documents, you can retrieve search results ranked by their relevance or similarity to the query. This allows users to find the most relevant information quickly and effectively.

Clustering

Utilizing embeddings, you can group text strings based on their similarity. By measuring the similarity between embeddings, you can identify clusters or groups of text strings that share common characteristics, providing valuable insights for data analysis.

Recommendations

Embeddings play a crucial role in recommendation systems. By identifying items with related text strings based on their embeddings, you can deliver personalized recommendations to users, enhancing user experience and engagement.

Anomaly Detection

Anomaly detection involves identifying outliers or anomalies in data. By quantifying the similarity between text strings using embeddings, you can identify anomalies that have little relatedness to the rest of the data, aiding in anomaly detection tasks.

Classification

Embeddings are valuable in classification tasks, where text strings are classified based on their most similar label. By comparing the embeddings of text strings and labels, you can accurately classify new text strings into predefined categories.

Get Started with the Python SDK

To get started with the Python SDK for scalable vector search on PostgreSQL, visit our GitHub repository. You'll find comprehensive documentation, code examples, and installation instructions to help you integrate the SDK into your projects seamlessly.

We're excited to see how the Python SDK transforms your vector search applications, enabling fast, accurate, and scalable search functionalities. Should you have any questions or need assistance please do not hesitate to reach out to us on Discord or send an email.

Happy coding and happy searching!

Introducing PostgresML Python SDK: Build End-to-End Vector Search Applications without OpenAI and Pinecone

Introducing PostgresML Python SDK: Build End-to-End Vector Search Applications without OpenAI and Pinecone

Key Features

Automated Database Management

Embedding Generation from Open Source Models

Flexible and Scalable Vector Search

How the Python SDK Works

Document and Text Chunk Management

Open Source Model Integration

Embedding Indexing

Querying and Search

Use Cases

Search

Clustering

Recommendations

Anomaly Detection

Classification

Get Started with the Python SDK

Related articles

Generating LLM embeddings with open source models in PostgresML

Making Postgres 30 Percent Faster in Production

Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes

Contribute

Contribute

Docs

Docs

Community

Community

PostgresML