Introducing ParadeDB
By Ming Ying on August 31, 2023
We’re excited to announce ParadeDB: a PostgreSQL database optimized for search. ParadeDB is the first Postgres database built to be an Elasticsearch alternative, engineered for lightning-fast full text, semantic, and hybrid search over Postgres tables.
Why We Built ParadeDB
For many organizations, search remains an unsolved problem. Despite the existence of giants like Elasticsearch, most developers who have worked with Elasticsearch know how incredibly painful it is to run, tune, and manage. While alternative search engines exist, gluing these external services on top of an existing database introduces the headache and costs of reindexing and duplicating data.
Many developers seeking a unified source of truth and search engine turn towards Postgres, which offers basic full text search via tsvector
and semantic search via pgvector
. These tools may work for simple use cases and medium-sized datasets, but break down when tables get large or queries become complex:
- Slow ranking and phrase search over large tables
- No support for BM25 calculations
- No support for hybrid search, a technique that combines vector search with full-text search
- No real-time search — data must manually be re-indexed or re-embedded
- Limited support for complex queries like faceting or relevance tuning
By now, we’ve witnessed dozens of engineering teams who have begrudgingly stitched Elasticsearch on top of Postgres, only to ditch it later because it was too bloated, expensive, or convoluted. We asked ourselves: what if Postgres itself was built for Elastic-quality search? What if developers didn’t need to choose between one unified Postgres database with limited search capabilities or two separate services, one as the source of truth and one as the search engine?
Who ParadeDB is For
Elasticsearch has many use cases, and we aren’t trying to tackle all of them — at least not yet. Instead, we’re focused on nailing a core set of use cases for Postgres users that want to search over their database. ParadeDB is a good fit for you if:
- You want a single, Postgres-based source of truth and hate duplicating data across multiple services
- You want to perform full-text search over large volumes of documents stored in Postgres without compromising on performance or scalability
- You want to combine ANN/similarity search with full text search for improved semantic matching
The Product
ParadeDB is gives Postgres the capabilities of a dedicated search engine.
- BM25 scoring: Full text search with support for boolean, fuzzy, boosted, and keyword queries. Results are scored using the BM25 algorithm.
- Faceted search: Easy bucketing and collection of metrics over full text search results.
- Hybrid search: Results are scored with a combination of semantic relevance (i.e. vector search) and full text relevance (i.e. BM25).
- Real time search: Text indexes and vector columns are automatically kept in sync with the underlying data.
How ParadeDB Is Built
The core of ParadeDB is a vanilla Postgres database with custom extensions, written in Rust, that introduce enhanced search capabilities.
ParadeDB’s search engine is built on Tantivy, an open-source, Rust-based search library heavily inspired by Apache Lucene. Search indexes are stored in Postgres as native Postgres indexes, which obviates the need to pipe data out of Postgres and duplicate it in foreign services and guarantees transaction safety.
ParadeDB introduces a new extension to the Postgres ecosystem: pg_search
. pg_search
implements
Rust-based full text search in Postgres using the BM25 scoring algorithm. This extension comes pre-installed
with ParadeDB.
What's Next
We are currently building a cloud version of ParadeDB, and already offer a commercial self-hosted version with support and enterprise features. If you would like to request access to the ParadeDB commercial offerings, we invite you to join our waitlist.
The focus of the core team is on developing the open-source version of ParadeDB, which we will be launching in winter 2023.
We’re building in public and are excited to share ParadeDB with the community. Stay tuned for updates — in future blog posts, we'll be covering some of the interesting technical challenges behind ParadeDB.