A demo trilogy: Vectorizer and Hybrid search

SereneDB Team

May 19, 2026 · 9 minutes read

A demo trilogy: Vectorizer and Hybrid search

How to use similarity search and manage embeddings in SereneDB?

There is a lot of interest in processing vectors lately, and the usual answer is to set up a separate vector database next to the one that already holds your data. We think it's the wrong abstraction, and we are not alone in this — see Vector Databases Are the Wrong Abstraction from TigerData for the same argument. Embeddings are not independent data, they are derived from the source text, very much like a regular index. The moment you put them in a separate system, you have to manage a second database and a sync pipeline between the two, which is quite a hard SRE and DevOps task on its own. The cleaner approach is to keep embeddings next to the data they describe, and let the database manage them as it already manages indexes. That is exactly what this post is about.

The previous post explored feature-rich access patterns in SereneDB. I highly recommend reading it before this post, so you can understand the full potential of remote indexing.

After reading about remote text indexing in SereneDB, you might have had a desire to use not only full-text indexes, but also similarity search. Well, lucky for you, we've created a demo that covers search over text embeddings with full-text filters. Moreover, since you may need to transform text into embeddings yourself, we've added a special function which allows you to trigger your favourite API (OpenAI, Voyage AI, Gemini, Ollama) and transform text into a vector. The second demo describes these cases more deeply.

Demo1: ANN and Hybrid search

For the first demo, we took a part of the Wikipedia dataset that has already been preprocessed with openai-3-small-1536. There are two scenarios of querying: from a table and from a view. In the first one we insert data from Parquet files into the table and then build an index over the table. By storing all the data on disk, the user is able to minimize the latency of lookups. In the second scenario the view is created over a Parquet file and the index is built over the view object. This allows the user to save disk space at the cost of additional latency from a remote request on each read.

To start, we create a hybrid index:

CREATE INDEX dbpedia_idx ON dbpedia USING inverted(
  text       dbpedia_en,
  embedding  ivf (metric = 'cosine')
);

Q1

SELECT title
FROM   dbpedia_idx d
ORDER BY d.embedding <=> $1::FLOAT[1536]
LIMIT 5;

It is a basic nearest-neighbors query that finds the 5 closest vectors to the given one ($1). Worth mentioning, the distance metric in the query should match the metric in the index definition.

Q2

SELECT d.title
FROM   dbpedia_idx d
WHERE  d.embedding <=> $1::FLOAT[1536] < 0.3
LIMIT 10;

In the next query we find all the fields that are inside a circle of radius 0.3 around the query vector $1. This is useful when you don't really care about the top-K, but rather about every item that is close enough. The LIMIT 10 is just a safety cap on the result size: the radius does the actual filtering.

Q3

SELECT title,
       left(text, 80) AS snippet
FROM   dbpedia_idx d
WHERE  text @@ (ts_phrase('physicist')
                && !!ts_phrase('philosophy')
                && (ts_phrase('quantum mechanics') || ts_phrase('general relativity')))
ORDER BY d.embedding <=> $1::FLOAT[1536]
LIMIT 5;

Next, we add a full-text filter to the Q1 query. The abstract must mention "physicist", must not mention "philosophy", and must contain "quantum mechanics" or "general relativity". This BM25 filter and the ANN traversal are applied on the fly against the same inverted index, instead of running as two separate stages. Since the text field is part of the index, the filter is served by the index itself, not by a table lookup, and the table is touched only for the final five rows.

Q4

SELECT title,
       left(text, 80) AS snippet
FROM   dbpedia_idx d
WHERE  text @@ (('quantum' ## 'mechanics')
                || ts_levenshtein('Schrodinger', 2)
                || ts_regexp('heisen[bu]+rg'))
ORDER BY d.embedding <=> $1::FLOAT[1536]
LIMIT 5;

The full-text side also supports proximity, fuzzy and regex matching. ## matches "quantum" immediately next to "mechanics", ts_levenshtein catches "Schrodinger" / "Schroedinger" / "Schrödinger" with edit distance 2, and ts_regexp restricts to "Heisenberg" / "Heisenburg". All of these compose with the ANN search in the same query plan.

Demo2: Generate embeddings yourself!

In case you want your own embeddings, we created the ai_embed function! ai_embed is the first step towards AI and RAG workloads in SereneDB. It accesses your custom API endpoint and retrieves embeddings from there:

-- OpenAI embeddings
CREATE SECRET openai (
  TYPE    openai,
  api_key 'API_KEY'
);
SELECT ai_embed('hello world', 'text-embedding-3-small', 'openai');

-- Gemini embeddings through the OpenAI-compatible API
CREATE SECRET gemini (
  TYPE             openai,
  api_key          'API_KEY',
  base_url         'https://generativelanguage.googleapis.com',
  embeddings_path  '/v1beta/openai/embeddings'
);
SELECT ai_embed('hello world', 'gemini-embedding-001', 'gemini');

-- local Ollama
CREATE SECRET ollama (
  TYPE             openai,
  base_url         'http://<host>:<port>',
  embeddings_path  '/v1/embeddings'
);
SELECT ai_embed('hello world', 'nomic-embed-text', 'ollama');

This way, you are able to take control over text transformation right in SereneDB with just 2 statements:

Create a secret object with an API key and a custom URL, if needed.
Transform text with any model using the ai_embed function.

There are no third-party ETL services nor preprocessed embeddings datasets, just the SereneDB server. And the second demo will show you how easy it is.

Self-embedding semantic search over arXiv abstracts

For the second demo we use a slice of neuralwork/arxiver and compute the embeddings on the fly via ai_embed(...) against Google's Gemini API through its OpenAI-compatible endpoint. The exact same function is reused at query time, so a natural-language prompt becomes a vector in-line inside the ORDER BY:

CREATE SECRET gemini (
  TYPE             openai,
  api_key          'API_KEY',
  base_url         'https://generativelanguage.googleapis.com',
  embeddings_path  '/v1beta/openai/embeddings'
);

CREATE TABLE arxiv (
  id              VARCHAR,
  title           VARCHAR,
  abstract        VARCHAR,
  authors         VARCHAR,
  published_date  TIMESTAMP,
  embedding       FLOAT[3072]
);

INSERT INTO arxiv
SELECT id,
       title,
       abstract,
       authors,
       published_date,
       ai_embed(abstract, 'gemini-embedding-001', 'gemini')::FLOAT[3072]
FROM (
  SELECT id,
         title,
         abstract,
         authors,
         strptime(published_date, '%Y-%m-%dT%H:%M:%SZ') AS published_date
  FROM   read_parquet(
           'https://huggingface.co/datasets/neuralwork/arxiver/resolve/main/data/train.parquet')
) src;

CREATE INDEX arxiv_idx ON arxiv USING inverted(
  abstract   arxiv_en,
  embedding  ivf (metric = 'cosine')
);

As before we created one hybrid index over two fields: abstract (VARCHAR) and embedding (FLOAT[3072]).

Q1 — pure semantic search

SELECT title
FROM   arxiv_idx a
ORDER BY a.embedding <=> ai_embed(
           'Compaction in LLM',
           'gemini-embedding-001',
           'gemini')::FLOAT[3072]
LIMIT 5;

Often a user is after the meaning of a query rather than the exact words it contains. Here is an example of such a query, where the result contains information about papers somehow linked to "Compaction in LLM". Firstly, the text is transformed using Gemini into a vector, and then (like in Q1 of demo 1) the search is run on arxiv_idx.

Q2 — hybrid: lexical scope + semantic rerank

SELECT title
FROM   arxiv_idx a
WHERE  abstract @@ (ts_phrase('OpenAI')
                    && !!ts_phrase('survey')
                    && (ts_starts_with('gpt') || ts_starts_with('gemini')))
ORDER BY a.embedding <=> ai_embed(
           'evaluating frontier model limits',
           'gemini-embedding-001',
           'gemini')::FLOAT[3072]
LIMIT 5;

The next enhancement is the unification of full-text and ANN searches. The boolean filter (must mention "OpenAI", must not mention "survey", and must mention GPT or Gemini) and the ANN scoring against "evaluating frontier model limits" run together in the same index pass, so only abstracts that satisfy the filter ever reach the top-5. This pattern is handy for things like company, author or model names, words that can't be rephrased and have to be present in the text.

Q3 — Hybrid search with generic filter

SELECT published_date::DATE AS published,
       title
FROM   arxiv_idx a
WHERE  published_date < TIMESTAMP '2024-01-01'
  AND  abstract @@ ((ts_starts_with('agent') || ts_starts_with('tool'))
                    && !!ts_phrase('robotic')
                    && ts_phrase('chain of thought'))
ORDER BY a.embedding <=> ai_embed(
           'LLM agents using tools',
           'gemini-embedding-001',
           'gemini')::FLOAT[3072]
LIMIT 5;

The filter does not have to be FTS, any SQL predicate works. Here a date predicate keeps only pre-2024 papers, while the boolean BM25 filter (agent/tool terminology, no robotics, and a chain-of-thought hint) and the ANN scoring against "LLM agents using tools" run together over the same index, so the agent/tool filter and the semantic ordering are applied in one pass instead of two.

Try it yourself

Everything in this article lives in the examples/ folder of the SereneDB repo on GitHub:

examples/demo4: hybrid vector + full-text search over DBpedia abstracts (the first demo, both native-table and remote-Parquet-view variants)
examples/demo5: self-embedding semantic search over arXiv abstracts via ai_embed (the second demo)

Each demo has a runnable demo.sql and a bootstrap.sql file, which sets up the environment (downloads dataset or links view to some endpoint). To run any of them locally, grab a build from the downloads page.

What's next?

We're actively developing vector search features in SereneDB, so there will be a lot more in the future. If you don't want to miss it, star us on GitHub. It genuinely helps us reach more people.