Skip to content

Product Search

Product search follows the same hybrid search architecture as the Content Lake search pipeline, but operates at the product level rather than the block level.

Indexing

Products are indexed as single vectors with the following attributes:

Attribute Type Description
id string Product ID
vector float[] Product embedding
title string Full-text indexed product title
description string Full-text indexed description
authors string[] Author names
product_type string Book, Video Course, etc.
product_model string legacy or content_lake
state string Current lifecycle state
tags string[] Product tags
language string ISO 639-1 language code

Search Modes

The search_mode parameter controls retrieval strategy:

  • hybrid (default) — weighted combination of vector similarity and BM25 full-text matching. Best for general product discovery.
  • keyword — BM25 full-text ranking only. Best for known-item retrieval (searching by exact title or ISBN).
  • vector — approximate nearest neighbour only. Best for recommendation-style queries ("products similar to X").

Search supports tag filtering, pagination via search sessions, and re-ranking — all following the patterns documented in the Content Lake search specification.

Performance

  • Tag-filtered queries: ~10–100ms
  • Full hybrid search: 1–5s, proportional to index size. Use tag filters to narrow candidates.

Vector Embeddings and Client Taxonomy

Product embeddings are exposed via the API so that client applications can overlay their own taxonomy and categorisation. This is a deliberate design choice — different storefronts have different category schemes, and the product catalogue should not need to know about any of them.

The Problem

Packtpub.com might categorise products by programming language and skill level. Gamedevassembly.com might categorise by game engine and discipline. A subscription platform might organise by learning path. These taxonomies are specific to each client application and change independently of the product catalogue.

The Approach

The Product Management service provides raw product embedding vectors. Client applications retrieve these vectors and use them for their own classification, clustering, and nearest-neighbour operations:

  1. Retrieve individual embeddings — fetch the embedding for a specific product for real-time classification or similarity lookups.
  2. Batch export embeddings — export embeddings for all products in a catalogue or matching a filter, useful for building or retraining a client-side taxonomy model.
  3. Nearest-neighbour queries — use the search API with vector mode to find products similar to a given vector for category pages or recommendation panels.

Taxonomy is a Client Concern

The platform does not store, manage, or enforce any taxonomy. It provides the vector space and search infrastructure. How a client maps products into categories — whether through k-means clustering, manual curation, or a trained classifier — is entirely up to the client application. This separation means:

  • Adding a new storefront does not require changes to the product catalogue
  • Changing a taxonomy does not require re-indexing products
  • Multiple storefronts can classify the same products differently