Product Search
Product search follows the same hybrid search architecture as the Content Lake search pipeline, but operates at the product level rather than the block level.
Indexing
Products are indexed as single vectors with the following attributes:
| Attribute | Type | Description |
|---|---|---|
id |
string | Product ID |
vector |
float[] | Product embedding |
title |
string | Full-text indexed product title |
description |
string | Full-text indexed description |
authors |
string[] | Author names |
product_type |
string | Book, Video Course, etc. |
product_model |
string | legacy or content_lake |
state |
string | Current lifecycle state |
tags |
string[] | Product tags |
language |
string | ISO 639-1 language code |
Search Modes
The search_mode parameter controls retrieval strategy:
hybrid(default) — weighted combination of vector similarity and BM25 full-text matching. Best for general product discovery.keyword— BM25 full-text ranking only. Best for known-item retrieval (searching by exact title or ISBN).vector— approximate nearest neighbour only. Best for recommendation-style queries ("products similar to X").
Search supports tag filtering, pagination via search sessions, and re-ranking — all following the patterns documented in the Content Lake search specification.
Performance
- Tag-filtered queries: ~10–100ms
- Full hybrid search: 1–5s, proportional to index size. Use tag filters to narrow candidates.
Vector Embeddings and Client Taxonomy
Product embeddings are exposed via the API so that client applications can overlay their own taxonomy and categorisation. This is a deliberate design choice — different storefronts have different category schemes, and the product catalogue should not need to know about any of them.
The Problem
Packtpub.com might categorise products by programming language and skill level. Gamedevassembly.com might categorise by game engine and discipline. A subscription platform might organise by learning path. These taxonomies are specific to each client application and change independently of the product catalogue.
The Approach
The Product Management service provides raw product embedding vectors. Client applications retrieve these vectors and use them for their own classification, clustering, and nearest-neighbour operations:
- Retrieve individual embeddings — fetch the embedding for a specific product for real-time classification or similarity lookups.
- Batch export embeddings — export embeddings for all products in a catalogue or matching a filter, useful for building or retraining a client-side taxonomy model.
- Nearest-neighbour queries — use the search API with
vectormode to find products similar to a given vector for category pages or recommendation panels.
Taxonomy is a Client Concern
The platform does not store, manage, or enforce any taxonomy. It provides the vector space and search infrastructure. How a client maps products into categories — whether through k-means clustering, manual curation, or a trained classifier — is entirely up to the client application. This separation means:
- Adding a new storefront does not require changes to the product catalogue
- Changing a taxonomy does not require re-indexing products
- Multiple storefronts can classify the same products differently