Skip to content

Creating Products

Products on the Packt Platform are lightweight wrappers around Content Lake content. This page covers the format generation pipeline for Content Lake-backed products, product assets, and how products are organised into series, bundles, and collections.

Format Generation

Format generation is the core differentiator for Content Lake-backed products. The design principle is no stored artifacts — every format request triggers a real-time generation and streaming pipeline.

Request
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│  CL Document │──▶│    Asset     │──▶│   Format     │
│  Retrieval   │   │  Injection   │   │  Rendering   │
│  (streaming) │   │              │   │  (streaming) │
└──────────────┘   └──────────────┘   └──────────────┘
                                      Response Stream

1. Content Lake Document Retrieval

Documents are fetched from the Content Lake using its JSONL streaming endpoint, which delivers each block as a newline-delimited JSON element with sub-100ms time-to-first-byte. For pinned documents, the specific version is requested. For latest-tracking documents, the latest version is fetched.

2. Asset Injection

Product assets (covers, imprint) are injected at the appropriate positions. The table of contents is generated from document heading structure (H1, H2, H3 blocks). The index is generated from entity and keyword data in the Content Lake's knowledge graph.

3. Format Rendering

The assembled content stream is rendered into the target format (PDF, ePub, InDesign) and streamed back to the client. The client receives bytes as they are rendered — there is no buffering of the complete output.

Determinism

For products where all documents are pinned to specific versions, the generated output is deterministic — the same request always produces the same file. This is important for downstream consumers that need reproducible artifacts.

For products with latest-tracking documents, the output reflects the Content Lake state at generation time.

Performance

The pipeline is designed for low latency:

  • Content Lake streaming: sub-100ms time-to-first-byte
  • Asset injection and ToC generation operate on the stream as it flows through
  • End-to-end time-to-first-byte targets sub-second for typical book products

Product Assets

Asset Source Stored
Front Cover Uploaded image Yes
Back Cover Uploaded image Yes
Imprint Generated from product/publisher metadata No
Table of Contents Generated from CL document headings No
Index Generated from CL entity/keyword data No

For Content Lake-backed products, cover images are the only binary assets stored. The imprint, table of contents, and index are generated dynamically during format generation and never persisted.

Series, Bundles, and Collections

Series

A series is an ordered collection of products that share a series identifier and metadata. Each product retains its own lifecycle, pricing, and metadata. Series membership is an association, not a structural dependency.

Bundles

A bundle packages multiple products for sale as a unit with bundle-specific pricing. Bundles have their own lifecycle and can be published and retired independently of their constituent products.

Collections

Collections are curated groupings for merchandising and editorial purposes. Unlike series (which have a fixed order and shared identity) and bundles (which have pricing), collections are lightweight — simply named lists of product references.