Skip to content

Tagging

Tags are the main way of organising Content within the Content Lake. In most cases we should be building Product experiences that are composed by Tags and Filters rather than explicit Content IDs. This lets Content evolve over time rather than being a static resource.

Flat vs Key Value pairs

You'll notice that Tags within the Content Lake are flat values rather than key value pairs. For example to filter Content by Contract ID you would query:

{
    "filter": {
        "tags": [
            "system/contract/734b40cb-89b5-4119-a52f-f1265a6b4930"
        ]
    }
}

Rather than:

{
    "filter": {
        "tags": [
            "system/contract" == "734b40cb-89b5-4119-a52f-f1265a6b4930"
        ]
    }
}

Except for certain System tags (created, updated, last_seen) all tags are treated as string literals for exact matching. Operations like wildcard matching or date comparison operations aren't possible within the Content Lake filtering, except for those certain System tags, though you are free to build your own more complex filtering client side.

Within the Content Lake tags are lexigraphically sorted, so most API responses will return tags in that order. However, we do not guarentee any specific sorting in the API responses and you shouldn't rely on any specific sorting order.

Levels

There are four types of tags within the Content Lake:

Level Prefix Description
System system/ Added by the Content Lake. Universal across use cases
Source source/ Added during ingestion. Vary by connector
Transcoder transcoder/ Added by formatters during ingestion
Application (custom) Private to the application that set them
  • System tags: Added by the Packt Content Lake and its internal systems. Prefixed by system/ in API responses. There are relatively few system tags but they are designed to be universally significant across use cases. Examples include system/lang/en, system/contract/<id>.
  • Source tags: Added during ingestion by the source connectors. The type of information within these tags will vary wildly — a YouTube importer might carry view counts and like counts, whereas a SharePoint connector might use tags for Microsoft tenant information.
  • Transcoder tags: Added by formatters during the ingestion process. These identify which formatter and version processed the content (transcoder/<name>, transcoder/version/<version>), making it possible to find and reprocess content when improved formatters are released.
  • Application (Client) tags: Private and only returned to the application which set them. You can specify up to 50 tags, with key names up to 60 characters long. Keys and values are stored as strings and can contain any characters except square brackets ([ and ]) or slashes (/ and \) in keys.

Tags are the primary mechanism for finding and matching Content. Search is still useful for human operators, but most tools, processes and automations should rely on Tags.

Examples

Packt Subscription Reader

Packt has a Subscription service whereby Customers can access Packt's library (or subsets of it depending on the tier).

When someone views a Product on the Subscription site we look up the Product in the Product Management System (PMS). The PMS has all of the Chapters along with their Content Lake IDs. The Subscription service then queries the Content Lake to return all Documents with those IDs and filteres based on the status tag which could be approved, reviewed, draft.

For users where they have opted into early access, the tag filter would be reviwed, meaning that the Chapter has been reviewed by Packt staff. For users where they haven't opted into early access, or come from a channel with stricter publishing requirements, the tag filter would be approved which means that it's been through the full review process and the Author has given final sign off.

We could build other Product experiences which let's Customers, Reviewers, Channels, opt into earlier stages like draft or in_progress too.

This is a basic starting example which shows that the Content Lake moves more of the decision making to the Product layer rathern than a technical one. With a simple tag change we go from a human reviewed product, to a product taking the Customers as close to the authors live thinking as possible. We can then layer in things like "Email author" to provide a real-time feedback function.

Author copilot

We could create an Author copiloting experience where Authors get real-time suggestions on what to write and help with redrafting based on their previous work. We know that Packt customers particularly value the real-world practical experience of our Authors but it can be hard to remember every instance that someone has used a technique or solved a problem.

Building an LLM powered copilot we would first use the system tag author and limit it to the currently logged in user. This means that when the copilot is thinking about things or trying to find prior art, it's only looking at content written by the current author.

Tags:
    system/author/<id>

Secondly, we would also use the contract system tags to filter out any Content which doesn't have clear terms that allow us to use the Content for future publishing.

Tags:
    system/contract/<id>
    system/contract/allow/publishing
    system/contract/disallow/ai

For example, the Author may have connected their GitHub account which includes personal projects and company projects. The company projects would be filtered out from consideration.

We can now build a simple LLM powered assistant which uses the Content Lake to look up knowledge from the Authors prior work. The Author Copilot app hasn't needed to use any tags other than the System ones automatically added. As more Content from the Author is imported it'll all automatically be surfaced in the Author Copilot.

Author assistant

Book product management

Packt LLM Chatbot

Tags

Below are the System and Source tags.

System

Tag Example Description
system/<id> system/5946206b-... The unique ID of the content
system/lang/<lang> system/lang/en The language of the content
system/contract/<id> system/contract/734b40cb-... The unique ID of the contract
contract/[allow,deny]/derive-content contract/allow/derive-content Allow or deny derived content
contract/[allow,deny]/derive-content/translation[/<lang>] contract/allow/derive-content/translation/fr Translation rights
contract/[allow,deny]/derive-content/ai/training contract/deny/derive-content/ai/training AI training rights
contract/[allow,deny]/derive-content/ai/inference contract/allow/derive-content/ai/inference AI inference (RAG/MCP)
contract/[allow,deny]/product/[manual,automation] contract/deny/product/automation Product automation rights
contract/[allow,deny]/product/format[/<format>] contract/allow/product/format/book Format-specific rights
contract/[allow,deny]/distribution/channel/<channel> contract/allow/distribution/channel/amazon Distribution channel
contract/[allow,deny]/distribution/country/<country> contract/deny/distribution/country/CN Distribution country

See Contracts for full details on the permission tag model.

Source (Ingestion)

Tag Example Description
source/created source/created/2026-01-01T00:00:01+00:00 An ISO8601 datetime of when the Content was created.
source/updated source/updated/2026-01-01T00:00:01+00:00 An ISO8601 datetime of when the Content was last updated.
source/last_seen source/last_seen/2026-01-01T00:00:01+00:00 An ISO8601 datetime of when the Content was last seen in the source. Will usually match the Ingestor

Transcoder

All formatters add tags to identify what work they did during the conversion process. If the document did not go through a formatter then there will not be any transcoder tags.

Tag Example Description
transcoder/<name> transcoder/pdf2markdown The name of the formatter used
transcoder/version/<version> transcoder/version/0.0.1 The version of the formatter

These let us quickly identify content which may benefit from being reprocessed if new versions or formatters with more capabilities are released.

SharePoint

These tags are applied to Content which comes from the SharePoint source.

Tag Example Description
source/sharepoint/id source/sharepoint/abc123 The Microsoft Unique ID for the SharePoint document

Product

product/abc123/perf/best_selling

product/abc123/attr/front_list

Best selling Front list Back list

Product search -> Content Lake IDs -> Content Lake search

Content Editing

Live Draft

Performance

Completions

Good Tag Design

When thinking about using new Tags either in your own application scope, or when making feature requests to the Packt team, consider good tag design.

For example, one decision when considering Contracting tags was whether we should name Technical partners individually. We may work with OpenAI in a Zero Data Retention mode which we find acceptable, but other LLM providers may not operate in that way. We could therefore allow OpenAI, but dissallow Google, Claude or others by a Tag such as system/contract/allow/open