Tagging
Tags are the main way of organising Content within the Content Lake. In most cases we should be building Product experiences that are composed by Tags and Filters rather than explicit Content IDs. This lets Content evolve over time rather than being a static resource.
Flat vs Key Value pairs
You'll notice that Tags within the Content Lake are flat values rather than key value pairs. For example to filter Content by Contract ID you would query:
Rather than:
Except for certain System tags (created, updated, last_seen) all tags are treated as string literals for exact matching. Operations like wildcard matching or date comparison operations aren't possible within the Content Lake filtering, except for those certain System tags, though you are free to build your own more complex filtering client side.
Within the Content Lake tags are lexigraphically sorted, so most API responses will return tags in that order. However, we do not guarentee any specific sorting in the API responses and you shouldn't rely on any specific sorting order.
Levels
There are four types of tags within the Content Lake:
| Level | Prefix | Description |
|---|---|---|
| System | system/ |
Added by the Content Lake. Universal across use cases |
| Source | source/ |
Added during ingestion. Vary by connector |
| Transcoder | transcoder/ |
Added by formatters during ingestion |
| Application | (custom) | Private to the application that set them |
- System tags: Added by the Packt Content Lake and its
internal systems. Prefixed by
system/in API responses. There are relatively few system tags but they are designed to be universally significant across use cases. Examples includesystem/lang/en,system/contract/<id>. - Source tags: Added during ingestion by the source connectors. The type of information within these tags will vary wildly — a YouTube importer might carry view counts and like counts, whereas a SharePoint connector might use tags for Microsoft tenant information.
- Transcoder tags: Added by formatters during the ingestion
process. These identify which formatter and version processed
the content (
transcoder/<name>,transcoder/version/<version>), making it possible to find and reprocess content when improved formatters are released. - Application (Client) tags: Private and only returned to
the application which set them. You can specify up to 50 tags,
with key names up to 60 characters long. Keys and values are
stored as strings and can contain any characters except square
brackets (
[and]) or slashes (/and\) in keys.
Tags are the primary mechanism for finding and matching Content. Search is still useful for human operators, but most tools, processes and automations should rely on Tags.
Examples
Packt Subscription Reader
Packt has a Subscription service whereby Customers can access Packt's library (or subsets of it depending on the tier).
When someone views a Product on the Subscription site we look up the Product in the Product Management System (PMS). The PMS has all of the Chapters along with their Content Lake IDs. The Subscription service then queries the Content Lake to return all Documents with those IDs and filteres based on the status tag which could be approved, reviewed, draft.
For users where they have opted into early access, the tag filter would be reviwed, meaning that the Chapter has been reviewed by Packt staff.
For users where they haven't opted into early access, or come from a channel with stricter publishing requirements, the tag filter would be approved which means that it's been through the full review process and the Author has given final sign off.
We could build other Product experiences which let's Customers, Reviewers, Channels, opt into earlier stages like draft or in_progress too.
This is a basic starting example which shows that the Content Lake moves more of the decision making to the Product layer rathern than a technical one. With a simple tag change we go from a human reviewed product, to a product taking the Customers as close to the authors live thinking as possible. We can then layer in things like "Email author" to provide a real-time feedback function.
Author copilot
We could create an Author copiloting experience where Authors get real-time suggestions on what to write and help with redrafting based on their previous work. We know that Packt customers particularly value the real-world practical experience of our Authors but it can be hard to remember every instance that someone has used a technique or solved a problem.
Building an LLM powered copilot we would first use the system tag author and limit it to the currently logged in user. This means that when the copilot is thinking about things or trying to find prior art, it's only looking at content written by the current author.
Secondly, we would also use the contract system tags to filter out any Content which doesn't have clear terms that allow us to use the Content for future publishing.
For example, the Author may have connected their GitHub account which includes personal projects and company projects. The company projects would be filtered out from consideration.
We can now build a simple LLM powered assistant which uses the Content Lake to look up knowledge from the Authors prior work. The Author Copilot app hasn't needed to use any tags other than the System ones automatically added. As more Content from the Author is imported it'll all automatically be surfaced in the Author Copilot.
Author assistant
Book product management
Packt LLM Chatbot
Tags
Below are the System and Source tags.
System
| Tag | Example | Description |
|---|---|---|
system/<id> |
system/5946206b-... |
The unique ID of the content |
system/lang/<lang> |
system/lang/en |
The language of the content |
system/contract/<id> |
system/contract/734b40cb-... |
The unique ID of the contract |
contract/[allow,deny]/derive-content |
contract/allow/derive-content |
Allow or deny derived content |
contract/[allow,deny]/derive-content/translation[/<lang>] |
contract/allow/derive-content/translation/fr |
Translation rights |
contract/[allow,deny]/derive-content/ai/training |
contract/deny/derive-content/ai/training |
AI training rights |
contract/[allow,deny]/derive-content/ai/inference |
contract/allow/derive-content/ai/inference |
AI inference (RAG/MCP) |
contract/[allow,deny]/product/[manual,automation] |
contract/deny/product/automation |
Product automation rights |
contract/[allow,deny]/product/format[/<format>] |
contract/allow/product/format/book |
Format-specific rights |
contract/[allow,deny]/distribution/channel/<channel> |
contract/allow/distribution/channel/amazon |
Distribution channel |
contract/[allow,deny]/distribution/country/<country> |
contract/deny/distribution/country/CN |
Distribution country |
See Contracts for full details on the permission tag model.
Source (Ingestion)
| Tag | Example | Description |
|---|---|---|
source/created |
source/created/2026-01-01T00:00:01+00:00 |
An ISO8601 datetime of when the Content was created. |
source/updated |
source/updated/2026-01-01T00:00:01+00:00 |
An ISO8601 datetime of when the Content was last updated. |
source/last_seen |
source/last_seen/2026-01-01T00:00:01+00:00 |
An ISO8601 datetime of when the Content was last seen in the source. Will usually match the Ingestor |
Transcoder
All formatters add tags to identify what work they did during the conversion process. If the document did not go through a formatter then there will not be any transcoder tags.
| Tag | Example | Description |
|---|---|---|
transcoder/<name> |
transcoder/pdf2markdown |
The name of the formatter used |
transcoder/version/<version> |
transcoder/version/0.0.1 |
The version of the formatter |
These let us quickly identify content which may benefit from being reprocessed if new versions or formatters with more capabilities are released.
SharePoint
These tags are applied to Content which comes from the SharePoint source.
| Tag | Example | Description |
|---|---|---|
source/sharepoint/id |
source/sharepoint/abc123 |
The Microsoft Unique ID for the SharePoint document |
Product
product/abc123/perf/best_selling
product/abc123/attr/front_list
Best selling Front list Back list
Product search -> Content Lake IDs -> Content Lake search
Content Editing
Live Draft
Performance
Completions
Good Tag Design
When thinking about using new Tags either in your own application scope, or when making feature requests to the Packt team, consider good tag design.
For example, one decision when considering Contracting tags was whether we should name Technical partners individually. We may work with OpenAI in a Zero Data Retention mode which we find acceptable, but other LLM providers may not operate in that way. We could therefore allow OpenAI, but dissallow Google, Claude or others by a Tag such as system/contract/allow/open