Derived Content

Derived content is new content generated from original source content in the Content Lake. Each piece of derived content is a first-class document with its own contract, tags, and version history.

Sources

There are three ways to create derived Content;

System
- The Content Lake has system provided derived Content processes which are applied to all Content ingested. These include:
  - Image description and annotations.
  - Table analysis and descriptions.
  - Code block analysis and descriptions.
- If you aren't happy with the the system provided derived Content, for example, the standard Image descriptions generated don't provide rich enough detail for a niche use-case, then you can use Workflows or Product Applications to build more bespoke descriptions, perhaps with a "Meme tone" for a social media summariser.
Workflows (Managed)
- Many types of Customers want to automate repetitive tasks whenever Content is ingested or updated. In order to support these use cases with minimal effort we plug into our Workflow Engine Zapier which provides Customers a point and click interface to process Content whenever it's created or updated. Common types of use-cases which this supports are:
  - Translation of Content between languages.
  - Automatic generation of Marketing assets or PI.
- The Zapier Platform has integrations with over 8,000 endpoints including LLMs for AI powered workflows. If you need more control or custom execution then Product Applications is likely the best fit.
- For Packt Employees, our Zapier environment has integrations to our CRM, CDP and Product Management tooling so you can automate most tasks.
Product Applications
- When you want to complete control and customisation over the creation of derived work then creating your own Product Applications is the best fit.
- This is particularly helpful when you need more than one source document to create your derived piece of work, for example, you may want to create a Newsletter section every Sunday based on interesting Content ingested in the last week.
- You can curate exactly what Content is brought in and how you use it with your own custom Product Application.

All types of derived Content should use the common/derived-from metadata object and list the Content Lake ID(s) of any source material used in creating new Content.

If you use the common/derived-from then the Content Lake will track the data lineage of all source Content so that applications can know where Content they're using came from, even if it was AI generated or otherwise derived. At Packt, accurate and consistent Attribution is significant for our Authors so that they get the credit they deserve whenever their knowledge or expertise is used, as well as considerations like Royalties.

All Packt systems and tools preserve the common/derived-from information and we highly recommend that any other Customers use this tool.

Tracking Lineage

Each Document returned from the Content Lake API has a metadata block which includes all source Documents.

{
    "metadata": {
        "derived": [
            "9d73b576-e08a-470e-89a6-7f710b251b65",
            "2fe2f7f3-9fd0-4a4a-a78d-0903d8203ac1"
        ]
    }
    "content": {
        //
    }
}

You can find all pieces of Content derived from a source piece of Content (e.g. ad5a61de-54bf-4619-be02-f45d0a34c1aa) using the common/derived-from Tag. For example:

{
    "filter": {
        "tags": [
            "common/derived-from/ad5a61de-54bf-4619-be02-f45d0a34c1aa"
        ]
    }
}