# Structured Image Metadata for Markdown-Sourced Nostr Long-Form Content **Status:** Working draft — a practice convention, not (yet) a NIP. **Scope:** Authors who maintain Markdown long-form posts (`kind:30023`, NIP-23) in a git repository and publish them to Nostr via a build pipeline. The convention defines how image metadata (author, license, source, alt text, caption) lives in the repository, how it becomes `imeta` tags (NIP-92) in the event, and how to round-trip between the two. **Goal:** Zero data loss between repository and event. Human-readable in raw Markdown. Machine-readable in the published event. Safe defaults against accidental misattribution. --- ## Why this exists Markdown's native image syntax — `![alt](file.png)` — only carries two fields: the target and an alt text. Everything else a properly attributed image needs (author, license, license link, source, modifications — the "TULLU-BA" rule in German copyright practice) has nowhere to go. Authors have three unsatisfying options today: 1. **Stuff everything into a visible caption line** under each image. Good for human readers, bad for machine parsing, risky because easily forgotten or inconsistent. 2. **Inline HTML `
` blocks** with `
`. Breaks Markdown lint tooling, hard to re-edit. 3. **Lose the metadata entirely.** Silent misattribution risk when the post is re-published without provenance. NIP-92's `imeta` tag fixes the event-side machine-readability problem (url, mime, sha256, alt, etc. per image). But it doesn't answer where the data lives *before* the event exists. This convention proposes: **structured YAML frontmatter as source of truth, free-form Markdown body for prose, deterministic bidirectional mapping between them.** --- ## The convention in one example ```yaml --- title: "Schoko-Zimt-Schnecken" slug: "schoko-schnecken" date: 2023-02-26 # Text license (the post body). Image licenses are set per image. license: "https://creativecommons.org/publicdomain/zero/1.0/" # Post text authors (array, even for single author). authors: - name: "Jane Doe" url: "https://jane.example/" images: - file: cover.jpg role: cover alt: "Golden baked yeast buns in a round pan, fresh from the oven" license: "https://creativecommons.org/publicdomain/zero/1.0/" authors: - name: "Jane Doe" - file: dough-filling.jpg alt: "Rolled-out yeast dough, spread with cocoa-cinnamon-sugar filling" license: "https://creativecommons.org/publicdomain/zero/1.0/" authors: - name: "Jane Doe" # Foreign image with full TULLU-BA attribution: - file: flickr-buns.jpg alt: "Basket of freshly baked cinnamon rolls" caption: "On a market stall in Lyon" authors: - name: "Max Mustermann" source_url: "https://www.flickr.com/photos/mustermann/12345/" license: "https://creativecommons.org/licenses/by-sa/4.0/" modifications: "cropped" --- Roll out the dough and spread the filling evenly: ![](dough-filling.jpg) Slice into 16 pieces and arrange in the pan... ``` The Markdown body stays clean. The YAML carries the truth. --- ## Field reference ### Post-level (applies to the post text, not images) | Field | Required | Type | Semantics | |---|---|---|---| | `license` | yes | URL | License of the post **text**. Does **not** cascade to images. | | `authors` | yes | Array of `{name, url?, orcid?, ...}` | Authors of the post text. Array even with one author. | Pipeline implementations may provide env-level defaults (`DEFAULT_LICENSE`, `DEFAULT_AUTHORS`) so single-author blogs don't repeat the same block on every post. ### Per-image (under the `images:` list) | Field | Required | Type | Semantics | |---|---|---|---| | `file` | yes | String | Filename relative to the post directory. Must exist on disk. | | `role` | no | `cover` | At most one image per post may carry `role: cover`. Its URL becomes the event's `image` tag. | | `alt` | yes | String | Accessibility description. Empty string is allowed (decorative image); missing field is a validation error. | | `caption` | no | String | Optional human context beyond the alt text. | | `license` | yes | URL or `UNKNOWN` | Full schema.org-style license URL, or the literal `UNKNOWN`. No cascading from post-level. | | `authors` | yes | Array or `UNKNOWN` | Author list, or the literal `UNKNOWN`. No cascading from post-level. | | `source_url` | no | URL | Where the image was originally sourced (Flickr, Sketchfab, self-reference, etc.). | | `modifications` | no | String | Free-text description of any derivative work ("cropped", "color-adjusted", "AI-generated with prompt: ..."). The "BA" in TULLU-BA. | ### Why no cascading Cascading license/author from post to images was rejected after early prototypes: it makes **silent misattribution** the easy default. If a post is tagged `license: CC0` and a contributor adds a foreign image without noticing, the image inherits CC0 implicitly and ships to Nostr with a false attribution. Explicit per-image fields cost a few extra lines of YAML and prevent an entire class of attribution bugs. ### `UNKNOWN` as an explicit value For legacy content where provenance has been lost: ```yaml - file: old-screenshot.png alt: "Screenshot of a now-defunct learning portal's homepage" license: UNKNOWN authors: UNKNOWN ``` Pipeline behavior: - Fields set to `UNKNOWN` are **not** written into the `imeta` tag (they are simply absent, not wrongly stated). - A warning is logged per `UNKNOWN` field with post slug + filename — this becomes a research backlog. - A strict mode can block publication when `UNKNOWN` values are present (opt-in). --- ## Mapping to the Nostr event A post with this frontmatter produces a `kind:30023` event containing: ### Standard NIP-23 tags - `["d", ""]` - `["title", ""]` - `["published_at", "<unix-seconds>"]` - `["summary", "<description>"]` if present - `["image", "<cover-blossom-url>"]` — from the image marked `role: cover` - `["t", "<tag>"]` per entry in `tags:` ### Text license - `["license", "<url>"]` — once per event, from post-level `license` ### Per-image `imeta` (NIP-92 + extensions) Each uploaded image yields one `imeta` tag: ``` ["imeta", "url <blossom-url>", "m <mime-type>", "x <sha256>", "alt <alt>", if non-empty "caption <caption>", if present "license <url>", if set (not UNKNOWN) "author <name>", one entry per author, if set (not UNKNOWN) "source_url <url>", if present "modifications <text>" if present ] ``` NIP-92 explicitly allows implementers to add fields beyond its core set; clients ignore unknown fields. `license`, `author`, `source_url`, `modifications` are extensions this convention uses to carry TULLU-BA data inline with the image reference. ### Markdown body transformation The Markdown body is traversed: each `![alt](filename.png)` is replaced with `![alt](<blossom-url>)` after the image has been uploaded. Size hints (`![alt](file.png =300x200)`) are stripped. Absolute URLs in the source are preserved. --- ## Round-trip: YAML ↔ Markdown The convention is designed so authors can work in **either direction**: ### Forward: YAML → published event 1. Pipeline parses frontmatter. 2. For each `images[]` entry, uploads `file` to Blossom, receives `{url, sha256}`. 3. Builds mapping `filename → blossom-url`. 4. Rewrites Markdown body image references. 5. Assembles `imeta` tags from the structured fields + upload results. 6. Signs and publishes. ### Reverse: "flat" Markdown → YAML Some authors write Markdown with visible attribution lines underneath images, like: ```markdown ![Yeast dough with filling](dough-filling.jpg) *Photo: Jane Doe, [CC0](https://creativecommons.org/publicdomain/zero/1.0/)* ![Cinnamon rolls at the market](flickr-buns.jpg) *Photo: Max Mustermann via [Flickr](https://www.flickr.com/photos/mustermann/12345/), [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/), cropped* ``` A round-trip parser can reconstruct the `images[]` YAML from this pattern because it follows a predictable shape: ``` ![<alt>](<file>) *Photo: <name>{, <name2>}{ via [<source-label>](<source-url>)}, [<license-label>](<license-url>){, <modifications>}.* ``` **Recognizable tokens** for the reverse parser: - Image reference: standard Markdown `![alt](file)` on its own line. - Attribution line: starts on the next line, wrapped in `*...*`, begins with a role word (`Photo`, `Foto`, `Image`, `Abb.`, etc.), ends with a period. - **Authors**: comma-separated names between the role word and either `via` or the license bracket. - **Source**: `via [<label>](<url>)`. The label is derived from the hostname if generated forward; on reverse, it's discarded and only the URL is kept. - **License**: `[<short>](<url>)`. On reverse, only the URL is kept. - **Modifications**: a trailing fragment after the license link, before the final period. ### Canonical caption format Forward generation (YAML → caption string) uses a deterministic template: ``` {caption + ". "}Photo: {authors joined by " / "}{ via [<source-host>](<source_url>)}, [<license-short>]({license_url}){, <modifications>}. ``` With a license short-form catalog: | License URL prefix | Short form | |---|---| | `https://creativecommons.org/publicdomain/zero/1.0/` | `CC0` | | `https://creativecommons.org/licenses/by/4.0/` | `CC BY 4.0` | | `https://creativecommons.org/licenses/by-sa/4.0/` | `CC BY-SA 4.0` | | `https://creativecommons.org/licenses/by-nd/4.0/` | `CC BY-ND 4.0` | | `https://creativecommons.org/licenses/by-nc/4.0/` | `CC BY-NC 4.0` | | `https://creativecommons.org/licenses/by-nc-sa/4.0/` | `CC BY-NC-SA 4.0` | | `https://creativecommons.org/licenses/by-nc-nd/4.0/` | `CC BY-NC-ND 4.0` | | *anything else* | hostname of the URL | Locale suffixes (`/deed.de`, `/deed.en`) are collapsed to the base URL for short-form lookup. --- ## Why this is forward-safe Three properties make the convention robust over time: 1. **Events are replaceable.** A post re-published with improved metadata (better alt text, filled-in `UNKNOWN` fields) simply overrides the previous event via NIP-23's `d`-tag identity. 2. **`imeta` extensions degrade gracefully.** Clients that don't read `license`/`author`/`source_url` in `imeta` ignore them; they still get the standard `url`/`m`/`x`/`alt` fields. 3. **Reverse parsing is optional.** A pipeline can publish without ever supporting the reverse direction; the YAML is always the source of truth. --- ## What this convention does **not** do - **Does not inject captions into the event body.** Early drafts did; it turned into a fragile regex workout across Markdown variants (link-wrapped images, list-embedded images, block quotes). Recommended approach: let clients render attribution from `imeta` fields. Inject body captions only if a concrete client gap makes it necessary. - **Does not define new Nostr kinds.** It uses `kind:30023` (NIP-23), `kind:10063` (Blossom user server list, BUD-03), and `kind:10002` (NIP-65 relay list) as-is. - **Does not mandate Blossom.** The convention maps cleanly to any content-addressed image host. Blossom is just the most interoperable option in the Nostr ecosystem today. --- ## Open questions for the community 1. **License in `imeta` — convention or its own tag?** Should per-image license info live in `imeta` as a non-standard field, or should there be a companion `license` tag per image with an `x <sha256>` back-reference? The `imeta` approach keeps everything per-image in one tag. A separate tag decouples concerns but duplicates the binding. 2. **Multiple licenses per image.** CC dual-licensing exists (e.g. "CC BY-SA or GFDL"). Should the spec allow `license` as an array, or repeat the `license` field multiple times in `imeta`? 3. **Canonical short-form catalog.** The table above is practical but not authoritative. Should a registry of license-URL-to-short-form mappings live somewhere reference-able? 4. **Attribution in languages other than English.** The reverse-parser pattern uses role words like `Photo`, `Foto`, `Image`. A language-agnostic marker (e.g. a leading emoji or a structured sigil like `⸻ credit ⸻`) would sidestep i18n, at the cost of readability. 5. **Machine-readable attribution in client rendering.** Long-form clients (Habla, Flycat, etc.) vary in how (and whether) they surface `imeta.license` / `imeta.author`. Adoption of this convention is only valuable if clients pick it up — a reference renderer implementation would lower the bar. --- ## References - [NIP-23 — Long-form Content](https://github.com/nostr-protocol/nips/blob/master/23.md) - [NIP-92 — Media Attachments (`imeta`)](https://github.com/nostr-protocol/nips/blob/master/92.md) - [NIP-65 — Relay List Metadata (`kind:10002`)](https://github.com/nostr-protocol/nips/blob/master/65.md) - [Blossom BUD-01 — Server Requirements](https://github.com/hzrd149/blossom/blob/master/buds/01.md) - [Blossom BUD-03 — User Server List (`kind:10063`)](https://github.com/hzrd149/blossom/blob/master/buds/03.md) - [TULLU / TULLU-BA attribution rule (German, Wikimedia practice)](https://commons.wikimedia.org/wiki/Commons:Lizenzhinweisgenerator) - [schema.org/CreativeWork — `license` field convention](https://schema.org/license)