joerglohrerde/docs/wiki-draft-nostr-image-meta...

13 KiB

Structured Image Metadata for Markdown-Sourced Nostr Long-Form Content

Status: Working draft — a practice convention, not (yet) a NIP. Scope: Authors who maintain Markdown long-form posts (kind:30023, NIP-23) in a git repository and publish them to Nostr via a build pipeline. The convention defines how image metadata (author, license, source, alt text, caption) lives in the repository, how it becomes imeta tags (NIP-92) in the event, and how to round-trip between the two. Goal: Zero data loss between repository and event. Human-readable in raw Markdown. Machine-readable in the published event. Safe defaults against accidental misattribution.


Why this exists

Markdown's native image syntax — ![alt](file.png) — only carries two fields: the target and an alt text. Everything else a properly attributed image needs (author, license, license link, source, modifications — the "TULLU-BA" rule in German copyright practice) has nowhere to go.

Authors have three unsatisfying options today:

  1. Stuff everything into a visible caption line under each image. Good for human readers, bad for machine parsing, risky because easily forgotten or inconsistent.
  2. Inline HTML <figure> blocks with <figcaption>. Breaks Markdown lint tooling, hard to re-edit.
  3. Lose the metadata entirely. Silent misattribution risk when the post is re-published without provenance.

NIP-92's imeta tag fixes the event-side machine-readability problem (url, mime, sha256, alt, etc. per image). But it doesn't answer where the data lives before the event exists.

This convention proposes: structured YAML frontmatter as source of truth, free-form Markdown body for prose, deterministic bidirectional mapping between them.


The convention in one example

---
title: "Schoko-Zimt-Schnecken"
slug: "schoko-schnecken"
date: 2023-02-26

# Text license (the post body). Image licenses are set per image.
license: "https://creativecommons.org/publicdomain/zero/1.0/"

# Post text authors (array, even for single author).
authors:
  - name: "Jane Doe"
    url: "https://jane.example/"

images:
  - file: cover.jpg
    role: cover
    alt: "Golden baked yeast buns in a round pan, fresh from the oven"
    license: "https://creativecommons.org/publicdomain/zero/1.0/"
    authors:
      - name: "Jane Doe"

  - file: dough-filling.jpg
    alt: "Rolled-out yeast dough, spread with cocoa-cinnamon-sugar filling"
    license: "https://creativecommons.org/publicdomain/zero/1.0/"
    authors:
      - name: "Jane Doe"

  # Foreign image with full TULLU-BA attribution:
  - file: flickr-buns.jpg
    alt: "Basket of freshly baked cinnamon rolls"
    caption: "On a market stall in Lyon"
    authors:
      - name: "Max Mustermann"
    source_url: "https://www.flickr.com/photos/mustermann/12345/"
    license: "https://creativecommons.org/licenses/by-sa/4.0/"
    modifications: "cropped"
---

Roll out the dough and spread the filling evenly:
![](dough-filling.jpg)

Slice into 16 pieces and arrange in the pan...

The Markdown body stays clean. The YAML carries the truth.


Field reference

Post-level (applies to the post text, not images)

Field Required Type Semantics
license yes URL License of the post text. Does not cascade to images.
authors yes Array of {name, url?, orcid?, ...} Authors of the post text. Array even with one author.

Pipeline implementations may provide env-level defaults (DEFAULT_LICENSE, DEFAULT_AUTHORS) so single-author blogs don't repeat the same block on every post.

Per-image (under the images: list)

Field Required Type Semantics
file yes String Filename relative to the post directory. Must exist on disk.
role no cover At most one image per post may carry role: cover. Its URL becomes the event's image tag.
alt yes String Accessibility description. Empty string is allowed (decorative image); missing field is a validation error.
caption no String Optional human context beyond the alt text.
license yes URL or UNKNOWN Full schema.org-style license URL, or the literal UNKNOWN. No cascading from post-level.
authors yes Array or UNKNOWN Author list, or the literal UNKNOWN. No cascading from post-level.
source_url no URL Where the image was originally sourced (Flickr, Sketchfab, self-reference, etc.).
modifications no String Free-text description of any derivative work ("cropped", "color-adjusted", "AI-generated with prompt: ..."). The "BA" in TULLU-BA.

Why no cascading

Cascading license/author from post to images was rejected after early prototypes: it makes silent misattribution the easy default. If a post is tagged license: CC0 and a contributor adds a foreign image without noticing, the image inherits CC0 implicitly and ships to Nostr with a false attribution.

Explicit per-image fields cost a few extra lines of YAML and prevent an entire class of attribution bugs.

UNKNOWN as an explicit value

For legacy content where provenance has been lost:

- file: old-screenshot.png
  alt: "Screenshot of a now-defunct learning portal's homepage"
  license: UNKNOWN
  authors: UNKNOWN

Pipeline behavior:

  • Fields set to UNKNOWN are not written into the imeta tag (they are simply absent, not wrongly stated).
  • A warning is logged per UNKNOWN field with post slug + filename — this becomes a research backlog.
  • A strict mode can block publication when UNKNOWN values are present (opt-in).

Mapping to the Nostr event

A post with this frontmatter produces a kind:30023 event containing:

Standard NIP-23 tags

  • ["d", "<slug>"]
  • ["title", "<title>"]
  • ["published_at", "<unix-seconds>"]
  • ["summary", "<description>"] if present
  • ["image", "<cover-blossom-url>"] — from the image marked role: cover
  • ["t", "<tag>"] per entry in tags:

Text license

  • ["license", "<url>"] — once per event, from post-level license

Per-image imeta (NIP-92 + extensions)

Each uploaded image yields one imeta tag:

["imeta",
  "url <blossom-url>",
  "m <mime-type>",
  "x <sha256>",
  "alt <alt>",              if non-empty
  "caption <caption>",      if present
  "license <url>",          if set (not UNKNOWN)
  "author <name>",          one entry per author, if set (not UNKNOWN)
  "source_url <url>",       if present
  "modifications <text>"    if present
]

NIP-92 explicitly allows implementers to add fields beyond its core set; clients ignore unknown fields. license, author, source_url, modifications are extensions this convention uses to carry TULLU-BA data inline with the image reference.

Markdown body transformation

The Markdown body is traversed: each ![alt](filename.png) is replaced with ![alt](<blossom-url>) after the image has been uploaded. Size hints (![alt](file.png =300x200)) are stripped. Absolute URLs in the source are preserved.


Round-trip: YAML ↔ Markdown

The convention is designed so authors can work in either direction:

Forward: YAML → published event

  1. Pipeline parses frontmatter.
  2. For each images[] entry, uploads file to Blossom, receives {url, sha256}.
  3. Builds mapping filename → blossom-url.
  4. Rewrites Markdown body image references.
  5. Assembles imeta tags from the structured fields + upload results.
  6. Signs and publishes.

Reverse: "flat" Markdown → YAML

Some authors write Markdown with visible attribution lines underneath images, like:

![Yeast dough with filling](dough-filling.jpg)
*Photo: Jane Doe, [CC0](https://creativecommons.org/publicdomain/zero/1.0/)*

![Cinnamon rolls at the market](flickr-buns.jpg)
*Photo: Max Mustermann via [Flickr](https://www.flickr.com/photos/mustermann/12345/), [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/), cropped*

A round-trip parser can reconstruct the images[] YAML from this pattern because it follows a predictable shape:

![<alt>](<file>)
*Photo: <name>{, <name2>}{ via [<source-label>](<source-url>)}, [<license-label>](<license-url>){, <modifications>}.*

Recognizable tokens for the reverse parser:

  • Image reference: standard Markdown ![alt](file) on its own line.
  • Attribution line: starts on the next line, wrapped in *...*, begins with a role word (Photo, Foto, Image, Abb., etc.), ends with a period.
  • Authors: comma-separated names between the role word and either via or the license bracket.
  • Source: via [<label>](<url>). The label is derived from the hostname if generated forward; on reverse, it's discarded and only the URL is kept.
  • License: [<short>](<url>). On reverse, only the URL is kept.
  • Modifications: a trailing fragment after the license link, before the final period.

Canonical caption format

Forward generation (YAML → caption string) uses a deterministic template:

{caption + ". "}Photo: {authors joined by " / "}{ via [<source-host>](<source_url>)}, [<license-short>]({license_url}){, <modifications>}.

With a license short-form catalog:

License URL prefix Short form
https://creativecommons.org/publicdomain/zero/1.0/ CC0
https://creativecommons.org/licenses/by/4.0/ CC BY 4.0
https://creativecommons.org/licenses/by-sa/4.0/ CC BY-SA 4.0
https://creativecommons.org/licenses/by-nd/4.0/ CC BY-ND 4.0
https://creativecommons.org/licenses/by-nc/4.0/ CC BY-NC 4.0
https://creativecommons.org/licenses/by-nc-sa/4.0/ CC BY-NC-SA 4.0
https://creativecommons.org/licenses/by-nc-nd/4.0/ CC BY-NC-ND 4.0
anything else hostname of the URL

Locale suffixes (/deed.de, /deed.en) are collapsed to the base URL for short-form lookup.


Why this is forward-safe

Three properties make the convention robust over time:

  1. Events are replaceable. A post re-published with improved metadata (better alt text, filled-in UNKNOWN fields) simply overrides the previous event via NIP-23's d-tag identity.
  2. imeta extensions degrade gracefully. Clients that don't read license/author/source_url in imeta ignore them; they still get the standard url/m/x/alt fields.
  3. Reverse parsing is optional. A pipeline can publish without ever supporting the reverse direction; the YAML is always the source of truth.

What this convention does not do

  • Does not inject captions into the event body. Early drafts did; it turned into a fragile regex workout across Markdown variants (link-wrapped images, list-embedded images, block quotes). Recommended approach: let clients render attribution from imeta fields. Inject body captions only if a concrete client gap makes it necessary.
  • Does not define new Nostr kinds. It uses kind:30023 (NIP-23), kind:10063 (Blossom user server list, BUD-03), and kind:10002 (NIP-65 relay list) as-is.
  • Does not mandate Blossom. The convention maps cleanly to any content-addressed image host. Blossom is just the most interoperable option in the Nostr ecosystem today.

Open questions for the community

  1. License in imeta — convention or its own tag? Should per-image license info live in imeta as a non-standard field, or should there be a companion license tag per image with an x <sha256> back-reference? The imeta approach keeps everything per-image in one tag. A separate tag decouples concerns but duplicates the binding.

  2. Multiple licenses per image. CC dual-licensing exists (e.g. "CC BY-SA or GFDL"). Should the spec allow license as an array, or repeat the license field multiple times in imeta?

  3. Canonical short-form catalog. The table above is practical but not authoritative. Should a registry of license-URL-to-short-form mappings live somewhere reference-able?

  4. Attribution in languages other than English. The reverse-parser pattern uses role words like Photo, Foto, Image. A language-agnostic marker (e.g. a leading emoji or a structured sigil like ⸻ credit ⸻) would sidestep i18n, at the cost of readability.

  5. Machine-readable attribution in client rendering. Long-form clients (Habla, Flycat, etc.) vary in how (and whether) they surface imeta.license / imeta.author. Adoption of this convention is only valuable if clients pick it up — a reference renderer implementation would lower the bar.


References