feat(generator): slide markdown image refs as labeled multimodal inputs #25

New issue

Closed

opened 2026-04-16 13:44:29 +00:00 by casper-stevens · 0 comments

casper-stevens commented

2026-04-16 13:44:29 +00:00

Member

Context

Slide authors need to embed images directly in slide markdown and attach intent to each one. Parsing ![alt](…) references from the slide file, assigning stable labels, and mapping surrounding prose to per-image instructions lets the generator act precisely on each image rather than treating all visual context as equivalent. Depends on hero_lib#128 (ImageRef abstraction).

Goals

Parse image refs from slide .md: ![alt](url_or_path), [![alt](img)](link), bare image URLs on their own line
Assign each ref a stable label ([IMAGE 1: "alt text"]) that survives prompt assembly
Map prose immediately surrounding each ref to per-image instructions
Resolve refs via ImageRef and attach to ImageGenerationRequest (hero_lib#128)
Assembled prompt separates per-image intent clearly when multiple images are present
Integrate into the existing multimodal path in generator.rs

Feasibility

Medium complexity, but with a significant cleanup component that the implementation must include — not defer.

The base64 round-trip in hero_slides exists purely as a transport mechanism: discovery.rs reads image files, resizes and PNG-encodes them, then base64-encodes the result into Vec<(String, String)>. generator.rs:133-138 immediately decodes that base64 back to bytes to pass to add_image_data(). With ImageRef::Path, the discovery functions return file paths directly — no encode/decode needed.

Code to remove as part of this issue:

discovery::image_file_to_png_base64() — fully superseded by image_io in hero_lib
The base64 decode block in generator.rs:133-138
The base64 crate dep from hero_slides_lib/Cargo.toml

Signature cascade — the following all change return type from Vec<(String, String)> to Vec<ImageRef>, requiring updates at every call site:

collect_background_images(), collect_selected_background_images(), collect_selected_background_images_with_meta() in discovery.rs
generate_slide() parameter background_images: &[(String, String)]
Callers: agent.rs:1004, deck_module.rs (3 sites)

The markdown image-ref parsing work is self-contained and testable in isolation. Main risk remains prompt labeling — plan for iteration on the multi-image prompt format.

## Context Slide authors need to embed images directly in slide markdown and attach intent to each one. Parsing `![alt](…)` references from the slide file, assigning stable labels, and mapping surrounding prose to per-image instructions lets the generator act precisely on each image rather than treating all visual context as equivalent. Depends on hero_lib#128 (ImageRef abstraction). ## Goals - Parse image refs from slide `.md`: `![alt](url_or_path)`, `[![alt](img)](link)`, bare image URLs on their own line - Assign each ref a stable label (`[IMAGE 1: "alt text"]`) that survives prompt assembly - Map prose immediately surrounding each ref to per-image instructions - Resolve refs via `ImageRef` and attach to `ImageGenerationRequest` (hero_lib#128) - Assembled prompt separates per-image intent clearly when multiple images are present - Integrate into the existing multimodal path in `generator.rs` ## Feasibility Medium complexity, but with a significant cleanup component that the implementation must include — not defer. The base64 round-trip in hero_slides exists purely as a transport mechanism: `discovery.rs` reads image files, resizes and PNG-encodes them, then base64-encodes the result into `Vec<(String, String)>`. `generator.rs:133-138` immediately decodes that base64 back to bytes to pass to `add_image_data()`. With `ImageRef::Path`, the discovery functions return file paths directly — no encode/decode needed. Code to remove as part of this issue: - `discovery::image_file_to_png_base64()` — fully superseded by `image_io` in hero_lib - The base64 decode block in `generator.rs:133-138` - The `base64` crate dep from `hero_slides_lib/Cargo.toml` Signature cascade — the following all change return type from `Vec<(String, String)>` to `Vec<ImageRef>`, requiring updates at every call site: - `collect_background_images()`, `collect_selected_background_images()`, `collect_selected_background_images_with_meta()` in `discovery.rs` - `generate_slide()` parameter `background_images: &[(String, String)]` - Callers: `agent.rs:1004`, `deck_module.rs` (3 sites) The markdown image-ref parsing work is self-contained and testable in isolation. Main risk remains prompt labeling — plan for iteration on the multi-image prompt format.