lhumina_code/hero_lib

Fork 1

ai client ready for image generation #124

New issue

Open

opened 2026-04-15 10:16:55 +00:00 by despiegk · 4 comments

despiegk commented

2026-04-15 10:16:55 +00:00

Owner

see crates/ai/src

we need to improve the client for
https://openrouter.ai/google/gemini-3.1-flash-image-preview/api

model = Nano Banana 2

we need to see how to also add images to the context to generate based on one or more images

for images see https://openrouter.ai/docs/guides/overview/multimodal/image-generation#image-aspect-ratio-configuration

we need a clean API extension to allow to generate images also starting from image(s)

make an example as well

Below is a practical spec for using OpenRouter with Nano Banana 2 for image generation and image-conditioned generation.

OpenRouter image generation spec for Nano Banana 2

Model
Use:

google/gemini-3.1-flash-image-preview

This is the OpenRouter model page for Nano Banana 2. OpenRouter lists it as released on February 26, 2026, with 65,536 context, and states that aspect ratios are controlled through the image_config parameter. (OpenRouter)

Endpoint
Use the normal OpenRouter chat endpoint:

POST https://openrouter.ai/api/v1/chat/completions

OpenRouter’s API is designed to be close to the OpenAI Chat Completions format, and the full schema is published in OpenAPI form. (OpenRouter)

1. Core rules

For image generation with Gemini image-capable models on OpenRouter, send:

"modalities": ["image", "text"]

OpenRouter’s image generation guide explicitly says Gemini-style models that output both text and images should use ["image", "text"]. (OpenRouter)

To pass one or more input images, use message content arrays with items of type:

{
  "type": "image_url",
  "image_url": {
    "url": "..."
  }
}

OpenRouter supports both public image URLs and base64 data URLs for image inputs, and multiple images can be sent as separate content items in the same message. OpenRouter also recommends putting the text prompt first, then the images. (OpenRouter)

2. Supported input formats

For input images, OpenRouter supports:

public URL, such as https://.../image.jpg
base64 data URL, such as data:image/jpeg;base64,...

Supported image MIME types in the image input guide are:

image/png
image/jpeg
image/webp
image/gif (OpenRouter)

3. Image output format

Generated images come back in the assistant message under an images field. The docs show this structure:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "I've generated a beautiful sunset image for you.",
        "images": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,..."
            }
          }
        ]
      }
    }
  ]
}

OpenRouter says the generated images are returned as base64-encoded data URLs, typically PNG, and some models can return multiple images in one response. (OpenRouter)

4. Image configuration for Nano Banana 2

OpenRouter supports image_config for image-capable models. For this model you can control:

Aspect ratio

Standard supported aspect ratios include:

1:1
2:3
3:2
3:4
4:3
4:5
5:4
9:16
16:9
21:9

For google/gemini-3.1-flash-image-preview specifically, OpenRouter also lists extended ratios:

1:4
4:1
1:8
8:1 (OpenRouter)

Image size

Supported values:

0.5K — listed as supported by this Gemini model only
1K
2K
4K (OpenRouter)

Example

"image_config": {
  "aspect_ratio": "16:9",
  "image_size": "2K"
}

OpenRouter’s image generation guide shows aspect_ratio and image_size being used together in the same request. (OpenRouter)

5. Minimal spec: text-to-image

Request

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "HTTP-Referer: https://your-app.example" \
  -H "X-OpenRouter-Title: Your App Name" \
  -d '{
    "model": "google/gemini-3.1-flash-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Generate a clean product hero shot of a futuristic white drone on a seamless studio background, ultra realistic, soft daylight, premium photography."
      }
    ],
    "modalities": ["image", "text"],
    "image_config": {
      "aspect_ratio": "16:9",
      "image_size": "2K"
    },
    "stream": false
  }'

The HTTP-Referer and X-OpenRouter-Title headers are optional OpenRouter-specific headers. (OpenRouter)

Response handling

Read:

choices[0].message.images[0].image_url.url

That value will be a base64 data URL. (OpenRouter)

6. Spec: image-to-image or image-conditioned generation with one image

Use a content array in the user message. Put the instruction first, then the image.

Request

{
  "model": "google/gemini-3.1-flash-image-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Use this reference image as the base composition. Keep the subject identity and pose similar, but restyle it as a premium cinematic product campaign with softer lighting, cleaner background, and more polished materials."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/reference.jpg"
          }
        }
      ]
    }
  ],
  "modalities": ["image", "text"],
  "image_config": {
    "aspect_ratio": "4:5",
    "image_size": "2K"
  },
  "stream": false
}

The OpenRouter docs say images are sent in the messages array as image_url content parts, and multiple image parts may be included in one request. (OpenRouter)

7. Spec: image-conditioned generation with multiple images

You can provide multiple images as separate image_url parts in the same content array.

Request

{
  "model": "google/gemini-3.1-flash-image-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Create a final image that combines these references: use image 1 for character identity, image 2 for outfit materials, and image 3 for background mood. Produce a coherent single photorealistic editorial shot."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/identity.jpg"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/outfit.jpg"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/mood.jpg"
          }
        }
      ]
    }
  ],
  "modalities": ["image", "text"],
  "image_config": {
    "aspect_ratio": "3:2",
    "image_size": "2K"
  }
}

OpenRouter does not give a single universal maximum image count in the general docs; it says the number of images allowed varies by provider and model. So treat multi-image support as supported in structure, but keep your app ready for provider/model-specific limits. (OpenRouter)

8. Base64 local file variant

For private or local images, send a data URL.

Example content part

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..."
  }
}

OpenRouter explicitly documents base64 image inputs for local/private images. (OpenRouter)

9. Streaming behavior

You can stream image generation responses. In streaming mode, image chunks appear under:

choices[0].delta.images

The image generation guide shows streamed image events arriving in the delta.images field. (OpenRouter)

Streaming note

For most apps, non-streaming is simpler for image generation unless you specifically want incremental progress handling.

10. Recommended prompt structure for reference-image workflows

For better control, use this instruction pattern:

Task:
Generate one final image.

Use references:
- Image 1: subject identity / face / pose
- Image 2: clothing or material reference
- Image 3: environment / mood / color language

Keep:
- [list exact things to preserve]

Change:
- [list exact things to change]

Output style:
- photorealistic / product photography / editorial / cinematic / minimal / etc.

Camera & composition:
- lens feel, shot type, framing, perspective

Lighting:
- daylight / studio softbox / dramatic rim light / etc.

Aspect ratio:
- 16:9 / 4:5 / etc.

This part is my recommendation based on the input format OpenRouter supports and how multimodal prompts are parsed.

11. Important implementation detail: raw JSON vs SDK field names

OpenRouter’s general API schema shows raw request content parts using snake_case:

{
  "type": "image_url",
  "image_url": { "url": "..." }
}

But the SDK examples on the docs page show imageUrl in code examples. That means:

raw HTTP JSON: use image_url
SDK wrapper: may use imageUrl depending on the SDK language/binding (OpenRouter)

For a direct REST client, I would standardize on this raw JSON shape:

{
  "type": "image_url",
  "image_url": {
    "url": "..."
  }
}

12. Production guidance

Use these defaults:

{
  "model": "google/gemini-3.1-flash-image-preview",
  "modalities": ["image", "text"],
  "stream": false,
  "image_config": {
    "aspect_ratio": "1:1",
    "image_size": "1K"
  }
}

Then expose these app-level controls:

prompt
reference_images[]
aspect_ratio
image_size
stream
return_text toggle if you want to keep or ignore message.content

Also validate that your parser checks for message.images before assuming output exists, which OpenRouter recommends in its best practices. (OpenRouter)

13. Full reusable spec block

# OpenRouter Nano Banana 2 Image Generation Spec

## Endpoint
POST /api/v1/chat/completions

## Model
google/gemini-3.1-flash-image-preview

## Required fields
- model
- messages
- modalities: ["image", "text"]

## Optional fields
- image_config.aspect_ratio
- image_config.image_size
- stream
- temperature
- max_tokens

## User message formats

### Text only
content: "Generate ..."

### Text + one or more reference images
content: [
  { "type": "text", "text": "Generate ... using these references ..." },
  { "type": "image_url", "image_url": { "url": "<public-url-or-data-url>" } },
  { "type": "image_url", "image_url": { "url": "<public-url-or-data-url>" } }
]

## Input image sources
- Public HTTPS URLs
- Base64 data URLs

## Output parsing
choices[0].message.images[*].image_url.url

## Streaming parsing
choices[0].delta.images[*].image_url.url

## Supported image_config values
aspect_ratio:
- 1:1
- 2:3
- 3:2
- 3:4
- 4:3
- 4:5
- 5:4
- 9:16
- 16:9
- 21:9
- 1:4
- 4:1
- 1:8
- 8:1

image_size:
- 0.5K
- 1K
- 2K
- 4K

If you want, I can turn this into a clean developer markdown spec with curl + TypeScript + Python examples.

REMARK: make sure images which are given from local pathgs can be reformatted to approprate format e.g. from jpeg to png or other way around and in right size (nr of bytes)

see crates/ai/src we need to improve the client for https://openrouter.ai/google/gemini-3.1-flash-image-preview/api model = Nano Banana 2 we need to see how to also add images to the context to generate based on one or more images for images see https://openrouter.ai/docs/guides/overview/multimodal/image-generation#image-aspect-ratio-configuration we need a clean API extension to allow to generate images also starting from image(s) make an example as well Below is a practical spec for using **OpenRouter** with **Nano Banana 2** for image generation and image-conditioned generation. ## OpenRouter image generation spec for Nano Banana 2 **Model** Use: ```txt google/gemini-3.1-flash-image-preview ``` This is the OpenRouter model page for **Nano Banana 2**. OpenRouter lists it as released on **February 26, 2026**, with **65,536 context**, and states that aspect ratios are controlled through the `image_config` parameter. ([OpenRouter][1]) **Endpoint** Use the normal OpenRouter chat endpoint: ```txt POST https://openrouter.ai/api/v1/chat/completions ``` OpenRouter’s API is designed to be close to the OpenAI Chat Completions format, and the full schema is published in OpenAPI form. ([OpenRouter][2]) --- ## 1. Core rules For image generation with Gemini image-capable models on OpenRouter, send: ```json "modalities": ["image", "text"] ``` OpenRouter’s image generation guide explicitly says Gemini-style models that output both text and images should use `["image", "text"]`. ([OpenRouter][3]) To pass one or more input images, use **message content arrays** with items of type: ```json { "type": "image_url", "image_url": { "url": "..." } } ``` OpenRouter supports both **public image URLs** and **base64 data URLs** for image inputs, and multiple images can be sent as separate content items in the same message. OpenRouter also recommends putting the **text prompt first**, then the images. ([OpenRouter][4]) --- ## 2. Supported input formats For input images, OpenRouter supports: * public URL, such as `https://.../image.jpg` * base64 data URL, such as `data:image/jpeg;base64,...` Supported image MIME types in the image input guide are: * `image/png` * `image/jpeg` * `image/webp` * `image/gif` ([OpenRouter][4]) --- ## 3. Image output format Generated images come back in the assistant message under an `images` field. The docs show this structure: ```json { "choices": [ { "message": { "role": "assistant", "content": "I've generated a beautiful sunset image for you.", "images": [ { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } } ] } } ] } ``` OpenRouter says the generated images are returned as **base64-encoded data URLs**, typically PNG, and some models can return multiple images in one response. ([OpenRouter][3]) --- ## 4. Image configuration for Nano Banana 2 OpenRouter supports `image_config` for image-capable models. For this model you can control: ### Aspect ratio Standard supported aspect ratios include: * `1:1` * `2:3` * `3:2` * `3:4` * `4:3` * `4:5` * `5:4` * `9:16` * `16:9` * `21:9` For **google/gemini-3.1-flash-image-preview** specifically, OpenRouter also lists extended ratios: * `1:4` * `4:1` * `1:8` * `8:1` ([OpenRouter][3]) ### Image size Supported values: * `0.5K` — listed as supported by this Gemini model only * `1K` * `2K` * `4K` ([OpenRouter][3]) ### Example ```json "image_config": { "aspect_ratio": "16:9", "image_size": "2K" } ``` OpenRouter’s image generation guide shows `aspect_ratio` and `image_size` being used together in the same request. ([OpenRouter][3]) --- ## 5. Minimal spec: text-to-image ### Request ```bash curl https://openrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -H "Content-Type: application/json" \ -H "HTTP-Referer: https://your-app.example" \ -H "X-OpenRouter-Title: Your App Name" \ -d '{ "model": "google/gemini-3.1-flash-image-preview", "messages": [ { "role": "user", "content": "Generate a clean product hero shot of a futuristic white drone on a seamless studio background, ultra realistic, soft daylight, premium photography." } ], "modalities": ["image", "text"], "image_config": { "aspect_ratio": "16:9", "image_size": "2K" }, "stream": false }' ``` The `HTTP-Referer` and `X-OpenRouter-Title` headers are optional OpenRouter-specific headers. ([OpenRouter][1]) ### Response handling Read: ```json choices[0].message.images[0].image_url.url ``` That value will be a base64 data URL. ([OpenRouter][3]) --- ## 6. Spec: image-to-image or image-conditioned generation with one image Use a **content array** in the user message. Put the instruction first, then the image. ### Request ```json { "model": "google/gemini-3.1-flash-image-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Use this reference image as the base composition. Keep the subject identity and pose similar, but restyle it as a premium cinematic product campaign with softer lighting, cleaner background, and more polished materials." }, { "type": "image_url", "image_url": { "url": "https://example.com/reference.jpg" } } ] } ], "modalities": ["image", "text"], "image_config": { "aspect_ratio": "4:5", "image_size": "2K" }, "stream": false } ``` The OpenRouter docs say images are sent in the `messages` array as `image_url` content parts, and multiple image parts may be included in one request. ([OpenRouter][4]) --- ## 7. Spec: image-conditioned generation with multiple images You can provide multiple images as separate `image_url` parts in the same content array. ### Request ```json { "model": "google/gemini-3.1-flash-image-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Create a final image that combines these references: use image 1 for character identity, image 2 for outfit materials, and image 3 for background mood. Produce a coherent single photorealistic editorial shot." }, { "type": "image_url", "image_url": { "url": "https://example.com/identity.jpg" } }, { "type": "image_url", "image_url": { "url": "https://example.com/outfit.jpg" } }, { "type": "image_url", "image_url": { "url": "https://example.com/mood.jpg" } } ] } ], "modalities": ["image", "text"], "image_config": { "aspect_ratio": "3:2", "image_size": "2K" } } ``` OpenRouter does not give a single universal maximum image count in the general docs; it says the number of images allowed varies by **provider and model**. So treat multi-image support as supported in structure, but keep your app ready for provider/model-specific limits. ([OpenRouter][4]) --- ## 8. Base64 local file variant For private or local images, send a data URL. ### Example content part ```json { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..." } } ``` OpenRouter explicitly documents base64 image inputs for local/private images. ([OpenRouter][4]) --- ## 9. Streaming behavior You can stream image generation responses. In streaming mode, image chunks appear under: ```json choices[0].delta.images ``` The image generation guide shows streamed image events arriving in the `delta.images` field. ([OpenRouter][3]) ### Streaming note For most apps, non-streaming is simpler for image generation unless you specifically want incremental progress handling. --- ## 10. Recommended prompt structure for reference-image workflows For better control, use this instruction pattern: ```txt Task: Generate one final image. Use references: - Image 1: subject identity / face / pose - Image 2: clothing or material reference - Image 3: environment / mood / color language Keep: - [list exact things to preserve] Change: - [list exact things to change] Output style: - photorealistic / product photography / editorial / cinematic / minimal / etc. Camera & composition: - lens feel, shot type, framing, perspective Lighting: - daylight / studio softbox / dramatic rim light / etc. Aspect ratio: - 16:9 / 4:5 / etc. ``` This part is my recommendation based on the input format OpenRouter supports and how multimodal prompts are parsed. --- ## 11. Important implementation detail: raw JSON vs SDK field names OpenRouter’s general API schema shows raw request content parts using snake_case: ```json { "type": "image_url", "image_url": { "url": "..." } } ``` But the SDK examples on the docs page show `imageUrl` in code examples. That means: * **raw HTTP JSON**: use `image_url` * **SDK wrapper**: may use `imageUrl` depending on the SDK language/binding ([OpenRouter][2]) For a direct REST client, I would standardize on this raw JSON shape: ```json { "type": "image_url", "image_url": { "url": "..." } } ``` --- ## 12. Production guidance Use these defaults: ```json { "model": "google/gemini-3.1-flash-image-preview", "modalities": ["image", "text"], "stream": false, "image_config": { "aspect_ratio": "1:1", "image_size": "1K" } } ``` Then expose these app-level controls: * `prompt` * `reference_images[]` * `aspect_ratio` * `image_size` * `stream` * `return_text` toggle if you want to keep or ignore `message.content` Also validate that your parser checks for `message.images` before assuming output exists, which OpenRouter recommends in its best practices. ([OpenRouter][3]) --- ## 13. Full reusable spec block ```md # OpenRouter Nano Banana 2 Image Generation Spec ## Endpoint POST /api/v1/chat/completions ## Model google/gemini-3.1-flash-image-preview ## Required fields - model - messages - modalities: ["image", "text"] ## Optional fields - image_config.aspect_ratio - image_config.image_size - stream - temperature - max_tokens ## User message formats ### Text only content: "Generate ..." ### Text + one or more reference images content: [ { "type": "text", "text": "Generate ... using these references ..." }, { "type": "image_url", "image_url": { "url": "<public-url-or-data-url>" } }, { "type": "image_url", "image_url": { "url": "<public-url-or-data-url>" } } ] ## Input image sources - Public HTTPS URLs - Base64 data URLs ## Output parsing choices[0].message.images[*].image_url.url ## Streaming parsing choices[0].delta.images[*].image_url.url ## Supported image_config values aspect_ratio: - 1:1 - 2:3 - 3:2 - 3:4 - 4:3 - 4:5 - 5:4 - 9:16 - 16:9 - 21:9 - 1:4 - 4:1 - 1:8 - 8:1 image_size: - 0.5K - 1K - 2K - 4K ``` If you want, I can turn this into a clean **developer markdown spec** with **curl + TypeScript + Python examples**. [1]: https://openrouter.ai/google/gemini-3.1-flash-image-preview/api "Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) – API Quickstart" [2]: https://openrouter.ai/docs/api/reference/overview "OpenRouter API Reference | Complete API Documentation | OpenRouter | Documentation" [3]: https://openrouter.ai/docs/guides/overview/multimodal/image-generation "OpenRouter Image Generation | Complete Documentation | OpenRouter | Documentation" [4]: https://openrouter.ai/docs/guides/overview/multimodal/images "OpenRouter Image Inputs | Complete Documentation | OpenRouter | Documentation" REMARK: make sure images which are given from local pathgs can be reformatted to approprate format e.g. from jpeg to png or other way around and in right size (nr of bytes)

despiegk commented

2026-04-15 10:28:22 +00:00

Author

Owner

Implementation Spec for Issue #124

Objective

Extend the herolib_ai crate so the existing Gemini 3.1 Flash Image Preview integration supports image-conditioned generation (one or more reference images, including from local files), keeps the current text-to-image path working, and exposes a clean, idiomatic Rust builder API. Add a local-image loader that detects MIME, optionally re-encodes (PNG/JPEG/WebP) and resizes to fit a max-byte budget. Ship a runnable example covering both flows.

Requirements

Reuse the existing OpenRouter plumbing in crates/ai/src/client/mod.rs (send_image_request, parse_image_response) — no new Provider variant.
Public builder ImageGenerationRequest with: .prompt(..), .add_image_url(..), .add_image_data(mime, bytes), .add_image_path(path), .add_image_path_with(path, ImageLoadOptions), .aspect_ratio(..), .image_size(..), .model(..), .execute(&AiClient) -> AiResult<ImageGenerationResult>.
Local image loader (image_io submodule) supporting PNG / JPEG / WebP / GIF; functions to: detect MIME from extension + magic bytes, re-encode between formats, resize to fit a max-bytes budget by iteratively shrinking dimensions and/or lowering JPEG/WebP quality.
Result type returns Vec<GeneratedImage> { bytes: Vec<u8>, mime: String, data_url: String } plus optional accompanying text and the model used. Keep the legacy ImageGenerationResponse type as a re-export/alias for back-compat with existing examples/tests.
Message construction: text part FIRST, then image parts, in a single user message using existing ContentPart::ImageUrl / ImageUrlInput.
Modalities ["image","text"] already wired; reuse.
Add image crate (0.25, default-features off, features png/jpeg/webp/gif) behind a default-on image-io cargo feature so consumers who don't need local-file support keep their build slim.
Example in crates/ai/examples/ doing both pure text-to-image and image-conditioned editing from a local PNG/JPEG path.
Rust 1.92, edition 2024, thiserror for new errors, doc-comments on every public item. Sync (no tokio) — match existing ureq pattern.

Files to Modify/Create

crates/ai/Cargo.toml — add optional image dep + image-io feature (default-on).
crates/ai/src/image_generation/mod.rs — declare new submodules; re-exports.
crates/ai/src/image_generation/image_io.rs — NEW. ImageLoadOptions, LoadedImage, load_image_from_path, reencode, fit_to_byte_budget, MIME helpers.
crates/ai/src/image_generation/request.rs — NEW. ImageGenerationRequest builder, ImageInput, GeneratedImage, ImageGenerationResult, .execute(&AiClient).
crates/ai/src/error.rs — add AiError::ImageIo(String) with From<image::ImageError> under feature.
crates/ai/src/client/mod.rs — expose image_request() entry point; refactor response parsing to Vec<GeneratedImage>; keep legacy generate_image* methods.
crates/ai/src/lib.rs — re-export new public types.
crates/ai/examples/image_generation_builder.rs — NEW. Text-to-image + image-conditioned from local file.
crates/ai/examples/README.md — one-line entry.

Implementation Plan

Step 1: Add `image` crate dependency behind feature

Files: crates/ai/Cargo.toml

image = { version = "0.25", default-features = false, features = ["png","jpeg","webp","gif"], optional = true }
image-io feature; include in default.
Dependencies: none

Step 2: Implement image I/O helpers

Files: crates/ai/src/image_generation/image_io.rs

ImageFormatHint { Png, Jpeg, Webp, Gif } with as_mime() / from_mime().
ImageLoadOptions { target_format, max_bytes, max_dimension, jpeg_quality } builder. Defaults: keep format, 4 MiB, no dim cap, quality 85.
LoadedImage { bytes, mime } with to_data_url().
detect_mime_from_path(&Path) (extension) + detect_mime_from_bytes (magic bytes / image::guess_format).
load_image_from_path(&Path, &ImageLoadOptions) -> AiResult<LoadedImage> — decodes, optionally converts, shrinks dimensions / drops quality until under max_bytes (max 6 iterations).
Unit tests with a synthetic 2000x2000 PNG shrunk to 50 KiB budget.
Gate module behind #[cfg(feature = "image-io")]; stub errors otherwise.
Dependencies: Step 1

Step 3: Add `ImageIo` variant to `AiError`

Files: crates/ai/src/error.rs

#[error("image io error: {0}")] ImageIo(String) + From<image::ImageError> under feature.
Dependencies: Step 1

Step 4: Build `ImageGenerationRequest` builder & result types

Files: crates/ai/src/image_generation/request.rs

ImageInput { Url(String), DataUrl(String), Bytes { mime, bytes }, Path(PathBuf, ImageLoadOptions) } with into_data_url().
ImageGenerationRequest { model, prompt, images, config } with builder methods.
GeneratedImage { bytes, mime, data_url } / ImageGenerationResult { images, text, model }.
execute(self, client: &AiClient) -> AiResult<ImageGenerationResult> — resolves inputs, builds multipart user message (text first, images after), reuses client plumbing.
Dependencies: Steps 2, 3

Step 5: Wire builder into the client; multi-image response parsing

Files: crates/ai/src/client/mod.rs, crates/ai/src/image_generation/mod.rs, crates/ai/src/lib.rs

Refactor parse_image_response → parse_all_images (Vec<GeneratedImage> + text). Legacy generate_image_with_options keeps wrapping first image.
Add AiClient::image_request() -> ImageGenerationRequest.
Re-exports wired up.
Dependencies: Step 4

Step 6: Example program

Files: crates/ai/examples/image_generation_builder.rs, crates/ai/examples/README.md

Example 1: text-to-image via AiClient::from_env().image_request().prompt(..).aspect_ratio(..).execute()? → save.
Example 2: image-conditioned from a local JPEG with ImageLoadOptions::new().with_target_format(Png).with_max_bytes(4*1024*1024) → save result.
Print byte counts and elapsed times.
Dependencies: Step 5

Step 7: Tests

Files: crates/ai/src/image_generation/image_io.rs, crates/ai/src/image_generation/request.rs

MIME round-trip; byte-budget shrink; builder serializes to JSON with modalities, prompt first, images after, valid data: URLs.
cargo test -p herolib_ai + cargo build -p herolib_ai --examples.
Dependencies: Steps 2, 4, 5

Acceptance Criteria

cargo build -p herolib_ai and cargo build -p herolib_ai --examples succeed on Rust 1.92 / edition 2024.
cargo test -p herolib_ai passes; new tests cover MIME detect, byte-budget shrink, builder JSON shape.
ImageGenerationRequest exposes the full builder surface and is documented.
Local PNG/JPEG/WebP/GIF paths load; conversions PNG↔JPEG↔WebP work; max_bytes budget enforced.
Outgoing request sets modalities=["image","text"], text first, images after, valid data: URLs.
Response parsing returns Vec<GeneratedImage>.
Existing generate_image* and image_generation_test.rs still compile and behave as before.
All public items have /// doc-comments.

Notes

Reuse Message::user_with_images(text, &[(mime, b64), ...]) already in types.rs.
image 0.25 with minimal features keeps binary size reasonable.
Byte-budget loop: PNG halves dimensions (lossless); JPEG/WebP steps quality (85→75→65→55→45) then halves.
No tokio; everything sync via ureq.
Provider enum unchanged; OpenRouter remains the only mapping for Model::Gemini3_1FlashImagePreview.

## Implementation Spec for Issue #124 ### Objective Extend the `herolib_ai` crate so the existing Gemini 3.1 Flash Image Preview integration supports image-conditioned generation (one or more reference images, including from local files), keeps the current text-to-image path working, and exposes a clean, idiomatic Rust builder API. Add a local-image loader that detects MIME, optionally re-encodes (PNG/JPEG/WebP) and resizes to fit a max-byte budget. Ship a runnable example covering both flows. ### Requirements - Reuse the existing OpenRouter plumbing in `crates/ai/src/client/mod.rs` (`send_image_request`, `parse_image_response`) — no new `Provider` variant. - Public builder `ImageGenerationRequest` with: `.prompt(..)`, `.add_image_url(..)`, `.add_image_data(mime, bytes)`, `.add_image_path(path)`, `.add_image_path_with(path, ImageLoadOptions)`, `.aspect_ratio(..)`, `.image_size(..)`, `.model(..)`, `.execute(&AiClient) -> AiResult<ImageGenerationResult>`. - Local image loader (`image_io` submodule) supporting PNG / JPEG / WebP / GIF; functions to: detect MIME from extension + magic bytes, re-encode between formats, resize to fit a max-bytes budget by iteratively shrinking dimensions and/or lowering JPEG/WebP quality. - Result type returns `Vec<GeneratedImage> { bytes: Vec<u8>, mime: String, data_url: String }` plus optional accompanying text and the model used. Keep the legacy `ImageGenerationResponse` type as a re-export/alias for back-compat with existing examples/tests. - Message construction: text part FIRST, then image parts, in a single `user` message using existing `ContentPart::ImageUrl` / `ImageUrlInput`. - Modalities `["image","text"]` already wired; reuse. - Add `image` crate (0.25, default-features off, features png/jpeg/webp/gif) behind a default-on `image-io` cargo feature so consumers who don't need local-file support keep their build slim. - Example in `crates/ai/examples/` doing both pure text-to-image and image-conditioned editing from a local PNG/JPEG path. - Rust 1.92, edition 2024, `thiserror` for new errors, doc-comments on every public item. Sync (no tokio) — match existing `ureq` pattern. ### Files to Modify/Create - `crates/ai/Cargo.toml` — add optional `image` dep + `image-io` feature (default-on). - `crates/ai/src/image_generation/mod.rs` — declare new submodules; re-exports. - `crates/ai/src/image_generation/image_io.rs` — NEW. `ImageLoadOptions`, `LoadedImage`, `load_image_from_path`, `reencode`, `fit_to_byte_budget`, MIME helpers. - `crates/ai/src/image_generation/request.rs` — NEW. `ImageGenerationRequest` builder, `ImageInput`, `GeneratedImage`, `ImageGenerationResult`, `.execute(&AiClient)`. - `crates/ai/src/error.rs` — add `AiError::ImageIo(String)` with `From<image::ImageError>` under feature. - `crates/ai/src/client/mod.rs` — expose `image_request()` entry point; refactor response parsing to `Vec<GeneratedImage>`; keep legacy `generate_image*` methods. - `crates/ai/src/lib.rs` — re-export new public types. - `crates/ai/examples/image_generation_builder.rs` — NEW. Text-to-image + image-conditioned from local file. - `crates/ai/examples/README.md` — one-line entry. ### Implementation Plan #### Step 1: Add `image` crate dependency behind feature Files: `crates/ai/Cargo.toml` - `image = { version = "0.25", default-features = false, features = ["png","jpeg","webp","gif"], optional = true }` - `image-io` feature; include in `default`. Dependencies: none #### Step 2: Implement image I/O helpers Files: `crates/ai/src/image_generation/image_io.rs` - `ImageFormatHint { Png, Jpeg, Webp, Gif }` with `as_mime()` / `from_mime()`. - `ImageLoadOptions { target_format, max_bytes, max_dimension, jpeg_quality }` builder. Defaults: keep format, 4 MiB, no dim cap, quality 85. - `LoadedImage { bytes, mime }` with `to_data_url()`. - `detect_mime_from_path(&Path)` (extension) + `detect_mime_from_bytes` (magic bytes / `image::guess_format`). - `load_image_from_path(&Path, &ImageLoadOptions) -> AiResult<LoadedImage>` — decodes, optionally converts, shrinks dimensions / drops quality until under `max_bytes` (max 6 iterations). - Unit tests with a synthetic 2000x2000 PNG shrunk to 50 KiB budget. - Gate module behind `#[cfg(feature = "image-io")]`; stub errors otherwise. Dependencies: Step 1 #### Step 3: Add `ImageIo` variant to `AiError` Files: `crates/ai/src/error.rs` - `#[error("image io error: {0}")] ImageIo(String)` + `From<image::ImageError>` under feature. Dependencies: Step 1 #### Step 4: Build `ImageGenerationRequest` builder & result types Files: `crates/ai/src/image_generation/request.rs` - `ImageInput { Url(String), DataUrl(String), Bytes { mime, bytes }, Path(PathBuf, ImageLoadOptions) }` with `into_data_url()`. - `ImageGenerationRequest { model, prompt, images, config }` with builder methods. - `GeneratedImage { bytes, mime, data_url }` / `ImageGenerationResult { images, text, model }`. - `execute(self, client: &AiClient) -> AiResult<ImageGenerationResult>` — resolves inputs, builds multipart user message (text first, images after), reuses client plumbing. Dependencies: Steps 2, 3 #### Step 5: Wire builder into the client; multi-image response parsing Files: `crates/ai/src/client/mod.rs`, `crates/ai/src/image_generation/mod.rs`, `crates/ai/src/lib.rs` - Refactor `parse_image_response` → `parse_all_images` (`Vec<GeneratedImage>` + text). Legacy `generate_image_with_options` keeps wrapping first image. - Add `AiClient::image_request() -> ImageGenerationRequest`. - Re-exports wired up. Dependencies: Step 4 #### Step 6: Example program Files: `crates/ai/examples/image_generation_builder.rs`, `crates/ai/examples/README.md` - Example 1: text-to-image via `AiClient::from_env().image_request().prompt(..).aspect_ratio(..).execute()?` → save. - Example 2: image-conditioned from a local JPEG with `ImageLoadOptions::new().with_target_format(Png).with_max_bytes(4*1024*1024)` → save result. - Print byte counts and elapsed times. Dependencies: Step 5 #### Step 7: Tests Files: `crates/ai/src/image_generation/image_io.rs`, `crates/ai/src/image_generation/request.rs` - MIME round-trip; byte-budget shrink; builder serializes to JSON with `modalities`, prompt first, images after, valid `data:` URLs. - `cargo test -p herolib_ai` + `cargo build -p herolib_ai --examples`. Dependencies: Steps 2, 4, 5 ### Acceptance Criteria - [ ] `cargo build -p herolib_ai` and `cargo build -p herolib_ai --examples` succeed on Rust 1.92 / edition 2024. - [ ] `cargo test -p herolib_ai` passes; new tests cover MIME detect, byte-budget shrink, builder JSON shape. - [ ] `ImageGenerationRequest` exposes the full builder surface and is documented. - [ ] Local PNG/JPEG/WebP/GIF paths load; conversions PNG↔JPEG↔WebP work; `max_bytes` budget enforced. - [ ] Outgoing request sets `modalities=["image","text"]`, text first, images after, valid `data:` URLs. - [ ] Response parsing returns `Vec<GeneratedImage>`. - [ ] Existing `generate_image*` and `image_generation_test.rs` still compile and behave as before. - [ ] All public items have `///` doc-comments. ### Notes - Reuse `Message::user_with_images(text, &[(mime, b64), ...])` already in `types.rs`. - `image` 0.25 with minimal features keeps binary size reasonable. - Byte-budget loop: PNG halves dimensions (lossless); JPEG/WebP steps quality (85→75→65→55→45) then halves. - No tokio; everything sync via `ureq`. - `Provider` enum unchanged; OpenRouter remains the only mapping for `Model::Gemini3_1FlashImagePreview`.

despiegk commented

2026-04-16 03:05:31 +00:00

Author

Owner

Test Results

cargo check -p herolib_ai: OK
cargo build -p herolib_ai --examples: OK
cargo test -p herolib_ai: 61 passed, 0 failed
Integration tests (nemotron_super_integration): 4 passed, 2 ignored (require network), 0 failed
Doc-tests: 7 passed, 0 failed

Total: 72 passed, 0 failed.

New tests in image_generation::image_io:

test_detect_mime_from_path
test_format_roundtrip
test_load_small_image_no_resize
test_byte_budget_shrinks_large_image (2000x2000 PNG shrunk into a 50 KiB JPEG budget)
test_format_conversion_png_to_jpeg
test_max_dimension
test_data_url

New tests in image_generation::request:

test_builder_basic
test_builder_with_url
test_image_input_url_passthrough
test_image_input_data_url_passthrough
test_image_input_bytes
test_decode_data_url

## Test Results - cargo check -p herolib_ai: OK - cargo build -p herolib_ai --examples: OK - cargo test -p herolib_ai: 61 passed, 0 failed - Integration tests (nemotron_super_integration): 4 passed, 2 ignored (require network), 0 failed - Doc-tests: 7 passed, 0 failed Total: 72 passed, 0 failed. New tests in `image_generation::image_io`: - `test_detect_mime_from_path` - `test_format_roundtrip` - `test_load_small_image_no_resize` - `test_byte_budget_shrinks_large_image` (2000x2000 PNG shrunk into a 50 KiB JPEG budget) - `test_format_conversion_png_to_jpeg` - `test_max_dimension` - `test_data_url` New tests in `image_generation::request`: - `test_builder_basic` - `test_builder_with_url` - `test_image_input_url_passthrough` - `test_image_input_data_url_passthrough` - `test_image_input_bytes` - `test_decode_data_url`

despiegk commented

2026-04-16 03:06:20 +00:00

Author

Owner

Implementation Summary

Files Added

crates/ai/src/image_generation/image_io.rs — image loader with MIME detection, format conversion (PNG/JPEG/WebP/GIF), and byte-budget resizing (quality stepping for lossy formats, dimension halving for lossless, max 6 iterations).
crates/ai/src/image_generation/request.rs — ImageGenerationRequest builder; ImageInput enum (Url / DataUrl / Bytes / Path); GeneratedImage and ImageGenerationResult types; multi-image response parsing via parse_all_images.
crates/ai/examples/image_generation_builder.rs — runnable example covering text-to-image and image-conditioned generation from a local path.

Files Modified

crates/ai/Cargo.toml — added optional image = "0.25" (png/jpeg/webp/gif features only) and base64 = "0.22" behind a default-on image-io feature.
crates/ai/src/error.rs — added AiError::ImageIo(String) variant.
crates/ai/src/image_generation/mod.rs — declared pub mod image_io; (feature-gated) and pub mod request;.
crates/ai/src/client/mod.rs — added AiClient::image_request() builder entry point; exposed provider_config, send_image_request_for, record_usage_pub as pub(crate) helpers for the builder; promoted base64_decode to pub(crate).
crates/ai/src/lib.rs — re-exports ImageGenerationRequest, ImageGenerationResult, GeneratedImage, ImageInput, and (behind image-io) ImageFormat, ImageLoadOptions, LoadedImage.
crates/ai/examples/README.md — entry for the new builder example.

Design Notes

Provider enum untouched; OpenRouter remains the only mapping for Model::Gemini3_1FlashImagePreview.
Existing generate_image* methods are unchanged and keep working for back-compat — the example image_generation_test.rs still compiles and runs.
Message construction follows the OpenRouter requirement: text part FIRST, image parts after; modalities=["image","text"]; reference images sent as data:<mime>;base64,<data> URLs.
Sync-only (ureq), no tokio.

Test Results

cargo check: OK
cargo build --examples: OK
cargo test: 61 unit + 4 integration + 7 doc-tests, 0 failures.

## Implementation Summary ### Files Added - `crates/ai/src/image_generation/image_io.rs` — image loader with MIME detection, format conversion (PNG/JPEG/WebP/GIF), and byte-budget resizing (quality stepping for lossy formats, dimension halving for lossless, max 6 iterations). - `crates/ai/src/image_generation/request.rs` — `ImageGenerationRequest` builder; `ImageInput` enum (`Url` / `DataUrl` / `Bytes` / `Path`); `GeneratedImage` and `ImageGenerationResult` types; multi-image response parsing via `parse_all_images`. - `crates/ai/examples/image_generation_builder.rs` — runnable example covering text-to-image and image-conditioned generation from a local path. ### Files Modified - `crates/ai/Cargo.toml` — added optional `image = "0.25"` (png/jpeg/webp/gif features only) and `base64 = "0.22"` behind a default-on `image-io` feature. - `crates/ai/src/error.rs` — added `AiError::ImageIo(String)` variant. - `crates/ai/src/image_generation/mod.rs` — declared `pub mod image_io;` (feature-gated) and `pub mod request;`. - `crates/ai/src/client/mod.rs` — added `AiClient::image_request()` builder entry point; exposed `provider_config`, `send_image_request_for`, `record_usage_pub` as `pub(crate)` helpers for the builder; promoted `base64_decode` to `pub(crate)`. - `crates/ai/src/lib.rs` — re-exports `ImageGenerationRequest`, `ImageGenerationResult`, `GeneratedImage`, `ImageInput`, and (behind `image-io`) `ImageFormat`, `ImageLoadOptions`, `LoadedImage`. - `crates/ai/examples/README.md` — entry for the new builder example. ### Design Notes - `Provider` enum untouched; OpenRouter remains the only mapping for `Model::Gemini3_1FlashImagePreview`. - Existing `generate_image*` methods are unchanged and keep working for back-compat — the example `image_generation_test.rs` still compiles and runs. - Message construction follows the OpenRouter requirement: text part FIRST, image parts after; `modalities=["image","text"]`; reference images sent as `data:<mime>;base64,<data>` URLs. - Sync-only (`ureq`), no tokio. ### Test Results - cargo check: OK - cargo build --examples: OK - cargo test: 61 unit + 4 integration + 7 doc-tests, 0 failures.