AI pipeline: incoherent scene generation — prompts generated blind, hardcoded count, no narrative arc #2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The current scene generation pipeline produces scenes that do not form a coherent story. Three distinct root causes combine to produce this:
1. Scene count is hardcoded to 5
In
workers/mod.rs,generate_scenes_workcalls:The
generate_scenesRPC method takes no count parameter, and the UI shows no control for it.Users have no way to say "I want a 3-scene short" vs "I want a 12-scene video". All projects start with exactly 5 scenes regardless of intent length, complexity, or target duration.
Fix needed: accept a
scene_countparameter from the client (reasonable range: 3–20); expose it as a number input in the New Project modal and in the Planning step.2. Image prompt and video prompt are generated in a single AI call — before any image exists
In
providers/ai.rs,generate_scenes()sends one request to the LLM that returns bothimage_promptandvideo_promptin the same JSON array:This means the video prompt (camera movement, subject action, atmosphere) is written before any image exists. The video prompt cannot account for:
When a video model receives a prompt that doesn't match the reference image, the result is visually incoherent motion or ignored prompts.
Fix needed: split generation into two separate AI calls:
This also means the
video_promptfield onSceneshould not be populated during scene planning — it should be derived later.3. No narrative coherence — scenes are generated as independent parallel items
The system prompt asks the LLM to "generate N scenes" in a single JSON array. The LLM treats each array entry as an independent creative unit with no enforced relationship to the others. There is no instruction to:
The result is a set of scenes that look like they come from different videos.
Fix needed: the scene generation prompt needs explicit narrative structure:
Alternatively: generate a narrative outline first (one short sentence per scene describing its role in the story), then generate image prompts grounded in that outline. This two-pass approach produces dramatically more coherent results.
Summary of required changes
5in workerRelated
See issue #1 for the user-facing side of this (field naming, structured brief input). The improvements here are backend/pipeline; they will also require a UI change to remove the video prompt textarea from the Planning step.
Implementation Spec — Issue #2
Objective
Fix three interconnected problems in the AI scene generation pipeline:
image_promptandvideo_prompt— split into two separate AI calls: image prompts first (all scenes at once), video prompts per scene only after an image is selected.Requirements
scene_countbetween 3 and 20 (inclusive) when creating a project or when triggeringgenerate_scenesfrom the Planning step.generate_scenesRPC method accepts an optionalscene_count: u32parameter (defaults to 5 when omitted for backward compatibility).video_promptis removed from the initial scene generation entirely; it starts as the arc sentence (a human-readable one-liner of that scene's narrative role) in theScenestruct.generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description)is called per scene at the point the user selects a candidate image. It uses the image URL/description as context.video_prompton a scene is editable and saveable, same asimage_prompt.openrpc.json) is updated to document the new parameter and method.scene.video_prompt(the clip generation worker) continue to work unchanged — they already readscene.video_promptfrom storage, so as long as it is populated beforegenerate_clipis called, no changes are needed there.Files to Modify
crates/hero_videos_server/src/providers/ai.rsgenerate_sceneswithgenerate_image_prompts(returns arc sentence + image prompt per scene) and addgenerate_video_prompt(single scene, post-image-selection)crates/hero_videos_server/src/workers/mod.rsscene_count: u32, callgenerate_image_prompts, store arc sentence as initialvideo_prompt; addgenerate_video_prompt_workcrates/hero_videos_server/src/rpc/mod.rsscene_count: Option<u32>togenerate_scenes; addgenerate_video_promptRPC handlercrates/hero_videos_server/src/main.rs--scene-countCLI arg toWorkerTask::Scenes; addWorkerTask::VideoPromptvariantcrates/hero_videos_server/openrpc.jsonscene_countparam; addgenerate_video_promptmethodcrates/hero_videos_web/templates/app.htmlgenerate_video_promptafter image selectionImplementation Plan
Step 1 — Split AI provider methods in
providers/ai.rsFile:
crates/hero_videos_server/src/providers/ai.rsReplace
generate_sceneswith two methods:generate_image_prompts(intent, count) -> Vec<(String, String)>(arc_sentence, image_prompt)Single chain-of-thought AI call. The prompt asks the model to:
story_arc: an array ofcountone-sentence scene descriptions forming a narrative arc (setup → development → resolution).image_promptgrounded in that scene (photorealistic, 16:9, vivid, consistent visual style across all scenes).JSON schema:
System prompt includes:
generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description) -> StringSingle AI call. Prompt: "You are a video director. Given a selected still image and scene context, write a single video prompt (2–3 sentences) for AI video generation. Describe camera movement, subject motion, and atmosphere."
Dependencies: none
Step 2 — Update workers in
workers/mod.rsFile:
crates/hero_videos_server/src/workers/mod.rsgenerate_scenes_workto acceptscene_count: u32; replace the hardcoded5and thegenerate_scenescall withgenerate_image_prompts(&intent, scene_count as usize).Sceneobjects:image_prompt= the returned image prompt,video_prompt= the returned arc sentence.generate_video_prompt_work(project_id, scene_id, osis, providers): loads project + scene, callsproviders.ai.generate_video_prompt(...), writes result back toscene.video_prompt.Dependencies: Step 1
Step 3 — Update RPC handlers in
rpc/mod.rsFile:
crates/hero_videos_server/src/rpc/mod.rsscene_count: Option<u32>togenerate_scenes. Validate range 3–20; default to 5.scene_countthrough to the worker CLI args and the tokio::spawn fallback.generate_video_prompt(project_id, scene_id)handler: validate scene exists and has a selected candidate, launchWorkerTask::VideoPromptjob.Dependencies: Step 2
Step 4 — Update CLI in
main.rsFile:
crates/hero_videos_server/src/main.rsscene_count: u32(default 5) field toWorkerTask::Scenes.generate_scenes_work.WorkerTask::VideoPrompt { project_id, scene_id }variant dispatching togenerate_video_prompt_work.Dependencies: Step 2
Step 5 — Update
openrpc.jsonFile:
crates/hero_videos_server/openrpc.jsonscene_count(optional integer 3–20) togenerate_scenes.generate_video_promptmethod withproject_idandscene_idparams.Dependencies: none (can run in parallel with Steps 1–4)
Step 6 — Update Planning step UI in
app.htmlFile:
crates/hero_videos_web/templates/app.htmlgenerateScenes()to read and passscene_count.select_imagesucceeds in the Imaging step, firerpc('generate_video_prompt', { project_id, scene_id })in a try/catch (non-blocking).videoPromptGenStartedAttracking and extendschedulePollto poll when video prompt generation is active.Dependencies: Step 3
Acceptance Criteria
generate_scenesRPC acceptsscene_count(3–20); rejects out-of-range values with a clear errorgenerate_sceneswith noscene_countdefaults to 5 (backward compatible)generate_image_promptshave coherentimage_promptvalues grounded in a story arcvideo_promptfor each scene is the arc sentence (human-readable)generate_video_promptis triggered automatically;scene.video_promptis updated once completegenerate_clip) works unchangedhero_procworker CLI (WorkerTask::Scenes) accepts--scene-countNotes
scene.video_prompttype staysstr— no OSchema regeneration needed.crates/hero_videos_admin/static/openrpc.jsonis a separate copy; checkbuild.rsto see if it is auto-synced, otherwise update it manually alongside the server copy.generate_video_promptis a fast single-inference call; 60s timeout is sufficient.video_promptoptional in the OpenRPC schema — it is always present.Test Results
Failures
Doc-test:
crates/hero_videos_sdk/src/lib.rs - (line 6)The doc-test example references
RunWorkflowInputandclient.run_workflow()which no longer exist inhero_videos_sdk. The SDK does not exportRunWorkflowInputandHeroVideosClienthas norun_workflowmethod.The 3 unit tests in
hero_videos_serverall passed:videos::tests::test_collection_crudvideos::tests::test_project_crudvideos::tests::test_videos_all_objectsImplementation Complete
Changes Made
crates/hero_videos_server/src/providers/ai.rsgenerate_scenesmethodgenerate_image_prompts(intent, count): single chain-of-thought AI call that first produces astory_arc(one sentence per scene forming a narrative arc), then derives animage_promptgrounded in each arc sentence. System prompt enforces consistent lighting, color palette, and protagonist appearance across all scenes. ReturnsVec<(arc_sentence, image_prompt)>.generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description): called after image selection; prompts the AI as a video director to write a 2–3 sentence camera/motion/atmosphere prompt using the selected image as context.crates/hero_videos_server/src/workers/mod.rsgenerate_scenes_worknow acceptsscene_count: u32; callsgenerate_image_promptsinstead of the old single-call method. Each scene'svideo_promptis initialized to the arc sentence (human-readable narrative one-liner).generate_video_prompt_work(project_id, scene_id, osis, providers): loads scene, callsai.generate_video_prompt, writes the result back toscene.video_prompt.crates/hero_videos_server/src/rpc/mod.rsgenerate_scenesnow accepts optionalscene_count: u32(range 3–20, default 5); validates and threads the value through to the worker CLI and tokio::spawn fallback.generate_video_prompt(project_id, scene_id)RPC handler: validates scene exists and has a selected image, then launchesgenerate_video_prompt_work.crates/hero_videos_server/src/sockets/mod.rsgenerate_video_promptinto the dispatch table andrpc.discovermethods list.crates/hero_videos_server/src/main.rsWorkerTask::Scenesnow accepts--scene-countCLI arg (default 5).WorkerTask::VideoPrompt { project_id, scene_id }variant dispatching togenerate_video_prompt_work.crates/hero_videos_server/openrpc.jsonandcrates/hero_videos_admin/static/openrpc.jsonscene_countparameter (integer 3–20) togenerate_scenes.generate_video_promptmethod.crates/hero_videos_web/templates/app.htmlgenerateScenes()now reads and passesscene_countin the RPC call.select_imagesucceeds,generate_video_promptis fired automatically (fire-and-forget).videoPromptGenStartedAttracking; polling continues while video prompt generation is active for any scene.crates/hero_videos_sdk/src/lib.rsrun_workflow/RunWorkflowInput.Test Results
Implementation Complete
Changes Made
crates/hero_videos_server/src/providers/ai.rsgenerate_scenesmethodgenerate_image_prompts(intent, count): single chain-of-thought AI call that first produces astory_arc(one sentence per scene forming a narrative arc), then derives animage_promptgrounded in each arc sentence. System prompt enforces consistent lighting, color palette, and protagonist appearance across all scenes. ReturnsVec<(arc_sentence, image_prompt)>.generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description): called after image selection; prompts the AI as a video director to write a 2–3 sentence camera/motion/atmosphere prompt using the selected image as context.crates/hero_videos_server/src/workers/mod.rsgenerate_scenes_worknow acceptsscene_count: u32; callsgenerate_image_promptsinstead of the old single-call method. Each scene'svideo_promptis initialized to the arc sentence (human-readable narrative one-liner).generate_video_prompt_work(project_id, scene_id, osis, providers): loads scene, callsai.generate_video_prompt, writes the result back toscene.video_prompt.crates/hero_videos_server/src/rpc/mod.rsgenerate_scenesnow accepts optionalscene_count: u32(range 3–20, default 5); validates and threads the value through to the worker CLI and tokio::spawn fallback.generate_video_prompt(project_id, scene_id)RPC handler: validates scene exists and has a selected image, then launchesgenerate_video_prompt_work.crates/hero_videos_server/src/sockets/mod.rsgenerate_video_promptinto the dispatch table andrpc.discovermethods list.crates/hero_videos_server/src/main.rsWorkerTask::Scenesnow accepts--scene-countCLI arg (default 5).WorkerTask::VideoPrompt { project_id, scene_id }variant dispatching togenerate_video_prompt_work.crates/hero_videos_server/openrpc.jsonandcrates/hero_videos_admin/static/openrpc.jsonscene_countparameter (integer 3–20) togenerate_scenes.generate_video_promptmethod.crates/hero_videos_web/templates/app.htmlgenerateScenes()now reads and passesscene_countin the RPC call.select_imagesucceeds,generate_video_promptis fired automatically (fire-and-forget).videoPromptGenStartedAttracking; polling continues while video prompt generation is active for any scene.crates/hero_videos_sdk/src/lib.rsrun_workflow/RunWorkflowInput.Test Results