services not shown properly #57
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#57
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
in services tab
e.g. for hero_db
but then if we check the jobs
the old jobs no longer relevant should not count for showing service is good or not,
if the 2 current jobs are ok and running then service is ok,
the old jobs no need to show
we need to track this through the run
when we see a run
we should be able to click on the jobs
and go to job pannel
Implementation Spec — Issue #57: Services tab not shown properly
Objective
Fix the
Servicestab so a service's listed jobs and its derived health status reflect only the current run of that service (the most recentrunrow whoseservice_idmatches). Stale jobs from prior runs must not appear in the per-service jobs view and must not influence the badge color or status. Each job row in the service detail panel must be clickable and deep-link to the Jobs panel for that job.Background — current state of the code
Runperservice.startcycle and tags every spawnedJobwithrun_id(seecrates/hero_proc_server/src/rpc/service.rshandle_start).runstable hasservice_idand there is arun_jobsjoin table.service.statusalready returnscurrent_run_id(most recent run for the service).JobFilter(OpenRPC schema and Rust) already supports bothservice_idandrun_idforjob.list.dashboard.jsviewServicealready correctly usescurrent_run_idto filter jobs.loadServicesindashboard.jscallsrpc('job.list', { filter: { service_id: name, limit: 200 } })— i.e., it counts all historical jobs for the service, regardless of run. The badge then turns red whenever any historical job hasphase === 'failed', even if the current run is fully healthy.service.status):service_running_jobsandservice_last_terminal_statewalk all jobs for the service viadb.jobs.list_by_service_id(name)instead of restricting to the current run. As a result, if a staleRetryingrow from a previous run lingers, or a previous run's last terminal job wasFailed, the derivedstateis wrong.navigateToServiceJobswhich only filters by service name, not by run. We need to also pass thecurrent_run_idand have the Jobs tab filter by it.Requirements
#jobs/<id>).run_idfilter (not justservice_id).current_run_id == null), the Jobs column shows-and the status badge showsinactive.service.statusstatemust derive from the current run only; old runs' jobs are ignored.Files to modify / create
crates/hero_proc_server/src/rpc/service.rs— scopeservice_running_jobs,service_last_terminal_state, andcount_restarts(used byhandle_statusandhandle_status_full) to the current run when one exists.crates/hero_proc_ui/static/js/dashboard.js— changeloadServicesto fetch jobs per service using the current run'srun_id; changeserviceJobsBadgeclick handler /navigateToServiceJobsto honour run scoping; ensure click-to-job rows inviewServicecontinue to deep-link.crates/hero_proc_ui/templates/index.html— minor copy update (tooltip "Current run jobs"); optional, low priority.crates/hero_proc_server/src/rpc/service.rs#[cfg(test)]) — add cases for current-run scoping.No new files are strictly required. No OpenRPC schema changes are required.
Implementation Plan
Step 1 — Server: scope service status to the current run
Files:
crates/hero_proc_server/src/rpc/service.rscurrent_run_id(db, name) -> Option<u32>usingdb.runs.get_for_service(name).current_run_job_ids(db, name) -> Option<Vec<u32>>usingdb.runs.get_run_job_ids(rid).service_running_jobs,service_last_terminal_state, andcount_restartsto iterate only over current-run job IDs when a current run exists. When no current run exists, return empty /inactive/0.handle_statusandhandle_status_fullautomatically pick up the fix.Why: makes
service.status.statereflect only the current cycle. Authoritative for all consumers (UI, CLI, scripts).Dependencies: none.
Step 2 — UI: scope the Jobs badge counts to the current run
Files:
crates/hero_proc_ui/static/js/dashboard.js(loadServices,serviceJobsBadge)loadServices, sequence per service: firstservice.statusto getcurrent_run_id, then if non-null calljob.listwithfilter.run_id = current_run_id. If null, store zero counts andrunId: null.Promise.allover services so different services still fetch in parallel.runIdincachedServiceJobCounts[name].serviceJobsBadgetooltip ("Jobs in current run # — N total, F failed, R retrying").onclickto callnavigateToServiceJobs(name, runId).Why: stops the Jobs column badge from being misled by stale old-run failures.
Dependencies: none (independent of Step 1).
Step 3 — UI: add run-scoped navigation to the Jobs tab
Files:
crates/hero_proc_ui/static/js/dashboard.js(navigateToServiceJobs,loadJobs, route restoration)navigateToServiceJobs(serviceName, runId)to set#jobs?service=<name>&run=<id>and persistrunIdin a_pendingRunFilter.loadJobs()to honourrunIdfilter (filter.run_id = runId); add a "Current run only" pill the user can dismiss to clear it (and the URL param).&run=<id>.Why: clicking the badge in services takes the user to a Jobs view that only shows current-run jobs.
Dependencies: Step 2.
Step 4 — UI: confirm click-to-job deep link in service detail panel
Files:
crates/hero_proc_ui/static/js/dashboard.js(viewServicejobs table render)onclick="navigateTo('jobs', j.id)"correctly opens the Jobs tab andviewJob(j.id).aria-label="Open job #<id>"andclass="cursor-pointer"to the row.Why: requirement: "we should be able to click on the jobs and go to job panel". Mostly already true — this hardens the UX.
Dependencies: none.
Step 5 — Tests
Files:
crates/hero_proc_server/src/rpc/service.rs#[cfg(test)]service_status_ignores_jobs_from_previous_runs: create service, start (run #1) → mark its job Failed, start again (run #2) → leave job Running; assertstate == "running"andcurrent_run_id == run #2.service_status_inactive_when_no_run: never start; assertstate == "inactive"andcurrent_run_id == null.Why: locks in the fix.
Dependencies: Steps 1, 2, 3, 4.
Acceptance criteria
-and the status badge showsinactive.service.status(RPC) returnsstate == "running"when the current run's jobs are active, regardless of stale failed jobs from earlier runs.run_id = current_run_id(URL#jobs?service=<name>&run=<id>).#jobs/<id>and opens the Jobs detail panel for that job.ServiceStatus.current_run_idalready exists).Notes / warnings
delete_inactive_by_servicecleanup onservice.start(replace_existing_jobs=truedefault) means stale jobs are often already gone. The visible bug appears when:replace_existing_jobs=false(rare in UI, possible via SDK/CLI),retryingrows pile up and mislead the badge.current_run_id, the UI is correct regardless of which edge case produced the stale jobs.get_run_for_servicereturns the most recent run regardless of status — includinghalted. That's correct: the current run is whichever is most recent. If the run ishalted, current-run jobs are typically terminal, andstatefalls back to the run's last terminal job.service.status, not just the dashboard.Test Results
cargo test --workspace --no-fail-fastFailures
hero_proc_integration_tests (lib unittest):
Implementation summary
All five steps from the spec are implemented.
Server (
crates/hero_proc_server/src/rpc/service.rs)current_run_id(db, name)andcurrent_run_job_ids(db, name).service_running_jobsnow returns active jobs from the current run only; returns empty when there is no current run.service_last_terminal_statenow considers only current-run terminal jobs; returns"inactive"when there is no current run.count_restartsnow counts restarts only for current-run jobs; returns0when there is no current run.handle_statusandhandle_status_fullautomatically pick up the fix.UI — Services tab badge (
crates/hero_proc_ui/static/js/dashboard.js)loadServicesnow callsservice.statusfirst per service to getcurrent_run_id, then callsjob.listfiltered byrun_id(skipping the call entirely when no current run exists).cachedServiceJobCounts[name]now also storesrunId.serviceJobsBadge(name)readsrunId, renders a dash + "service inactive" tooltip whenrunId == null, and otherwise shows the badge withJobs in current run #<id> — N total, F failed, R retrying — click to view.onclicknow callsnavigateToServiceJobs(name, runId).UI — Run-scoped Jobs navigation (
crates/hero_proc_ui/static/js/dashboard.jsandtemplates/index.html)_pendingRunFilter/_activeRunFilter.navigateToServiceJobs(serviceName, runId)accepts the optionalrunIdand writes it to#jobs?service=<name>&run=<id>.loadJobs()honours_activeRunFilter(or pending) by addingfilter.run_idto thejob.listrpc call, and renders the run-filter pill.renderJobsRunPill()andclearJobsRunFilter()— Bootstrap dismissible badgeRun #Nwithbtn-closethat strips&run=from the URL hash and refreshes the list.hashchangehandler now read&run=<id>and clear it on plain#jobs.templates/index.html: added<span id="jobs-run-pill">host element in the Jobs toolbar.UI — Service detail row UX (
crates/hero_proc_ui/static/js/dashboard.jsandstatic/css/dashboard.css)<tr>rows in the service detail jobs table now haveclass="cursor-pointer",aria-label="Open job #<id>", and an explicit "Click to open job #" tooltip.onclick="navigateTo('jobs', j.id)"is preserved (it deep-links to#jobs/<id>and opens the job detail panel)..cursor-pointer { cursor: pointer; }todashboard.css(utility was not present).Tests
Added two unit tests in
crates/hero_proc_server/src/rpc/service.rs:service_status_ignores_jobs_from_previous_runs— proves aFailedjob from run #1 does not affect status when run #2 has aRunningjob. Assertscurrent_run_id == Some(run #2),state == "running",running_jobs.len() == 1.service_status_inactive_when_no_run— proves an unstarted service reportscurrent_run_id == None,state == "inactive", no running jobs,count_restarts == 0.Both new tests pass.
cargo test -p hero_proc_server --libis green (67 passed, 0 failed).Workspace test results
cargo test --workspace --no-fail-fast: 408 passed, 1 failed, 53 ignored (462 total).The single failure was
hero_proc_integration_tests::harness::tests::test_harness_starts_and_stops— a server-startup-timeout flake that occurs only when the harness self-test runs concurrently with the rest of the workspace. Re-runningcargo test -p hero_proc_integration_tests --libin isolation passes cleanly (4/4). The failure is unrelated to the issue #57 changes and reproduces on the unchanged baseline under parallel load.Acceptance criteria — status
-and the status badge showsinactive.service.statusRPC returnsstate == "running"when the current run is active, regardless of stale failures.#jobs?service=<name>&run=<id>and the Jobs tab applies therun_idfilter.#jobs/<id>and opens the Jobs detail panel.