[ci] Fix integration tests so tests.yaml passes on every push #12

Open
opened 2026-04-26 00:54:58 +00:00 by mik-tf · 0 comments
Owner

Symptom

.forgejo/workflows/tests.yaml has been failing on every push to development for many commits (run #36 latest, several before that). This makes the repo's CI permanently red even when lint and the actual library code are fine.

Root cause (as best understood — needs CI log inspection)

make test runs bash ci_rhai.sh --category tests --verbose which executes the Rhai integration test scripts under crates/*/tests/rhai/. One of them fails locally too on a stock dev box:

Total scripts: 24
✅ Passed: 8
❌ Failed: 1
⚪ Skipped: 15

❌ Failed scripts:
  - crates/virt_rhai/tests/rhai/podman/01_container_operations.rhai [1.9sec]

The 15 skipped are auto-skipped because their backing services (postgres, hero_proc, etc.) aren't running. The 1 failed needs a working podman/container runtime.

The CI workflow installs a heavy stack (btrfs-progs, containerd, nerdctl, buildkit, cloud-hypervisor, iperf3, redis, postgresql) to support these tests, but the runner image apparently still can't get the podman test green.

What needs to happen

  • Pull the actual log of run #36 (or trigger a fresh one) to see what step in tests.yaml is failing — is it apt-install, the storage config, containerd startup, or the actual make test step?
  • If make test reaches the podman test and fails: investigate what the runner's container setup needs (storage driver? userns? specific kernel cap?)
  • If apt-install / setup fails earlier: which dep, why?
  • Either fix the test infra so the runner consistently provides a working podman setup, OR rework 01_container_operations.rhai to mock the container runtime, OR add an explicit PODMAN=available env-gated skip with a comment.
  • Audit the other 14 skipped tests — confirm they're skipping for the right reasons (backing service unavailable) vs masking a bug.
  • Verify the green CI then sticks across multiple pushes (no flakiness).

Acceptance criteria

  • Fresh push to developmenttests.yaml run completes with status success
  • Fresh PR targeting developmenttests.yaml run completes with status success
  • Number of passing tests reported matches the number of tests that have working backing infrastructure (skipped count documented if not zero)

Why this issue exists

During the home#188 CI sweep, an earlier draft of PR #11 restricted tests.yaml triggers to workflow_dispatch + tags only — bypassing the failure to make dev push CI "green". That was rolled back as not honest enough: a green badge that doesn't run the tests on every push misrepresents what's actually being checked.

The repo's overall CI on development will stay red on the tests.yaml workflow until this is fixed. Lint is green; tests are TODO. This issue is the entry point for whoever picks up the integration-test-infrastructure work.

Signed-off-by: mik-tf

## Symptom `.forgejo/workflows/tests.yaml` has been **failing on every push to `development`** for many commits (run #36 latest, several before that). This makes the repo's CI permanently red even when lint and the actual library code are fine. ## Root cause (as best understood — needs CI log inspection) `make test` runs `bash ci_rhai.sh --category tests --verbose` which executes the Rhai integration test scripts under `crates/*/tests/rhai/`. **One of them fails locally too** on a stock dev box: ``` Total scripts: 24 ✅ Passed: 8 ❌ Failed: 1 ⚪ Skipped: 15 ❌ Failed scripts: - crates/virt_rhai/tests/rhai/podman/01_container_operations.rhai [1.9sec] ``` The 15 skipped are auto-skipped because their backing services (postgres, hero_proc, etc.) aren't running. The 1 failed needs a working podman/container runtime. The CI workflow installs a heavy stack (`btrfs-progs`, `containerd`, `nerdctl`, `buildkit`, `cloud-hypervisor`, `iperf3`, `redis`, `postgresql`) to support these tests, but the runner image apparently still can't get the podman test green. ## What needs to happen - [ ] Pull the actual log of run #36 (or trigger a fresh one) to see what step in `tests.yaml` is failing — is it apt-install, the storage config, `containerd` startup, or the actual `make test` step? - [ ] If `make test` reaches the podman test and fails: investigate what the runner's container setup needs (storage driver? userns? specific kernel cap?) - [ ] If apt-install / setup fails earlier: which dep, why? - [ ] **Either fix the test infra so the runner consistently provides a working podman setup**, OR rework `01_container_operations.rhai` to mock the container runtime, OR add an explicit `PODMAN=available` env-gated skip with a comment. - [ ] Audit the other 14 skipped tests — confirm they're skipping for the right reasons (backing service unavailable) vs masking a bug. - [ ] Verify the green CI then sticks across multiple pushes (no flakiness). ## Acceptance criteria - [ ] Fresh push to `development` → `tests.yaml` run completes with status `success` - [ ] Fresh PR targeting `development` → `tests.yaml` run completes with status `success` - [ ] Number of passing tests reported matches the number of tests that have working backing infrastructure (skipped count documented if not zero) ## Related - Parent CI sweep tracker: https://forge.ourworld.tf/lhumina_code/home/issues/188 - Cross-repo tracker: https://forge.ourworld.tf/lhumina_code/home/issues/189 - The lint side is being addressed in https://forge.ourworld.tf/lhumina_code/hero_lib_rhai/pulls/11 (clippy + fmt fixes — keeps lint.yaml green) ## Why this issue exists During the [home#188](https://forge.ourworld.tf/lhumina_code/home/issues/188) CI sweep, an earlier draft of PR #11 restricted `tests.yaml` triggers to `workflow_dispatch` + tags only — bypassing the failure to make dev push CI "green". That was rolled back as not honest enough: a green badge that doesn't run the tests on every push misrepresents what's actually being checked. The repo's overall CI on `development` will stay red on the `tests.yaml` workflow until this is fixed. Lint is green; tests are TODO. This issue is the entry point for whoever picks up the integration-test-infrastructure work. Signed-off-by: mik-tf
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_lib_rhai#12
No description provided.