Generator: OSchema marker parser uses substring matching, miscategorizes adjacent markers #67

Closed
opened 2026-05-19 15:30:37 +00:00 by timur · 1 comment
Owner

Context

Discovered by the #261 agent while bootstrapping the hero_service template repo.

The OSchema marker parser identifies section markers via substring matching, which means markers that share a prefix can be matched against the wrong section (e.g., a enum block being read as a fragment of a longer marker, or two adjacent block markers colliding when one is a prefix of the other).

What to do

  • Locate the marker-parsing logic in crates/oschema/ (or wherever section markers are tokenised).
  • Replace substring contains / starts_with checks with whole-line / delimiter-aware matching.
  • Add regression tests covering: adjacent marker collision, prefix-collision (e.g. @root vs @rootobject), markers inside string literals (must not match).

Acceptance

  • Existing schemas in example/, hero_compute/, hero_osis/, hero_service all parse identically post-fix.
  • New regression tests cover the failure modes the #261 agent hit.
## Context Discovered by the [#261 agent](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/261#issuecomment-34264) while bootstrapping the `hero_service` template repo. The OSchema marker parser identifies section markers via substring matching, which means markers that share a prefix can be matched against the wrong section (e.g., a `enum` block being read as a fragment of a longer marker, or two adjacent block markers colliding when one is a prefix of the other). ## What to do - Locate the marker-parsing logic in `crates/oschema/` (or wherever section markers are tokenised). - Replace substring `contains` / `starts_with` checks with whole-line / delimiter-aware matching. - Add regression tests covering: adjacent marker collision, prefix-collision (e.g. `@root` vs `@rootobject`), markers inside string literals (must not match). ## Acceptance - Existing schemas in `example/`, `hero_compute/`, `hero_osis/`, `hero_service` all parse identically post-fix. - New regression tests cover the failure modes the #261 agent hit. ## Related - [hero_skills#261](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/261) — source of the discovery. - Parent META: [hero_skills#262](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/262).
Author
Owner

PR #78 opened against development.

Root cause: c.text.contains("[rootobject]") (in oschema/parser.rs and generator/schemas/oschema.rs) and source.find("# ===SCHEMA===") (in oschema/parser.rs) are raw substring lookups. Adjacent markers, prefix-colliding names, markers mentioned inside string literals, and any longer comment line that happens to contain the header delimiter could all be miscategorised.

Fix: new crates/oschema/src/markers.rs with a delimiter- and string-literal-aware parse_markers / has_marker / strip_markers / is_header_delimiter_line. The oschema parser uses it for [rootobject] detection and a new split_header helper that requires the # ===SCHEMA=== delimiter to be a whole comment line. The generator (schemas/oschema.rs, generator.rs, domain.rs) is routed through the same helpers so the two crates cannot diverge.

Regression coverage

  • 11 unit tests in markers.rs — adjacent markers, prefix-collision, string-literal escapes, case-insensitivity, whole-line delimiter.
  • 4 parser-level tests in parser.rs exercising the failure modes hit by the #261 agent: marker_regression_adjacent_markers, marker_regression_prefix_collision, marker_regression_string_literal_mention_does_not_match, marker_regression_header_delimiter_must_be_whole_line.
  • 7 integration tests in oschema/tests/real_schemas.rs parsing every .oschema shipped in the repo (basic OSIS example, recipe_server schemas/ + src/recipes/ + src/recipes/core/, example/recipe_server) and asserting the set of root objects is unchanged.

Acceptance: 79 oschema lib + 7 integration tests pass; 125 generator tests pass; all real schemas parse identically (same root-object sets).

PR #78 opened against `development`. **Root cause:** `c.text.contains("[rootobject]")` (in `oschema/parser.rs` and `generator/schemas/oschema.rs`) and `source.find("# ===SCHEMA===")` (in `oschema/parser.rs`) are raw substring lookups. Adjacent markers, prefix-colliding names, markers mentioned inside string literals, and any longer comment line that happens to contain the header delimiter could all be miscategorised. **Fix:** new `crates/oschema/src/markers.rs` with a delimiter- and string-literal-aware `parse_markers` / `has_marker` / `strip_markers` / `is_header_delimiter_line`. The oschema parser uses it for `[rootobject]` detection and a new `split_header` helper that requires the `# ===SCHEMA===` delimiter to be a whole comment line. The generator (`schemas/oschema.rs`, `generator.rs`, `domain.rs`) is routed through the same helpers so the two crates cannot diverge. **Regression coverage** - 11 unit tests in `markers.rs` — adjacent markers, prefix-collision, string-literal escapes, case-insensitivity, whole-line delimiter. - 4 parser-level tests in `parser.rs` exercising the failure modes hit by the #261 agent: `marker_regression_adjacent_markers`, `marker_regression_prefix_collision`, `marker_regression_string_literal_mention_does_not_match`, `marker_regression_header_delimiter_must_be_whole_line`. - 7 integration tests in `oschema/tests/real_schemas.rs` parsing every `.oschema` shipped in the repo (basic OSIS example, recipe_server `schemas/` + `src/recipes/` + `src/recipes/core/`, `example/recipe_server`) and asserting the set of root objects is unchanged. **Acceptance:** 79 oschema lib + 7 integration tests pass; 125 generator tests pass; all real schemas parse identically (same root-object sets).
timur closed this issue 2026-05-20 05:55:04 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_blueprint#67
No description provided.