Really Advanced – Custom Parsers ====================================== {{#include ../links.md}} Sometimes it is desirable to have multiple [custom syntax] starting with the same symbol. This is especially common for _command-style_ syntax where the second symbol calls a particular command: ```rust // The following simulates a command-style syntax, all starting with 'perform'. perform hello world; // A fixed sequence of symbols perform action 42; // Perform a system action with a parameter perform update system; // Update the system perform check all; // Check all system settings perform cleanup; // Clean up the system perform add something; // Add something to the system perform remove something; // Delete something from the system ``` Alternatively, a [custom syntax] may have variable length, with a termination symbol: ```rust // The following is a variable-length list terminated by '>' tags < "foo", "bar", 123, ... , x+y, true > ``` For even more flexibility in order to handle these advanced use cases, there is a _low level_ API for [custom syntax] that allows the registration of an entire mini-parser. Use `Engine::register_custom_syntax_with_state_raw` to register a [custom syntax] _parser_ together with an implementation function, both of which accept a custom user-defined _state_ value. How Custom Parsers Work ----------------------- ### Leading Symbol Under this API, the leading symbol for a custom parser is no longer restricted to be valid identifiers. It can either be: * an identifier that isn't a normal [keyword] unless [disabled][disable keywords and operators], or * a valid symbol (see [list]({{rootUrl}}/appendix/operators.md)) which is not a normal [operator] unless [disabled][disable keywords and operators]. ### Parser Function Signature The [custom syntax] parser has the following signature. > ```rust > Fn(symbols: &[ImmutableString], look_ahead: &str, state: &mut Dynamic) -> Result, ParseError> > ``` where: | Parameter | Type | Description | | ------------ | :---------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `symbols` | [`&[ImmutableString]`][`ImmutableString`] | a slice of symbols that have been parsed so far, possibly containing `$expr$` and/or `$block$`; `$ident$` and other literal markers are replaced by the actual text | | `look_ahead` | `&str` | a string slice containing the next symbol that is about to be read | | `state` | [`&mut Dynamic`][`Dynamic`] | mutable reference to a user-defined _state_ | Most strings are [`ImmutableString`]'s so it is usually more efficient to just `clone` the appropriate one (if any matches, or keep an internal cache for commonly-used symbols) as the return value. ### Parameter #1 – Symbols Parsed So Far The symbols parsed so far are provided as a slice of [`ImmutableString`]s. The custom parser can inspect this symbols stream to determine the next symbol to parse. | Argument type | Value | | :-----------: | ----------------- | | text [string] | text value | | `$ident$` | identifier name | | `$symbol$` | symbol literal | | `$expr$` | `$expr$` | | `$block$` | `$block$` | | `$func$` | `$func$` | | `$bool$` | `true` or `false` | | `$int$` | value of number | | `$float$` | value of number | | `$string$` | [string] text | ### Parameter #2 – Look-Ahead Symbol The _look-ahead_ symbol is the symbol that will be parsed _next_. If the look-ahead is an expected symbol, the customer parser just returns it to continue parsing, or it can return `$ident$` to parse it as an identifier, or even `$expr$` to start parsing an expression. ```admonish tip.side.wide "Tip: Strings vs identifiers" The look-ahead of an identifier (e.g. [variable] name) is its text name. That of a [string] literal is its content wrapped in _quotes_ (`"`), e.g. `"this is a string"`. ``` If the look-ahead is `{`, then the custom parser may also return `$block$` to start parsing a statements block. If the look-ahead is unexpected, the custom parser should then return the symbol expected and Rhai will fail with a parse error containing information about the expected symbol. ### Parameter #3 – User-Defined Custom _State_ The _state's_ value starts off as [`()`]. Its type is [`Dynamic`], possible to hold any value. Usually it is set to an [object map] that contains information on the state of parsing. ### Return value The return value is `Result, ParseError>` where: | Value | Description | | :----------------: | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `Ok(None)` | parsing is complete and there is no more symbol to match | | `Ok(Some(symbol))` | the next `symbol` to match, which can also be `$expr$`, `$ident$`, `$block$` etc. | | `Err(error)` | `error` that is reflected back to the [`Engine`] – normally `ParseError( ParseErrorType::BadInput( LexError::ImproperSymbol(message) ), Position::NONE)` to indicate that there is a syntax error, but it can be any `ParseError`. | A custom parser always returns `Some` with the _next_ symbol expected (which can be `$ident$`, `$expr$`, `$block$` etc.) or `None` if parsing should terminate (_without_ reading the look-ahead symbol). #### The `$$` return symbol short-cut A return symbol starting with `$$` is treated specially. Like `None`, it also terminates parsing, but at the same time it adds this symbol as text into the _inputs_ stream at the end. This is typically used to inform the implementation function which [custom syntax] variant was actually parsed. ```rust fn implementation_fn(context: &mut EvalContext, inputs: &[Expression], state: &Dynamic) -> Result> { // Get the last symbol let key = inputs.last().unwrap().get_string_value().unwrap(); // Make sure it starts with '$$' assert!(key.starts_with("$$")); // Execute the custom syntax expression match key { "$$hello" => { ... } "$$world" => { ... } "$$foo" => { ... } "$$bar" => { ... } _ => Err(...) } } ``` `$$` is a convenient _short-cut_. An alternative method is to pass such information in the user-defined custom _state_. ### Implementation Function Signature The signature of an implementation function for `Engine::register_custom_syntax_with_state_raw` is as follows, which is slightly different from the function for `Engine::register_custom_syntax`. > ```rust > Fn(context: &mut EvalContext, inputs: &[Expression], state: &Dynamic) -> Result> > ``` where: | Parameter | Type | Description | | --------- | :---------------------------------: | ----------------------------------------------------- | | `context` | [`&mut EvalContext`][`EvalContext`] | mutable reference to the current _evaluation context_ | | `inputs` | `&[Expression]` | a list of input expression trees | | `state` | [`&Dynamic`][`Dynamic`] | reference to the user-defined state | Custom Parser Example --------------------- ```rust engine.register_custom_syntax_with_state_raw( // The leading symbol - which needs not be an identifier. "perform", // The custom parser implementation - always returns the next symbol expected // 'look_ahead' is the next symbol about to be read // // Return symbols starting with '$$' also terminate parsing but allows us // to determine which syntax variant was actually parsed so we can perform the // appropriate action. This is a convenient short-cut to keeping the value // inside the state. // // The return type is 'Option' to allow common text strings // to be interned and shared easily, reducing allocations during parsing. |symbols, look_ahead, state| match symbols.len() { // perform ... 1 => Ok(Some("$ident$".into())), // perform command ... 2 => match symbols[1].as_str() { "action" => Ok(Some("$expr$".into())), "hello" => Ok(Some("world".into())), "update" | "check" | "add" | "remove" => Ok(Some("$ident$".into())), "cleanup" => Ok(Some("$$cleanup".into())), cmd => Err(LexError::ImproperSymbol(format!("Improper command: {cmd}")) .into_err(Position::NONE)), }, // perform command arg ... 3 => match (symbols[1].as_str(), symbols[2].as_str()) { ("action", _) => Ok(Some("$$action".into())), ("hello", "world") => Ok(Some("$$hello-world".into())), ("update", arg) => match arg { "system" => Ok(Some("$$update-system".into())), "client" => Ok(Some("$$update-client".into())), _ => Err(LexError::ImproperSymbol(format!("Cannot update {arg}")) .into_err(Position::NONE)) }, ("check", arg) => Ok(Some("$$check".into())), ("add", arg) => Ok(Some("$$add".into())), ("remove", arg) => Ok(Some("$$remove".into())), (cmd, arg) => Err(LexError::ImproperSymbol( format!("Invalid argument for command {cmd}: {arg}") ).into_err(Position::NONE)), }, _ => unreachable!(), }, // No variables declared/removed by this custom syntax false, // Implementation function |context, inputs, state| { let cmd = inputs.last().unwrap().get_string_value().unwrap(); match cmd { "$$cleanup" => { ... } "$$action" => { ... } "$$update-system" => { ... } "$$update-client" => { ... } "$$check" => { ... } "$$add" => { ... } "$$remove" => { ... } _ => Err(format!("Invalid command: {cmd}")) } } ); ```