This repository has been archived on 2025-08-04. You can view files and clone it, but cannot push or open issues or pull requests.
rhaj/rhai_engine/rhaibook/language/strings-chars.md
2025-04-03 09:18:05 +02:00

260 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Strings and Characters
======================
{{#include ../links.md}}
```admonish tip.side "Safety"
Always limit the [maximum length of strings].
```
String in Rhai contain any text sequence of valid Unicode characters.
Internally strings are stored in UTF-8 encoding.
[`type_of()`] a string returns `"string"`.
String and Character Literals
-----------------------------
String and character literals follow JavaScript-style syntax.
| Type | Quotes | Escapes? | Continuation? | Interpolation? |
| ------------------------- | :-------------: | :------: | :-----------: | :------------: |
| Normal string | `"..."` | yes | with `\` | **no** |
| Raw string | `#..#"..."#..#` | **no** | **no** | **no** |
| Multi-line literal string | `` `...` `` | **no** | **no** | with `${...}` |
| Character | `'...'` | yes | **no** | **no** |
```admonish tip.small "Tip: Building strings"
Strings can be built up from other strings and types via the `+` operator
(provided by the [`MoreStringPackage`][built-in packages] but excluded when using a [raw `Engine`]).
This is particularly useful when printing output.
```
Standard Escape Sequences
-------------------------
~~~admonish tip.side "Tip: Character `to_int()`"
Use the `to_int` method to convert a Unicode character into its 32-bit Unicode encoding.
~~~
There is built-in support for Unicode (`\u`_xxxx_ or `\U`_xxxxxxxx_) and hex (`\x`_xx_) escape
sequences for normal strings and characters.
Hex sequences map to ASCII characters, while `\u` maps to 16-bit common Unicode code points and `\U`
maps the full, 32-bit extended Unicode code points.
Escape sequences are not supported for multi-line literal strings wrapped by back-ticks (`` ` ``).
| Escape sequence | Meaning |
| --------------- | -------------------------------- |
| `\\` | back-slash (`\`) |
| `\t` | tab |
| `\r` | carriage-return (`CR`) |
| `\n` | line-feed (`LF`) |
| `\"` or `""` | double-quote (`"`) |
| `\'` | single-quote (`'`) |
| `\x`_xx_ | ASCII character in 2-digit hex |
| `\u`_xxxx_ | Unicode character in 4-digit hex |
| `\U`_xxxxxxxx_ | Unicode character in 8-digit hex |
Line Continuation
-----------------
For a normal string wrapped by double-quotes (`"`), a back-slash (`\`) character at the end of a
line indicates that the string continues onto the next line _without any line-break_.
Whitespace up to the indentation of the opening double-quote is ignored in order to enable lining up
blocks of text.
Spaces are _not_ added, so to separate one line with the next with a space, put a space before the
ending back-slash (`\`) character.
```rust
let x = "hello, world!\
hello world again! \
this is the ""last"" time!!!";
// ^^^^^^ these whitespaces are ignored
// The above is the same as:
let x = "hello, world!hello world again! this is the \"last\" time!!!";
```
A string with continuation does not open up a new line. To do so, a new-line character must be
manually inserted at the appropriate position.
```rust
let x = "hello, world!\n\
hello world again!\n\
this is the last time!!!";
// The above is the same as:
let x = "hello, world!\nhello world again!\nthis is the last time!!!";
```
~~~admonish warning.small "No ending quote before the line ends is a syntax error"
If the ending double-quote is omitted, it is a syntax error.
```rust
let x = "hello
# ";
// ^ syntax error: unterminated string literal
```
~~~
```admonish question.small "Why not go multi-line?"
Technically speaking, there is no difficulty in allowing strings to run for multiple lines
_without_ the continuation back-slash.
Rhai forces you to manually mark a continuation with a back-slash because the ending quote is easy to omit.
Once it happens, the entire remainder of the script would become one giant, multi-line string.
This behavior is different from Rust, where string literals can run for multiple lines.
```
Raw Strings
-----------
A _raw string_ is any text enclosed by a pair of double-quotes (`"`), wrapped by hash (`#`) characters.
The number of hash (`#`) on each side must be the same.
Any text inside the double-quotes, as long as it is not a double-quote (`"`) followed by the same
number of hash (`#`) characters, is simply copied verbatim, _including control codes and/or
line-breaks_.
Raw strings are very useful for embedded regular expressions, file paths, and program code etc.
```rust
let x = #"Hello, I am a raw string! which means that I can contain
line-breaks, \ slashes (not escapes), "quotes" and even # characters!"#
// Use more than one '#' if you happen to have '"###...' inside the string...
let x = ###"In Rhai, you can write ##"hello"## as a raw string."###;
// ^^^ this is not the end of the raw string
```
Indexing
--------
Strings can be _indexed_ into to get access to any individual character.
This is similar to many modern languages but different from Rust.
### From beginning
Individual characters within a string can be accessed with zero-based, non-negative integer indices:
> _string_ `[` _index from 0 to (total number of characters 1)_ `]`
### From end
A _negative_ index accesses a character in the string counting from the _end_, with 1 being the
_last_ character.
> _string_ `[` _index from 1 to (total number of characters)_ `]`
```admonish warning.small "Character indexing can be SLOOOOOOOOW"
Internally, a Rhai string is still stored compactly as a Rust UTF-8 string in order to save memory.
Therefore, getting the character at a particular index involves walking through the entire UTF-8
encoded bytes stream to extract individual Unicode characters, counting them on the way.
Because of this, indexing can be a _slow_ procedure, especially for long strings.
Along the same lines, getting the _length_ of a string (which returns the number of characters, not
bytes) can also be slow.
```
Sub-Strings
-----------
Sub-strings, or _slices_ in some programming languages, are parts of strings.
In Rhai, a sub-string can be specified by indexing with a [range] of characters:
> _string_ `[` _first character (starting from zero)_ `..` _last character (exclusive)_ `]`
>
> _string_ `[` _first character (starting from zero)_ `..=` _last character (inclusive)_ `]`
Sub-string [ranges] always start from zero counting towards the end of the string.
Negative [ranges] are not supported.
Examples
--------
```js
let name = "Bob";
let middle_initial = 'C';
let last = "Davis";
let full_name = `${name} ${middle_initial}. ${last}`;
full_name == "Bob C. Davis";
// String building with different types
let age = 42;
let record = `${full_name}: age ${age}`;
record == "Bob C. Davis: age 42";
// Unlike Rust, Rhai strings can be indexed to get a character
// (disabled with 'no_index')
let c = record[4];
c == 'C'; // single character
let slice = record[4..8]; // sub-string slice
slice == " C. D";
ts.s = record; // custom type properties can take strings
let c = ts.s[4];
c == 'C';
let c = ts.s[-4]; // negative index counts from the end
c == 'e';
let c = "foo"[0]; // indexing also works on string literals...
c == 'f';
let c = ("foo" + "bar")[5]; // ... and expressions returning strings
c == 'r';
let text = "hello, world!";
text[0] = 'H'; // modify a single character
text == "Hello, world!";
text[7..=11] = "Earth"; // modify a sub-string slice
text == "Hello, Earth!";
// Escape sequences in strings
record += " \u2764\n"; // escape sequence of '❤' in Unicode
record == "Bob C. Davis: age 42 ❤\n"; // '\n' = new-line
// Unlike Rust, Rhai strings can be directly modified character-by-character
// (disabled with 'no_index')
record[4] = '\x58'; // 0x58 = 'X'
record == "Bob X. Davis: age 42 ❤\n";
// Use 'in' to test if a substring (or character) exists in a string
"Davis" in record == true;
'X' in record == true;
'C' in record == false;
// Strings can be iterated with a 'for' statement, yielding characters
for ch in record {
print(ch);
}
```