reorganize module

This commit is contained in:
Timur Gordon
2025-04-04 08:28:07 +02:00
parent 1ea37e2e7f
commit 939b6b4e57
375 changed files with 7580 additions and 191 deletions

View File

@@ -0,0 +1,348 @@
Strings and Characters
======================
String in Rhai contain any text sequence of valid Unicode characters.
[`type_of()`](type-of.md) a string returns `"string"`.
String and Character Literals
-----------------------------
String and character literals follow JavaScript-style syntax.
| Type | Quotes | Escapes? | Continuation? | Interpolation? |
| ------------------------- | :-------------: | :------: | :-----------: | :------------: |
| Normal string | `"..."` | yes | with `\` | **no** |
| Raw string | `#..#"..."#..#` | **no** | **no** | **no** |
| Multi-line literal string | `` `...` `` | **no** | **no** | with `${...}` |
| Character | `'...'` | yes | **no** | **no** |
```admonish tip.small "Tip: Building strings"
Strings can be built up from other strings and types via the `+` or `+=` operators.
```
Standard Escape Sequences
-------------------------
~~~admonish tip.side "Tip: Character `to_int()`"
Use the `to_int` method to convert a Unicode character into its 32-bit Unicode encoding.
~~~
There is built-in support for Unicode (`\u`_xxxx_ or `\U`_xxxxxxxx_) and hex (`\x`_xx_) escape
sequences for normal strings and characters.
Hex sequences map to ASCII characters, while `\u` maps to 16-bit common Unicode code points and `\U`
maps the full, 32-bit extended Unicode code points.
Escape sequences are not supported for multi-line literal strings wrapped by back-ticks (`` ` ``).
| Escape sequence | Meaning |
| --------------- | -------------------------------- |
| `\\` | back-slash (`\`) |
| `\t` | tab |
| `\r` | carriage-return (`CR`) |
| `\n` | line-feed (`LF`) |
| `\"` or `""` | double-quote (`"`) |
| `\'` | single-quote (`'`) |
| `\x`_xx_ | ASCII character in 2-digit hex |
| `\u`_xxxx_ | Unicode character in 4-digit hex |
| `\U`_xxxxxxxx_ | Unicode character in 8-digit hex |
Line Continuation
-----------------
For a normal string wrapped by double-quotes (`"`), a back-slash (`\`) character at the end of a
line indicates that the string continues onto the next line _without any line-break_.
Whitespace up to the indentation of the opening double-quote is ignored in order to enable lining up
blocks of text.
Spaces are _not_ added, so to separate one line with the next with a space, put a space before the
ending back-slash (`\`) character.
```rust
let x = "hello, world!\
hello world again! \
this is the ""last"" time!!!";
// ^^^^^^ these whitespaces are ignored
// The above is the same as:
let x = "hello, world!hello world again! this is the \"last\" time!!!";
```
A string with continuation does not open up a new line. To do so, a new-line character must be
manually inserted at the appropriate position.
```rust
let x = "hello, world!\n\
hello world again!\n\
this is the last time!!!";
// The above is the same as:
let x = "hello, world!\nhello world again!\nthis is the last time!!!";
```
~~~admonish warning.small "No ending quote before the line ends is a syntax error"
If the ending double-quote is omitted, it is a syntax error.
```rust
let x = "hello
# ";
// ^ syntax error: unterminated string literal
```
~~~
```admonish question.small "Why not go multi-line?"
Technically speaking, there is no difficulty in allowing strings to run for multiple lines
_without_ the continuation back-slash.
Rhai forces you to manually mark a continuation with a back-slash because the ending quote is easy to omit.
Once it happens, the entire remainder of the script would become one giant, multi-line string.
This behavior is different from Rust, where string literals can run for multiple lines.
```
Raw Strings
-----------
A _raw string_ is any text enclosed by a pair of double-quotes (`"`), wrapped by hash (`#`) characters.
The number of hash (`#`) on each side must be the same.
Any text inside the double-quotes, as long as it is not a double-quote (`"`) followed by the same
number of hash (`#`) characters, is simply copied verbatim, _including control codes and/or
line-breaks_.
Raw strings are very useful for embedded regular expressions, file paths, and program code etc.
```rust
let x = #"Hello, I am a raw string! which means that I can contain
line-breaks, \ slashes (not escapes), "quotes" and even # characters!"#
// Use more than one '#' if you happen to have '"###...' inside the string...
let x = ###"In Rhai, you can write ##"hello"## as a raw string."###;
// ^^^ this is not the end of the raw string
```
Multi-Line Literal Strings
--------------------------
A string wrapped by a pair of back-tick (`` ` ``) characters is interpreted _literally_,
meaning that every single character that lies between the two back-ticks is taken verbatim.
This include new-lines, whitespaces, escape characters etc.
```js
let x = `hello, world! "\t\x42"
hello world again! 'x'
this is the last time!!! `;
// The above is the same as:
let x = "hello, world! \"\\t\\x42\"\n hello world again! 'x'\n this is the last time!!! ";
```
If a back-tick (`` ` ``) appears at the _end_ of a line, then it is understood that the entire text
block starts from the _next_ line; the starting new-line character is stripped.
```js
let x = `
hello, world! "\t\x42"
hello world again! 'x'
this is the last time!!!
`;
// The above is the same as:
let x = " hello, world! \"\\t\\x42\"\n hello world again! 'x'\n this is the last time!!!\n";
```
To actually put a back-tick (`` ` ``) character inside a multi-line literal string, use two
back-ticks together (i.e. ` `` `).
```js
let x = `I have a quote " as well as a back-tick `` here.`;
// The above is the same as:
let x = "I have a quote \" as well as a back-tick ` here.";
```
String Interpolation
--------------------
~~~admonish question.side.wide "What if I want `${` inside?"
🤦 Well, you just _have_ to ask for the impossible, don't you?
Currently there is no way to escape `${`. Build the string in three pieces:
```js
`Interpolations start with "`
+ "${"
+ `" and end with }.`
```
~~~
Multi-line literal strings support _string interpolation_ wrapped in `${` ... `}`.
Interpolation is not supported for normal string or character literals.
`${` ... `}` acts as a statements _block_ and can contain anything that is allowed within a
statements block, including another interpolated string!
The last result of the block is taken as the value for interpolation.
Rhai uses [`to_string`](convert.md) to convert any value into a string, then physically joins all
the sub-strings together.
For convenience, if any interpolated value is a [BLOB](blobs.md), however, it is automatically treated as a
UTF-8 encoded string. That is because it is rarely useful to interpolate a [BLOB](blobs.md) into a string,
but extremely useful to be able to directly manipulate UTF-8 encoded text.
```js
let x = 42;
let y = 123;
let s = `x = ${x} and y = ${y}.`; // <- interpolated string
let s = ("x = " + {x} + " and y = " + {y} + "."); // <- de-sugars to this
s == "x = 42 and y = 123.";
let s = `
Undeniable logic:
1) Hello, ${let w = `${x} world`; if x > 1 { w += "s" } w}!
2) If ${y} > ${x} then it is ${y > x}!
`;
s == "Undeniable logic:\n1) Hello, 42 worlds!\n2) If 123 > 42 then it is true!\n";
let blob = blob(3, 0x21);
print(blob); // prints [212121]
print(`Data: ${blob}`); // prints "Data: !!!"
// BLOB is treated as UTF-8 encoded string
print(`Data: ${blob.to_string()}`); // prints "Data: [212121]"
```
Indexing
--------
Strings can be _indexed_ into to get access to any individual character.
This is similar to many modern languages but different from Rust.
### From beginning
Individual characters within a string can be accessed with zero-based, non-negative integer indices:
> _string_ `[` _index from 0 to (total number of characters 1)_ `]`
### From end
A _negative_ index accesses a character in the string counting from the _end_, with 1 being the
_last_ character.
> _string_ `[` _index from 1 to (total number of characters)_ `]`
```admonish warning.small "Character indexing can be SLOOOOOOOOW"
Internally, a Rhai string is still stored compactly as a Rust UTF-8 string in order to save memory.
Therefore, getting the character at a particular index involves walking through the entire UTF-8
encoded bytes stream to extract individual Unicode characters, counting them on the way.
Because of this, indexing can be a _slow_ procedure, especially for long strings.
Along the same lines, getting the _length_ of a string (which returns the number of characters, not
bytes) can also be slow.
```
Sub-Strings
-----------
Sub-strings, or _slices_ in some programming languages, are parts of strings.
In Rhai, a sub-string can be specified by indexing with a [range](ranges.md) of characters:
> _string_ `[` _first character (starting from zero)_ `..` _last character (exclusive)_ `]`
>
> _string_ `[` _first character (starting from zero)_ `..=` _last character (inclusive)_ `]`
Sub-string [ranges](ranges.md) always start from zero counting towards the end of the string.
Negative [ranges](ranges.md) are not supported.
Examples
--------
```js
let name = "Bob";
let middle_initial = 'C';
let last = "Davis";
let full_name = `${name} ${middle_initial}. ${last}`;
full_name == "Bob C. Davis";
// String building with different types
let age = 42;
let record = `${full_name}: age ${age}`;
record == "Bob C. Davis: age 42";
// Unlike Rust, Rhai strings can be indexed to get a character
// (disabled with 'no_index')
let c = record[4];
c == 'C'; // single character
let slice = record[4..8]; // sub-string slice
slice == " C. D";
ts.s = record; // custom type properties can take strings
let c = ts.s[4];
c == 'C';
let c = ts.s[-4]; // negative index counts from the end
c == 'e';
let c = "foo"[0]; // indexing also works on string literals...
c == 'f';
let c = ("foo" + "bar")[5]; // ... and expressions returning strings
c == 'r';
let text = "hello, world!";
text[0] = 'H'; // modify a single character
text == "Hello, world!";
text[7..=11] = "Earth"; // modify a sub-string slice
text == "Hello, Earth!";
// Escape sequences in strings
record += " \u2764\n"; // escape sequence of '❤' in Unicode
record == "Bob C. Davis: age 42 ❤\n"; // '\n' = new-line
// Unlike Rust, Rhai strings can be directly modified character-by-character
// (disabled with 'no_index')
record[4] = '\x58'; // 0x58 = 'X'
record == "Bob X. Davis: age 42 ❤\n";
// Use 'in' to test if a substring (or character) exists in a string
"Davis" in record == true;
'X' in record == true;
'C' in record == false;
// Strings can be iterated with a 'for' statement, yielding characters
for ch in record {
print(ch);
}
```