260 lines
8.4 KiB
Markdown
260 lines
8.4 KiB
Markdown
Strings and Characters
|
||
======================
|
||
|
||
{{#include ../links.md}}
|
||
|
||
```admonish tip.side "Safety"
|
||
|
||
Always limit the [maximum length of strings].
|
||
```
|
||
|
||
String in Rhai contain any text sequence of valid Unicode characters.
|
||
|
||
Internally strings are stored in UTF-8 encoding.
|
||
|
||
[`type_of()`] a string returns `"string"`.
|
||
|
||
|
||
String and Character Literals
|
||
-----------------------------
|
||
|
||
String and character literals follow JavaScript-style syntax.
|
||
|
||
| Type | Quotes | Escapes? | Continuation? | Interpolation? |
|
||
| ------------------------- | :-------------: | :------: | :-----------: | :------------: |
|
||
| Normal string | `"..."` | yes | with `\` | **no** |
|
||
| Raw string | `#..#"..."#..#` | **no** | **no** | **no** |
|
||
| Multi-line literal string | `` `...` `` | **no** | **no** | with `${...}` |
|
||
| Character | `'...'` | yes | **no** | **no** |
|
||
|
||
```admonish tip.small "Tip: Building strings"
|
||
|
||
Strings can be built up from other strings and types via the `+` operator
|
||
(provided by the [`MoreStringPackage`][built-in packages] but excluded when using a [raw `Engine`]).
|
||
|
||
This is particularly useful when printing output.
|
||
```
|
||
|
||
|
||
Standard Escape Sequences
|
||
-------------------------
|
||
|
||
~~~admonish tip.side "Tip: Character `to_int()`"
|
||
|
||
Use the `to_int` method to convert a Unicode character into its 32-bit Unicode encoding.
|
||
~~~
|
||
|
||
There is built-in support for Unicode (`\u`_xxxx_ or `\U`_xxxxxxxx_) and hex (`\x`_xx_) escape
|
||
sequences for normal strings and characters.
|
||
|
||
Hex sequences map to ASCII characters, while `\u` maps to 16-bit common Unicode code points and `\U`
|
||
maps the full, 32-bit extended Unicode code points.
|
||
|
||
Escape sequences are not supported for multi-line literal strings wrapped by back-ticks (`` ` ``).
|
||
|
||
| Escape sequence | Meaning |
|
||
| --------------- | -------------------------------- |
|
||
| `\\` | back-slash (`\`) |
|
||
| `\t` | tab |
|
||
| `\r` | carriage-return (`CR`) |
|
||
| `\n` | line-feed (`LF`) |
|
||
| `\"` or `""` | double-quote (`"`) |
|
||
| `\'` | single-quote (`'`) |
|
||
| `\x`_xx_ | ASCII character in 2-digit hex |
|
||
| `\u`_xxxx_ | Unicode character in 4-digit hex |
|
||
| `\U`_xxxxxxxx_ | Unicode character in 8-digit hex |
|
||
|
||
|
||
Line Continuation
|
||
-----------------
|
||
|
||
For a normal string wrapped by double-quotes (`"`), a back-slash (`\`) character at the end of a
|
||
line indicates that the string continues onto the next line _without any line-break_.
|
||
|
||
Whitespace up to the indentation of the opening double-quote is ignored in order to enable lining up
|
||
blocks of text.
|
||
|
||
Spaces are _not_ added, so to separate one line with the next with a space, put a space before the
|
||
ending back-slash (`\`) character.
|
||
|
||
```rust
|
||
let x = "hello, world!\
|
||
hello world again! \
|
||
this is the ""last"" time!!!";
|
||
// ^^^^^^ these whitespaces are ignored
|
||
|
||
// The above is the same as:
|
||
let x = "hello, world!hello world again! this is the \"last\" time!!!";
|
||
```
|
||
|
||
A string with continuation does not open up a new line. To do so, a new-line character must be
|
||
manually inserted at the appropriate position.
|
||
|
||
```rust
|
||
let x = "hello, world!\n\
|
||
hello world again!\n\
|
||
this is the last time!!!";
|
||
|
||
// The above is the same as:
|
||
let x = "hello, world!\nhello world again!\nthis is the last time!!!";
|
||
```
|
||
|
||
~~~admonish warning.small "No ending quote before the line ends is a syntax error"
|
||
|
||
If the ending double-quote is omitted, it is a syntax error.
|
||
|
||
```rust
|
||
let x = "hello
|
||
# ";
|
||
// ^ syntax error: unterminated string literal
|
||
```
|
||
~~~
|
||
|
||
```admonish question.small "Why not go multi-line?"
|
||
|
||
Technically speaking, there is no difficulty in allowing strings to run for multiple lines
|
||
_without_ the continuation back-slash.
|
||
|
||
Rhai forces you to manually mark a continuation with a back-slash because the ending quote is easy to omit.
|
||
Once it happens, the entire remainder of the script would become one giant, multi-line string.
|
||
|
||
This behavior is different from Rust, where string literals can run for multiple lines.
|
||
```
|
||
|
||
|
||
Raw Strings
|
||
-----------
|
||
|
||
A _raw string_ is any text enclosed by a pair of double-quotes (`"`), wrapped by hash (`#`) characters.
|
||
|
||
The number of hash (`#`) on each side must be the same.
|
||
|
||
Any text inside the double-quotes, as long as it is not a double-quote (`"`) followed by the same
|
||
number of hash (`#`) characters, is simply copied verbatim, _including control codes and/or
|
||
line-breaks_.
|
||
|
||
Raw strings are very useful for embedded regular expressions, file paths, and program code etc.
|
||
|
||
```rust
|
||
let x = #"Hello, I am a raw string! which means that I can contain
|
||
line-breaks, \ slashes (not escapes), "quotes" and even # characters!"#
|
||
|
||
// Use more than one '#' if you happen to have '"###...' inside the string...
|
||
|
||
let x = ###"In Rhai, you can write ##"hello"## as a raw string."###;
|
||
// ^^^ this is not the end of the raw string
|
||
```
|
||
|
||
|
||
Indexing
|
||
--------
|
||
|
||
Strings can be _indexed_ into to get access to any individual character.
|
||
This is similar to many modern languages but different from Rust.
|
||
|
||
### From beginning
|
||
|
||
Individual characters within a string can be accessed with zero-based, non-negative integer indices:
|
||
|
||
> _string_ `[` _index from 0 to (total number of characters − 1)_ `]`
|
||
|
||
### From end
|
||
|
||
A _negative_ index accesses a character in the string counting from the _end_, with −1 being the
|
||
_last_ character.
|
||
|
||
> _string_ `[` _index from −1 to −(total number of characters)_ `]`
|
||
|
||
```admonish warning.small "Character indexing can be SLOOOOOOOOW"
|
||
|
||
Internally, a Rhai string is still stored compactly as a Rust UTF-8 string in order to save memory.
|
||
|
||
Therefore, getting the character at a particular index involves walking through the entire UTF-8
|
||
encoded bytes stream to extract individual Unicode characters, counting them on the way.
|
||
|
||
Because of this, indexing can be a _slow_ procedure, especially for long strings.
|
||
Along the same lines, getting the _length_ of a string (which returns the number of characters, not
|
||
bytes) can also be slow.
|
||
```
|
||
|
||
|
||
Sub-Strings
|
||
-----------
|
||
|
||
Sub-strings, or _slices_ in some programming languages, are parts of strings.
|
||
|
||
In Rhai, a sub-string can be specified by indexing with a [range] of characters:
|
||
|
||
> _string_ `[` _first character (starting from zero)_ `..` _last character (exclusive)_ `]`
|
||
>
|
||
> _string_ `[` _first character (starting from zero)_ `..=` _last character (inclusive)_ `]`
|
||
|
||
Sub-string [ranges] always start from zero counting towards the end of the string.
|
||
Negative [ranges] are not supported.
|
||
|
||
|
||
Examples
|
||
--------
|
||
|
||
```js
|
||
let name = "Bob";
|
||
let middle_initial = 'C';
|
||
let last = "Davis";
|
||
|
||
let full_name = `${name} ${middle_initial}. ${last}`;
|
||
full_name == "Bob C. Davis";
|
||
|
||
// String building with different types
|
||
let age = 42;
|
||
let record = `${full_name}: age ${age}`;
|
||
record == "Bob C. Davis: age 42";
|
||
|
||
// Unlike Rust, Rhai strings can be indexed to get a character
|
||
// (disabled with 'no_index')
|
||
let c = record[4];
|
||
c == 'C'; // single character
|
||
|
||
let slice = record[4..8]; // sub-string slice
|
||
slice == " C. D";
|
||
|
||
ts.s = record; // custom type properties can take strings
|
||
|
||
let c = ts.s[4];
|
||
c == 'C';
|
||
|
||
let c = ts.s[-4]; // negative index counts from the end
|
||
c == 'e';
|
||
|
||
let c = "foo"[0]; // indexing also works on string literals...
|
||
c == 'f';
|
||
|
||
let c = ("foo" + "bar")[5]; // ... and expressions returning strings
|
||
c == 'r';
|
||
|
||
let text = "hello, world!";
|
||
text[0] = 'H'; // modify a single character
|
||
text == "Hello, world!";
|
||
|
||
text[7..=11] = "Earth"; // modify a sub-string slice
|
||
text == "Hello, Earth!";
|
||
|
||
// Escape sequences in strings
|
||
record += " \u2764\n"; // escape sequence of '❤' in Unicode
|
||
record == "Bob C. Davis: age 42 ❤\n"; // '\n' = new-line
|
||
|
||
// Unlike Rust, Rhai strings can be directly modified character-by-character
|
||
// (disabled with 'no_index')
|
||
record[4] = '\x58'; // 0x58 = 'X'
|
||
record == "Bob X. Davis: age 42 ❤\n";
|
||
|
||
// Use 'in' to test if a substring (or character) exists in a string
|
||
"Davis" in record == true;
|
||
'X' in record == true;
|
||
'C' in record == false;
|
||
|
||
// Strings can be iterated with a 'for' statement, yielding characters
|
||
for ch in record {
|
||
print(ch);
|
||
}
|
||
```
|