reorganize module
This commit is contained in:
348
_archive/rhai_engine/rhaibook/ref/strings-chars.md
Normal file
348
_archive/rhai_engine/rhaibook/ref/strings-chars.md
Normal file
@@ -0,0 +1,348 @@
|
||||
Strings and Characters
|
||||
======================
|
||||
|
||||
String in Rhai contain any text sequence of valid Unicode characters.
|
||||
|
||||
[`type_of()`](type-of.md) a string returns `"string"`.
|
||||
|
||||
|
||||
String and Character Literals
|
||||
-----------------------------
|
||||
|
||||
String and character literals follow JavaScript-style syntax.
|
||||
|
||||
| Type | Quotes | Escapes? | Continuation? | Interpolation? |
|
||||
| ------------------------- | :-------------: | :------: | :-----------: | :------------: |
|
||||
| Normal string | `"..."` | yes | with `\` | **no** |
|
||||
| Raw string | `#..#"..."#..#` | **no** | **no** | **no** |
|
||||
| Multi-line literal string | `` `...` `` | **no** | **no** | with `${...}` |
|
||||
| Character | `'...'` | yes | **no** | **no** |
|
||||
|
||||
```admonish tip.small "Tip: Building strings"
|
||||
|
||||
Strings can be built up from other strings and types via the `+` or `+=` operators.
|
||||
```
|
||||
|
||||
|
||||
Standard Escape Sequences
|
||||
-------------------------
|
||||
|
||||
~~~admonish tip.side "Tip: Character `to_int()`"
|
||||
|
||||
Use the `to_int` method to convert a Unicode character into its 32-bit Unicode encoding.
|
||||
~~~
|
||||
|
||||
There is built-in support for Unicode (`\u`_xxxx_ or `\U`_xxxxxxxx_) and hex (`\x`_xx_) escape
|
||||
sequences for normal strings and characters.
|
||||
|
||||
Hex sequences map to ASCII characters, while `\u` maps to 16-bit common Unicode code points and `\U`
|
||||
maps the full, 32-bit extended Unicode code points.
|
||||
|
||||
Escape sequences are not supported for multi-line literal strings wrapped by back-ticks (`` ` ``).
|
||||
|
||||
| Escape sequence | Meaning |
|
||||
| --------------- | -------------------------------- |
|
||||
| `\\` | back-slash (`\`) |
|
||||
| `\t` | tab |
|
||||
| `\r` | carriage-return (`CR`) |
|
||||
| `\n` | line-feed (`LF`) |
|
||||
| `\"` or `""` | double-quote (`"`) |
|
||||
| `\'` | single-quote (`'`) |
|
||||
| `\x`_xx_ | ASCII character in 2-digit hex |
|
||||
| `\u`_xxxx_ | Unicode character in 4-digit hex |
|
||||
| `\U`_xxxxxxxx_ | Unicode character in 8-digit hex |
|
||||
|
||||
|
||||
Line Continuation
|
||||
-----------------
|
||||
|
||||
For a normal string wrapped by double-quotes (`"`), a back-slash (`\`) character at the end of a
|
||||
line indicates that the string continues onto the next line _without any line-break_.
|
||||
|
||||
Whitespace up to the indentation of the opening double-quote is ignored in order to enable lining up
|
||||
blocks of text.
|
||||
|
||||
Spaces are _not_ added, so to separate one line with the next with a space, put a space before the
|
||||
ending back-slash (`\`) character.
|
||||
|
||||
```rust
|
||||
let x = "hello, world!\
|
||||
hello world again! \
|
||||
this is the ""last"" time!!!";
|
||||
// ^^^^^^ these whitespaces are ignored
|
||||
|
||||
// The above is the same as:
|
||||
let x = "hello, world!hello world again! this is the \"last\" time!!!";
|
||||
```
|
||||
|
||||
A string with continuation does not open up a new line. To do so, a new-line character must be
|
||||
manually inserted at the appropriate position.
|
||||
|
||||
```rust
|
||||
let x = "hello, world!\n\
|
||||
hello world again!\n\
|
||||
this is the last time!!!";
|
||||
|
||||
// The above is the same as:
|
||||
let x = "hello, world!\nhello world again!\nthis is the last time!!!";
|
||||
```
|
||||
|
||||
~~~admonish warning.small "No ending quote before the line ends is a syntax error"
|
||||
|
||||
If the ending double-quote is omitted, it is a syntax error.
|
||||
|
||||
```rust
|
||||
let x = "hello
|
||||
# ";
|
||||
// ^ syntax error: unterminated string literal
|
||||
```
|
||||
~~~
|
||||
|
||||
```admonish question.small "Why not go multi-line?"
|
||||
|
||||
Technically speaking, there is no difficulty in allowing strings to run for multiple lines
|
||||
_without_ the continuation back-slash.
|
||||
|
||||
Rhai forces you to manually mark a continuation with a back-slash because the ending quote is easy to omit.
|
||||
Once it happens, the entire remainder of the script would become one giant, multi-line string.
|
||||
|
||||
This behavior is different from Rust, where string literals can run for multiple lines.
|
||||
```
|
||||
|
||||
|
||||
Raw Strings
|
||||
-----------
|
||||
|
||||
A _raw string_ is any text enclosed by a pair of double-quotes (`"`), wrapped by hash (`#`) characters.
|
||||
|
||||
The number of hash (`#`) on each side must be the same.
|
||||
|
||||
Any text inside the double-quotes, as long as it is not a double-quote (`"`) followed by the same
|
||||
number of hash (`#`) characters, is simply copied verbatim, _including control codes and/or
|
||||
line-breaks_.
|
||||
|
||||
Raw strings are very useful for embedded regular expressions, file paths, and program code etc.
|
||||
|
||||
```rust
|
||||
let x = #"Hello, I am a raw string! which means that I can contain
|
||||
line-breaks, \ slashes (not escapes), "quotes" and even # characters!"#
|
||||
|
||||
// Use more than one '#' if you happen to have '"###...' inside the string...
|
||||
|
||||
let x = ###"In Rhai, you can write ##"hello"## as a raw string."###;
|
||||
// ^^^ this is not the end of the raw string
|
||||
```
|
||||
|
||||
|
||||
Multi-Line Literal Strings
|
||||
--------------------------
|
||||
|
||||
A string wrapped by a pair of back-tick (`` ` ``) characters is interpreted _literally_,
|
||||
meaning that every single character that lies between the two back-ticks is taken verbatim.
|
||||
This include new-lines, whitespaces, escape characters etc.
|
||||
|
||||
```js
|
||||
let x = `hello, world! "\t\x42"
|
||||
hello world again! 'x'
|
||||
this is the last time!!! `;
|
||||
|
||||
// The above is the same as:
|
||||
let x = "hello, world! \"\\t\\x42\"\n hello world again! 'x'\n this is the last time!!! ";
|
||||
```
|
||||
|
||||
If a back-tick (`` ` ``) appears at the _end_ of a line, then it is understood that the entire text
|
||||
block starts from the _next_ line; the starting new-line character is stripped.
|
||||
|
||||
```js
|
||||
let x = `
|
||||
hello, world! "\t\x42"
|
||||
hello world again! 'x'
|
||||
this is the last time!!!
|
||||
`;
|
||||
|
||||
// The above is the same as:
|
||||
let x = " hello, world! \"\\t\\x42\"\n hello world again! 'x'\n this is the last time!!!\n";
|
||||
```
|
||||
|
||||
To actually put a back-tick (`` ` ``) character inside a multi-line literal string, use two
|
||||
back-ticks together (i.e. ` `` `).
|
||||
|
||||
```js
|
||||
let x = `I have a quote " as well as a back-tick `` here.`;
|
||||
|
||||
// The above is the same as:
|
||||
let x = "I have a quote \" as well as a back-tick ` here.";
|
||||
```
|
||||
|
||||
|
||||
String Interpolation
|
||||
--------------------
|
||||
|
||||
~~~admonish question.side.wide "What if I want `${` inside?"
|
||||
|
||||
🤦 Well, you just _have_ to ask for the impossible, don't you?
|
||||
|
||||
Currently there is no way to escape `${`. Build the string in three pieces:
|
||||
|
||||
```js
|
||||
`Interpolations start with "`
|
||||
+ "${"
|
||||
+ `" and end with }.`
|
||||
```
|
||||
~~~
|
||||
|
||||
Multi-line literal strings support _string interpolation_ wrapped in `${` ... `}`.
|
||||
|
||||
Interpolation is not supported for normal string or character literals.
|
||||
|
||||
`${` ... `}` acts as a statements _block_ and can contain anything that is allowed within a
|
||||
statements block, including another interpolated string!
|
||||
The last result of the block is taken as the value for interpolation.
|
||||
|
||||
Rhai uses [`to_string`](convert.md) to convert any value into a string, then physically joins all
|
||||
the sub-strings together.
|
||||
|
||||
For convenience, if any interpolated value is a [BLOB](blobs.md), however, it is automatically treated as a
|
||||
UTF-8 encoded string. That is because it is rarely useful to interpolate a [BLOB](blobs.md) into a string,
|
||||
but extremely useful to be able to directly manipulate UTF-8 encoded text.
|
||||
|
||||
```js
|
||||
let x = 42;
|
||||
let y = 123;
|
||||
|
||||
let s = `x = ${x} and y = ${y}.`; // <- interpolated string
|
||||
|
||||
let s = ("x = " + {x} + " and y = " + {y} + "."); // <- de-sugars to this
|
||||
|
||||
s == "x = 42 and y = 123.";
|
||||
|
||||
let s = `
|
||||
Undeniable logic:
|
||||
1) Hello, ${let w = `${x} world`; if x > 1 { w += "s" } w}!
|
||||
2) If ${y} > ${x} then it is ${y > x}!
|
||||
`;
|
||||
|
||||
s == "Undeniable logic:\n1) Hello, 42 worlds!\n2) If 123 > 42 then it is true!\n";
|
||||
|
||||
let blob = blob(3, 0x21);
|
||||
|
||||
print(blob); // prints [212121]
|
||||
|
||||
print(`Data: ${blob}`); // prints "Data: !!!"
|
||||
// BLOB is treated as UTF-8 encoded string
|
||||
|
||||
print(`Data: ${blob.to_string()}`); // prints "Data: [212121]"
|
||||
```
|
||||
|
||||
|
||||
Indexing
|
||||
--------
|
||||
|
||||
Strings can be _indexed_ into to get access to any individual character.
|
||||
This is similar to many modern languages but different from Rust.
|
||||
|
||||
### From beginning
|
||||
|
||||
Individual characters within a string can be accessed with zero-based, non-negative integer indices:
|
||||
|
||||
> _string_ `[` _index from 0 to (total number of characters − 1)_ `]`
|
||||
|
||||
### From end
|
||||
|
||||
A _negative_ index accesses a character in the string counting from the _end_, with −1 being the
|
||||
_last_ character.
|
||||
|
||||
> _string_ `[` _index from −1 to −(total number of characters)_ `]`
|
||||
|
||||
```admonish warning.small "Character indexing can be SLOOOOOOOOW"
|
||||
|
||||
Internally, a Rhai string is still stored compactly as a Rust UTF-8 string in order to save memory.
|
||||
|
||||
Therefore, getting the character at a particular index involves walking through the entire UTF-8
|
||||
encoded bytes stream to extract individual Unicode characters, counting them on the way.
|
||||
|
||||
Because of this, indexing can be a _slow_ procedure, especially for long strings.
|
||||
Along the same lines, getting the _length_ of a string (which returns the number of characters, not
|
||||
bytes) can also be slow.
|
||||
```
|
||||
|
||||
|
||||
Sub-Strings
|
||||
-----------
|
||||
|
||||
Sub-strings, or _slices_ in some programming languages, are parts of strings.
|
||||
|
||||
In Rhai, a sub-string can be specified by indexing with a [range](ranges.md) of characters:
|
||||
|
||||
> _string_ `[` _first character (starting from zero)_ `..` _last character (exclusive)_ `]`
|
||||
>
|
||||
> _string_ `[` _first character (starting from zero)_ `..=` _last character (inclusive)_ `]`
|
||||
|
||||
Sub-string [ranges](ranges.md) always start from zero counting towards the end of the string.
|
||||
Negative [ranges](ranges.md) are not supported.
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
```js
|
||||
let name = "Bob";
|
||||
let middle_initial = 'C';
|
||||
let last = "Davis";
|
||||
|
||||
let full_name = `${name} ${middle_initial}. ${last}`;
|
||||
full_name == "Bob C. Davis";
|
||||
|
||||
// String building with different types
|
||||
let age = 42;
|
||||
let record = `${full_name}: age ${age}`;
|
||||
record == "Bob C. Davis: age 42";
|
||||
|
||||
// Unlike Rust, Rhai strings can be indexed to get a character
|
||||
// (disabled with 'no_index')
|
||||
let c = record[4];
|
||||
c == 'C'; // single character
|
||||
|
||||
let slice = record[4..8]; // sub-string slice
|
||||
slice == " C. D";
|
||||
|
||||
ts.s = record; // custom type properties can take strings
|
||||
|
||||
let c = ts.s[4];
|
||||
c == 'C';
|
||||
|
||||
let c = ts.s[-4]; // negative index counts from the end
|
||||
c == 'e';
|
||||
|
||||
let c = "foo"[0]; // indexing also works on string literals...
|
||||
c == 'f';
|
||||
|
||||
let c = ("foo" + "bar")[5]; // ... and expressions returning strings
|
||||
c == 'r';
|
||||
|
||||
let text = "hello, world!";
|
||||
text[0] = 'H'; // modify a single character
|
||||
text == "Hello, world!";
|
||||
|
||||
text[7..=11] = "Earth"; // modify a sub-string slice
|
||||
text == "Hello, Earth!";
|
||||
|
||||
// Escape sequences in strings
|
||||
record += " \u2764\n"; // escape sequence of '❤' in Unicode
|
||||
record == "Bob C. Davis: age 42 ❤\n"; // '\n' = new-line
|
||||
|
||||
// Unlike Rust, Rhai strings can be directly modified character-by-character
|
||||
// (disabled with 'no_index')
|
||||
record[4] = '\x58'; // 0x58 = 'X'
|
||||
record == "Bob X. Davis: age 42 ❤\n";
|
||||
|
||||
// Use 'in' to test if a substring (or character) exists in a string
|
||||
"Davis" in record == true;
|
||||
'X' in record == true;
|
||||
'C' in record == false;
|
||||
|
||||
// Strings can be iterated with a 'for' statement, yielding characters
|
||||
for ch in record {
|
||||
print(ch);
|
||||
}
|
||||
```
|
Reference in New Issue
Block a user