Julia Strings
A string (English: string) is a finite sequence composed of zero or more characters. It is a data type in programming languages used to represent text.
In Julia, single quotes '
are typically used to create single characters, while double quotes "
or triple quotes """
are used to create strings. For example:
c = 'x'
str = "tutorialpro"
tutorialpro = """tutorialpro.org "tutorialpro",contains a single quote"""
Characteristics of Julia String Types:
The built-in concrete type for strings (and string literals) in Julia is String.
All string types in Julia are subtypes of the abstract type AbstractString.
Julia has an excellent type for representing single characters, namely AbstractChar. Char is a built-in subtype of AbstractChar, which is a 32-bit primitive type capable of representing any Unicode character (based on UTF-8 encoding).
Julia strings are immutable—the value of any AbstractString object cannot be changed.
Characters
Individual characters are represented by Char values.
Char is a 32-bit primitive type that can be converted to its corresponding integer value, the Unicode code:
Example
julia> c = 'x'
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
julia> typeof(c)
Char
julia> c = Int('x')
120
julia> typeof(c)
Int64
We can also convert an integer value to a Char:
Example
julia> Char(97)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> Char(120)
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
We can perform comparisons and limited arithmetic operations on Char values:
Example
julia> 'A' < 'a'
true
julia> 'A' <= 'a' <= 'Z'
false
julia> 'A' <= 'X' <= 'Z'
true
julia> 'x' - 'a'
23
julia> 'A' + 1
'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
Strings
Strings in Julia can be declared using double quotes "
or triple double quotes """
. If you need to use quotes within a string, you can use triple quotes to do so, as shown below:
Example
julia> str = "tutorialpro"
"tutorialpro"
julia> tutorialpro = """tutorialpro.org "tutorialpro",contains a single quote"""
"tutorialpro.org \"tutorialpro\",contains a single quote"
If a string is too long, we can use a backslash \
to split it:
Example
julia> "This is a long \
line"
"This is a long line"
We can use indexing to access specific characters in a string, with the starting position being 1
or begin
, and the ending position being end
:
Example
julia> str = "tutorialpro"
"tutorialpro"
julia> str[begin]
'R': ASCII/Unicode U+0052 (category Lu: Letter, uppercase)
julia> str[1]
'R': ASCII/Unicode U+0052 (category Lu: Letter, uppercase)
julia> str[2]
'U': ASCII/Unicode U+0055 (category Lu: Letter, uppercase)
julia> str[end]
'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
julia> str[end-1]
'O': ASCII/Unicode U+004F (category Lu: Letter, uppercase)
We can use range indexing to extract substrings:
Example
julia> str = "tutorialpro"
"tutorialpro"
julia> str[begin:end]
"tutorialpro"
julia> str[begin:end-1]
"RUNOO"
julia> str[2:5]
"UNOO"
Additionally, the expressions str[k]
and str[k:k]
yield different results. The former retrieves the character at the specified position using indexing, and its type is Char, while the latter reads the string within the given range, which happens to be a string containing only one character:
Example
julia> str[6]
'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
julia> str[6:6]
"B"
Range slicing can also be achieved using the SubString method, for example:
Example
julia> str = "long string"
"long string"
julia> substr = SubString(str, 1, 4)
"long"
julia> typeof(substr)
SubString{String}
String Concatenation
Example
julia> greet = "Hello"
"Hello"
julia> whom = "world"
"world"
julia> string(greet, ", ", whom, ".\n")
"Hello, world.\n"
Interpolation
Concatenating strings can sometimes be cumbersome. To reduce redundant calls to string
or repetitive multiplications, Julia allows interpolation into string literals using $
similar to Perl:
Example
julia> "$greet, $whom.\n"
"Hello, world.\n"
This is more readable and convenient, and equivalent to the string concatenation above—the system rewrites this apparent single-line string literal into a concatenation of string literals with arguments string(greet, ", ", whom, ".\n")
.
The shortest complete expression after $
is considered the expression whose value is to be inserted into the string. Therefore, you can insert any expression into the string using parentheses:
Example
julia> "1 + 2 = $(1 + 2)"
"1 + 2 = 3"
Both concatenation and interpolation call string
to convert objects to string form. However, string
actually just returns the output of print
, so new types should add print
or show
methods rather than a string
method.
Most non-AbstractString objects are converted to strings closely corresponding to how they are entered as text:
Example
julia> v = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> "v: $v"
"v: [1, 2, 3]"
string
is the identity for AbstractString and AbstractChar values, so they are inserted into strings as themselves, without quotation marks or escaping:
Example
julia> c = 'x'
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
julia> "hi, $c"
"hi, x"
To include a literal $
in a string literal, escape it with a backslash:
Example
julia> print("I have \$100 in my account.\n")
I have $100 in my account.
Unicode and UTF-8
Julia fully supports Unicode characters and strings.
In character literals, Unicode codes can be represented with Unicode \u
and \U
escape sequences, or with all standard C escape sequences. These can also be used to write string literals:
Example
julia> s = "\u2200 x \u2203 y"
"∀ x ∃ y"
These Unicode characters display as escapes or special characters depending on your terminal's locale settings and its support for Unicode. String literals are encoded in UTF-8. UTF-8 is a variable-length encoding, meaning not all characters are encoded with the same number of bytes. In UTF-8, ASCII characters (those less than 0x80 (128)) are encoded with a single byte as in ASCII; characters 0x80 and above are encoded with up to 4 bytes.
Triple-Quoted String Literals
Triple-quoted """..."""
strings provide convenience for creating longer and more complex strings, allowing for easy use of newlines, quotes, and indentation without special handling.
Example
julia> str = """
Hello,
world.
"""
" Hello,\n world.\n"
String Comparison
We can compare strings lexicographically using comparison operators:
Example
julia> "abracadabra" < "xylophone"
true
julia> "abracadabra" == "xylophone"
false
julia> "Hello, world." != "Goodbye, world."
true
julia> "1 + 2 = 3" == "1 + 2 = $(1 + 2)"
true
You can use the findfirst
and findlast
functions to search for the index of specific characters:
Example
julia> findfirst(isequal('o'), "xylophone")
4
julia> findlast(isequal('o'), "xylophone")
7
julia> findfirst(isequal('z'), "xylophone")
You can also use findnext
and findprev
functions with a third argument to search for characters starting from a given offset:
Example
julia> findnext(isequal('o'), "xylophone", 1)
4
julia> findnext(isequal('o'), "xylophone", 5)
7
julia> findprev(isequal('o'), "xylophone", 5)
4
julia> findnext(isequal('o'), "xylophone", 8)
You can use the occursin
function to check if a substring is found within a string.
Example
julia> occursin("world", "Hello, world.")
true
julia> occursin("o", "Xylophon")
true
julia> occursin("a", "Xylophon")
false
julia> occursin('o', "Xylophon")
true
The last example shows that occursin
can also be used to search for character literals.
There are also two convenient string functions, repeat
and join
:
Example
julia> repeat(".:Z:.", 10)
".:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:."
julia> join(["apples", "bananas", "pineapples"], ", ", " and ")
"apples, bananas and pineapples"
Other useful functions include:
firstindex(str)
- Gives the smallest (byte) index that can be used to index intostr
(this is always 1 for strings, but not necessarily for other containers).lastindex(str)
- Gives the largest (byte) index that can be used to index intostr
.length(str)
- The number of characters instr
.length(str, i, j)
- The number of valid character indices instr
fromi
toj
.ncodeunits(str)
- The number of code units (code points) in the string.codeunit(str, i)
- Gives the value of the code unit at indexi
in stringstr
.thisind(str, i)
- Given any index into a string, finds the first index of the character it falls within.nextind(str, i, n=1)
- Finds the start of then
th character after the given indexi
.prevind(str, i, n=1)
- Finds the start of then
th character before the given indexi
.
Raw String Literals
Raw strings without interpolation and unescaped characters can be represented with the non-standard string literal form raw"..."
. Raw string literals produce ordinary String
objects that contain the enclosed contents exactly as entered with no interpolation or unescaping. This is useful for strings that contain other languages or markup that use "
or \"
as special characters.
The exception is that quotes still need to be escaped, such as raw"\""
which is equivalent to "\""
. To be able to express all strings, backslashes must also be escaped, but only when they immediately precede a quote.
julia> println(raw"\\ \\\"")
\\ \"
Note that the first two backslashes are displayed verbatim in the output because they are not preceding a quote. However, the next backslash escapes the following backslash; and since these backslashes are preceding a quote, the last backslash escapes a quote.