Rust Ownership
Computer programs must manage the memory resources they use at runtime.
Most programming languages have features for managing memory:
Languages like C/C++ primarily manage memory manually, requiring developers to manually allocate and deallocate memory resources. However, to improve development efficiency, many developers do not have the habit of releasing memory promptly as long as it does not affect the functionality of the program. Therefore, manual memory management often leads to resource waste.
Programs written in Java run in a virtual machine (JVM), which has the capability to automatically reclaim memory resources. However, this often reduces runtime efficiency, so the JVM tries to reclaim resources as little as possible, which can also result in programs occupying larger memory resources.
Ownership is a novel concept for most developers, designed as a syntactic mechanism in Rust for efficient memory usage. The concept of ownership was created to enable Rust to more effectively analyze the usefulness of memory resources at compile time to achieve memory management.
Ownership Rules
Ownership has the following three rules:
- Each value in Rust has a variable that is its owner.
- There can only be one owner at a time.
- When the owner goes out of scope, the value will be dropped.
These three rules are the foundation of the ownership concept.
Next, we will introduce concepts related to ownership.
Variable Scope
We describe the concept of variable scope with the following program:
{
// Variable s is invalid before declaration
let s = "tutorialpro";
// This is the available scope of variable s
}
// The scope of variable s has ended, and it is now invalid
The scope of a variable is an attribute that represents the valid domain of the variable, which starts from its declaration and ends with the closure of its scope.
Memory and Allocation
When we define a variable and assign it a value, the value of the variable exists in memory. This is common. However, if the length of the data we need to store is uncertain (such as a string of user input), we cannot define the data length at the time of declaration, and thus we cannot allocate a fixed-length memory space for data storage during the compilation phase. (Some suggest allocating as large a space as possible, but this method is not elegant). This requires a mechanism for the program to request memory usage at runtime—the heap. All the "memory resources" discussed in this chapter refer to the memory space occupied by the heap.
There is allocation, and there is deallocation; a program cannot always occupy a certain memory resource. Therefore, the key factor in determining whether a resource is wasted is whether the resource is released in a timely manner.
We rewrite the string example program in C:
{
char *s = strdup("tutorialpro");
free(s); // Release the resource of s
}
It is clear that Rust does not call the free function to release the resource of the string s (I know this is incorrect in C because "tutorialpro" is not on the heap, but let's assume it is). Rust does not explicitly release steps because the Rust compiler automatically adds a call to the resource release function when the variable scope ends.
This mechanism seems simple: it merely helps programmers add a resource release function call in the appropriate place. However, this simple mechanism can effectively solve one of the most headache-inducing programming problems.
Ways Variables and Data Interact
There are mainly two ways for variables to interact with data: Move and Clone.
Move
Multiple variables can interact with the same data in different ways in Rust:
let x = 5;
let y = x;
This program binds the value 5 to the variable x, then copies the value of x and assigns it to the variable y. Now there will be two values of 5 on the stack. In this case, the data is of the "primitive data" type, which does not need to be stored on the heap, and the "move" method of data on the stack is direct copying, which does not take more time or storage space. The "primitive data" types include:
- All integer types, such as i32, u32, i64, etc.
- The boolean type bool, with values true or false.
- All floating-point types, f32 and f64.
- The character type char.
- Tuples containing only the above types of data.
But if the interacting data is on the heap, it is a different situation:
let s1 = String::from("hello");
let s2 = s1;
The first step creates a String object with the value "hello". The "hello" can be considered as data with an uncertain length, which needs to be stored on the heap.
The second step is slightly different (this is not entirely true, just for reference):
As shown in the figure: two String objects are on the stack, each with a pointer pointing to the "hello" string on the heap. When assigning to s2, only the data on the stack is copied, and the string on the heap remains the original string.
As we mentioned earlier, when a variable goes out of scope, Rust automatically calls the resource release function and cleans up the heap memory of that variable. However, if both s1 and s2 are released, the "hello" on the heap is released twice, which is not allowed by the system. To ensure safety, s1 becomes invalid when it is assigned to s2. That's right, after assigning the value of s1 to s2, s1 can no longer be used. The following program is incorrect:
let s1 = String::from("hello"); let s2 = s1; println!("{}, world!", s1); // Error! s1 is no longer valid
So the reality is:
s1 is now invalid.
---
## Cloning
Rust aims to minimize the runtime cost of the program, so by default, larger data is stored on the heap and data is interacted with using a move mechanism. However, if you need to simply copy the data for other uses, you can use the second method of data interaction—cloning.
## Example
fn main() { let s1 = String::from("hello"); let s2 = s1.clone(); println!("s1 = {}, s2 = {}", s1, s2); }
Running result:
s1 = hello, s2 = hello
Here, the "hello" in the heap is truly copied, so both s1 and s2 are bound to a value separately, and they will be treated as two resources when released.
Of course, cloning should only be used when copying is necessary, as copying data incurs more time cost.
---
## Ownership Mechanism Involving Functions
This is the most complex scenario for variables.
If a variable is passed as a function argument to another function, how is ownership safely handled?
The following program describes the principle of ownership in this situation:
## Example
fn main() { let s = String::from("hello"); // s is declared valid
takes_ownership(s);
// s's value is passed into the function
// so s can be considered moved and is now invalid
let x = 5;
// x is declared valid
makes_copy(x);
// x's value is passed into the function
// but x is a primitive type and remains valid
// x can still be used here but not s
} // function ends, x is invalid, then s. But s is moved, so it doesn't need to be freed
fn takes_ownership(some_string: String) { // a String parameter some_string is passed in, valid println!("{}", some_string); } // function ends, parameter some_string is freed here
fn makes_copy(some_integer: i32) { // an i32 parameter some_integer is passed in, valid println!("{}", some_integer); } // function ends, parameter some_integer is a primitive type, no need to free
If a variable is passed as a function argument, the effect is the same as moving.
### Ownership Mechanism for Function Return Values
## Example
fn main() { let s1 = gives_ownership(); // gives_ownership moves its return value to s1
let s2 = String::from("hello");
// s2 is declared valid
let s3 = takes_and_gives_back(s2);
// s2 is moved as an argument, s3 gets the return value ownership
} // s3 is invalid and freed, s2 is moved, s1 is invalid and freed.
fn gives_ownership() -> String { let some_string = String::from("hello"); // some_string is declared valid
return some_string;
// some_string is moved as the return value out of the function
}
fn takes_and_gives_back(a_string: String) -> String { // a_string is declared valid
a_string // a_string is moved as the return value out of the function
}
Variables that are function return values will have their ownership moved out of the function and returned to the calling location, and will not be directly invalidated and freed.
---
## References and Borrowing
References are a concept familiar to C++ developers.
If you are familiar with pointers, you can think of them as a kind of pointer.
In essence, "references" are an indirect way to access variables.
## Example
fn main() { let s1 = String::from("hello"); let s2 = &s1; println!("s1 is {}, s2 is {}", s1, s2); }
Running result:
s1 is hello, s2 is hello
The `&` operator can take a "reference" to a variable.
When the value of a variable is referenced, the variable itself is not considered invalid. This is because "referencing" does not copy the value of the variable on the stack:
The same principle applies to function parameter passing:
## Example
```rust
fn main() {
let s1 = String::from("hello");
let len = calculate_length(&s1);
println!("The length of '{}' is {}.", s1, len);
}
fn calculate_length(s: &String) -> usize {
s.len()
}
Execution result:
The length of 'hello' is 5.
References do not take ownership of the value.
References can only borrow the ownership of the value.
A reference itself is a type with a value, which records the location of another value, but the reference does not own the value it points to:
Example
fn main() {
let s1 = String::from("hello");
let s2 = &s1;
let s3 = s1;
println!("{}", s2);
}
This program is incorrect: Since s2 borrowed s1, which has moved its ownership to s3, s2 can no longer borrow the ownership of s1. If s2 needs to use the value, it must re-borrow:
Example
fn main() {
let s1 = String::from("hello");
let mut s2 = &s1;
let s3 = s1;
s2 = &s3; // Re-borrow ownership from s3
println!("{}", s2);
}
This program is correct.
Since references do not have ownership, even if they borrow ownership, they only have the right to use it (similar to renting a house).
Attempting to modify data using borrowed rights will be prevented:
Example
fn main() {
let s1 = String::from("run");
let s2 = &s1;
println!("{}", s2);
s2.push_str("oob"); // Error, forbidden to modify borrowed value
println!("{}", s2);
}
In this program, s2 attempts to modify the value of s1 and is prevented, as borrowed ownership cannot modify the owner's value.
Of course, there is also a mutable borrowing method, similar to renting a house where the landlord can modify the house structure, and the landlord grants you this right in the contract:
Example
fn main() {
let mut s1 = String::from("run");
// s1 is mutable
let s2 = &mut s1;
// s2 is a mutable reference
s2.push_str("oob");
println!("{}", s2);
}
This program is fine. We use &mut to modify the mutable reference type.
Compared to immutable references, mutable references have different permissions, and mutable references do not allow multiple references, but immutable references do:
Example
let mut s = String::from("hello");
let r1 = &mut s;
let r2 = &mut s;
println!("{}, {}", r1, r2);
This program is incorrect because there are multiple mutable references to s.
Rust's design for mutable references is mainly to prevent data access collisions in concurrent situations, which is avoided at compile time.
Since one of the necessary conditions for data access collisions is that the data is written by at least one user and read or written by at least one other user, a value cannot be referenced again when it is being mutably referenced.
Dangling References
This is a concept with a different name. If it were in a programming language with pointer concepts, it would refer to pointers that do not actually point to a truly accessible data (note, not necessarily a null pointer, but also possibly a released resource). They are like ropes without a悬挂物体, hence the term "dangling references."
"Dangling references" are not allowed in Rust, and if they occur, the compiler will detect them.
Here is a typical example of a dangling reference:
Example
fn main() {
let reference_to_nothing = dangle();
}
fn dangle() -> &String {
let s = String::from("hello");
&s
}
Obviously, as the dangle function ends, its local variable's value is not returned and is released. However, its reference is returned, and the reference no longer points to a deterministically existing value, hence it is not allowed.