Skip to content

Commit

Permalink
Merge pull request #1562 from chorman0773/spec-add-identifiers-behavi…
Browse files Browse the repository at this point in the history
…our-considered-undefined

Add spec identifiers to behaviour-considered-undefined.md
  • Loading branch information
traviscross authored Oct 21, 2024
2 parents cf88463 + a8b8c49 commit 369f949
Showing 1 changed file with 92 additions and 12 deletions.
104 changes: 92 additions & 12 deletions src/behavior-considered-undefined.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,49 @@
## Behavior considered undefined

r[undefined]

r[undefined.general]
Rust code is incorrect if it exhibits any of the behaviors in the following
list. This includes code within `unsafe` blocks and `unsafe` functions.
`unsafe` only means that avoiding undefined behavior is on the programmer; it
does not change anything about the fact that Rust programs must never cause
undefined behavior.

r[undefined.soundness]
It is the programmer's responsibility when writing `unsafe` code to ensure that
any safe code interacting with the `unsafe` code cannot trigger these
behaviors. `unsafe` code that satisfies this property for any safe client is
called *sound*; if `unsafe` code can be misused by safe code to exhibit
undefined behavior, it is *unsound*.

> [!WARNING]
> The following list is not exhaustive; it may grow or shrink. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. We also reserve the right to make some of the behavior in that list defined in the future. In other words, this list does not say that anything will *definitely* always be undefined in all future Rust version (but we might make such commitments for some list items in the future).
>
> Please read the [Rustonomicon] before writing unsafe code.
<div class="warning">

***Warning:*** The following list is not exhaustive; it may grow or shrink.
There is no formal model of Rust's semantics for what is and is not allowed in
unsafe code, so there may be more behavior considered unsafe. We also reserve
the right to make some of the behavior in that list defined in the future. In
other words, this list does not say that anything will *definitely* always be
undefined in all future Rust version (but we might make such commitments for
some list items in the future).

Please read the [Rustonomicon] before writing unsafe code.

</div>

r[undefined.race]
* Data races.

r[undefined.pointer-access]
* Accessing (loading from or storing to) a place that is [dangling] or [based on
a misaligned pointer].

r[undefined.place-projection]
* Performing a place projection that violates the requirements of [in-bounds
pointer arithmetic](pointer#method.offset). A place projection is a [field
expression][project-field], a [tuple index expression][project-tuple], or an
[array/slice index expression][project-slice].

r[undefined.alias]
* Breaking the [pointer aliasing rules]. `Box<T>`, `&mut T` and `&T` follow
LLVM’s scoped [noalias] model, except if the `&T` contains an
[`UnsafeCell<U>`]. References and boxes must not be [dangling] while they are
Expand All @@ -40,23 +60,37 @@ undefined behavior, it is *unsound*.
All this also applies when values of these
types are passed in a (nested) field of a compound type, but not behind
pointer indirections.

r[undefined.immutable]
* Mutating immutable bytes.
All bytes reachable through a [const-promoted] expression are immutable, as well as bytes reachable through borrows in `static` and `const` initializers that have been [lifetime-extended] to `'static`.
The bytes owned by an immutable binding or immutable `static` are immutable, unless those bytes are part of an [`UnsafeCell<U>`].

Moreover, the bytes [pointed to] by a shared reference, including transitively through other references (both shared and mutable) and `Box`es, are immutable; transitivity includes those references stored in fields of compound types.

A mutation is any write of more than 0 bytes which overlaps with any of the relevant bytes (even if that write does not change the memory contents).

r[undefined.intrinsic]
* Invoking undefined behavior via compiler intrinsics.

r[undefined.target-feature]
* Executing code compiled with platform features that the current platform
does not support (see [`target_feature`]), *except* if the platform explicitly documents this to be safe.

r[undefined.call]
* Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI.

r[undefined.invalid]
* Producing an [invalid value][invalid-values]. "Producing" a
value happens any time a value is assigned to or read from a place, passed to
a function/primitive operation or returned from a function/primitive
operation.

r[undefined.asm]
* Incorrect use of inline assembly. For more details, refer to the [rules] to
follow when writing code that uses inline assembly.

r[undefined.const-transmute-ptr2int]
* **In [const context](const_eval.md#const-context)**: transmuting or otherwise
reinterpreting a pointer (reference, raw pointer, or function pointer) into
some allocated object as a non-pointer type (such as integers).
Expand All @@ -71,11 +105,15 @@ undefined behavior, it is *unsound*.
### Pointed-to bytes

r[undefined.pointed-to]
The span of bytes a pointer or reference "points to" is determined by the pointer value and the size of the pointee type (using `size_of_val`).

### Places based on misaligned pointers
[based on a misaligned pointer]: #places-based-on-misaligned-pointers

r[undefined.misaligned]

r[undefined.misaligned.general]
A place is said to be "based on a misaligned pointer" if the last `*` projection
during place computation was performed on a pointer that was not aligned for its
type. (If there is no `*` projection in the place expression, then this is
Expand All @@ -93,75 +131,117 @@ alignment 1). In other words, the alignment requirement derives from the type of
the pointer that was dereferenced, *not* the type of the field that is being
accessed.

Note that a place based on a misaligned pointer only leads to undefined behavior
when it is loaded from or stored to. `&raw const`/`&raw mut` on such a place
is allowed. `&`/`&mut` on a place requires the alignment of the field type (or
r[undefined.misaligned.load-store]
Note that a place based on a misaligned pointer only leads to Undefined Behavior
when it is loaded from or stored to.

r[undefined.misaligned.raw]
`&raw const`/`&raw mut` on such a place is allowed.

r[undefined.misaligned.reference]
`&`/`&mut` on a place requires the alignment of the field type (or
else the program would be "producing an invalid value"), which generally is a
less restrictive requirement than being based on an aligned pointer. Taking a
reference will lead to a compiler error in cases where the field type might be
less restrictive requirement than being based on an aligned pointer.

r[undefined.misaligned.packed]
Taking a reference will lead to a compiler error in cases where the field type might be
more aligned than the type that contains it, i.e., `repr(packed)`. This means
that being based on an aligned pointer is always sufficient to ensure that the
new reference is aligned, but it is not always necessary.

### Dangling pointers
[dangling]: #dangling-pointers

r[undefined.dangling]

r[undefined.dangling.general]
A reference/pointer is "dangling" if not all of the bytes it
[points to] are part of the same live allocation (so in particular they all have to be
part of *some* allocation).

r[undefined.dangling.zero-size]
If the size is 0, then the pointer is trivially never "dangling"
(even if it is a null pointer).

r[undefined.dangling.dynamic-size]
Note that dynamically sized types (such as slices and strings) point to their
entire range, so it is important that the length metadata is never too large. In
particular, the dynamic size of a Rust value (as determined by `size_of_val`)
entire range, so it is important that the length metadata is never too large.

r[undefined.dangling.alloc-limit]
In particular, the dynamic size of a Rust value (as determined by `size_of_val`)
must never exceed `isize::MAX`, since it is impossible for a single allocation
to be larger than `isize::MAX`.

### Invalid values
[invalid-values]: #invalid-values

r[undefined.validity]

r[undefined.validity.general]
The Rust compiler assumes that all values produced during program execution are
"valid", and producing an invalid value is hence immediate UB.

Whether a value is valid depends on the type:

r[undefined.validity.bool]
* A [`bool`] value must be `false` (`0`) or `true` (`1`).

r[undefined.validity.fn-pointer]
* A `fn` pointer value must be non-null.

r[undefined.validity.char]
* A `char` value must not be a surrogate (i.e., must not be in the range `0xD800..=0xDFFF`) and must be equal to or less than `char::MAX`.

r[undefined.validity.never]
* A `!` value must never exist.

r[undefined.validity.scalar]
* An integer (`i*`/`u*`), floating point value (`f*`), or raw pointer must be
initialized, i.e., must not be obtained from [uninitialized memory][undef].

r[undefined.validity.str]
* A `str` value is treated like `[u8]`, i.e. it must be initialized.

r[undefined.validity.enum]
* An `enum` must have a valid discriminant, and all fields of the variant indicated by that discriminant must be valid at their respective type.

r[undefined.validity.struct]
* A `struct`, tuple, and array requires all fields/elements to be valid at their respective type.

r[undefined.validity.union]
* For a `union`, the exact validity requirements are not decided yet.
Obviously, all values that can be created entirely in safe code are valid.
If the union has a zero-sized field, then every possible value is valid.
Further details are [still being debated](https://github.com/rust-lang/unsafe-code-guidelines/issues/438).

r[undefined.validity.reference-box]
* A reference or [`Box<T>`] must be aligned, it cannot be [dangling], and it must point to a valid value
(in case of dynamically sized types, using the actual dynamic type of the
pointee as determined by the metadata).
Note that the last point (about pointing to a valid value) remains a subject of some debate.

r[undefined.validity.wide]
* The metadata of a wide reference, [`Box<T>`], or raw pointer must match
the type of the unsized tail:
* `dyn Trait` metadata must be a pointer to a compiler-generated vtable for `Trait`.
(For raw pointers, this requirement remains a subject of some debate.)
* Slice (`[T]`) metadata must be a valid `usize`.
Furthermore, for wide references and [`Box<T>`], slice metadata is invalid
if it makes the total size of the pointed-to value bigger than `isize::MAX`.

r[undefined.validity.valid-range]
* If a type has a custom range of a valid values, then a valid value must be in that range.
In the standard library, this affects [`NonNull<T>`] and [`NonZero<T>`].

> **Note**: `rustc` achieves this with the unstable
> `rustc_layout_scalar_valid_range_*` attributes.
r[undefined.validity.undef]
**Note:** Uninitialized memory is also implicitly invalid for any type that has
a restricted set of valid values. In other words, the only cases in which
reading uninitialized memory is permitted are inside `union`s and in "padding"
(the gaps between the fields of a type).


[`bool`]: types/boolean.md
[`const`]: items/constant-items.md
[noalias]: http://llvm.org/docs/LangRef.html#noalias
Expand Down

0 comments on commit 369f949

Please sign in to comment.