Invalid expression discarding when using semicolons #317

CohenArthur · 2021-03-28T10:17:16Z

rustc does not produce any warnings for the following code:

fn main() {
    8;
}

(which is very different from this code, which does not compile with rustc either, obviously)

fn main() {
    8
}

What's happening is that, despite 8 being an "expression", the use of a semicolon afterwards allows not using it. This would be similar to something like

fn not_void() -> i32 {
    8
}

fn main() {
    not_void();
}

where discarding the return value of a function is fine, unless said function or type is marked with #[must_use].

However, gccrs does not seem to be able to discard the value of an expression (and thus transforming said expression into a "statement"). Therefore, the snippets of code above fail with the following error:

/tmp/main.rs:1:1: error: expected [()] got [<integer>] # or `got [i32]` if we use the `not_void()` function
    1 | fn main() {
      | ^

I haven't found any issues about the subject, but might not have looked hard enough! If so, I apologize

The text was updated successfully, but these errors were encountered:

github-actions · 2021-03-28T10:17:54Z

Thanks for your contribution fellow Rustacean

YizhePKU · 2021-03-30T01:34:52Z

I checked with a debugger. The syntax is parsed correctly(8 is parsed as ExprStmtWithoutBlock, not as the final expression of the block). The error happened in the stage of type resolution.

YizhePKU · 2021-03-30T03:55:15Z

The offending code is at rust-hir-type-check.cc:82. The implemented logic is that, if no trailing expression exists, the type of the last statement will be used as the type of the block, which is incorrect. The correct behavior is to use () as the type of the block whenever there's no trailing expression.

Excerpt of the offending code:

void
TypeCheckExpr::visit (HIR::BlockExpr &expr)
{
  TyTy::BaseType *block_tyty
    = new TyTy::TupleType (expr.get_mappings ().get_hirid ());

  expr.iterate_stmts ([&] (HIR::Stmt *s) mutable -> bool {
    bool is_final_stmt = expr.is_final_stmt (s);
    bool has_final_expr = expr.has_expr () && expr.tail_expr_reachable ();
    bool stmt_is_final_expr = is_final_stmt && !has_final_expr;

    auto resolved = TypeCheckStmt::Resolve (s, inside_loop);
    if (resolved == nullptr)
      {
	rust_error_at (s->get_locus_slow (), "failure to resolve type");
	return false;
      }

    if (stmt_is_final_expr)
      {
	delete block_tyty;
	block_tyty = resolved;
      }
    else if (!resolved->is_unit ())
      {
	rust_error_at (s->get_locus_slow (), "expected () got %s",
		       resolved->as_string ().c_str ());
      }

    return true;
  });

  if (expr.has_expr ())
    {
      delete block_tyty;

      block_tyty
	= TypeCheckExpr::Resolve (expr.get_final_expr ().get (), inside_loop);
    }

  infered = block_tyty->clone ();
}

@CohenArthur Maybe you'll be interested in implementing a fix? It should help you get comfortable with the codebase too.

CohenArthur · 2021-03-30T06:18:54Z

Wow thanks for the headstart @YizhePKU ! I hadn't even started to research the issue yet. I'd love to implement that fix

philberty · 2021-03-30T08:54:46Z

I've known about this bug but haven't decided if the bug is in the parser to make the ExprStmtWithoutBlock the final expression or if this can be handled as part of HIR lowering. The other side of this problem is you can have an ExprStmtWithBlock but this is the final expression such as:

fn main() -> i32 {
    let a = 1;
    if a > 10 {
        123
    } else {
       456
    }
}

philberty · 2021-03-30T08:55:00Z

@SimplyTheOther do you have any opinions on this?

CohenArthur · 2021-03-30T08:57:40Z

I've known about this bug but haven't decided if the bug is in the parser to make the ExprStmtWithoutBlock the final expression or if this can be handled as part of HIR lowering. The other side of this problem is you can have an ExprStmtWithBlock but this is the final expression such as:
fn main() -> i32 {
    let a = 1;
    if a > 10 {
        123
    } else {
       456
    }
}

In that case, wouldn't the type of the ExprStmtWithBlock resolve as an integer?

philberty · 2021-03-30T09:00:03Z

Yes I am just showing that the final expr can be an ExprStmtWithBlock. Another more awkward example is this one that is difficult:

This is a testcase deadcode1.rs

fn test2(x: i32) -> i32 {
    if x > 1 {
        return 5;
    } else {
        return 0;
    }
    return 1;
}

Usually anything that is not the final expr should ensure is UnitType but because the final return is unreachable it is not required.

YizhePKU · 2021-03-30T10:52:37Z

Related discussion from rustc: rust-lang/rust#61733

YizhePKU · 2021-03-30T11:42:14Z

After reading the reference and various discussions, here's my understanding:

There're two types of expressions: ExprWithoutBlock and ExprWithBlock
Every expression can be made into a statement by appending a semicolon
However, in case of ExprWithBlock, the semicolon can be omitted for convenience if
- Its result has type ()
- It is not the last expression of a block(in this case if you omit the semicolon it becomes the trailing expression)
If an expression(either ExprWithoutBlock or ExprWithBlock) appears at the end of a block, it is treated as the trailing expression
- If you terminate it with a semicolon it's no longer an expression, so this rule doesn't apply

rustc currently handles this by distinguishing AST statements with and without semicolons(doc). This information is then removed during the AST-to-HIR lowering process. We could follow rustc, but our current approaching of handling this in the parser seems cleaner.

SimplyTheOther · 2021-04-01T07:50:04Z

My understanding of it is that code like this:

fn main() {
    8;
}

should be parsed as ExprStmtWithoutBlock, and that the return type of the function should be ().

For code like this:

fn main() -> i32 {
    let a = 1;
    if a > 10 {
        123
    } else {
       456
    }
}

My understanding was that this was technically not allowed. According to the Rust reference, only ExprWithoutBlock is allowed as the "final expression" - if expressions should be parsed as ExprStmtWithBlock in this context, and hence would not contribute to implicit returns or the function return type.

On the other hand, I tested that code in Rust Playground, and apparently it works fine. So this actually seems to me to be a type resolution error as a result of a parser error as a result of an error in the Rust reference.

EDIT: there is an open issue on the Rust reference regarding this. The issue implies that poor wording may have caused a misinterpretation on my part.

CohenArthur · 2021-04-02T10:21:45Z

It seems that rustc does not represent the unit type directly, but rather considers it as a tuple with 0 elements.
Here is the TyKind enum in the AST, with the is_unit method which basically checks if the type is a Tuple and is empty. This makes sense, considering that a tuple (i32, u8, T) is indeed the unit type () when there are no elements. However, I believe that it would be clearer to have a clearly defined unit type in the TypeKind enum. What would you rather the approach be? Check against the emptiness of the TUPLE type or add a new UNIT type?

dkm · 2021-04-02T10:28:36Z

#286 :D
I did this more or less blindly to discover the code so have no opinion on this...

CohenArthur · 2021-04-02T10:45:55Z

#286 :D
I did this more or less blindly to discover the code so have no opinion on this...

Haha what a coincidence! Well in the end I used an empty Tuple as well, I have a fix ready and I'm cleaning it up and adding tests. This way it's closer to the rustc implementation, but I think we should also add documentation to clear that up

CohenArthur · 2021-04-02T22:39:15Z

To add to the complexity of the issue:

Both

fn function() -> i32 {
    return 15;
    return 1.2;
}

and

fn function() -> i32 {
    return 1.2;
    return 15;
}

Produce an error, meaning that the typechecker probably knows about the "expected" return type of a function block when typechecking all the statements inside said block. This is a bit different to the approach actually taken from gccrs, and needs a bit more modification than I thought.

The dilemma I'm currently facing is the following. Consider the following blocks:

{ // 1
    8;
    8
} // -> implicit return of an i32
{ // 2
    8;
    8;
} // no last expression, therefore i32 type is "discarded" and () is returned
{ // 3
    8;
    return 8;
} // no last expression, but last statement is a return, so returns an i32

You can't simply discard the types of every statement in the block, and you can't simply use the type of the last statement, because sometimes it needs to be discarded!
I wonder what the best approach to fixing this would be. Checking if the statement is a Return statement would be enough to circumvent the issue exposed by the 3 blocks mentionned before-hand, but wouldn't be enough to exactly reproduce the behavior of the rustc compiler in the first two snippets of code.

philberty · 2021-04-26T10:34:31Z

@lrh2000 is this fixed now?

lrh2000 · 2021-04-26T23:12:25Z

@lrh2000 is this fixed now?

Yes, we should close this issue. (Fixed in #380.)

philberty added the bug label Mar 30, 2021

philberty added this to the Data Structures 2 - Generics milestone Mar 30, 2021

philberty added the question label Mar 30, 2021

philberty mentioned this issue Apr 11, 2021

Fix issues about block expression evaluation #364

Closed

7 tasks

philberty linked a pull request Apr 12, 2021 that will close this issue

Fix issues about block expression evaluation #364

Closed

7 tasks

philberty removed a link to a pull request Apr 14, 2021

Fix issues about block expression evaluation #364

Closed

7 tasks

philberty removed this from the Data Structures 2 - Generics milestone Apr 14, 2021

philberty mentioned this issue Apr 17, 2021

Fix issues about block expression evaluation #380

Merged

philberty closed this as completed Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid expression discarding when using semicolons #317

Invalid expression discarding when using semicolons #317

CohenArthur commented Mar 28, 2021

github-actions bot commented Mar 28, 2021

YizhePKU commented Mar 30, 2021 •

edited

Loading

YizhePKU commented Mar 30, 2021 •

edited

Loading

CohenArthur commented Mar 30, 2021

philberty commented Mar 30, 2021

philberty commented Mar 30, 2021

CohenArthur commented Mar 30, 2021

philberty commented Mar 30, 2021

YizhePKU commented Mar 30, 2021

YizhePKU commented Mar 30, 2021

SimplyTheOther commented Apr 1, 2021 •

edited

Loading

CohenArthur commented Apr 2, 2021

dkm commented Apr 2, 2021

CohenArthur commented Apr 2, 2021 •

edited

Loading

CohenArthur commented Apr 2, 2021

philberty commented Apr 26, 2021

lrh2000 commented Apr 26, 2021 •

edited

Loading

Invalid expression discarding when using semicolons #317

Invalid expression discarding when using semicolons #317

Comments

CohenArthur commented Mar 28, 2021

github-actions bot commented Mar 28, 2021

YizhePKU commented Mar 30, 2021 • edited Loading

YizhePKU commented Mar 30, 2021 • edited Loading

CohenArthur commented Mar 30, 2021

philberty commented Mar 30, 2021

philberty commented Mar 30, 2021

CohenArthur commented Mar 30, 2021

philberty commented Mar 30, 2021

YizhePKU commented Mar 30, 2021

YizhePKU commented Mar 30, 2021

SimplyTheOther commented Apr 1, 2021 • edited Loading

CohenArthur commented Apr 2, 2021

dkm commented Apr 2, 2021

CohenArthur commented Apr 2, 2021 • edited Loading

CohenArthur commented Apr 2, 2021

philberty commented Apr 26, 2021

lrh2000 commented Apr 26, 2021 • edited Loading

YizhePKU commented Mar 30, 2021 •

edited

Loading

YizhePKU commented Mar 30, 2021 •

edited

Loading

SimplyTheOther commented Apr 1, 2021 •

edited

Loading

CohenArthur commented Apr 2, 2021 •

edited

Loading

lrh2000 commented Apr 26, 2021 •

edited

Loading