Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid expression discarding when using semicolons #317

Closed
CohenArthur opened this issue Mar 28, 2021 · 17 comments
Closed

Invalid expression discarding when using semicolons #317

CohenArthur opened this issue Mar 28, 2021 · 17 comments

Comments

@CohenArthur
Copy link
Member

rustc does not produce any warnings for the following code:

fn main() {
    8;
}

(which is very different from this code, which does not compile with rustc either, obviously)

fn main() {
    8
}

What's happening is that, despite 8 being an "expression", the use of a semicolon afterwards allows not using it. This would be similar to something like

fn not_void() -> i32 {
    8
}

fn main() {
    not_void();
}

where discarding the return value of a function is fine, unless said function or type is marked with #[must_use].

However, gccrs does not seem to be able to discard the value of an expression (and thus transforming said expression into a "statement"). Therefore, the snippets of code above fail with the following error:

/tmp/main.rs:1:1: error: expected [()] got [<integer>] # or `got [i32]` if we use the `not_void()` function
    1 | fn main() {
      | ^

I haven't found any issues about the subject, but might not have looked hard enough! If so, I apologize

@github-actions
Copy link

Thanks for your contribution fellow Rustacean

@YizhePKU
Copy link
Contributor

YizhePKU commented Mar 30, 2021

I checked with a debugger. The syntax is parsed correctly(8 is parsed as ExprStmtWithoutBlock, not as the final expression of the block). The error happened in the stage of type resolution.

@YizhePKU
Copy link
Contributor

YizhePKU commented Mar 30, 2021

The offending code is at rust-hir-type-check.cc:82. The implemented logic is that, if no trailing expression exists, the type of the last statement will be used as the type of the block, which is incorrect. The correct behavior is to use () as the type of the block whenever there's no trailing expression.

Excerpt of the offending code:

void
TypeCheckExpr::visit (HIR::BlockExpr &expr)
{
  TyTy::BaseType *block_tyty
    = new TyTy::TupleType (expr.get_mappings ().get_hirid ());

  expr.iterate_stmts ([&] (HIR::Stmt *s) mutable -> bool {
    bool is_final_stmt = expr.is_final_stmt (s);
    bool has_final_expr = expr.has_expr () && expr.tail_expr_reachable ();
    bool stmt_is_final_expr = is_final_stmt && !has_final_expr;

    auto resolved = TypeCheckStmt::Resolve (s, inside_loop);
    if (resolved == nullptr)
      {
	rust_error_at (s->get_locus_slow (), "failure to resolve type");
	return false;
      }

    if (stmt_is_final_expr)
      {
	delete block_tyty;
	block_tyty = resolved;
      }
    else if (!resolved->is_unit ())
      {
	rust_error_at (s->get_locus_slow (), "expected () got %s",
		       resolved->as_string ().c_str ());
      }

    return true;
  });

  if (expr.has_expr ())
    {
      delete block_tyty;

      block_tyty
	= TypeCheckExpr::Resolve (expr.get_final_expr ().get (), inside_loop);
    }

  infered = block_tyty->clone ();
}

@CohenArthur Maybe you'll be interested in implementing a fix? It should help you get comfortable with the codebase too.

@CohenArthur
Copy link
Member Author

Wow thanks for the headstart @YizhePKU ! I hadn't even started to research the issue yet. I'd love to implement that fix

@philberty
Copy link
Member

I've known about this bug but haven't decided if the bug is in the parser to make the ExprStmtWithoutBlock the final expression or if this can be handled as part of HIR lowering. The other side of this problem is you can have an ExprStmtWithBlock but this is the final expression such as:

fn main() -> i32 {
    let a = 1;
    if a > 10 {
        123
    } else {
       456
    }
}

@philberty
Copy link
Member

@SimplyTheOther do you have any opinions on this?

@CohenArthur
Copy link
Member Author

I've known about this bug but haven't decided if the bug is in the parser to make the ExprStmtWithoutBlock the final expression or if this can be handled as part of HIR lowering. The other side of this problem is you can have an ExprStmtWithBlock but this is the final expression such as:

fn main() -> i32 {
    let a = 1;
    if a > 10 {
        123
    } else {
       456
    }
}

In that case, wouldn't the type of the ExprStmtWithBlock resolve as an integer?

@philberty
Copy link
Member

Yes I am just showing that the final expr can be an ExprStmtWithBlock. Another more awkward example is this one that is difficult:

This is a testcase deadcode1.rs

fn test2(x: i32) -> i32 {
    if x > 1 {
        return 5;
    } else {
        return 0;
    }
    return 1;
}

Usually anything that is not the final expr should ensure is UnitType but because the final return is unreachable it is not required.

@YizhePKU
Copy link
Contributor

Related discussion from rustc: rust-lang/rust#61733

@YizhePKU
Copy link
Contributor

After reading the reference and various discussions, here's my understanding:

  • There're two types of expressions: ExprWithoutBlock and ExprWithBlock
  • Every expression can be made into a statement by appending a semicolon
  • However, in case of ExprWithBlock, the semicolon can be omitted for convenience if
    • Its result has type ()
    • It is not the last expression of a block(in this case if you omit the semicolon it becomes the trailing expression)
  • If an expression(either ExprWithoutBlock or ExprWithBlock) appears at the end of a block, it is treated as the trailing expression
    • If you terminate it with a semicolon it's no longer an expression, so this rule doesn't apply

rustc currently handles this by distinguishing AST statements with and without semicolons(doc). This information is then removed during the AST-to-HIR lowering process. We could follow rustc, but our current approaching of handling this in the parser seems cleaner.

@SimplyTheOther
Copy link
Member

SimplyTheOther commented Apr 1, 2021

My understanding of it is that code like this:

fn main() {
    8;
}

should be parsed as ExprStmtWithoutBlock, and that the return type of the function should be ().

For code like this:

fn main() -> i32 {
    let a = 1;
    if a > 10 {
        123
    } else {
       456
    }
}

My understanding was that this was technically not allowed. According to the Rust reference, only ExprWithoutBlock is allowed as the "final expression" - if expressions should be parsed as ExprStmtWithBlock in this context, and hence would not contribute to implicit returns or the function return type.

On the other hand, I tested that code in Rust Playground, and apparently it works fine. So this actually seems to me to be a type resolution error as a result of a parser error as a result of an error in the Rust reference.

EDIT: there is an open issue on the Rust reference regarding this. The issue implies that poor wording may have caused a misinterpretation on my part.

@CohenArthur
Copy link
Member Author

It seems that rustc does not represent the unit type directly, but rather considers it as a tuple with 0 elements.
Here is the TyKind enum in the AST, with the is_unit method which basically checks if the type is a Tuple and is empty. This makes sense, considering that a tuple (i32, u8, T) is indeed the unit type () when there are no elements. However, I believe that it would be clearer to have a clearly defined unit type in the TypeKind enum. What would you rather the approach be? Check against the emptiness of the TUPLE type or add a new UNIT type?

@dkm
Copy link
Member

dkm commented Apr 2, 2021

#286 :D
I did this more or less blindly to discover the code so have no opinion on this...

@CohenArthur
Copy link
Member Author

CohenArthur commented Apr 2, 2021

#286 :D
I did this more or less blindly to discover the code so have no opinion on this...

Haha what a coincidence! Well in the end I used an empty Tuple as well, I have a fix ready and I'm cleaning it up and adding tests. This way it's closer to the rustc implementation, but I think we should also add documentation to clear that up

@CohenArthur
Copy link
Member Author

To add to the complexity of the issue:

Both

fn function() -> i32 {
    return 15;
    return 1.2;
}

and

fn function() -> i32 {
    return 1.2;
    return 15;
}

Produce an error, meaning that the typechecker probably knows about the "expected" return type of a function block when typechecking all the statements inside said block. This is a bit different to the approach actually taken from gccrs, and needs a bit more modification than I thought.

The dilemma I'm currently facing is the following. Consider the following blocks:

{ // 1
    8;
    8
} // -> implicit return of an i32
{ // 2
    8;
    8;
} // no last expression, therefore i32 type is "discarded" and () is returned
{ // 3
    8;
    return 8;
} // no last expression, but last statement is a return, so returns an i32

You can't simply discard the types of every statement in the block, and you can't simply use the type of the last statement, because sometimes it needs to be discarded!
I wonder what the best approach to fixing this would be. Checking if the statement is a Return statement would be enough to circumvent the issue exposed by the 3 blocks mentionned before-hand, but wouldn't be enough to exactly reproduce the behavior of the rustc compiler in the first two snippets of code.

@philberty
Copy link
Member

@lrh2000 is this fixed now?

@lrh2000
Copy link
Contributor

lrh2000 commented Apr 26, 2021

@lrh2000 is this fixed now?

Yes, we should close this issue. (Fixed in #380.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants