Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double-free related to arith.select and reference counters #911

Open
pdamme opened this issue Nov 12, 2024 · 0 comments
Open

Double-free related to arith.select and reference counters #911

pdamme opened this issue Nov 12, 2024 · 0 comments
Assignees
Labels
bug A mistake in the code.

Comments

@pdamme
Copy link
Collaborator

pdamme commented Nov 12, 2024

The following DaphneDSL script crashes with a segfault:

test.daphne:

# Update a matrix inside then-branch of if-statement with value computed outside if-statement
# (scf.if gets rewritten to arith.select).
X = [0];
Y = X + 1;
# r must be unknown at compile-time (otherwise, the entire "if" gets optimized away)
r = as.scalar(rand(1, 1, 0, 1, 1, -1));
if(r)
    X = Y;
print(X);

output of bin/daphne test.daphne:

bin/daphne(+0x172dea2)[0x5a36c5d44ea2]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7774f506f090]
[error]: Got an abort signal from the execution engine. Most likely an exception in a shared library. Check logs!
Execution error: Returning from signal 11

The reason becomes clear when looking at the IR after managing object reference counters:

output of bin/daphne --explain obj_ref_mgnt test.daphne:

IR after managing object references:
module {
  func.func @main() {
    %0 = "daphne.constant"() {value = 0 : si64} : () -> si64
    %1 = "daphne.constant"() {value = 94015972571104 : ui64} : () -> ui64
    %2 = "daphne.constant"() {value = 1 : si64} : () -> si64
    %3 = "daphne.constant"() {value = true} : () -> i1
    %4 = "daphne.constant"() {value = false} : () -> i1
    %5 = "daphne.constant"() {value = -1 : si64} : () -> si64
    %6 = "daphne.constant"() {value = 1 : index} : () -> index
    %7 = "daphne.constant"() {value = 1.000000e+00 : f64} : () -> f64
    %8 = "daphne.constant"() {value = 94015940900480 : ui64} : () -> ui64
    %9 = "daphne.constant"() {value = 94015940898880 : ui64} : () -> ui64
    %10 = "daphne.constant"() {value = 94015940898752 : ui64} : () -> ui64
    %11 = "daphne.constant"() {value = 140723175402456 : ui64} : () -> ui64
    %12 = "daphne.createDaphneContext"(%11, %10, %9, %8) : (ui64, ui64, ui64, ui64) -> !daphne.DaphneContext
    %13 = "daphne.matrixConstant"(%1) : (ui64) -> !daphne.Matrix<1x1xsi64>
    %14 = "daphne.reshape"(%13, %6, %6) : (!daphne.Matrix<1x1xsi64>, index, index) -> !daphne.Matrix<1x1xsi64>
    "daphne.decRef"(%13) : (!daphne.Matrix<1x1xsi64>) -> ()
    %15 = "daphne.ewAdd"(%14, %2) : (!daphne.Matrix<1x1xsi64>, si64) -> !daphne.Matrix<1x1xsi64>
    %16 = "daphne.randMatrix"(%6, %6, %0, %2, %7, %5) : (index, index, si64, si64, f64, si64) -> !daphne.Matrix<1x1xsi64:sp[1.000000e+00]>
    "daphne.decRef"(%16) : (!daphne.Matrix<1x1xsi64:sp[1.000000e+00]>) -> ()
    %17 = "daphne.randMatrix"(%6, %6, %0, %2, %7, %5) : (index, index, si64, si64, f64, si64) -> !daphne.Matrix<1x1xsi64:sp[1.000000e+00]>
    %18 = "daphne.cast"(%17) : (!daphne.Matrix<1x1xsi64:sp[1.000000e+00]>) -> si64
    "daphne.decRef"(%17) : (!daphne.Matrix<1x1xsi64:sp[1.000000e+00]>) -> ()
    %19 = "daphne.cast"(%18) : (si64) -> i1
    %20 = arith.select %19, %15, %14 : !daphne.Matrix<1x1xsi64>
    "daphne.decRef"(%15) : (!daphne.Matrix<1x1xsi64>) -> ()
    "daphne.decRef"(%14) : (!daphne.Matrix<1x1xsi64>) -> ()
    "daphne.print"(%20, %3, %4) : (!daphne.Matrix<1x1xsi64>, i1, i1) -> ()
    "daphne.decRef"(%20) : (!daphne.Matrix<1x1xsi64>) -> ()
    "daphne.destroyDaphneContext"() : () -> ()
    "daphne.return"() : () -> ()
  }
}
...

The reference counters of both inputs to arith.select (%15, %14) are decreased right after that operation. However, the result (%20) is one of them at run-time. Thus, a double-free happens when decRef is applied to %20 later.

@pdamme pdamme added the bug A mistake in the code. label Nov 12, 2024
@pdamme pdamme self-assigned this Nov 12, 2024
pdamme added a commit that referenced this issue Nov 12, 2024
- The MLIR operation arith.select is frequently created during the canonicalization of scf.if (which, in turn, is created by DaphneDSL if-statements).
- arith.select takes a boolean argument (condition) plus two additional arguments, one of which it selects/returns based on the value of the condition.
- arith.select needs to be taken into account in DAPHNE's object reference counter management, because:
  - We decrease the reference counter of SSA values after their last use as operands (the arith.select could be the last user).
  - We have no clue which of the two last operands of arith.select is returned at run-time.
  - However, the return value is the same runtime object as one of the two latter operands.
  - To ensure the correct reference counter for the result after the arith.select, we need to increase the counter of the result before we decrease the counters of the operands.
- We had already implemented this fix, but applied it only for string operands/results.
- However, we must apply the fix for all operand/result types that have a reference counter at run-time.
- Fixed ManageObjRefPass.
- Added several script-level test cases testing the behavior for matrices, frames, lists, and strings and different settings of the if-then-else in DaphneDSL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A mistake in the code.
Projects
None yet
Development

No branches or pull requests

1 participant