Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnmf crashes with --vec #897

Open
philipportner opened this issue Nov 5, 2024 · 1 comment
Open

gnmf crashes with --vec #897

philipportner opened this issue Nov 5, 2024 · 1 comment
Labels
bug A mistake in the code.

Comments

@philipportner
Copy link
Collaborator

Running the scripts/algorithms/gnmf.daph script with --vec crashes.

Command to reproduce

bin/daphne --vec scripts/algorithms/gnmf.daph rank=2 n=100 e=500 W=\"outW.csv\" H=\"outH.csv\"

Stack trace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff75cb859 in __GI_abort () at abort.c:79
#2  0x00007ffff79b3ee6 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff79c5f8c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff79c5ff7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff79c6258 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffcd4c9e02 in CompiledPipelineTask<DenseMatrix<double> >::accumulateOutputs (this=this@entry=0x555559d0c6b0, localResults=std::vector of length 2, capacity 2 = {...},
    localAddRes=std::vector of length 2, capacity 2 = {...}, rowStart=rowStart@entry=0, rowEnd=rowEnd@entry=1) at /usr/include/c++/11/bits/allocator.h:174
#7  0x00007fffcd4cbb4e in CompiledPipelineTask<DenseMatrix<double> >::execute (this=<optimized out>, fid=<optimized out>, batchSize=2595) at /home/philipportner/daphne/src/runtime/local/vectorized/Tasks.cpp:35
#8  0x00007fffcd4ae320 in WorkerCPU::run (this=0x55555a223e80) at /home/philipportner/daphne/src/runtime/local/vectorized/WorkerCPU.h:84
#9  0x00007ffff79f5793 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7f99609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007ffff76c8353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@philipportner philipportner added the bug A mistake in the code. label Nov 5, 2024
@sweetpellegrino
Copy link

sweetpellegrino commented Nov 5, 2024

Hey Philipp,

I think I can provide some direction for this issue.
I strongly believe that the current implementation of the vectorization pass produces broken pipelines.

Here is one pipeline of gnmf.daph

%21:2 = "daphne.vectorizedPipeline"(%20, %18, %5, %8, %19, %16, %18, %4, %0, %0, %2, %0) ({
    ^bb0(%arg0: !daphne.Matrix<?x2xf64:sp[1.000000e+00]>, %arg1: !daphne.Matrix<2x100xf64:sp[1.000000e+00]>, %arg2: i1, %arg3: f64, %arg4: !daphne.Matrix<?x200xf64:sp[1.000000e+00]>, %arg5: !daphne.Matrix<200x100xf64>, %arg6: !daphne.Matrix<?x100xf64:sp[1.000000e+00]>, %arg7: i1):
      %24 = "daphne.matMul"(%arg0, %arg1, %arg2, %arg2) : (!daphne.Matrix<?x2xf64:sp[1.000000e+00]>, !daphne.Matrix<2x100xf64:sp[1.000000e+00]>, i1, i1) -> !daphne.Matrix<?x?xf64:sp[1.000000e+00]>
      %25 = "daphne.ewAdd"(%24, %arg3) : (!daphne.Matrix<?x?xf64:sp[1.000000e+00]>, f64) -> !daphne.Matrix<?x?xf64>
      %26 = "daphne.matMul"(%arg4, %arg5, %arg2, %arg2) : (!daphne.Matrix<?x200xf64:sp[1.000000e+00]>, !daphne.Matrix<200x100xf64>, i1, i1) -> !daphne.Matrix<?x?xf64>
      %27 = "daphne.ewDiv"(%26, %25) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x?xf64>) -> !daphne.Matrix<?x?xf64>
      %28 = "daphne.ewMul"(%arg6, %27) : (!daphne.Matrix<?x100xf64:sp[1.000000e+00]>, !daphne.Matrix<?x?xf64>) -> !daphne.Matrix<?x?xf64:sp[1.000000e+00]>
      %29 = "daphne.matMul"(%28, %28, %arg2, %arg7) : (!daphne.Matrix<?x?xf64:sp[1.000000e+00]>, !daphne.Matrix<?x?xf64:sp[1.000000e+00]>, i1, i1) -> !daphne.Matrix<?x?xf64:sp[1.000000e+00]>
      "daphne.return"(%28, %29) : (!daphne.Matrix<?x?xf64:sp[1.000000e+00]>, !daphne.Matrix<?x?xf64:sp[1.000000e+00]>) -> ()
   } ,

We see that matMul (%29) is in one pipeline with ewMul (%28). However the current implementation for matMul only allows for a row split for lhs and broadcast for rhs. This does not work in case lhs and rhs are both %28.

The approach that we taken in the thesis will not produce such pipelines, but split it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A mistake in the code.
Projects
None yet
Development

No branches or pull requests

2 participants