Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed #13

Open
grayfall opened this issue Jun 26, 2020 · 14 comments
Open
Labels

Comments

@grayfall
Copy link

Greetings. I'm experiencing what seems to be a size-dependent bug in propr. Here is a reproducible example

library(dplyr)
library(propr)
n.samples <- 80
n.features <- 47000
feature.counts <- rpois(n.features*n.samples, lambda=10000) %>% matrix(nrow=n.samples, ncol=n.features)

phs <- propr(feature.counts, metric='phs')

The traceback doesn't show anything particularly useful:

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed

2. labRcpp(ncol(lr))
1. propr(feature.counts, metric = "phs")

Reducing the number of features to 46000 (and anything below that) "solves" the issue.

@tpq
Copy link
Owner

tpq commented Jun 26, 2020

Hey! Thanks for your interest in propr!

Hm, my first guess is that you have run out of RAM. Rcpp will try to make an object size n.features * n.features, which takes up a lot of space.

Usually this gives an error like "Error: cannot allocate vector of size 3.8 Gb". When I call "labRcpp(47000)" directly... I get the same error as you.

Perhaps see here -- alyssafrazee/polyester#41 (comment) -- indeed, this may be a generic way that R says "no more memory!"

@tpq
Copy link
Owner

tpq commented Jun 26, 2020

If you have some programming skills, I got a very basic script that divides up the propr analysis into a few chunks, then glues those chunks back together at the end. It was intended to be used for parallelization, but could be used to reduce RAM overhead too. I think you'd just need to replace the foreach() call with a basic for-loop.

propr.dopar.zip

@grayfall
Copy link
Author

@tpq Thanks for replying. I doubt this is a memory issue. I've been running this on a machine with 0.5TB of RAM, and memory consumption never went past 100GB.

@tpq
Copy link
Owner

tpq commented Jun 27, 2020

Hm. labRcpp is a very simple function, written in C++ via Rcpp. Perhaps something about C++ not having enough bits for the integer. Interestingly, log(46000, base = 2) is about 46,000 -- so it looks like C++ might use 32 bits for the integer by default.

I'm not sure if I can fix this quickly, but I will have a deeper think after the weekend.

// Function to label a half matrix
// [[Rcpp::export]]
List labRcpp(int nfeats){

  int llt = nfeats * (nfeats - 1) / 2;
  Rcpp::IntegerVector partner(llt);
  Rcpp::IntegerVector pair(llt);
  int counter = 0;

  for(int i = 1; i < nfeats; i++){
    for(int j = 0; j < i; j++){
      partner(counter) = i + 1;
      pair(counter) = j + 1;
      counter += 1;
    }
  }

  return List::create(
    _["Partner"] = partner,
    _["Pair"] = pair
  );
}

@tpq tpq added the bug label Jun 27, 2020
@grayfall
Copy link
Author

grayfall commented Jun 27, 2020

@tpq Yes, I thought this might be an overflow somewhere in the cpp code, though 47k aren't enough to overflow a 32 bit signed integer, if we are looking at

int llt = nfeats * (nfeats - 1) / 2;. 

Thanks for taking care of this issue.

Update I was wrong. While the result should fit into a 32-bit int, intermediate values can overflow making the end result negative.

@grayfall
Copy link
Author

grayfall commented Jun 27, 2020

@tpq seems like one can get the same error message whilst trying to create a vector with length exceeding a 31-bit number https://stackoverflow.com/a/48676389/3846213 and https://stackoverflow.com/a/5234293/3846213.

@grayfall
Copy link
Author

grayfall commented Jun 27, 2020

@tpq Sorry for spamming so hard. Here I've tested the out-of-memory assumption by writing a function that should've worked with ncol = 47000. The function does nothing but allocating an integer vector. Assuming 64-bit integers, we are dealing with mere ~8GB of RAM. Yet, we still get the same error. I am not exactly sure, why this is happening, though. Given ncol = 47000, ncol * (ncol - 1) / 2 is two times smaller than the 31 uint vector-size limit.

> library(Rcpp)
>
> cppFunction("
+ IntegerVector test(int size) {
+     int veclen = size * (size - 1) / 2;
+     IntegerVector vec(veclen);
+     return vec;
+ }
+ ")
> vec <- test(47000)
Error in test(47000) : negative length vectors are not allowed

@tpq
Copy link
Owner

tpq commented Jun 27, 2020

No need to apologize -- thanks heaps for the reproducible example!! I'm actually a bit of a C++ newb, but I'm thinking that if we could replace with long int, it might work. It looks like it helps with the test function. I can try something similar with labRcpp

library(Rcpp)
cppFunction("IntegerVector test(long int size) {
int veclen = size * (size - 1) / 2;
IntegerVector vec(veclen);
return vec;
}
")

vec <- test(47000)

@grayfall
Copy link
Author

grayfall commented Jun 27, 2020

@tpq I went over the codebase and refactored all integer arithmetic operations to avoid overflows in intermediate results. This allows us to avoid 64-bit integers. You should still add a warning in propr (and other related functions) to let users know there is an upper-limit on the number of features, because due to R's limitation, indexing doesn't work on vanilla vectors and matrices with more than 2^31 - 1 entries.

@grayfall
Copy link
Author

I have retracted my PR, because the overflow fix brought out another problem with integer division (#15). At this point the only sensible thing I can think of is to switch the argument type in labRcpp (and other related functions) to a 64-bit int.

@grayfall
Copy link
Author

grayfall commented Jun 29, 2020

@tpq I am under the impression, that making the package compatible with the number of features I'm interested in will be a lot more complicated than changing a couple of 32-bit arguments to 64-bits. I've found other places that can overflow. For example,

std::vector<int> indexPairs(NumericMatrix & X,
                            const String op = "==",
                            const double ref = 0){

  // Index pairs by operator and reference
  std::vector<int> index;
  int nfeats = X.nrow();
  for(int i = 1; i < nfeats; i++){
    for(int j = 0; j < i; j++){

      if(op == "==" || op == "="){
        if(X(i, j) == ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == ">"){
        if(X(i, j) > ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == ">="){
        if(X(i, j) >= ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == "<"){
        if(X(i, j) < ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == "<="){
        if(X(i, j) <= ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == "!="){
        if(X(i, j) != ref){
          index.push_back(j * nfeats + i + 1);
        }

      }else if(op == "all"){

        index.push_back(j * nfeats + i + 1);

      }else{

        stop("Operator not found.");
      }
    }
  }

Here j * nfeats can easily overflow a 32-bit int with large enough nfeats.

@tpq
Copy link
Owner

tpq commented Jul 1, 2020

I assume it should be possible to define index as a long int too?

std::vector<long int> index;

I'll try to have a closer look at this soon -- sorry, it's been a busy week!

@grayfall
Copy link
Author

grayfall commented Jul 5, 2020

@tpq I guess there are many other places like this one. And I'm not sure how this will play out in the end. I've already tried changing all ints to int64_t, but the package doesn't compile that way.

@grayfall
Copy link
Author

grayfall commented Jul 6, 2020

@tpq my understanding is that R has a very complicated relationship with 64-bit integers. There is no native 64-bit integer type, and R integer vectors are indexed by 32-bit signed ints, which limits their effective size (you can still have larger vectors, but there are indexing problems). I believe that's the reason, why a simple swap of integer type doesn't work on a package level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants