Error in labRcpp(ncol(lr)) : negative length vectors are not allowed #13

grayfall · 2020-06-26T14:28:07Z

Greetings. I'm experiencing what seems to be a size-dependent bug in propr. Here is a reproducible example

library(dplyr)
library(propr)
n.samples <- 80
n.features <- 47000
feature.counts <- rpois(n.features*n.samples, lambda=10000) %>% matrix(nrow=n.samples, ncol=n.features)

phs <- propr(feature.counts, metric='phs')

The traceback doesn't show anything particularly useful:

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed

2. labRcpp(ncol(lr))
1. propr(feature.counts, metric = "phs")

Reducing the number of features to 46000 (and anything below that) "solves" the issue.

The text was updated successfully, but these errors were encountered:

tpq · 2020-06-26T21:24:55Z

Hey! Thanks for your interest in propr!

Hm, my first guess is that you have run out of RAM. Rcpp will try to make an object size n.features * n.features, which takes up a lot of space.

Usually this gives an error like "Error: cannot allocate vector of size 3.8 Gb". When I call "labRcpp(47000)" directly... I get the same error as you.

Perhaps see here -- alyssafrazee/polyester#41 (comment) -- indeed, this may be a generic way that R says "no more memory!"

tpq · 2020-06-26T21:27:51Z

If you have some programming skills, I got a very basic script that divides up the propr analysis into a few chunks, then glues those chunks back together at the end. It was intended to be used for parallelization, but could be used to reduce RAM overhead too. I think you'd just need to replace the foreach() call with a basic for-loop.

propr.dopar.zip

grayfall · 2020-06-27T07:36:57Z

@tpq Thanks for replying. I doubt this is a memory issue. I've been running this on a machine with 0.5TB of RAM, and memory consumption never went past 100GB.

tpq · 2020-06-27T08:32:18Z

Hm. labRcpp is a very simple function, written in C++ via Rcpp. Perhaps something about C++ not having enough bits for the integer. Interestingly, log(46000, base = 2) is about 46,000 -- so it looks like C++ might use 32 bits for the integer by default.

I'm not sure if I can fix this quickly, but I will have a deeper think after the weekend.

// Function to label a half matrix
// [[Rcpp::export]]
List labRcpp(int nfeats){

  int llt = nfeats * (nfeats - 1) / 2;
  Rcpp::IntegerVector partner(llt);
  Rcpp::IntegerVector pair(llt);
  int counter = 0;

  for(int i = 1; i < nfeats; i++){
    for(int j = 0; j < i; j++){
      partner(counter) = i + 1;
      pair(counter) = j + 1;
      counter += 1;
    }
  }

  return List::create(
    _["Partner"] = partner,
    _["Pair"] = pair
  );
}

grayfall · 2020-06-27T08:37:16Z

@tpq Yes, I thought this might be an overflow somewhere in the cpp code, though 47k aren't enough to overflow a 32 bit signed integer, if we are looking at

int llt = nfeats * (nfeats - 1) / 2;.

Thanks for taking care of this issue.

Update I was wrong. While the result should fit into a 32-bit int, intermediate values can overflow making the end result negative.

grayfall · 2020-06-27T10:56:11Z

@tpq seems like one can get the same error message whilst trying to create a vector with length exceeding a 31-bit number https://stackoverflow.com/a/48676389/3846213 and https://stackoverflow.com/a/5234293/3846213.

grayfall · 2020-06-27T11:36:52Z

@tpq Sorry for spamming so hard. Here I've tested the out-of-memory assumption by writing a function that should've worked with ncol = 47000. The function does nothing but allocating an integer vector. Assuming 64-bit integers, we are dealing with mere ~8GB of RAM. Yet, we still get the same error. I am not exactly sure, why this is happening, though. Given ncol = 47000, ncol * (ncol - 1) / 2 is two times smaller than the 31 uint vector-size limit.

> library(Rcpp)
>
> cppFunction("
+ IntegerVector test(int size) {
+     int veclen = size * (size - 1) / 2;
+     IntegerVector vec(veclen);
+     return vec;
+ }
+ ")
> vec <- test(47000)
Error in test(47000) : negative length vectors are not allowed

tpq · 2020-06-27T12:20:31Z

No need to apologize -- thanks heaps for the reproducible example!! I'm actually a bit of a C++ newb, but I'm thinking that if we could replace with long int, it might work. It looks like it helps with the test function. I can try something similar with labRcpp

library(Rcpp)
cppFunction("IntegerVector test(long int size) {
int veclen = size * (size - 1) / 2;
IntegerVector vec(veclen);
return vec;
}
")

vec <- test(47000)

grayfall · 2020-06-27T20:31:35Z

@tpq I went over the codebase and refactored all integer arithmetic operations to avoid overflows in intermediate results. This allows us to avoid 64-bit integers. You should still add a warning in propr (and other related functions) to let users know there is an upper-limit on the number of features, because due to R's limitation, indexing doesn't work on vanilla vectors and matrices with more than 2^31 - 1 entries.

grayfall · 2020-06-28T20:51:54Z

I have retracted my PR, because the overflow fix brought out another problem with integer division (#15). At this point the only sensible thing I can think of is to switch the argument type in labRcpp (and other related functions) to a 64-bit int.

grayfall · 2020-06-29T16:32:54Z

@tpq I am under the impression, that making the package compatible with the number of features I'm interested in will be a lot more complicated than changing a couple of 32-bit arguments to 64-bits. I've found other places that can overflow. For example,

std::vector<int> indexPairs(NumericMatrix & X,
                            const String op = "==",
                            const double ref = 0){

  // Index pairs by operator and reference
  std::vector<int> index;
  int nfeats = X.nrow();
  for(int i = 1; i < nfeats; i++){
    for(int j = 0; j < i; j++){

      if(op == "==" || op == "="){
        if(X(i, j) == ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == ">"){
        if(X(i, j) > ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == ">="){
        if(X(i, j) >= ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == "<"){
        if(X(i, j) < ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == "<="){
        if(X(i, j) <= ref){
          index.push_back(j * nfeats + i + 1);
        }
      }else if(op == "!="){
        if(X(i, j) != ref){
          index.push_back(j * nfeats + i + 1);
        }

      }else if(op == "all"){

        index.push_back(j * nfeats + i + 1);

      }else{

        stop("Operator not found.");
      }
    }
  }

Here j * nfeats can easily overflow a 32-bit int with large enough nfeats.

tpq · 2020-07-01T21:23:02Z

I assume it should be possible to define index as a long int too?

std::vector<long int> index;

I'll try to have a closer look at this soon -- sorry, it's been a busy week!

grayfall · 2020-07-05T18:35:04Z

@tpq I guess there are many other places like this one. And I'm not sure how this will play out in the end. I've already tried changing all ints to int64_t, but the package doesn't compile that way.

grayfall · 2020-07-06T16:31:17Z

@tpq my understanding is that R has a very complicated relationship with 64-bit integers. There is no native 64-bit integer type, and R integer vectors are indexed by 32-bit signed ints, which limits their effective size (you can still have larger vectors, but there are indexing problems). I believe that's the reason, why a simple swap of integer type doesn't work on a package level.

tpq added the bug label Jun 27, 2020

grayfall mentioned this issue Jun 27, 2020

Fixing integer overflows #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed #13

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed #13

grayfall commented Jun 26, 2020

tpq commented Jun 26, 2020

tpq commented Jun 26, 2020

grayfall commented Jun 27, 2020

tpq commented Jun 27, 2020

grayfall commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 27, 2020 •

edited

Loading

tpq commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 28, 2020

grayfall commented Jun 29, 2020 •

edited

Loading

tpq commented Jul 1, 2020

grayfall commented Jul 5, 2020

grayfall commented Jul 6, 2020 •

edited

Loading

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed #13

Error in labRcpp(ncol(lr)) : negative length vectors are not allowed #13

Comments

grayfall commented Jun 26, 2020

tpq commented Jun 26, 2020

tpq commented Jun 26, 2020

grayfall commented Jun 27, 2020

tpq commented Jun 27, 2020

grayfall commented Jun 27, 2020 • edited Loading

grayfall commented Jun 27, 2020 • edited Loading

grayfall commented Jun 27, 2020 • edited Loading

tpq commented Jun 27, 2020 • edited Loading

grayfall commented Jun 27, 2020 • edited Loading

grayfall commented Jun 28, 2020

grayfall commented Jun 29, 2020 • edited Loading

tpq commented Jul 1, 2020

grayfall commented Jul 5, 2020

grayfall commented Jul 6, 2020 • edited Loading

grayfall commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 27, 2020 •

edited

Loading

tpq commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 27, 2020 •

edited

Loading

grayfall commented Jun 29, 2020 •

edited

Loading

grayfall commented Jul 6, 2020 •

edited

Loading