Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in glm.fit: object fit not found #17

Open
skanwal opened this issue Mar 5, 2024 · 14 comments
Open

Error in glm.fit: object fit not found #17

skanwal opened this issue Mar 5, 2024 · 14 comments

Comments

@skanwal
Copy link

skanwal commented Mar 5, 2024

Hi @reimand0 ,

I am getting an error, when trying to run ActiveDriveWGS using following command:

ActiveDriverWGS.res[[i]] = ActiveDriverWGS(mutations = ActiveDriverWGSInfo[[i]], elements = elements, sites = NULL, window_size = 50000, recovery.dir = paste(outDir, "ActiveDriverWGS_recovery", sep="/"), mc.cores = 4, ref_genome = paste0("hg", params$ucsc_genome_assembly)) 

The error is:

0 remove hypermut, n= 0 ,  0 %
hypermuted samples:   

reversing 0 positions
Removing  0  invalid SNVs & indels

Number of Elements with 0 Mutations:  14 
Tests to do:  1195 
Tests recovered:  928 
100  elements completed
200  elements completed
300  elements completed
400  elements completed
500  elements completed
600  elements completed
700  elements completed
800  elements completed
900  elements completed
.Error in glm.fit(x = numeric(0), y = numeric(0), weights = NULL, start = NULL,  : 
  object 'fit' not found

Data

>  head(ActiveDriverWGSInfo)
$pdac
         chr      pos1      pos2 ref alt                            patient
     1: chr3 120002137 120002137   T   C p010_tumor-52fccd-somatic.pcgr.vcf
     2: chr4 125450687 125450687   G   T p010_tumor-52fccd-somatic.pcgr.vcf
     3: chr5  38502681  38502681   C   A p010_tumor-52fccd-somatic.pcgr.vcf
     4: chr6  89951675  89951675   C   T p010_tumor-52fccd-somatic.pcgr.vcf
     5: chr7  82822603  82822603   G   T p010_tumor-52fccd-somatic.pcgr.vcf
    ---                                                                    
196698: chrX  15795073  15795073   A   C                  SA533811_SP125786
196699: chrX  15800132  15800132   T   A                  SA569276_SP133702
196700: chrX  15803607  15803607   G   T                  SA558660_SP125807
196701: chr9  14398633  14398633   C   G                            CGPA229
196702: chr1 186680291 186680291   C   T                            CGPA234

> str(ActiveDriverWGSInfo)
List of 1
 $ pdac:Classes 'data.table' and 'data.frame':	196702 obs. of  6 variables:
  ..$ chr    : chr [1:196702] "chr3" "chr4" "chr5" "chr6" ...
  ..$ pos1   : num [1:196702] 1.20e+08 1.25e+08 3.85e+07 9.00e+07 8.28e+07 ...
  ..$ pos2   : num [1:196702] 1.20e+08 1.25e+08 3.85e+07 9.00e+07 8.28e+07 ...
  ..$ ref    : chr [1:196702] "T" "G" "C" "C" ...
  ..$ alt    : chr [1:196702] "C" "T" "A" "T" ...
  ..$ patient: chr [1:196702] "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" "p010_tumor-52fccd-somatic.pcgr.vcf" ...
  ..- attr(*, ".internal.selfref")=<externalptr>
  
  > head(elements)
     chr    start      end       id          GENEID
20 chr12   912077   990053    RAD52 ENSG00000002016
30 chr17 38869859 38921770    LASP1 ENSG00000002834
56 chr12 21468911 21501669    RECQL ENSG00000004700
66  chr7 96120220 96322147 SLC25A13 ENSG00000004864
74 chr19 18831938 18868236     UPF1 ENSG00000005007
78  chr7 27181510 27185223   HOXA11 ENSG00000005073

I have also done some sanity checking on both mutation and elements data and found no issues with having NA's or empty values.

Can you please help figure out why this might be producing an error?
Many thanks.

@reimand0
Copy link
Contributor

reimand0 commented Mar 7, 2024 via email

@skanwal
Copy link
Author

skanwal commented Mar 12, 2024

@reimand0 - I have emailed the data to your utoronto.ca id. Hopefully you have received it?

@reimand0
Copy link
Contributor

reimand0 commented Mar 13, 2024 via email

@skanwal
Copy link
Author

skanwal commented Mar 13, 2024 via email

@skanwal
Copy link
Author

skanwal commented Apr 9, 2024 via email

@THT-sleepy
Copy link

Hello, Skanwal

I've met exactly the same issue, is there any progression later?

Many thanks.

@skanwal
Copy link
Author

skanwal commented Oct 6, 2024

Hello - @THT-sleepy

Unfortunately, no. I had shared the test data with the author as well but did't get a response.

@reimand0
Copy link
Contributor

reimand0 commented Oct 7, 2024 via email

@reimand0
Copy link
Contributor

reimand0 commented Oct 7, 2024 via email

@THT-sleepy
Copy link

Dear Sehrish & THT, I am very sorry that this has been delayed. We are still on it with a number of other tasks for a bigger update of the algorithm. In the meantime, does your code work without specifying the recovery.dir? all the best, Jüri

On Sun, Oct 6, 2024 at 04:13 Sehrish Kanwal @.> wrote: Hello - @THT-sleepy https://github.com/THT-sleepy Unfortunately, no. I had shared the test data with the author as well but did't get a response. — Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETAF7PR6ZXFQESY3HZSASLZ2DWI3AVCNFSM6AAAAABPNC5YKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGM2DENJYHE . You are receiving this because you were mentioned.Message ID: @.>

Dear Juri,

I didn't specify the recovery.dir.

best
Huatao

@skanwal
Copy link
Author

skanwal commented Oct 8, 2024

Thanks, Juri. I had also tried with and without recovery.dir. Both errored out with the same issue.
Looking forward to the algorithm update. Please keep us posted.

@THT-sleepy
Copy link

For what it's worth, ActiveDriverWGS works well for us in a number of applications and collaborations. I will let you know once we have a solution for this issue.

On Mon, Oct 7, 2024 at 06:09 Juri Reimand @.> wrote: Dear Sehrish & THT, I am very sorry that this has been delayed. We are still on it with a number of other tasks for a bigger update of the algorithm. In the meantime, does your code work without specifying the recovery.dir? all the best, Jüri On Sun, Oct 6, 2024 at 04:13 Sehrish Kanwal @.> wrote: > Hello - @THT-sleepy https://github.com/THT-sleepy > > Unfortunately, no. I had shared the test data with the author as well but > did't get a response. > > — > Reply to this email directly, view it on GitHub > <#17 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AETAF7PR6ZXFQESY3HZSASLZ2DWI3AVCNFSM6AAAAABPNC5YKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGM2DENJYHE > . > You are receiving this because you were mentioned.Message ID: > @.***> >

Dear Juri,

I found a post(https://stackoverflow.com/questions/53476056/how-to-fix-error-in-glim-fit-fitnot-found-in-gamlss-regression) discussing a similar error, and based on my understanding of the post, it is possible that the version of stats we are using is different from yours(It seems that the error is caused by the use of the stats::glm function at lines 283-284 in ADWGS_test.R), which may have caused this error (someone mentioned that it fails on one computer but not on another). So, I would like to know if you can provide a conda environment.yml file, so that I can check whether this is the cause of the error. Thank you.

all the best,
Huatao

@THT-sleepy
Copy link

Hello Juri and Skanwal, here are some updates:

I found that the bug is caused by the ADWGS_test function (the specific reason is detailed at the end). So I wrote a temporary workaround function to fix the bug. Make sure you are using version 1.2.1 of ADWGS, and then execute this code before running your own code:
ADWGS_test_mod <-function (id, gr_element_coords, gr_site_coords, gr_maf, win_size,
this_genome, detect_depleted_mutations = FALSE)
{
null_res = data.frame(id, pp_element = NA, element_muts_obs = NA,
element_muts_exp = NA, element_enriched = NA, pp_site = NA,
site_muts_obs = NA, site_muts_exp = NA, site_enriched = NA,
stringsAsFactors = F)
gr_elements = gr_element_coords[GenomicRanges::mcols(gr_element_coords)[,
1] == id]
if (length(gr_site_coords) == 0) {
gr_sites = GenomicRanges::GRanges()
}
else {
gr_sites = gr_site_coords[GenomicRanges::mcols(gr_site_coords)[,
1] == id]
}
if (length(GenomicRanges::findOverlaps(gr_elements, gr_maf)) ==
0) {
return(null_res)
}
gr_background = .create_background(gr_elements, win_size,
this_genome)
gr_mutations = gr_maf[S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_maf,
gr_background))]
gr_indel = gr_mutations[GenomicRanges::mcols(gr_mutations)[,
2] == "indel>X"]
gr_snv = gr_mutations[GenomicRanges::mcols(gr_mutations)[,
2] != "indel>X"]
gr_background = GenomicRanges::setdiff(gr_background, gr_elements)
gr_sites = GenomicRanges::intersect(gr_elements, gr_sites)
gr_elements = GenomicRanges::setdiff(gr_elements, gr_sites)
signt_template = .make_mut_signatures()
trinuc_elements = .seq2signt(gr_elements, this_genome, signt_template)
trinuc_sites = .seq2signt(gr_sites, this_genome, signt_template)
trinuc_background = .seq2signt(gr_background, this_genome,
signt_template)
dfr_snv = NULL
patients_with_SNV_in_element = NULL
if (length(gr_snv) > 0) {
snv_elements = .mut2signt(gr_elements, gr_snv, signt_template,
remove_dup_mut_per_patient = TRUE)
snv_background = .mut2signt(gr_background, gr_snv, signt_template)
snv_elements = merge(snv_elements, trinuc_elements,
by = "signt")
snv_background = merge(snv_background, trinuc_background,
by = "signt")
snv_elements$region = "elements"
snv_background$region = "background"
snv_sites = NULL
if (length(gr_sites) > 0) {
snv_sites = .mut2signt(gr_sites, gr_snv, signt_template,
remove_dup_mut_per_patient = TRUE)
snv_sites = merge(snv_sites, trinuc_sites, by = "signt")
snv_sites$region = "sites"
}
gr_snv_el_site = gr_snv[unique(S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_snv,
c(gr_elements, gr_sites))))]
patients_with_SNV_in_element = unique(GenomicRanges::mcols(gr_snv_el_site)[,
"mcols.patient"])
dfr_snv = rbind(snv_sites, snv_elements, snv_background)
}
dfr_indel = NULL
if (length(gr_indel) > 0) {
gr_indel_fg = gr_indel[S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_indel,
c(gr_sites, gr_elements)))]
gr_indel_fg = gr_indel_fg[!duplicated(GenomicRanges::mcols(gr_indel_fg)[,
"mcols.patient"])]
gr_indel_bg = gr_indel[S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_indel,
gr_background))]
gr_indel = c(gr_indel_fg, gr_indel_bg)
indel_tag = apply(data.frame(gr_indel)[, c("seqnames",
"start", "end", "mcols.patient")], 1, paste, collapse = "::")
gr_indel = gr_indel[!duplicated(indel_tag)]
indel_index_sites = unique(S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_indel,
gr_sites)))
indel_index_elements = unique(S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_indel,
gr_elements)))
indel_index_background = unique(S4Vectors::queryHits(GenomicRanges::findOverlaps(gr_indel,
gr_background)))
indel_index_elements = setdiff(indel_index_elements,
indel_index_sites)
indel_index_background = setdiff(indel_index_background,
c(indel_index_elements, indel_index_sites))
which_indel_index_element_dup = which(GenomicRanges::mcols(gr_indel[indel_index_elements])[,
"mcols.patient"] %in% patients_with_SNV_in_element)
if (length(which_indel_index_element_dup) > 0) {
indel_index_elements = indel_index_elements[-which_indel_index_element_dup]
}
which_indel_index_sites_dup = which(GenomicRanges::mcols(gr_indel[indel_index_sites])[,
"mcols.patient"] %in% patients_with_SNV_in_element)
if (length(which_indel_index_sites_dup) > 0) {
indel_index_sites = indel_index_sites[-which_indel_index_sites_dup]
}
indel_sites = indel_elements = indel_background = data.frame(signt = "indel>X",
n_mut = NA, tri_nucleotide = "indel", n_pos = NA,
region = NA, stringsAsFactors = FALSE)
indel_elements$n_mut = length(indel_index_elements)
indel_background$n_mut = length(indel_index_background)
indel_elements$n_pos = sum(GenomicRanges::width(gr_elements))
indel_background$n_pos = sum(GenomicRanges::width(gr_background))
indel_elements$region = "elements"
indel_background$region = "background"
if (length(gr_sites) > 0) {
indel_sites$n_mut = length(indel_index_sites)
indel_sites$n_pos = sum(GenomicRanges::width(gr_sites))
indel_sites$region = "sites"
}
else {
indel_sites = NULL
}
dfr_indel = rbind(indel_sites, indel_elements, indel_background)
}
dfr_mut = rbind(dfr_indel, dfr_snv)
dfr_mut$is_site = 0 + (dfr_mut$region == "sites")
dfr_mut$is_element = 0 + dfr_mut$region %in% c("sites",
"elements")
signt_with_muts = names(which(c(by(dfr_mut$n_mut, dfr_mut$signt,
sum)) > 0))
dfr_mut = dfr_mut[dfr_mut$signt %in% signt_with_muts, ,
drop = FALSE]
dfr_mut = dfr_mut[dfr_mut$n_pos > 0, , drop = FALSE]
if(nrow(dfr_mut) > 0){
formula_h0 = ifelse(length(signt_with_muts) > 1, "n_mut ~ signt",
"n_mut ~ 1")
h0 = stats::glm(stats::as.formula(formula_h0), offset = log(dfr_mut$n_pos),
family = stats::poisson, data = dfr_mut)
h1 = stats::update(h0, . ~ . + is_element)
pp_element = pp_element_2way = stats::anova(h0, h1, test = "Chisq")[2,
5]
coef_element = stats::coef(h1)[["is_element"]]
element_enriched = coef_element > 0
if (!detect_depleted_mutations & !element_enriched & !is.na(pp_element) &
pp_element < 0.5) {
pp_element = 1 - pp_element_2way
}
if (detect_depleted_mutations & element_enriched & !is.na(pp_element) &
pp_element < 0.5) {
pp_element = 1 - pp_element_2way
}
element_stats = .get_obs_exp(h0, dfr_mut$is_element == 1,
dfr_mut, "n_mut")
element_muts_obs = element_stats[[1]]
element_muts_exp = element_stats[[2]]
pp_site = site_muts_obs = site_muts_exp = site_enriched = site_depleted = NA
if (length(gr_sites) > 0) {
h2 = stats::update(h1, . ~ . + is_site)
pp_site = pp_site_2way = stats::anova(h1, h2, test = "Chisq")[2,
5]
coef_site = stats::coef(h2)[["is_site"]]
site_enriched = coef_site > 0
if (!detect_depleted_mutations & !site_enriched & !is.na(pp_site) &
pp_site < 0.5) {
pp_site = 1 - pp_site_2way
}
if (detect_depleted_mutations & site_enriched & !is.na(pp_site) &
pp_site < 0.5) {
pp_site = 1 - pp_site_2way
}
site_stats = .get_obs_exp(h1, dfr_mut$is_site == 1,
dfr_mut, "n_mut")
site_muts_obs = site_stats[[1]]
site_muts_exp = site_stats[[2]]
}
data.frame(id, pp_element, element_muts_obs, element_muts_exp,
element_enriched, pp_site, site_muts_obs, site_muts_exp,
site_enriched, stringsAsFactors = F)}
else{
return(null_res)
}
}
environment(ADWGS_test_mod) <- asNamespace('ActiveDriverWGS')
assignInNamespace("ADWGS_test", ADWGS_test_mod, ns = "ActiveDriverWGS")
However,this will assign NA to the elements causing the error in the result file, so make sure these elements are not that important.

The specific reason of the bug:
I found that the cause of this error is that when certain components, such as 'chr1 143655400 143655600 ctcfbs::ensemble::chr1:143655401-143655600::NA (hg19)', are passed to the ADWGS_test function, it results in dfr_mut being an empty data frame. This causes an error in the line of code h0 = stats::glm(stats::as.formula(formula_h0), offset = log(dfr_mut$n_pos), family = stats::poisson, data = dfr_mut) because this line does not allow dfr_mut to be empty.

Best
Huatao

@reimand0
Copy link
Contributor

reimand0 commented Nov 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants