-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validator error in Splash #248
Comments
Honestly, I dont know. Could you please drop the MSBNK-BAFG-CSL23102611413.txt file here for me? |
Strange that it's in the first block, I also don't recall seeing this case before... |
Thank you. I checked your file. It contains: I expect you get the output shown in your first comment from a run of the validator with multiple files. This software runs multithreaded and sometimes output gets a bit messed up. I expect, that the outputline you found belongs to a different record. And in the output the explanation comes first and then the filename, see below a single file validation. We focus instead on the output of the validation of a single file. You are right: There is a missmatch about the SPLASH calculated by RMassBank and the one from the Validator.
I need to dig a little bit deeper. |
Alright, seems you will solve it soon. Just as a heads-up, I used splashR to compute the Splash. |
Interesting, https://splash.fiehnlab.ucdavis.edu/ gives ...and it only worked on those numbers, returned a format error on the middle column only. |
We recently had a similar issue MassBank/MassBank-web#384 and it was related to zeros somehow. What happens in your R Object if you remove the 0 in the first row? |
I thought of that issue too, but this is affecting the first block this time, not the third one - which is really strange. Is it related to the middle column somehow (all entries are below 1) Tagging in @berlinguyinca and @ssmehta again ;-) |
We need to solve that issue on the R side.
The REST endpoints agrees with the java implementation. And the 44.9980 gives the same. I will read the old issue again very carefully. |
I can't find a way in R to skip the first 0 in 44.9980 but leave the others unchanged. If I round everything to 3 decimal places, I also get the incorrect splash |
Please don't round to 3 dp! That will for sure change the splash (but also the final hash block too, right?). |
From the article ... the second block (wrapped bin) is the one that's changing: |
Looking at the failing file, I note that your absolute intensities are all <1. Is this how Sciex reports them? Does that have anything to do with the issue? |
This is how Sciex converts them to mzXML. I believe in the native Sciex format, the numbers are higher. |
@meowcat great finding. this means this issue should go to the R implementation at https://github.com/berlinguyinca/spectra-hash? |
I can just change the intensities temporarily to create the splash, no? |
You dont need to bother about the SPLASH issue, because I can easily fix that on the txt files. If you think your files are fine and only some SPLASH are broken, please reopen your PR. I expect that there is a fix required to the SPLASH library to solve that issue on the RMassBank side. |
@ksjewell Since you import the records in MsBackendMassbank and then export them again (right?), you could in fact recalculate the splash there, yes.
yep; though best would be to get the fix in the original SPLASH lib and port it identically, so we don't have two different implementations of the fix. I hope multiplying by 1k will not break a few other SPLASHes because of rounding issues |
I agree, thats why I opened a issue at the splash package repo. |
I think I am making progress but there is still one single Validator error left (this is after multiplying intensity by 1000)
Here is the file:
|
So does that mean the i variant is correct in this case and the Validator is incorrect? |
Not sure, need @meier-rene 's opinion on this... it's strange that it doesn't work at all with the decimals... |
Hi, I can confirm that the online calculator https://splash.fiehnlab.ucdavis.edu/ is unhappy about decimals for intensities. decimals in m/z are fine there. IIRC the online calculator uses the scala implementation. Yours, Steffen |
good afternoon,
I can confirm that the splash requires the intensity to be provided as an
integer on the website. But frankly, I cannot for the love of it I can't
remember why we decided to go with integers over double/floats for
intensity values on the website. The actual code to generate the splash,
accepts doubles just fine, based on the input string
public static Spectrum convertStringToSpectrum(String spectra,
SpectraType type, String origin) {
String[] pairs = spectra.split(" ");
List<Ion> ionList = new ArrayList(200);
String[] var5 = pairs;
int var6 = pairs.length;
for(int var7 = 0; var7 < var6; ++var7) {
String pair = var5[var7];
String[] p = pair.split(":");
Double m = Double.parseDouble(p[0]);
Double intensity = Double.parseDouble(p[1]);
ionList.add(new Ion(m, intensity));
}
SpectrumImpl impl = new SpectrumImpl(ionList, type);
impl.setOrigin(origin);
return impl;
}
so there is no reason why the website complains about it, except someone
writing a wrong regular expression to validate the input.
g.
…On Fri, Nov 10, 2023 at 3:14 AM Steffen Neumann ***@***.***> wrote:
Hi, I can confirm that the online calculator
https://splash.fiehnlab.ucdavis.edu/ is unhappy about decimals for
intensities. decimals in m/z are fine there. IIRC the online calculator
uses the scala implementation. Yours, Steffen
—
Reply to this email directly, view it on GitHub
<#248 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAD73DJNMRI7EEJGIKBECDYDYECHAVCNFSM6AAAAAA6SVKK3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBVGUZTGNBWG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
------------------------------------------------------------
Lead Developer - Fiehnlab, UC Davis
gert wohlgemuth
work:
http://fiehnlab.ucdavis.edu/staff/wohlgemuth
linkedin:
https://www.linkedin.com/in/berlinguyinca
|
Ok, digging a bit further ... so far we used the online splash calculator that takes the peaklist as kinda CSV, and which complains about non-integer intensities due to the input validation. Using the REST call we get for the spectrum in #248 (comment):
which is the same value as the massbank validator I also checked that both splashR and the splash code we copy&pasted into RMassBank give identical results:
So I get the feeling RMassBank passes something weird to
That'd be highly appreciated, please ping me if you need help. |
Hi all, Regarding the initial problem in this issue relating to inconsistent histograms, I believe this was due to a a missing binning correction factor in splashR. I submitted a PR which should fix this: berlinguyinca/spectra-hash#52 For the second spectrum, I agree with @sneumann that it doesn't seem to be an issue with SPLASH. I tried some variations of intensities and could only produce Best, |
Hi René,
I am getting the following Validator error:
I checked the file and the actual splash in the file is:
´splash10-0006-9300000000-5cd70311703e2423a1c5´
I ran the code separately and indeed this is the splash I get when I run:
So I not only don't understand where it is getting the splash ´splash10-0gx3-9000000000-fdf8d511e2f88d17c82e´ from, I also do not understand why it is computing ´splash10-0w3u-9000000000-fdf8d511e2f88d17c82e´, a different one than I am.
The text was updated successfully, but these errors were encountered: