-
Notifications
You must be signed in to change notification settings - Fork 291
conditional to skip wasteful (possibly harmful) FPCodes with no IOI data #38
base: master
Are you sure you want to change the base?
Conversation
Thanks for this. I agree that the presence of these "null codes" can be confusing when eyeballing the decoded fingerprint codes. However, they occur only during the last one or two seconds of each subband. There are 8 of these spurious null codes at the end of each of these 8 subbands, giving a total of 64, which is relatively a very small amount. I think the effect on matching accuracy is negligible, but I agree that it would be good to fix this. Do you have any examples showing how this significantly degrades matching performance? It would probably be appropriate to address this in a different way, whereby instead of explicitly checking for |
…tion. The only optimization left that I can think of is to unroll the first iteration (k==0) before and out of the loop, but that would bloat the code somewhat, making it redundant and less maintainable.
You are right about optimizing by changing the loop range to avoid checking the condition on every iteration, however the first one, as in the commit comment, must be dealt with. Thinking about a more elegant solution, such as prior the the loop setting nhashes to zero... get back to you soon. |
I'm not sure what you mean. Why do you still need |
The reason was that k=0 iteration could still generate an "empty" FPCode (nhashes was never set to zero, just 6,3 or 1). |
How can this happen? I ran some tests with my fix and didn't see any empty codes but it would be good to see a counterexample if you have one.
Why do you think it is possible that p[0][0] and p[1][0] could both be less than 8? The deadtime between onsets should ensure that no two onsets are closer to each other than 128 points. Any contradiction to this indicates that something strange is going on, and that would need to be investigated. |
If you have any bug reports or examples of instances where the codegen is not working as expected, then they would be very welcome too. |
I could not provide an example for which p[*][0] < 8 holds nor any bug report. You must be right about such impossibility, it is just that it was not obvious to me because it is code outside the function that ensures that and the only technical paper I know about Echoprint does not go into that level of detail about the algorithms. Regarding performance, I would point out that although 64 FPCodes are not much for each song, in the inverted index there will be an entry with the whole database of millions of songs. Regarding accuracy, I have not done formal tests, but I would guess the harm would probably be minor. Well, so I guess this is it: just changing the loop range solves and closes the issue. Right? |
This fixes an issue with "short" codes. When total content length is short (15-30 seconds) with a controlled non-music database and with near real-time performance requirements, you need around 5 seconds to reliably match without this patch, but 3 seconds matches quite well with it. |
It's nice to hear that! I just hope that noew the pull request will be finally accepted. |
Hi jhaygood86 & adrianomitre, You're both right in that these extraneous codes should be omitted from the fingerprint. Up until now we have been very cautious about not changing the fingerprinting code but I agree that this ought to be included so I'm going to look into merging these patches (or something very similar) into the codebase. Thanks again, Andrew |
It has been a year since last message, so I wonder if this pull request will ever be accepted? |
Sorry for the delay in this, and thanks for the reminder. It's a good patch and a good point about the extraneous hash codes. The issue is whether this should go into the current stable (version 4.xx) codegen format, or the next major (verison 5.x) codegen format, depending on the level of incompatibility that this change introduces, which we don't know. Do you have any examples of significant performance increases by applying this change? |
On one hand, I do not have large scale comparative test performance results and, as previously stated, the impact should be minor in typical cases. On the other hand, I would assume that the incompatibility that this change introduces is negligible: fingerprints computed with the patched algorithm will match fingerprints computed with the unpatched one. So the question seems to me to be: why not? |
Hi. I have just added an 'if' to skip FPCodes with null "inter-onset-interval" data (i.e., dt0=dt1=0). In addition to waste DB space and processing, these empty FPCodes can also slighly hurt matching accuracy, given that they will be contained in practically every song/document -- often multiple times -- and adding to everyone actually reduces the score gap between high and low ranking documents in the first phase of matching.