conditional to skip wasteful (possibly harmful) FPCodes with no IOI data #38

adrianomitre · 2012-04-13T07:37:16Z

Hi. I have just added an 'if' to skip FPCodes with null "inter-onset-interval" data (i.e., dt0=dt1=0). In addition to waste DB space and processing, these empty FPCodes can also slighly hurt matching accuracy, given that they will be contained in practically every song/document -- often multiple times -- and adding to everyone actually reduces the score gap between high and low ranking documents in the first phase of matching.

alnesbit · 2012-04-13T10:30:06Z

Thanks for this. I agree that the presence of these "null codes" can be confusing when eyeballing the decoded fingerprint codes. However, they occur only during the last one or two seconds of each subband. There are 8 of these spurious null codes at the end of each of these 8 subbands, giving a total of 64, which is relatively a very small amount.

I think the effect on matching accuracy is negligible, but I agree that it would be good to fix this. Do you have any examples showing how this significantly degrades matching performance?

It would probably be appropriate to address this in a different way, whereby instead of explicitly checking for if (time_delta0 != 0 || time_delta1 != 0) { ... } in every iteration of the enclosing for-loop, the entire for-loop should be written as for(uint k=0;k<nhashes;k++) { ... } instead of the current for(uint k=0;k<6;k++) { ... }.

…tion. The only optimization left that I can think of is to unroll the first iteration (k==0) before and out of the loop, but that would bloat the code somewhat, making it redundant and less maintainable.

adrianomitre · 2012-04-13T10:42:51Z

You are right about optimizing by changing the loop range to avoid checking the condition on every iteration, however the first one, as in the commit comment, must be dealt with. Thinking about a more elegant solution, such as prior the the loop setting nhashes to zero... get back to you soon.

alnesbit · 2012-04-13T10:49:20Z

I'm not sure what you mean. Why do you still need if (k == 0 && (time_delta0 != 0 || time_delta1 != 0)) { ... }?

adrianomitre · 2012-04-13T11:08:18Z

The reason was that k=0 iteration could still generate an "empty" FPCode (nhashes was never set to zero, just 6,3 or 1).
I think it does not get any better than my last commit: I took care to work out the "< 8" to avoid two extra quantized_time_for_frame_delta calls and "n < 8" can be optimized by the compiler as "!(n & 0xFFFFFFF8)". What do you think?

alnesbit · 2012-04-13T12:19:59Z

The reason was that k=0 iteration could still generate an "empty" FPCode (nhashes was never set to zero, just 6,3 or 1).

How can this happen? I ran some tests with my fix and didn't see any empty codes but it would be good to see a counterexample if you have one.

I think it does not get any better than my last commit: I took care to work out the "< 8" to avoid two extra quantized_time_for_frame_delta calls and "n < 8" can be optimized by the compiler as "!(n & 0xFFFFFFF8)". What do you think?

Why do you think it is possible that p[0][0] and p[1][0] could both be less than 8? The deadtime between onsets should ensure that no two onsets are closer to each other than 128 points. Any contradiction to this indicates that something strange is going on, and that would need to be investigated.

alnesbit · 2012-04-13T14:46:30Z

If you have any bug reports or examples of instances where the codegen is not working as expected, then they would be very welcome too.

…essions"

adrianomitre · 2012-04-13T19:26:44Z

I could not provide an example for which p[*][0] < 8 holds nor any bug report. You must be right about such impossibility, it is just that it was not obvious to me because it is code outside the function that ensures that and the only technical paper I know about Echoprint does not go into that level of detail about the algorithms.

Regarding performance, I would point out that although 64 FPCodes are not much for each song, in the inverted index there will be an entry with the whole database of millions of songs. Regarding accuracy, I have not done formal tests, but I would guess the harm would probably be minor.

Well, so I guess this is it: just changing the loop range solves and closes the issue. Right?

jhaygood86 · 2012-07-10T22:08:26Z

This fixes an issue with "short" codes. When total content length is short (15-30 seconds) with a controlled non-music database and with near real-time performance requirements, you need around 5 seconds to reliably match without this patch, but 3 seconds matches quite well with it.

adrianomitre · 2012-07-10T22:50:57Z

It's nice to hear that! I just hope that noew the pull request will be finally accepted.

alnesbit · 2012-07-20T14:00:15Z

Hi jhaygood86 & adrianomitre,

You're both right in that these extraneous codes should be omitted from the fingerprint. Up until now we have been very cautious about not changing the fingerprinting code but I agree that this ought to be included so I'm going to look into merging these patches (or something very similar) into the codebase.

Thanks again,

Andrew

adrianomitre · 2013-10-26T01:10:31Z

It has been a year since last message, so I wonder if this pull request will ever be accepted?

alnesbit · 2013-10-30T03:02:20Z

Sorry for the delay in this, and thanks for the reminder. It's a good patch and a good point about the extraneous hash codes. The issue is whether this should go into the current stable (version 4.xx) codegen format, or the next major (verison 5.x) codegen format, depending on the level of incompatibility that this change introduces, which we don't know. Do you have any examples of significant performance increases by applying this change?

adrianomitre · 2013-10-30T03:20:49Z

On one hand, I do not have large scale comparative test performance results and, as previously stated, the impact should be minor in typical cases.

On the other hand, I would assume that the incompatibility that this change introduces is negligible: fingerprints computed with the patched algorithm will match fingerprints computed with the unpatched one. So the question seems to me to be: why not?

Zlib urlsafe base64 encoded by default, human readable with -h flag Squashed commit of the following: commit bbdd156 commit 4c5c903 commit 13756eb commit 434f442 commit e99df11 commit 005be9e commit afce593 commit f0bc228 commit bd67d66

…ble)

spotify#38 spotify#34

skip wasteful null hashes (dt0==0 && dt1==0)

1814b96

ghost assigned alnesbit Apr 13, 2012

Changed the loop range to avoid checking the condition on every itera…

406576d

…tion. The only optimization left that I can think of is to unroll the first iteration (k==0) before and out of the loop, but that would bloat the code somewhat, making it redundant and less maintainable.

faster and more elegant solution/optimization

90fd49a

adrianomitre added 2 commits April 13, 2012 16:20

removed clause that shall never be true

55c8ade

no more "warning: comparison between signed and unsigned integer expr…

5057370

…essions"

Merge branch 'master' of https://github.com/echonest/echoprint-codegen

0559ce1

adrianomitre force-pushed the master branch from 4c89cba to f986a9e Compare November 21, 2015 02:40

Implemented unhashed codes

92e6d98

Zlib urlsafe base64 encoded by default, human readable with -h flag Squashed commit of the following: commit bbdd156 commit 4c5c903 commit 13756eb commit 434f442 commit e99df11 commit 005be9e commit afce593 commit f0bc228 commit bd67d66

adrianomitre force-pushed the master branch from f986a9e to 92e6d98 Compare November 21, 2015 02:41

adrianomitre added 3 commits November 30, 2015 04:01

createCodeStringJSON: bugfix for very small number of codes (< 3)

bb0ed0e

discard even more wasteful null codes (TODO: verify if actually possi…

921cd04

…ble)

refactoring to make as equal as possible to playax repo

64ebf06

adrianomitre unassigned alnesbit Apr 11, 2016

fluential pushed a commit to fluential/echoprint-codegen that referenced this pull request Oct 12, 2021

Add feature Pull Requests. spotify#38 spotify#34.

cb3f289

spotify#38 spotify#34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conditional to skip wasteful (possibly harmful) FPCodes with no IOI data #38

conditional to skip wasteful (possibly harmful) FPCodes with no IOI data #38

adrianomitre commented Apr 13, 2012

alnesbit commented Apr 13, 2012

adrianomitre commented Apr 13, 2012

alnesbit commented Apr 13, 2012

adrianomitre commented Apr 13, 2012

alnesbit commented Apr 13, 2012

alnesbit commented Apr 13, 2012

adrianomitre commented Apr 13, 2012

jhaygood86 commented Jul 10, 2012

adrianomitre commented Jul 10, 2012

alnesbit commented Jul 20, 2012

adrianomitre commented Oct 26, 2013

alnesbit commented Oct 30, 2013

adrianomitre commented Oct 30, 2013

conditional to skip wasteful (possibly harmful) FPCodes with no IOI data #38

Are you sure you want to change the base?

conditional to skip wasteful (possibly harmful) FPCodes with no IOI data #38

Conversation

adrianomitre commented Apr 13, 2012

alnesbit commented Apr 13, 2012

adrianomitre commented Apr 13, 2012

alnesbit commented Apr 13, 2012

adrianomitre commented Apr 13, 2012

alnesbit commented Apr 13, 2012

alnesbit commented Apr 13, 2012

adrianomitre commented Apr 13, 2012

jhaygood86 commented Jul 10, 2012

adrianomitre commented Jul 10, 2012

alnesbit commented Jul 20, 2012

adrianomitre commented Oct 26, 2013

alnesbit commented Oct 30, 2013

adrianomitre commented Oct 30, 2013