Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finite differences converges to different value than automatic differentiation. #26

Open
andrevitorelli opened this issue Jul 22, 2021 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@andrevitorelli
Copy link
Contributor

As it is right now, when doing finite differences, it's clear that the derivative converges as we go to smaller step sizes. However, when using an odd number of pixels (which at the moment is the correct choice, as shear is applied around an integer central pixel, and we cannot properly shift the image without better interpolation), the value it converges to is consistently lower than the one given by automatic differentiation.

I'll post more about here illustrating what we know about this bug, but in general, it seems that when generating the metacalibration image, the current code creates an image with less shear than what we ask of it.

@andrevitorelli andrevitorelli added the bug Something isn't working label Jul 22, 2021
@andrevitorelli andrevitorelli self-assigned this Jul 22, 2021
@EiffL
Copy link
Member

EiffL commented Jul 23, 2021

can you illustrate this with a plot :-) Better yet, a unit test?

@EiffL
Copy link
Member

EiffL commented Jul 23, 2021

but these are 2 different questions, the question of whether we get the correct gradients, and the question of whether we apply the correct shearing.

@andrevitorelli
Copy link
Contributor Author

andrevitorelli commented Jul 23, 2021

but these are 2 different questions, the question of whether we get the correct gradients, and the question of whether we apply the correct shearing.

Yes, indeed they are. I'm still collecting all the behaviours I observe (I wanted to post it earlier today, but had a lot to do here). What I know by now is:

  • The gradients by finite differences always converge (ie., with smaller steps, they converge to a precise value)
  • When using an even stamp_size (which we know to be incorrect), they will always converge to the same value given by automatic differentiation
  • When using an odd stamp_size, they converge to other values, but not in a fixed rate in comparison to the automatic differentiation value
  • Neither the median nor the mean for all galaxies in finite diferentiation converge to the same value as the same as automatic differentiation.
  • The differences between the values from finite/auto derivatives are small for diagonal elements, way larger for off-diagonals.
  • In general, both methods, in "normal" situations (ie. reasonable level of noise), give smaller R_{11} values than the correct one (unlike what we said before, yes, we CAN know the correct value, because our simple measurement method correlates very tightly with the known ellipticities in our simulated images - so we can get R_{11} as m1+1 by linear regression.)
  • Also in general, both methods reach the same results, and both over-correct ellipticities (because they both report response values that are about 5% smaller than what they sould be)

@EiffL
Copy link
Member

EiffL commented Jul 25, 2021

@andrevitorelli great investigative work :-) But can you either upload your notebook or upload the plots that support your conclusions? It's very useful for other contributors (i.e. me) to be able to look into problems quickly, from a starting point ;-)

Also, what would be the most constructive thing is to build unit tests, which can fail, highlihgting the potential issues with the code. Then we can work on fixes to these unit tests.

@andrevitorelli
Copy link
Contributor Author

andrevitorelli commented Jul 25, 2021

@EiffL Thanks, and yes, I am in the process of creating a unit test that covers the problem precisely. It will just test two things: 1) does the finite difference derivative converge? (it always does, actually, but still) 2) when it does, does it differ from the autodiff value by a large (ie > 1e-3) value?

I was checking what to expect from different examples.

@EiffL
Copy link
Member

EiffL commented Jul 25, 2021

ok, so, a tiny bit of experimenting reveals the following. The tensorfow addon resampler seems to struggle to get derivatives around integer resampling points.

I've added a test notebook here: https://github.com/CosmoStat/autometacal/blob/u/EiffL/test_grads/notebooks/MetacalGradients.ipynb
(Look at the last section, which should highlight the problem)

When the resampling points are at half pixels, the gradients of the resampling step are perfect:
image

But when at integer pixel values:
image

(Tthese are residual plots)

Ok, so, several steps from here:

  • : Double check on a clean example that the gradients of tfa_image.resampler fail when the warp is at integer values.
  • : If so, open an issue on tensorflow_addons, with a MWE reproducing the problem
  • : Then 2 options to solve the problem:
    - Option 1: dig into the code of tfa, find why the gradient kernel is not workring, fix it
    - Option 2: if we used even size images, it would work, BUT in that case we need to use resampling of the image in the FFT step so that we obtain the correct behaviour in the Fourier transform.

Please let me know which option you want to pursue @andrevitorelli, I won't be very available this week, but don't hesiitate to ask me questions if you have any questions on how to proceed :-)

@EiffL
Copy link
Member

EiffL commented Jul 25, 2021

Made a small demo showing the diffference in gradients when resample points are at integer vs half integer positions:
https://gist.github.com/EiffL/9f29a407b91c4de207803b1769189dfa
It confirms that gradients are wonky if the sampling points are at integer values.

You can use something similar to create a minimal working example to open an issue on tensorflow_addons.

@EiffL
Copy link
Member

EiffL commented Jul 25, 2021

sorry ^^' one last comment before I end up offline in the countryside. This is also very correlated to what @Dr-Zero was working on. We are going to replace the interpolation code in tfa anyways. Are we close to a working implementation? Because this would be good to mention as well in a TFA issue, that we will have a fix in the form of a revamp of the interpolation implementation in resampler.

@andrevitorelli
Copy link
Contributor Author

@EiffL I'll talk to him during this week and work closely. I want to close this whole issue ASAP, preferably during this week, but surely, in a worst-case scenario, until the next one.

@andrevitorelli
Copy link
Contributor Author

Related to: tensorflow/addons#2535

@EiffL
Copy link
Member

EiffL commented Nov 6, 2021

How are we doing with a PR for TF addons ^^' ?

@Dr-Zero
Copy link

Dr-Zero commented Nov 8, 2021 via email

@EiffL
Copy link
Member

EiffL commented Nov 9, 2021

Thanks for the update Thiago, and I hope you are recovering well!
Sure, a quintic kernel would be good, but much more importantly we want to merge our modified code into Tensorflow add-ons. For one it will mean that it will be part of the default library that users pip install, they won't have to compile it on their own, and it also reduces the maintenance cost by a lot.

I took a look at your copy of the code, but it's not a fork of tensorflow/addons ? I'm not sure if you can open a PR if you didn't fork the repo from GitHub.

@andrevitorelli
Copy link
Contributor Author

andrevitorelli commented Jan 5, 2022

@Dr-Zero @EiffL I have set up a fork at my own account here: addons, and checked the diff results locally:

diff -qr github/addons/tensorflow_addons/ github/addons_drzero/tensorflow_addons/

Files github/addons/tensorflow_addons/custom_ops/image/BUILD and github/addons_drzero/tensorflow_addons/custom_ops/image/BUILD differ
Only in github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels: register_kernel_factory.h
Files github/addons/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.cc and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.cc differ
Files github/addons/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops_gpu.cu.cc and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops_gpu.cu.cc differ
Files github/addons/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.h and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.h differ
Only in github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels: sampling_functions.h
Files github/addons/tensorflow_addons/custom_ops/image/cc/ops/resampler_ops.cc and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/ops/resampler_ops.cc differ
Files github/addons/tensorflow_addons/image/dense_image_warp.py and github/addons_drzero/tensorflow_addons/image/dense_image_warp.py differ
Files github/addons/tensorflow_addons/image/resampler_ops.py and github/addons_drzero/tensorflow_addons/image/resampler_ops.py differ
Files github/addons/tensorflow_addons/image/tests/dense_image_warp_test.py and github/addons_drzero/tensorflow_addons/image/tests/dense_image_warp_test.py differ
Files github/addons/tensorflow_addons/layers/__init__.py and github/addons_drzero/tensorflow_addons/layers/__init__.py differ
Only in github/addons/tensorflow_addons/layers: max_unpooling_2d_v2.py
Only in github/addons/tensorflow_addons/layers/tests: max_unpooling_2d_v2_test.py
Files github/addons/tensorflow_addons/optimizers/lamb.py and github/addons_drzero/tensorflow_addons/optimizers/lamb.py differ
Files github/addons/tensorflow_addons/optimizers/tests/lamb_test.py and github/addons_drzero/tensorflow_addons/optimizers/tests/lamb_test.py differ
Files github/addons/tensorflow_addons/optimizers/tests/weight_decay_optimizers_test.py and github/addons_drzero/tensorflow_addons/optimizers/tests/weight_decay_optimizers_test.py differ
Files github/addons/tensorflow_addons/optimizers/utils.py and github/addons_drzero/tensorflow_addons/optimizers/utils.py differ
Files github/addons/tensorflow_addons/optimizers/weight_decay_optimizers.py and github/addons_drzero/tensorflow_addons/optimizers/weight_decay_optimizers.py differ
Files github/addons/tensorflow_addons/seq2seq/tests/decoder_test.py and github/addons_drzero/tensorflow_addons/seq2seq/tests/decoder_test.py differ
Files github/addons/tensorflow_addons/version.py and github/addons_drzero/tensorflow_addons/version.py differ

Some of them are updates from upstream, and others are our work. I'm parsing through to see which is which to restructure the repo. I'll use git --author to fix the authorship of the changes.

@andrevitorelli
Copy link
Contributor Author

Interestingly, Mike Jarvis found what @Dr-Zero commented in one of our meetings:
.../GalSim/.../src/Interpolant.cpp#L392

Quote:

        // However, it turns out that the second derivative continuity equations (marked * above)
        // are not actually satisfied.  I (MJ) tried to derive a version that does satisfy all
        // the constraints and discovered that the system is over-constrained.  If I keep the
        // second derivative constraints and drop F''''(j) = 0, I get the following:
(...)

        // This is Gary's original version with F''''(j) = 0, but f''(x) is not continuous at
        // x = 1,2,3.

This is not, however, the source of our troubles, as we only use the first derivative.

@EiffL
Copy link
Member

EiffL commented Jan 5, 2022

Ok this is great! Here is what I recommend you do:

  • Step I: You create a patch with only the modifications added by Thiago. You can do that by diffing the last commit from Thiago's branch vs the last commit not made by Thiago, which appears to be this one: Dr-Zero/addons@f86b502

  • Step II: On a clean fork of tf addons, up to date with upstream, you apply the patch you have created

This way you won't have to deal with differences that are due to new modifs in tf addons.

@EiffL
Copy link
Member

EiffL commented Jan 5, 2022

....or maybe easier, you just update your local clone of Thiago's repo to make it up to date with upstream.

@andrevitorelli
Copy link
Contributor Author

....or maybe easier, you just update your local clone of Thiago's repo to make it up to date with upstream.

Ok, I think this worked, and kept @Dr-Zero's signed edits with it. Can you check, @EiffL , here: addons/new_kernels ?

@andrevitorelli
Copy link
Contributor Author

@Dr-Zero we need to come up with a PR for the addons repo ASAP for the code to be useful.

@andrevitorelli
Copy link
Contributor Author

Just to inform that the feature request was posted as an issue in the tfa repo here: addons#2779. They ask for people to do a feature request before submitting new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants