Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 949 complex step derivative #950

Conversation

yizhang-yiz
Copy link
Contributor

@yizhang-yiz yizhang-yiz commented Jul 24, 2018

Submission Checklist

  • Run unit tests: ./runTests.py test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary:

Scalar version of #949 , to calculate derivative using complex step derivative approximation.

Intended Effect:

Given a function f that supports complex number calculation

complex<double> f(complex<double>& x);

calculate f(x) and df/dx through

complex_step_derivative(f, x, x_r, x_i, msg)

static const perturbation size h is set to be 1.e-32.

How to Verify:

Unit test. The f in unit test has a large derivative near x=0. The current implementation gives comparable accuracy to autodiff up to cout.precision(16) with step size h=1.e-32:

completex_step: 74753.81286589925
auto_diff     : 74753.812865899265

Side Effects:

n/a

Documentation:

doxygen

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Metrum Research Group, LLC

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

std::ostream* msgs) {
using stan::math::var;
using std::complex;
static double h = 1.e-32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precision is O(h^2) so why does h need to be 1e-32? Usually people do 1e-8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, < 1e-10 it becomes insensitive. I was showing merely we can do this way less than finite difference.

static double h = 1.e-32;
const double theta_d = theta.val();
const double res = complex_step_derivative(f, theta_d, x_r, x_i, msgs);
const double g
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems as if you should be able to do this in one function call that yields a complex<double> that has both the real and imaginary parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. We need both f(x) and f(x + ih), don't we?

@bgoodri
Copy link
Contributor

bgoodri commented Jul 24, 2018 via email

@yizhang-yiz
Copy link
Contributor Author

Since we need f(x). Are you saying f(x) \approx Re(f(x + ih))? This may not be true if f''(x) is large enough. Say f(x) = x^(-3), then when x=0.001

(0.001)^(-3) - Re((0.001 + 0.00000001I)^(-3)) = 0.6

@bbbales2
Copy link
Member

bbbales2 commented Aug 2, 2018

@yizhang-cae Isn't that a bit of an unfair comparison though? Playing in Mathematica I get:

NumberForm[0.001^-3 - Re[(0.001 + (1*^-8) I)^(-3)], 16]
0.5999999046325684

But then:

NumberForm[0.001^-3 - Re[(0.001 + (1*^-32) I)^(-3)], 16]
0.

The error on the real part is going to be O(h^2), which is super tiny if we can get away with using h = 1e-32. That function has a discontinuity at zero as well so it's gonna be unusually difficult.

edit: precision was actually 1e-8 in first example and I had a NumberForm on it

@bbbales2
Copy link
Member

bbbales2 commented Aug 2, 2018

Also could we make liberal use of auto to allow for functions that have lots of output?

Like if the function we're getting the step derivative of outputs a std::vector, then can we have:

var build_appropriate_output(const std::complex<double>& g, vari* theta, double h) {
  var(new stan::math::precomp_v_vari(std::real(g), theta.vi_, std::imag(g) / h));
}

std::vector<var> build_appropriate_output(const std::vector<std::complex<double>>& g, vari* theta, double h) {
  std::vector<var> out;
  out.reserve(g.size());
  for (int i = 0; i < g.size(); ++i) {
    out.push_back(var(new stan::math::precomp_v_vari(std::real(g[i]), theta.vi_, std::imag(g[i]) / h)));
  }
}

// etc...

template <typename F>
auto complex_step_derivative(const F& f,
                        const stan::math::var& theta,
                        const std::vector<double>& x_r,
                        const std::vector<int>& x_i,
                        std::ostream* msgs) {
   ...
   auto g = f(complex<double>(theta_d, h), x_r, x_i, msgs);
   return build_appropriate_output(g, theta.vi_, h);
}

And we could extend this to all the types that f might return pretty easily.

edit: fixed incorrect output type on std::vector build_appropriate_output

* @return a var with value f(theta.val()) and derivative at theta.
*/
template <typename F>
double complex_step_derivative(const F& f, const double& theta,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prim version should be fully templated (return a T, and theta is const T&). No reason this wouldn't work with the higher order stuff.

double complex_step_derivative(const F& f, const double& theta,
const std::vector<double>& x_r,
const std::vector<int>& x_i,
std::ostream* msgs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass h through as an argument and have its default value be 1e-32 or whatever? That seems like a handy thing to control.

@bbbales2
Copy link
Member

bbbales2 commented Aug 2, 2018

This is a cool idea @yizhang-cae. I didn't realize how simple it'd be to move this up to the interface level. I want to play around with it some more. I'd really like to use it in the reverse mode test code I write. Why I haven't been finite differencing up to this point in my tests is probly a failure on my part, but this looks better anyway.

Also I wonder about making theta a std::vector? I'm thinking Jacobians with N outputs and M inputs, where N >> M. It'd have the same advantages/disadvantages as any forward mode autodiff. But like in the diffusion example I mentioned in the other thread N is like 100+ and M is like 2-3, so M std::complex evals are going to be faster than 100+ reverse mode sweeps.

What is the difference in using this and an fvar? Cause we could make the same kind of function with an fvar? Is there a precision/speed advantage to one or the other?

@bbbales2
Copy link
Member

bbbales2 commented Aug 2, 2018

Oh just noticed the error on the derivative is also O(h^2). Definitely just need one complex evaluation like @bgoodri said.

@yizhang-yiz
Copy link
Contributor Author

Let's put h in args then.

Also I wonder about making theta a std::vector?

I was submitting this PR to see if there's anyone interested, if not I can move on to other priorities. Can I consider you guys comments as formal reviews, if so can we get the scalar version in first?

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add h as an argument

  2. Just use one std::complex evaluation

  3. Pick a few of your favorite scalar functions and compare the results of their vanilla autodiff against this autodiff.

When @bgoodri did this for the integrate_1d stuff he used these C++14 templated lambdas which made it really easy: https://github.com/stan-dev/math/blob/develop/test/unit/math/rev/arr/functor/integrate_1d_test.cpp#L408

@bbbales2
Copy link
Member

bbbales2 commented Aug 2, 2018

Well dangit, if we do this with only one argument first, then the signature changes for the std::vector version. Can we just do this all in one whammo? It should be pretty easy. The tests are gonna be the most annoying thing. I can help with those if you want.

@bbbales2
Copy link
Member

bbbales2 commented Aug 3, 2018

Alright I had a change of opinion on this. I still think it's super cool and I want something like this in the math library for testing, but it seems to me like the dual number formulation of fvars is better. It's based on the same sorta Taylor expansion idea, but you're working with an algebra where the equivalent of the h^2 terms really are zero.

So unless there is some numerical advantage to the complex algorithms (which there very easily could be, and I'm not the expert here), I think from an exposing-this-to-Stan perspective, fvars should be better.

The Nomad manual is here: https://github.com/stan-dev/nomad in the manual folder. I used this LaTeX makefile to build it: https://github.com/JasonHiebel/latex.makefile

@bgoodri @betanalpha Chime in -- I'm pretty outta my expertise-zone making recommendations here

@betanalpha
Copy link
Contributor

betanalpha commented Aug 3, 2018 via email

@bbbales2
Copy link
Member

bbbales2 commented Aug 3, 2018

@betanalpha Yup, that was the current question exactly. Thanks.

@yizhang-yiz
Copy link
Contributor Author

My motivation is mainly for some PDE solvers that I'd like to use (#931 ) can not easily incorporate AD but do support complex number. I can always do CSDA in the solvers, but figure it might be useful to expose the process to Stan.

@betanalpha
Copy link
Contributor

betanalpha commented Aug 3, 2018 via email

@bbbales2
Copy link
Member

bbbales2 commented Aug 3, 2018

@yizhang-cae I think to convince myself this was good to expose in Stan itself, then I'd have to be convinced it was better than doing the same thing with fvars. I could definitely be convinced of the usefulness of that -- fvars are faster than vars in some situations. Even if this just embeds them back in vars and it's all a little perverse.

I would like something like this in the math library itself. With @ChrisChiasson 's complex numbers with vars, we'd have another way to get 2nd derivatives too, I think. Which means we'd have a ton of difference ways to test derivatives which'd be really nice. Just having this in Math has the added benefit of getting rid of the interface cruft (x_r, x_i, msgs, etc.)

I'll defer to you on the usefulness of this sorta thing in PDE solvers, but this would be a small part of that.

@betanalpha At least this doesn't have a subtraction? It would avoid the loss of precision when doing the subtraction in finite diff. Ofc. maybe that'd go away by using some sorta high precision software floating point on the diff itself. Which probably wouldn't be hard?

@betanalpha
Copy link
Contributor

betanalpha commented Aug 4, 2018 via email

@bbbales2
Copy link
Member

bbbales2 commented Aug 4, 2018

@betanalpha The first place I heard about this was this little newsletter thing my advisor fwd'ed around to our group: https://sinews.siam.org/Details-Page/differentiation-without-a-difference . Considering it comes with a friendly cartoon, I'm led to believe it is the Absolute Authority on the subject.

I was wrong in saying you could just up the precision on the difference itself -- you'd have to actually have it all the way down the calculation. I think complex step should be blanket better numerically than FD in places where it's applicable.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Aug 4, 2018 via email

@yizhang-yiz
Copy link
Contributor Author

If there were only one book to read for math students, I'd recommend Higham's "Handbook of Writing for the Mathematical Sciences".

@yizhang-yiz
Copy link
Contributor Author

@bbbales2 , please go ahead review this. I've added h to args. For now it's scalar version only. After your adj_jac_apply is in I'll use it for vector version.

@bbbales2
Copy link
Member

Can we pause this pull request? If the use case is the PDE stuff, let's talk about it in the context of the PDE stuff.

This is neat. It should work, and could be useful, but I'm just hesitant to expose it at the .stan level if it's really part of something else.

@yizhang-yiz
Copy link
Contributor Author

I'll put it off then. It's already in torsten. I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants