Issue 949 complex step derivative #950

yizhang-yiz · 2018-07-24T04:15:36Z

Submission Checklist

Run unit tests: ./runTests.py test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Summary:

Scalar version of #949 , to calculate derivative using complex step derivative approximation.

Intended Effect:

Given a function f that supports complex number calculation

complex<double> f(complex<double>& x);

calculate f(x) and df/dx through

complex_step_derivative(f, x, x_r, x_i, msg)

static const perturbation size h is set to be 1.e-32.

How to Verify:

Unit test. The f in unit test has a large derivative near x=0. The current implementation gives comparable accuracy to autodiff up to cout.precision(16) with step size h=1.e-32:

completex_step: 74753.81286589925
auto_diff     : 74753.812865899265

Side Effects:

n/a

Documentation:

doxygen

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Metrum Research Group, LLC

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

…xp1~20180509124008.99 (branches/release_50)

bgoodri · 2018-07-24T15:25:20Z

stan/math/rev/scal/functor/complex_step_derivative.hpp

+                                        std::ostream* msgs) {
+  using stan::math::var;
+  using std::complex;
+  static double h = 1.e-32;


The precision is O(h^2) so why does h need to be 1e-32? Usually people do 1e-8.

You are right, < 1e-10 it becomes insensitive. I was showing merely we can do this way less than finite difference.

bgoodri · 2018-07-24T15:26:24Z

stan/math/rev/scal/functor/complex_step_derivative.hpp

+  static double h = 1.e-32;
+  const double theta_d = theta.val();
+  const double res = complex_step_derivative(f, theta_d, x_r, x_i, msgs);
+  const double g


It seems as if you should be able to do this in one function call that yields a complex<double> that has both the real and imaginary parts.

Not sure what you mean. We need both f(x) and f(x + ih), don't we?

bgoodri · 2018-07-24T16:05:02Z

No, just y = f(x + i * h) with real(y) and imag(y) / h.

…

On Tue, Jul 24, 2018 at 11:44 AM Yi Zhang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In stan/math/rev/scal/functor/complex_step_derivative.hpp <#950 (comment)>: > + * @param[in] x_i integer data vector for the ODE. + * @param[out] msgs the print stream for warning messages. + * @return a var with value f(theta.val()) and derivative at theta. + */ +template <typename F> +stan::math::var complex_step_derivative(const F& f, + const stan::math::var& theta, + const std::vector<double>& x_r, + const std::vector<int>& x_i, + std::ostream* msgs) { + using stan::math::var; + using std::complex; + static double h = 1.e-32; + const double theta_d = theta.val(); + const double res = complex_step_derivative(f, theta_d, x_r, x_i, msgs); + const double g Not sure what you mean. We need both f(x) and f(x + ih), don't we? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#950 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOrqskLdnaYSOsg_oA1QK_7iaHZiaIGks5uJ0DVgaJpZM4VcDnR> .

yizhang-yiz · 2018-07-24T16:31:52Z

Since we need f(x). Are you saying f(x) \approx Re(f(x + ih))? This may not be true if f''(x) is large enough. Say f(x) = x^(-3), then when x=0.001

(0.001)^(-3) - Re((0.001 + 0.00000001I)^(-3)) = 0.6

bbbales2 · 2018-08-02T09:07:21Z

@yizhang-cae Isn't that a bit of an unfair comparison though? Playing in Mathematica I get:

NumberForm[0.001^-3 - Re[(0.001 + (1*^-8) I)^(-3)], 16]
0.5999999046325684

But then:

NumberForm[0.001^-3 - Re[(0.001 + (1*^-32) I)^(-3)], 16]
0.

The error on the real part is going to be O(h^2), which is super tiny if we can get away with using h = 1e-32. That function has a discontinuity at zero as well so it's gonna be unusually difficult.

edit: precision was actually 1e-8 in first example and I had a NumberForm on it

bbbales2 · 2018-08-02T09:20:43Z

Also could we make liberal use of auto to allow for functions that have lots of output?

Like if the function we're getting the step derivative of outputs a std::vector, then can we have:

var build_appropriate_output(const std::complex<double>& g, vari* theta, double h) {
  var(new stan::math::precomp_v_vari(std::real(g), theta.vi_, std::imag(g) / h));
}

std::vector<var> build_appropriate_output(const std::vector<std::complex<double>>& g, vari* theta, double h) {
  std::vector<var> out;
  out.reserve(g.size());
  for (int i = 0; i < g.size(); ++i) {
    out.push_back(var(new stan::math::precomp_v_vari(std::real(g[i]), theta.vi_, std::imag(g[i]) / h)));
  }
}

// etc...

template <typename F>
auto complex_step_derivative(const F& f,
                        const stan::math::var& theta,
                        const std::vector<double>& x_r,
                        const std::vector<int>& x_i,
                        std::ostream* msgs) {
   ...
   auto g = f(complex<double>(theta_d, h), x_r, x_i, msgs);
   return build_appropriate_output(g, theta.vi_, h);
}

And we could extend this to all the types that f might return pretty easily.

edit: fixed incorrect output type on std::vector build_appropriate_output

bbbales2 · 2018-08-02T09:21:51Z

stan/math/prim/scal/functor/complex_step_derivative.hpp

+ * @return a var with value f(theta.val()) and derivative at theta.
+ */
+template <typename F>
+double complex_step_derivative(const F& f, const double& theta,


The prim version should be fully templated (return a T, and theta is const T&). No reason this wouldn't work with the higher order stuff.

bbbales2 · 2018-08-02T09:22:37Z

stan/math/prim/scal/functor/complex_step_derivative.hpp

+double complex_step_derivative(const F& f, const double& theta,
+                               const std::vector<double>& x_r,
+                               const std::vector<int>& x_i,
+                               std::ostream* msgs) {


Can we pass h through as an argument and have its default value be 1e-32 or whatever? That seems like a handy thing to control.

bbbales2 · 2018-08-02T09:32:45Z

This is a cool idea @yizhang-cae. I didn't realize how simple it'd be to move this up to the interface level. I want to play around with it some more. I'd really like to use it in the reverse mode test code I write. Why I haven't been finite differencing up to this point in my tests is probly a failure on my part, but this looks better anyway.

Also I wonder about making theta a std::vector? I'm thinking Jacobians with N outputs and M inputs, where N >> M. It'd have the same advantages/disadvantages as any forward mode autodiff. But like in the diffusion example I mentioned in the other thread N is like 100+ and M is like 2-3, so M std::complex evals are going to be faster than 100+ reverse mode sweeps.

What is the difference in using this and an fvar? Cause we could make the same kind of function with an fvar? Is there a precision/speed advantage to one or the other?

bbbales2 · 2018-08-02T09:57:12Z

Oh just noticed the error on the derivative is also O(h^2). Definitely just need one complex evaluation like @bgoodri said.

yizhang-yiz · 2018-08-02T14:32:06Z

Let's put h in args then.

Also I wonder about making theta a std::vector?

I was submitting this PR to see if there's anyone interested, if not I can move on to other priorities. Can I consider you guys comments as formal reviews, if so can we get the scalar version in first?

bbbales2

Add h as an argument
Just use one std::complex evaluation
Pick a few of your favorite scalar functions and compare the results of their vanilla autodiff against this autodiff.

When @bgoodri did this for the integrate_1d stuff he used these C++14 templated lambdas which made it really easy: https://github.com/stan-dev/math/blob/develop/test/unit/math/rev/arr/functor/integrate_1d_test.cpp#L408

bbbales2 · 2018-08-02T15:10:02Z

Well dangit, if we do this with only one argument first, then the signature changes for the std::vector version. Can we just do this all in one whammo? It should be pretty easy. The tests are gonna be the most annoying thing. I can help with those if you want.

bbbales2 · 2018-08-03T08:02:14Z

Alright I had a change of opinion on this. I still think it's super cool and I want something like this in the math library for testing, but it seems to me like the dual number formulation of fvars is better. It's based on the same sorta Taylor expansion idea, but you're working with an algebra where the equivalent of the h^2 terms really are zero.

So unless there is some numerical advantage to the complex algorithms (which there very easily could be, and I'm not the expert here), I think from an exposing-this-to-Stan perspective, fvars should be better.

The Nomad manual is here: https://github.com/stan-dev/nomad in the manual folder. I used this LaTeX makefile to build it: https://github.com/JasonHiebel/latex.makefile

@bgoodri @betanalpha Chime in -- I'm pretty outta my expertise-zone making recommendations here

betanalpha · 2018-08-03T14:03:11Z

Sorry, Ben, what’s the current question? The complex step method takes advantage of the fact that off the real axis the terms in a Taylor expansion of an analytic function oscillate between imaginary and real, so the the series is quickly decaying then the imaginary part is dominated by the first odd term, which is the gradient. This method is comparable to symmetric finite differences (with all of the same accuracy issues, especially in high-dimensions) with the benefit of being a little bit cleaner and requiring only a single function evaluation. The downside is that you need to support all of the complex operations and you get the overhead of those more expensive complex operations (so the single function evaluation costs similar to the two function evaluations you would use in symmetric finite differences). Forward mode autodiff implemented by fvars is equivalent to doing a complex step method with a “dual unit” instead of an imagine unit. A dual unit is nilpotent, its square vanishes, so the only thing that survives in the dual component is the exact first order directional derivative, hence no error (ignoring floating point). Personally I don’t think complex step would be worth the UX pain — how does the interface clearly inform users if all of the operations in their function don’t support complex numbers? Especially when floating point gives nearly the same result without requiring any complex number support at all.

…

On Aug 3, 2018, at 4:02 AM, Ben Bales ***@***.***> wrote: Alright I had a change of opinion on this. I still think it's super cool and I want something like this in the math library for testing, but it seems to me like the dual number formulation of fvars is better. It's based on the same sorta Taylor expansion idea, but you're working with an algebra where the equivalent of the h^2 terms really are zero. So unless there is some numerical advantage to the complex algorithms (which there very easily could be, and I'm not the expert here), I think from an exposing-this-to-Stan perspective, fvars should be better. The Nomad manual is here: https://github.com/stan-dev/nomad <https://github.com/stan-dev/nomad> in the manual folder. I used this LaTeX makefile to build it: https://github.com/JasonHiebel/latex.makefile <https://github.com/JasonHiebel/latex.makefile> @bgoodri <https://github.com/bgoodri> @betanalpha <https://github.com/betanalpha> Chime in -- I'm pretty outta my expertise-zone making recommendations here — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#950 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABdNlr7erdWxLyseZ91MxD3_P0ze6sNTks5uNAOHgaJpZM4VcDnR>.

bbbales2 · 2018-08-03T14:20:26Z

@betanalpha Yup, that was the current question exactly. Thanks.

yizhang-yiz · 2018-08-03T14:26:19Z

My motivation is mainly for some PDE solvers that I'd like to use (#931 ) can not easily incorporate AD but do support complex number. I can always do CSDA in the solvers, but figure it might be useful to expose the process to Stan.

betanalpha · 2018-08-03T14:44:22Z

The question is not CSDA verses AD but rather they are "why is CSDA beneficial over symmetric finite differences (if at all)?" and "is that benefit work the maintenance burden?”.

…

On Aug 3, 2018, at 10:34 AM, Yi Zhang ***@***.***> wrote: My motivation is mainly for some PDE solvers that I'd like to use (#931 <#931> ) can not easily incorporate AD but do support complex number. I can always do CSDA in the solvers, but figure it might be useful to expose the process to Stan. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#950 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABdNlm7orJ7iMvIG37EJ3G9wCum4Bw92ks5uNF9hgaJpZM4VcDnR>.

bbbales2 · 2018-08-03T15:08:43Z

@yizhang-cae I think to convince myself this was good to expose in Stan itself, then I'd have to be convinced it was better than doing the same thing with fvars. I could definitely be convinced of the usefulness of that -- fvars are faster than vars in some situations. Even if this just embeds them back in vars and it's all a little perverse.

I would like something like this in the math library itself. With @ChrisChiasson 's complex numbers with vars, we'd have another way to get 2nd derivatives too, I think. Which means we'd have a ton of difference ways to test derivatives which'd be really nice. Just having this in Math has the added benefit of getting rid of the interface cruft (x_r, x_i, msgs, etc.)

I'll defer to you on the usefulness of this sorta thing in PDE solvers, but this would be a small part of that.

@betanalpha At least this doesn't have a subtraction? It would avoid the loss of precision when doing the subtraction in finite diff. Ofc. maybe that'd go away by using some sorta high precision software floating point on the diff itself. Which probably wouldn't be hard?

betanalpha · 2018-08-04T03:49:47Z

@yizhang-cae <https://github.com/yizhang-cae> I think to convince myself this was good to expose in Stan itself, then I'd have to be convinced it was better than doing the same thing with fvars. I could definitely be convinced of the usefulness of that -- fvars are faster than vars in some situations. Even if this just embeds them back in vars and it's all a little perverse.

To reiterate a bit — CSDA does not do the same thing as forward mode with fvars. The former is an approximation that requires complex implementations of functions but no analytic partials while the latter is an exact method that requires no complex implementations but analytic partials. The correct question is whether or not CSDA is better than finite differences. Oh, and to be clear in the multivariate case both CSAD and finite differences yield _directional derivatives_, not full gradients. To build up the full gradients you’d need to do apply these approximations N times, where N is the size of the gradient.

@betanalpha <https://github.com/betanalpha> At least this doesn't have a subtraction? It would avoid the loss of precision when doing the subtraction in finite diff. Ofc. maybe that'd go away by using some sorta high precision software floating point on the diff itself. Which probably wouldn't be hard

With symmetric finite differences you use a difference epsilon to get accuracy O(epsilon^2). CSDA adds an imaginary component of size epsilon and gets accuracy O(epsilon^2). Whether CSDA is more accurate than finite differences depends on how large epsilon is relative to the other terms that propagate into the imaginary part during the complex function evaluation, and this won’t be entirely disparate from the relevant comparisons in the finite difference case. There may be some circumstances where the two are significantly different but then we have to ask how often do those circumstances arise and how important are they?

bbbales2 · 2018-08-04T05:31:21Z

@betanalpha The first place I heard about this was this little newsletter thing my advisor fwd'ed around to our group: https://sinews.siam.org/Details-Page/differentiation-without-a-difference . Considering it comes with a friendly cartoon, I'm led to believe it is the Absolute Authority on the subject.

I was wrong in saying you could just up the precision on the difference itself -- you'd have to actually have it all the way down the calculation. I think complex step should be blanket better numerically than FD in places where it's applicable.

bob-carpenter · 2018-08-04T09:40:52Z

Ben Goodrich's primary advice for learning numerical methods was to read whatever Nicholas Higham wrotes. Looks like Higham worked with Al-Mohy on this, as well as matrix exponential algorithm I think we're using.

yizhang-yiz · 2018-08-04T15:28:51Z

If there were only one book to read for math students, I'd recommend Higham's "Handbook of Writing for the Mathematical Sciences".

yizhang-yiz · 2018-08-10T02:21:56Z

@bbbales2 , please go ahead review this. I've added h to args. For now it's scalar version only. After your adj_jac_apply is in I'll use it for vector version.

bbbales2 · 2018-08-14T12:20:32Z

Can we pause this pull request? If the use case is the PDE stuff, let's talk about it in the context of the PDE stuff.

This is neat. It should work, and could be useful, but I'm just hesitant to expose it at the .stan level if it's really part of something else.

yizhang-yiz · 2018-08-14T12:42:25Z

I'll put it off then. It's already in torsten. I'm closing this.

Yi Zhang and others added 5 commits July 23, 2018 23:29

scalar version of complex step derivative approximation with unit test

54bf6e2

doc, lint

fb42412

Merge commit '29dff4c5aa290d362f89dd99f41bcf328b2b30cf' into HEAD

4538112

[Jenkins] auto-formatting by clang-format version 5.0.2-svn328729-1~e…

f57a8de

…xp1~20180509124008.99 (branches/release_50)

add headers

a65c79d

bgoodri reviewed Jul 24, 2018

View reviewed changes

bbbales2 reviewed Aug 2, 2018

View reviewed changes

bbbales2 requested changes Aug 2, 2018

View reviewed changes

two versions: default h and user-supplied h

d0d2b27

yizhang-yiz closed this Aug 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 949 complex step derivative #950

Issue 949 complex step derivative #950

yizhang-yiz commented Jul 24, 2018 •

edited

Loading

bgoodri Jul 24, 2018

yizhang-yiz Jul 24, 2018

bgoodri Jul 24, 2018

yizhang-yiz Jul 24, 2018

bgoodri commented Jul 24, 2018 via email

yizhang-yiz commented Jul 24, 2018

bbbales2 commented Aug 2, 2018 •

edited

Loading

bbbales2 commented Aug 2, 2018 •

edited

Loading

bbbales2 Aug 2, 2018

bbbales2 Aug 2, 2018

bbbales2 commented Aug 2, 2018

bbbales2 commented Aug 2, 2018

yizhang-yiz commented Aug 2, 2018

bbbales2 left a comment

bbbales2 commented Aug 2, 2018

bbbales2 commented Aug 3, 2018

betanalpha commented Aug 3, 2018 via email

bbbales2 commented Aug 3, 2018

yizhang-yiz commented Aug 3, 2018

betanalpha commented Aug 3, 2018 via email

bbbales2 commented Aug 3, 2018

betanalpha commented Aug 4, 2018 via email

bbbales2 commented Aug 4, 2018 •

edited

Loading

bob-carpenter commented Aug 4, 2018 via email

yizhang-yiz commented Aug 4, 2018

yizhang-yiz commented Aug 10, 2018

bbbales2 commented Aug 14, 2018

yizhang-yiz commented Aug 14, 2018

Issue 949 complex step derivative #950

Issue 949 complex step derivative #950

Conversation

yizhang-yiz commented Jul 24, 2018 • edited Loading

Submission Checklist

Summary:

Intended Effect:

How to Verify:

Side Effects:

Documentation:

Copyright and Licensing

bgoodri Jul 24, 2018

Choose a reason for hiding this comment

yizhang-yiz Jul 24, 2018

Choose a reason for hiding this comment

bgoodri Jul 24, 2018

Choose a reason for hiding this comment

yizhang-yiz Jul 24, 2018

Choose a reason for hiding this comment

bgoodri commented Jul 24, 2018 via email

yizhang-yiz commented Jul 24, 2018

bbbales2 commented Aug 2, 2018 • edited Loading

bbbales2 commented Aug 2, 2018 • edited Loading

bbbales2 Aug 2, 2018

Choose a reason for hiding this comment

bbbales2 Aug 2, 2018

Choose a reason for hiding this comment

bbbales2 commented Aug 2, 2018

bbbales2 commented Aug 2, 2018

yizhang-yiz commented Aug 2, 2018

bbbales2 left a comment

Choose a reason for hiding this comment

bbbales2 commented Aug 2, 2018

bbbales2 commented Aug 3, 2018

betanalpha commented Aug 3, 2018 via email

bbbales2 commented Aug 3, 2018

yizhang-yiz commented Aug 3, 2018

betanalpha commented Aug 3, 2018 via email

bbbales2 commented Aug 3, 2018

betanalpha commented Aug 4, 2018 via email

bbbales2 commented Aug 4, 2018 • edited Loading

bob-carpenter commented Aug 4, 2018 via email

yizhang-yiz commented Aug 4, 2018

yizhang-yiz commented Aug 10, 2018

bbbales2 commented Aug 14, 2018

yizhang-yiz commented Aug 14, 2018

yizhang-yiz commented Jul 24, 2018 •

edited

Loading

bbbales2 commented Aug 2, 2018 •

edited

Loading

bbbales2 commented Aug 2, 2018 •

edited

Loading

bbbales2 commented Aug 4, 2018 •

edited

Loading