-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 949 complex step derivative #950
Issue 949 complex step derivative #950
Conversation
…xp1~20180509124008.99 (branches/release_50)
std::ostream* msgs) { | ||
using stan::math::var; | ||
using std::complex; | ||
static double h = 1.e-32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The precision is O(h^2) so why does h need to be 1e-32? Usually people do 1e-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, < 1e-10 it becomes insensitive. I was showing merely we can do this way less than finite difference.
static double h = 1.e-32; | ||
const double theta_d = theta.val(); | ||
const double res = complex_step_derivative(f, theta_d, x_r, x_i, msgs); | ||
const double g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems as if you should be able to do this in one function call that yields a complex<double>
that has both the real and imaginary parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. We need both f(x)
and f(x + ih)
, don't we?
No, just
y = f(x + i * h)
with real(y) and imag(y) / h.
…On Tue, Jul 24, 2018 at 11:44 AM Yi Zhang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In stan/math/rev/scal/functor/complex_step_derivative.hpp
<#950 (comment)>:
> + * @param[in] x_i integer data vector for the ODE.
+ * @param[out] msgs the print stream for warning messages.
+ * @return a var with value f(theta.val()) and derivative at theta.
+ */
+template <typename F>
+stan::math::var complex_step_derivative(const F& f,
+ const stan::math::var& theta,
+ const std::vector<double>& x_r,
+ const std::vector<int>& x_i,
+ std::ostream* msgs) {
+ using stan::math::var;
+ using std::complex;
+ static double h = 1.e-32;
+ const double theta_d = theta.val();
+ const double res = complex_step_derivative(f, theta_d, x_r, x_i, msgs);
+ const double g
Not sure what you mean. We need both f(x) and f(x + ih), don't we?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#950 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqskLdnaYSOsg_oA1QK_7iaHZiaIGks5uJ0DVgaJpZM4VcDnR>
.
|
Since we need (0.001)^(-3) - Re((0.001 + 0.00000001I)^(-3)) = 0.6 |
@yizhang-cae Isn't that a bit of an unfair comparison though? Playing in Mathematica I get:
But then:
The error on the real part is going to be O(h^2), which is super tiny if we can get away with using h = 1e-32. That function has a discontinuity at zero as well so it's gonna be unusually difficult. edit: precision was actually 1e-8 in first example and I had a NumberForm on it |
Also could we make liberal use of auto to allow for functions that have lots of output? Like if the function we're getting the step derivative of outputs a std::vector, then can we have:
And we could extend this to all the types that f might return pretty easily. edit: fixed incorrect output type on std::vector build_appropriate_output |
* @return a var with value f(theta.val()) and derivative at theta. | ||
*/ | ||
template <typename F> | ||
double complex_step_derivative(const F& f, const double& theta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prim version should be fully templated (return a T, and theta is const T&). No reason this wouldn't work with the higher order stuff.
double complex_step_derivative(const F& f, const double& theta, | ||
const std::vector<double>& x_r, | ||
const std::vector<int>& x_i, | ||
std::ostream* msgs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass h through as an argument and have its default value be 1e-32 or whatever? That seems like a handy thing to control.
This is a cool idea @yizhang-cae. I didn't realize how simple it'd be to move this up to the interface level. I want to play around with it some more. I'd really like to use it in the reverse mode test code I write. Why I haven't been finite differencing up to this point in my tests is probly a failure on my part, but this looks better anyway. Also I wonder about making theta a What is the difference in using this and an fvar? Cause we could make the same kind of function with an fvar? Is there a precision/speed advantage to one or the other? |
Oh just noticed the error on the derivative is also O(h^2). Definitely just need one complex evaluation like @bgoodri said. |
Let's put
I was submitting this PR to see if there's anyone interested, if not I can move on to other priorities. Can I consider you guys comments as formal reviews, if so can we get the scalar version in first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Add h as an argument
-
Just use one std::complex evaluation
-
Pick a few of your favorite scalar functions and compare the results of their vanilla autodiff against this autodiff.
When @bgoodri did this for the integrate_1d stuff he used these C++14 templated lambdas which made it really easy: https://github.com/stan-dev/math/blob/develop/test/unit/math/rev/arr/functor/integrate_1d_test.cpp#L408
Well dangit, if we do this with only one argument first, then the signature changes for the std::vector version. Can we just do this all in one whammo? It should be pretty easy. The tests are gonna be the most annoying thing. I can help with those if you want. |
Alright I had a change of opinion on this. I still think it's super cool and I want something like this in the math library for testing, but it seems to me like the dual number formulation of fvars is better. It's based on the same sorta Taylor expansion idea, but you're working with an algebra where the equivalent of the h^2 terms really are zero. So unless there is some numerical advantage to the complex algorithms (which there very easily could be, and I'm not the expert here), I think from an exposing-this-to-Stan perspective, fvars should be better. The Nomad manual is here: https://github.com/stan-dev/nomad in the manual folder. I used this LaTeX makefile to build it: https://github.com/JasonHiebel/latex.makefile @bgoodri @betanalpha Chime in -- I'm pretty outta my expertise-zone making recommendations here |
Sorry, Ben, what’s the current question?
The complex step method takes advantage of the fact that off
the real axis the terms in a Taylor expansion of an analytic
function oscillate between imaginary and real, so the the series
is quickly decaying then the imaginary part is dominated by
the first odd term, which is the gradient. This method is
comparable to symmetric finite differences (with all of the
same accuracy issues, especially in high-dimensions) with
the benefit of being a little bit cleaner and requiring only a
single function evaluation. The downside is that you need
to support all of the complex operations and you get the
overhead of those more expensive complex operations
(so the single function evaluation costs similar to the two
function evaluations you would use in symmetric finite
differences).
Forward mode autodiff implemented by fvars is equivalent to
doing a complex step method with a “dual unit” instead of an
imagine unit. A dual unit is nilpotent, its square vanishes, so
the only thing that survives in the dual component is the exact
first order directional derivative, hence no error (ignoring
floating point).
Personally I don’t think complex step would be worth the UX
pain — how does the interface clearly inform users if all of
the operations in their function don’t support complex numbers?
Especially when floating point gives nearly the same result
without requiring any complex number support at all.
… On Aug 3, 2018, at 4:02 AM, Ben Bales ***@***.***> wrote:
Alright I had a change of opinion on this. I still think it's super cool and I want something like this in the math library for testing, but it seems to me like the dual number formulation of fvars is better. It's based on the same sorta Taylor expansion idea, but you're working with an algebra where the equivalent of the h^2 terms really are zero.
So unless there is some numerical advantage to the complex algorithms (which there very easily could be, and I'm not the expert here), I think from an exposing-this-to-Stan perspective, fvars should be better.
The Nomad manual is here: https://github.com/stan-dev/nomad <https://github.com/stan-dev/nomad> in the manual folder. I used this LaTeX makefile to build it: https://github.com/JasonHiebel/latex.makefile <https://github.com/JasonHiebel/latex.makefile>
@bgoodri <https://github.com/bgoodri> @betanalpha <https://github.com/betanalpha> Chime in -- I'm pretty outta my expertise-zone making recommendations here
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#950 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABdNlr7erdWxLyseZ91MxD3_P0ze6sNTks5uNAOHgaJpZM4VcDnR>.
|
@betanalpha Yup, that was the current question exactly. Thanks. |
My motivation is mainly for some PDE solvers that I'd like to use (#931 ) can not easily incorporate AD but do support complex number. I can always do CSDA in the solvers, but figure it might be useful to expose the process to Stan. |
The question is not CSDA verses AD but rather they are
"why is CSDA beneficial over symmetric finite differences
(if at all)?" and "is that benefit work the maintenance burden?”.
… On Aug 3, 2018, at 10:34 AM, Yi Zhang ***@***.***> wrote:
My motivation is mainly for some PDE solvers that I'd like to use (#931 <#931> ) can not easily incorporate AD but do support complex number. I can always do CSDA in the solvers, but figure it might be useful to expose the process to Stan.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#950 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABdNlm7orJ7iMvIG37EJ3G9wCum4Bw92ks5uNF9hgaJpZM4VcDnR>.
|
@yizhang-cae I think to convince myself this was good to expose in Stan itself, then I'd have to be convinced it was better than doing the same thing with fvars. I could definitely be convinced of the usefulness of that -- fvars are faster than vars in some situations. Even if this just embeds them back in vars and it's all a little perverse. I would like something like this in the math library itself. With @ChrisChiasson 's complex numbers with vars, we'd have another way to get 2nd derivatives too, I think. Which means we'd have a ton of difference ways to test derivatives which'd be really nice. Just having this in Math has the added benefit of getting rid of the interface cruft (x_r, x_i, msgs, etc.) I'll defer to you on the usefulness of this sorta thing in PDE solvers, but this would be a small part of that. @betanalpha At least this doesn't have a subtraction? It would avoid the loss of precision when doing the subtraction in finite diff. Ofc. maybe that'd go away by using some sorta high precision software floating point on the diff itself. Which probably wouldn't be hard? |
@yizhang-cae <https://github.com/yizhang-cae> I think to convince myself this was good to expose in Stan itself, then I'd have to be convinced it was better than doing the same thing with fvars. I could definitely be convinced of the usefulness of that -- fvars are faster than vars in some situations. Even if this just embeds them back in vars and it's all a little perverse.
To reiterate a bit — CSDA does not do the same thing as forward mode with fvars.
The former is an approximation that requires complex implementations of functions
but no analytic partials while the latter is an exact method that requires no complex
implementations but analytic partials.
The correct question is whether or not CSDA is better than finite differences.
Oh, and to be clear in the multivariate case both CSAD and finite differences
yield _directional derivatives_, not full gradients. To build up the full gradients
you’d need to do apply these approximations N times, where N is the size of
the gradient.
@betanalpha <https://github.com/betanalpha> At least this doesn't have a subtraction? It would avoid the loss of precision when doing the subtraction in finite diff. Ofc. maybe that'd go away by using some sorta high precision software floating point on the diff itself. Which probably wouldn't be hard
With symmetric finite differences you use a difference epsilon to get accuracy
O(epsilon^2). CSDA adds an imaginary component of size epsilon and gets
accuracy O(epsilon^2). Whether CSDA is more accurate than finite differences
depends on how large epsilon is relative to the other terms that propagate into
the imaginary part during the complex function evaluation, and this won’t be
entirely disparate from the relevant comparisons in the finite difference case.
There may be some circumstances where the two are significantly different
but then we have to ask how often do those circumstances arise and how
important are they?
|
@betanalpha The first place I heard about this was this little newsletter thing my advisor fwd'ed around to our group: https://sinews.siam.org/Details-Page/differentiation-without-a-difference . Considering it comes with a friendly cartoon, I'm led to believe it is the Absolute Authority on the subject. I was wrong in saying you could just up the precision on the difference itself -- you'd have to actually have it all the way down the calculation. I think complex step should be blanket better numerically than FD in places where it's applicable. |
Ben Goodrich's primary advice for learning numerical methods was to read whatever Nicholas Higham wrotes. Looks like Higham worked with Al-Mohy on this, as well as matrix exponential algorithm I think we're using.
|
If there were only one book to read for math students, I'd recommend Higham's "Handbook of Writing for the Mathematical Sciences". |
@bbbales2 , please go ahead review this. I've added |
Can we pause this pull request? If the use case is the PDE stuff, let's talk about it in the context of the PDE stuff. This is neat. It should work, and could be useful, but I'm just hesitant to expose it at the .stan level if it's really part of something else. |
I'll put it off then. It's already in torsten. I'm closing this. |
Submission Checklist
./runTests.py test/unit
make cpplint
Summary:
Scalar version of #949 , to calculate derivative using complex step derivative approximation.
Intended Effect:
Given a function
f
that supports complex number calculationcalculate
f(x)
anddf/dx
throughcomplex_step_derivative(f, x, x_r, x_i, msg)
static const
perturbation sizeh
is set to be1.e-32
.How to Verify:
Unit test. The
f
in unit test has a large derivative nearx=0
. The current implementation gives comparable accuracy to autodiff up tocout.precision(16)
with step sizeh=1.e-32
:completex_step: 74753.81286589925 auto_diff : 74753.812865899265
Side Effects:
n/a
Documentation:
doxygen
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Metrum Research Group, LLC
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: