Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian analytics #3

Open
dapperdrop opened this issue Feb 17, 2020 · 8 comments
Open

Bayesian analytics #3

dapperdrop opened this issue Feb 17, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@dapperdrop
Copy link
Member

@kingo55 should we consider fleshing out Bayesian analytics again? It would be interesting to develop some functionality to run side-by-side with the Frequentist reports we run, to see how it stacks up.

The main thing to put some thought to is how we calculate priors. We could perhaps calculate it (mean + std deviation) based on the past X months worth of conversion data?

Another question is how to we deal with sizing and presenting the data in our reports.

Refs:

@kingo55
Copy link
Member

kingo55 commented Feb 17, 2020

Oh yes! That would be awesome.

As you allude to, priors could be generated through our sizing process. Anything is better than no prior. Also, I'm not sure how relevant older data is, considering seasonality... perhaps the 30 days we use in our sizing calculator is sufficient here too?

I don't think we need to size experiments in advance with bayesian inference, but we'd still need it for establishing the prior I think.

@kingo55 kingo55 added the enhancement New feature or request label Feb 17, 2020
@lukasvermeer
Copy link

Anything is better than no prior.

There is no such thing as Bayesian with “no prior”.

At a bare minimum there is an “uninformed prior” (which is a bit of a misnomer imho), but you can’t take the prior out of the equation (or out of the philosophy, for that matter).

@lukasvermeer
Copy link

Another question is how to we deal with sizing

Could you explain what you mean by "sizing"? I am not familiar with this term.

@kingo55
Copy link
Member

kingo55 commented Feb 17, 2020

We haven't documented this, but before running experiments at Mint Metrics, we calculate the traffic we need for a minimum detectable effect using some helper functions. e.g.:

> estimateDurationQuery(
+   app_id = "site_name",
+   trigger_clause = "page_urlpath like '/products/%'",
+   conversion_clause = "page_urlpath = '/order/thank-you/'",
+   delta = -0.07,
+   recipes = 2,
+   stat_power = 0.8
+ )
[1] "Days to run: 31.786299299664"
  subjects conversions  base_cvr target_cvr
1    47862        5221 0.1090845  0.1014485
  • Trigger clause: This selects users who would have been exposed
  • Conversion clause: This selects users who would have converted after being exposed

It gives us a base line conversion rate for users who would typically be exposed over the last 30 days. Perhaps this baseline conversion rate will be useful as a prior?

@kingo55
Copy link
Member

kingo55 commented Feb 17, 2020

Kind of similar to this calculator: https://www.evanmiller.org/ab-testing/sample-size.html

The actual calculation is performed here in our code: https://github.com/mint-metrics/mojito-r-analytics/blob/master/mojito-functions/experiment_sizing.R#L40

@lukasvermeer
Copy link

stat_power = 0.8

I assume this refers to the "desired statistical power"? In a Frequentist paradigm, power is needed to control for the type-II error (false negative) rate. Conversely, in a Bayesian paradigm, there are (afaik) no type-II error rate guarantees.

I don't think we need to size experiments in advance with bayesian inference

Indeed there is no need. Sizing (or power) is needed to make guarantees about error rates that Bayesian inference does not consider.

(ftr: imho this is a limitation of the Bayesian approach, not a strength.)

@dapperdrop
Copy link
Member Author

@lukasvermeer @kingo55

Not sure if my thinking is correct, but could we take an approach that leverage the advantages of both paradigms?

I.e. Frequentist to determine a target sample size / test duration to reduce type-II errors and Bayesian (with strong priors) to reduce type-I errors and easier to disseminate results?

I've seen some other CRO agencies use 'hybrid' approaches, albeit not as simplistic as this, so this train of thought maybe completely off.

@lukasvermeer
Copy link

Bayesian (with strong priors) to reduce type-I errors

While a Bayesian approach might empirically reduce type-I errors (when evaluated against some simulated data using a Frequentist lens), there are not guarantees about error rates (type-I or type-II).

I really have no idea how one would get the best of both worlds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants