Back to book
Making the most of your data with Bayes
This page will give you the means for performing simple Bayesian analyses.
For a short example and tutorial:
Dienes, Z., Coulton, S., & Heather, N. (in press). Using Bayes Factors To Evaluate Evidence For No Effect: Examples From The SIPS Project. Addiction,
For a longer tutorial see:
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psycholology, 5: 781. doi: 10.3389/fpsyg.2014.00781
Dienes, Z., & McLatchie, N. (2017). Four reasons to prefer Bayesian over orthodox statistical analyses. Psychonomic Bulletin & Review,
1. Bayes factor.
The Bayes factor tells you how strongly data support one theory (e.g. your pet scientific theory under test) over another (e.g. the null hypothesis). It is a simple intuitive way of performing the Bayesian equivalence of significance testing, telling you the sort of answer which many people mistakenly think they obtain from significance testing, but cannot. A "null" result in significance testing, for example, does not automatically mean you should reduce your confidence in the theory under test; often you should actually increase your confidence. A non-significant p-value does not tell you whether you have evidence for the null or no evidence for any conlusion at all (or indeed evidence against the null). Yet people routinely take a non-significant result as indicating they should reduce their confidence in a theory that predicts a difference.
The Bayes factor needs two types of input: 1) a summary of the data and 2) a specification of what the theories predict. In total you will only need to enter about four numbers!
1) In a situation where you could do a t-test, the data summary is exactly the same as would be used for a t-test:
a) the sample mean difference between conditions, or between a mean and a baseline, call this meandiff; and
b) the standard error of the difference, call this SE.
Note that t = meandiff/SE. Thus, if you know t and the mean differnce between conditions, you can get the relevant SE from SE = meandiff/t. This applies for any type of t-test.
(Note more generally: a) could be any sample statistic, such as a regression coefficient, and b) is its standard error, so long as a) is distributed roughly normally.)
In sum: For the first step, enter the difference between conditions in the "sample mean" box and its standard error in the "standard error" box.
2) Next you specify the theory you are testing against the null hypthesis. Specifying the theory means saying what range of effects are consistent with the theory and if any are particularly likely. The calculator calls the plot of different plausibilities of population effects given the theory "p(population effect|theory)", and asks if this is uniform. A simple rule is that if you can say what the maximum plausible effect is, say "yes"; otherwise say "no".
a) If you can specify a plausible maximum effect, use a uniform from 0 to that maximum. Enter "0" in the lower limit box and the maximum in the upper limit box. Then click "Go!"
b) If you can specify a plausible predicted effect size P (e.g. based on a previous study or meta-analysis), say "No" to a uniform. Three new boxes will come up, asking for the mean, standard deviation and number of tails (of a normal). A simple rule: If your theory allows effects to go in either direction,set the mean to 0, the standard deviation to P, and the tails to 2. If the theory makes a directional prediction, set mean to 0, SD to P and tails to 1. Then click "Go".
In sum: For the second step, you enter two numbers to describe your theory.
Note: A Bayes factor that is not based on the predicitons of your theory will be irrelevant to your theory.
A Bayes factor of 3 or more can be taken as substantial evidence for your theory (and against the null) and of 1/3 or less as evidence for the null (and against your theory). Bayes factors between 1/3 and 3 show the data do not provide much evidence to distinguish your theory from the null: The data are insensitive.
- If the theory predicts a direction, the program assumes the predicted difference is in the positive direction. If your mean difference was in the opposite direction as theory, enter it as negative.
- It is assumed that data will be normally distributed around the population mean with known variance. Typically population variance is unknown and only estimated from data (and so the standard error calculated from your data is only estimated), so the assumption of known variance will be problemmatic for small sample sizes (say less than 30) - in which case use the correction (given in Box 4.4 on page 94 of Dienes (2008)): increase the standard error, SE, to SE*(1 + 20/df*df), where df is the degrees of freedom. Alternatively, use this R code.
- To test a a Pearson's correlation r, first transform it to make it normal with Fisher's z transform. This site will do that for you. It has standard error SE = 1/squareroot(df - 1). For example, a correlation between mindfulness and hypnotisability is found of 0.2 with 30 participants. The Fisher z transform of 0.20 is 0.20. It has degrees of freedom = 28, so standard error = 0.19. From past research, correlates of hypnotisability, if they exist, are often around r = .30. The Fisher z transform of .30 is .31. Enter sample mean = .20, standard error = .19, no to uniform, 2 for tails (if that is the theory), 0 for mean and .3 for standard deviation. B = 0.78 (i.e. insensitive).
- The sample mean and standard error (for the data summary), and the limits of the uniform, or the mean and standard deviation of a normal (for specifying the theory) must all be in the same units. If your mean is on Likert scale, the predictions of your theory will also be in terms of a Likert scale. If you need to use standardized effect sizes, then r = sqrt( sqr(t) / ( sqr(t) + df) ). Then analyze r according to note 3.
A 5-minute short instructional on using the calculator
Now click here to calculate your Bayes factor!
For those who use Matlab, here is Matlab code for calculating Bayes factor in the same way as the flash program above. Baguley and Kaye (2010) provide equivalent R code. John Christie has also provided R code for the calculator, modified so that one can adjust the quality of the estimation of area under the curve; John claims greater accurcy for his calculator. Stefan Wiens provides R code here, including for using the t-distribution to model H1. For an example of R code using Bayes factors with logistic mixed effects models (glmer) , written by Elizabeth Wonnacott for Wonnacot, Brown, & Nation (2017).
For information on how the Rouder et al (2009) calculator differs from the Dienes (2008) one. This page also provides R code for using t-distributions in specifying H1.
For Bayes factor calculators for the binomial situation see here for two groups and here for one group .
Five minute Bayes:
The weakness of power
How many participants might I need?
How to analyze a 2X2 contingency table
2. Prior and Posterior distributions.
As well as a Bayes factor, it is usualy useful to determine what the most plausible set of population mean differences are, given your data and other constraints.
You start with prior beleifs about the population parameter. Assume you can represent your prior by a normal distribution (without grave misrepresentation) and also that your data are normal. Once you have determined the mean and standard deviation of your prior, collected data and hence found the mean and standard deviation of your likelihood (i.e. the mean difference in your data and its standard error), use this flash program to determine the mean and standard deviation of your posterior and look at graphs of the prior, likelihood and posterior distributions. If your prior is quite vague, the posterior is largely determined by the data. Thus your new prior before looking at the next study will be a normal distribution with a mean equal to the mean of Study 1 and a standard deviation equal to the standard error of the mean from Study 1.
Thus you can meta-analytically combine evidence across a series of studies with the same DV in the following way: For the mean of the prior enter the mean (mean difference etc) of Study 1 and for the standard deviaton of the prior enter the standard error of the mean difference. (This distributon represents the rational beleifs to have about the population paraemter value after seeing Study 1, given a vague prior beforehand.) Enter the mean difference for Study 2 as the mean of the likelihood and the standard error of Study 2 as the standard deviation of the likelihood. The posterior then indicated by the program gives the best estimate of the populaton parameter and its uncertainty in the light of both Studies 1 and 2. This could form the new prior to combining with a Study 3, and so on iteratively. If you have several studies with the same DV, this procedure can be followed to obtain an overall estimate of the mean difference and its standard error, which can be used in a Bayes factor calculator to determine the overall strength of evidence for H1 versus H0, or to evaluate credibiltiy intervals overall (see Dienes, 2014, for the principles of inference by interval).
Example: a previous study found that asking people to peform maths problems for 5 minutes a day increased their self discipline generally so they ended up doing the washing up two extra days each week. You replicate with different will-power interventions in three studies, finding the following increases in number of days of doing washing up each week: 0.5 (SE = 1.2), 2 (SE = 0.9), -0.5 (SE = 1.5). After Study 1, and before Study 2, one's prior would have a mean of 0.5 days and a SD of 1.2 (assuming before study 1 one had a very vague prior). After Study 2 and before Study 3, one's posterior from Study 2, and hence one's prior for Study 3, now has mean 1.46 (SD = 0.72). Finally, after Study 3, one's posterior has mean 1.09 (SD = 0.65). To perform a Bayes factor on the three studies as a whole, enter 1.09 as the sample mean and 0.65 as the standard error. Using a half-normal with SD = 2 days (the effect size from the original study), B = 2.08, indicating the theory that practicing will-power increases washing up episodes does not have substanital evidence either for or against provided by the three studies. (It might be worth exactly replicating Study 2 or the original study to see how that affects the overall evidence.)
To test your intuitions concerning Bayesian versus Orthodox statistics try this QUIZ.
For other practical examples of using Bayes factors:
Dienes, Z (2015). How Bayesian statistics are needed to determine whether mental states are unconscious. In M. Overgaard (Ed.), Behavioural Methods in Consciousness Research. Oxford: Oxford University Press.
For a discussion of Bayes and the credibility crisis in Psychology:
Dienes, Z. (2016). How Bayes factors change scientific practice. Journal of Mathematical Psychology, 72, 78-89.
For a discussion of conceptual issues:
Dienes, Z. (2011). Bayesian versus Orthodox statistics: Which side are you on? Perspectives on Psychological Sciences, 6(3), 274-290.
A talk on Four reasons to be Bayesian given at Oxford in 2017; and a follow up workshop on Principles for Bayes factors.
A talk on how to use Bayes given at Lancaster earlier in 2015.
This is a lecture I gave on Bayes to Masters students at University of Sussex in 2014.
An essay I set students is: "Perform a Bayesian analysis on a part of the data from your project or from a paper published this year (consider an interesting question tested by a t-test one test will do). Compare and contrast the conclusions from your analysis with those that follow from an analysis using Neyman-Pearson (orthodox) statistics. "
See also this assessment of several topics from the book.
For papers using a Bayes factor for every inferential test ("a B for every p"):
Fu, Q., Liu, Y., Dienes, Z., Wu, J., Chen, W., Fu, X. (2017). Neural correlates of subjective awareness for natural scene categorization of color photographs and line-drawings. Frontiers in Psychology, 8, 210, doi.org/10.3389/fpsyg.2017.00210
Lush, P. , Caspar, E. A., Cleeremans, A., Haggard, P., Magalhães De Saldanha da Gama, P.A., & Dienes, Z (in press). The Power of Suggestion: Post-hypnotically induced changes in the temporal binding of intentional action outcomes. Psychological Science,
Martin, J. R., Sackur, J., & Dienes, Z. (in press). Attention or Instruction? Do sustained attentional abilities really differ between high and low hypnotisable persons. Psychological Research,
Caspar, E. A., Desantis, A. Dienes, Z., Cleeremans, A., & Haggard, P. (2016) The sense of agency as tracking control. PLoS ONE 11(10): e0163892. doi:10.1371/journal.pone.0163892
Dienes, Z., Lush, P., Semmens-Wheeler, R., Parkinson, J., Scott, R. B., & Naish, P. (2016). Hypnosis as self-deception; Meditation as self-insight. In A. Raz and M. Lifshitz (Eds), Hypnosis and Meditation: Toward an integrative science of conscious planes. Oxford University Press, pp 107-128.
Fu, Q., Liu, Y., Dienes, Z., Wua, J., Chen, W., & Fu, X. (2016). The role of edge-based and surface-based information in natural scene categorization: Evidence from behavior and event-related potentials. Consciousness and Cognition, 43, 152–166.
Ling, X., Li, F., Qiao, F., Guo, X., & Dienes, Z. (2016). Fluency Expresses Implicit Knowledge of Tonal Symmetry. Frontiers in Psycholology, 7, 57, doi: 10.3389/fpsyg.2016.00057
Lush, P., Naish, P., & Dienes, Z. (2016). Metacognition of intentions in mindfulness and hypnosis. Neuroscience of Consciousness, 1-10, doi: 10.1093/nc/niw007
Lush, P., Parkinson, J., & Dienes, Z. (2016). Illusory temporal binding in meditators. Mindfulness,7, 1416–1422 doi:10.1007/s12671-016-0583-z
Martin, J.R., Sackur, J., Anlló, H., Naish, P., & Dienes, Z. (2016). Perceiving time differences when you should not: Applying the El Greco fallacy to hypnotic time distortions. Frontiers in Psychology, 7, 1309, doi: org/10.3389/fpsyg.2016.01309
Norman, E., Scott, R. B., Price, M. C., & Dienes, Z. (2016). The relationship between strategic control and conscious structural knowledge in artificial grammar learning. Consciousness and Cognition, 42, 229–236.
Parkinson, J., Garfinkel, S., Critchley, H. , Dienes, Z., , and Seth, A. (2016). Don’t make me angry, you wouldn’t like me when I’m angry: Volitional choices to act or inhibit are modulated by subliminal perception of emotional faces. Cognitive, Affective, and Behavioral Neuroscience, doi:10.3758/s13415-016-0477-5
Ziori, E., & Dienes, Z. (2015). Facial beauty affects implicit and explicit learning of men and women differently. Frontiers in Psycholology, 6, 1124. doi: 10.3389/fpsyg.2015.01124
Some other papers using Bayes factors for certain key tests:
Greve, A., Cooper, E., & Henson, R. N. (2014). No evidence that ‘fast-mapping’ beneﬁts novel learning in healthy older adults. Neuropsychologia, 60, 52–59
Allen, C. P. G. , Dunkley, B. T., Muthukumaraswamy, S. D., Edden, R., Evans, C. J., et al. (2014) Enhanced Awareness Followed Reversible Inhibition of Human Visual Cortex: A Combined TMS, MRS and MEG Study. PLoS ONE 9(6): e100350. doi:10.1371/journal.pone.0100350
Allen, C. P. G., Sumner, P., & Chambers, C. D. (in press). The Timing and Neuroanatomy of Conscious Vision as Revealed by TMS-induced Blindsight. Journal of Cognitive Neuroscience
Verbruggen F, Adams RC, van ‘t Wout F, Stevens T, McLaren IPL, et al. (2013) Are the Effects of Response Inhibition on Gambling Long-Lasting? PLoS ONE 8(7): e70155. doi:10.1371/journal.pone.0070155 (see Data nalysis section)
Newell, B. R., & Rakow, T. (2011). Revising beliefs about the merit of unconscious thought: Evidence in favour of the null hypothesis? Social Cognition, 29,
Verdonschot, R.G., Kiyama, S., Tamaoka, K., Kinoshita, S., La Heij, W., & Schiller, N.O. (2011). The Functional Unit of Japanese Word Naming: Evidence From Masked Priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37 ( 6), 1458–1473.
Thanks to Online Experiments for providing the Flash conversions of my Matlab. Use Online Experiments for convenient programming of all your experiments!
The two-sample t-test is a parametric test that compares the location parameter of two independent data samples.
The test statistic iswhere and are the sample means, sx and sy are the sample standard deviations, and n and m are the sample sizes.
In the case where it is assumed that the two data samples are from populations with equal variances, the test statistic under the null hypothesis has Student's t distribution with n + m – 2 degrees of freedom, and the sample standard deviations are replaced by the pooled standard deviation
In the case where it is not assumed that the two data samples are from populations with equal variances, the test statistic under the null hypothesis has an approximate Student's t distribution with a number of degrees of freedom given by Satterthwaite's approximation. This test is sometimes called Welch’s t-test.