A week-and-a-bit ago I received the table of contents for SPPS in my inbox (gotta love the smell of research in the morning), and on the list was this study by Will Gervais, Jennifer Jewell, Maxine Najle, and Ben Ng. I read it a while ago, and liked it so much that I presented it to the weekly “journal club” at uni. I also meant to blog about it right away, but clearly needed this additional nudge to actually make it happen!
Gervais gives a full outline of the study on his blog, so I will just summarize in very brief: the main finding was that social psychologists making a hypothetical hiring decision preferred a job candidate with a history of being less productive but having higher powered studies…when the (negative) consequences of low-powered vs. high-powered research were made salient to them.
Judging from the reactions of the more experienced researchers at our journal club, job candidates are so rarely evaluated in the explicit terms set up in the study, that directly translating from this hypothetical set-up to any particular intervention in “the real world”, might be too much of a leap. (Not that I think that’s what Gervais and colleagues were aiming for, anyway.) What most of us did agree on though, was that seeing the consequences of low-powered research quantified like that was, well, powerful.
When I was first introduced to the idea of power (statistically speaking), it was as an explanation for null-effects – “maybe we didn’t have enough power to detect that effect; it’s hard to interpret this null effect, given the small sample size”. This in turn meant that the bad consequence I associated with low power, was “merely” an increased chance of Type II Errors. I know some argue that we should be just as concerned about false negatives (incorrectly failing to reject the null hypothesis) as false positives (incorrectly rejecting the null hypothesis), but for whatever reason I have always had a much stronger reaction to the possibility of the latter; that is, Type I Errors.
What I think is really clever about the Gervais and colleagues study, is that their formulas and accompanying graph (try it out for yourself!) tie the two types of error together, so they can be considered in relation to each other. (As they “should”, at least for some value of “should”). So, when determining the sample size for my most recent study, I had a play around with the graph, and various power calculators – aiming to set my sample size based on much more thorough calculations than I have in the past. Which brings me to the second thing I appreciate about the study – the discussion of the costs of high powered research. In determining the sample size for my study, I was under very real restraints in terms of resources, which ultimately boiled down to – for me – do I recruit an additional 100 participants, or do I attend the next major conference in my field (SPSP)?
Now. There would have been other ways to free up funds. I could have paid participants less. I could have abstained from coffee for a month. I could have cancelled my gym membership. But, for whatever reason, the trade-off I found myself making was between sample size and spending money on a conference. In the end I compromised, and recruited not quite as many participants as I would have in an ideal world (sigh), but still enough to have 80% power to detect an effect as small as Cohen’s d = 0.2. The main effect I was looking for has in my previous research been medium to quite large – d‘s from 0.4 to 1.6 – but I was in this study introducing a new manipulation that I have not previously used. So, we’ll see. Maybe I’ll have to fall back on the ol’ “it’s hard to interpret this null effect”…