The title of this post refers to #bringoutyernulls, which was (is?) making the rounds of twitter a little while ago. The hashtag emerged around the same time as a large scale attempt at replicating ego-depletion was published (hint: nulls), as well as a paper opening an oxytocin file drawer (and this earlier post is also highly relevant). More generally, the argument goes that since lots of labs and research literatures are accompanied by large file drawers, emptying these out might be good for science, for our research practice, for shifting incentives, etc etc. Previously, people would hide nulls, because they were somehow… I don’t know, shameful? A blot on people’s research oeuvre? [Edit: oh, and, because nulls don’t get published, obviously. There just seemed to be this vibe of “nothing so be ashamed of!” around the hashtag, which implies secrecy etc, to me.] While I agree that file drawers are a problem, I felt a vague unease at this hashtag.
Then, about a week later, we had the special issue of JESP, where Baumeister’s take on “flair” prompted discussion of expertise, competence, and n = 10 (?!). Part of this debate, in a simplified version, boils down the question of comparing an original study with a replication, when the original study showed a significant effect, and the replication study did not. How do you explain this difference? Baumeister seems to explain it by saying that the replication study was poorly done, by incompetent researchers. The original is “true”, the replication is flawed. On the opposing side of the debate, the difference is explained by researcher degrees of freedom in running the original study, small samples, and flukes. The replication is “true”, the original is flawed. (Of course, everyone in this debate is also being extremely nuanced, so this is a caricature.)
I tend to find myself closer to the second side, but this time points 1 and 2 above added up to feeling some sympathy for Baumeister’s position. (I also assume he represents some larger proportion of people in academia, not speaking only for himself.)
To explain: I don’t think replicators are generally incompetent, or out to ruin anyone’s careers. I do remain ambivalent about what might explain the difference between the original study and the replication, however. And,this is where my reaction to #bringoutyernulls comes in:
We have all got null results sometimes. And although it probably (?) doesn’t mean we’re incompetent in general, it can mean that we designed or ran that particular study badly. In fact, keeping in mind how hard good psych research is, I’d almost be inclined to assume that it’s easier to get null results, than to get significance. And if your study was poorly designed and executed, your nulls will be as meaningless as a p-hacked p < 0.05. Personal example: in the first study of my thesis, I described a few significant findings as “mere blips of significance in a sea of inconsistent correlations”. Knowing how my study was designed, these correlations were most likely completely meaningless. And, having resisted over-interpretation in my thesis, I wouldn’t want to flood you with this sea in a #bringoutyernulls link, either.
So that’s where my resistance comes from. I think when there’s a difference between originals and replications, science is not helped by fetishizing neither flair nor null results. Quality design and execution will have some weight in determining the evidentiary value of both studies. Whether the result of either study has some a priori claim on being “true”, will depend on all sorts of other things. (This is one reason why I was (rather incoherently) going on about Bayes, in the last post.)
Anyway. Other researchers write with more authority and expertise (even flair!) about doing psych research, so I won’t often go on about it here. I will, though, introduce you to these researchers along the way. For now, I’m off to present some pretty decisive (I hope) non-nulls at SASP, the best conference in the world. That finding replicates every year.