Solving the replication crisis: rigor and reproducibility

Started by Puget, July 03, 2019, 09:03:33 AM

Previous topic - Next topic

Puget

This discussion started on another thread, and I thought it was worthy of its own.
Those in the sciences/quantitative social sciences:

What do you do to enhance rigor and reproducibility (to use the NIH phrasing) in your research?

What (if any) research practices have you changed since becoming more aware of replication problems?

How do you train your students on this?

Let's have a good discussion about best practices!
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

Kron3007

We always repeat our experiments multiple times to ensure the results are reproducible, at least within our lab.  I thought this was standard practice everywhere, but learned that this is not the case in a recent discussion with someone in my field.  We have run a number of experiments in our lab that were not reproducible, and if we dont get the same results upon repetition, there is no way they will be reliable among labs.  I think another key is to include very detailed materials and methods.  Again, this should be standard, but from trying to replicate published protocols I can tell you that it is not always the case.

Regarding student training, the main thing is that I stress not to blindly trust scientific literature (a shame that it has come to this) or expect to be able to use published protocols.  I am just finishing up a student who's whole thesis failed due to these issues.  Based on the literature it should have worked, but based on our results and discussions with others who have tried, I am convinced it simply doesn't work.  I intend to (try to) publish the negative results, which is one of the key solutions to this problem IMO.  Too often, only positive results are published.  So if something  works 1/10 times (within the statistical rate of false positives in many cases) the literature will not reflect this and lead to many more people wasting their time trying to reproduce it.

Unfortunately, I think these issues are really enhanced by the whole system pushing us to publish only "novel", "high-impact" research rather than recognizing that science is really built on slow and incremental improvements in our understanding.     

 

youllneverwalkalone

With respect to data analysis, I transitioned completely to an open environment (R) and increasingly provide the code as appendix to the papers, or at least make it available upon request.

I don't think I did it in response to any top-down pressure per se, mostly I found it very helpful when others did it and started doing it myself.

I haven't been as forthcoming with making data publicly available but I guess it will become part of my routine in the near future as many funding agencies where I am (Europe) increasingly require it.

Also, +1 to everything Kron wrote. In my lab and in my field generally the bar regarding number of experiments (and number of data point per experiment) considered "enough" to go ahead an publish has been getting higher in recent years.

pigou

Quote from: Puget on July 03, 2019, 09:03:33 AM
What do you do to enhance rigor and reproducibility (to use the NIH phrasing) in your research?
I write all my papers in Rmarkdown and make publicly available the raw data (with identifiers removed). Anyone can "compile" my manuscript, which takes them from raw data to manuscript (without journal typesetting) and see the code that produced every statistical result and every figure. Screenshots of all experimental materials go into the appendix.

Part of that was motivated by requests for data years after publishing a paper... it's a pain to go back to old work, so being able to point people to a (well documented) repository is just much easier. After spending years to get a paper published, taking a day to make sure the documentation is up to par is really not that much additional effort.

Quote
What (if any) research practices have you changed since becoming more aware of replication problems?
I now preregister my studies. It doesn't make a huge difference, because the outcomes in my studies and the analyses are pretty self-explanatory. But it's valuable to think through what could go wrong -- and given the extent of p-hacking in the field, it (hopefully) makes readers more confident that this didn't happen here.

I can see how it's even more valuable in fields where people come up with creative reasons for excluding "outliers" or where they run 10 different scales that could be "moderators" of an effect. When you're reporting three-way interactions with 200 people, there's no way you get anything remotely sensible. So as a reviewer, I now just outright reject those papers.

I've now also increased sample sizes substantially. It increases the cost of doing research, but in the long run it saves money and time that'd otherwise have been wasted on effects that aren't actually real. But that's also a lot easier at places that have excellent research support. If I had to spend all my time for a week or two to manage data collection, I'd get nothing done. And at many universities, it'd simply be impossible to get 200-300 respondents in the first place.

Hibush

Thanks for starting this topic. It is rich for discussion, and that discussion is crucial.

I really like the old chestnuts from Chamberlain (Science, Feb 1890) The Method of Multiple Working Hypotheses. With this method the dangers of parental affection for a favorite theory can be circumvented and
Platt (Science, October 1964) Science, Strong Inference. Certain systematic methods of scientific thinking may produce much more rapid progress than others.

Both of these diatribes, published one and a quarter, and  one half century ago talk about the scientist's urge to collect data consistent with their favorite model, and how that practice leads to a lack of rigor, reproducibility and progress. Scientists today are the same.

I try to, and have students try to, lay out all the potential mechanisms underlying the phenomenon of interest. Decide which you are going to distinguish between and then describe how the experimental results will be different if one is true and the other is not. This is the information I want to pre-register. That way, you don't do any hypothesizing after the results are known. (HARKing is another ubiquitous bad practice.)

This practice is pretty effective for both social and natural sciences.

Hibush

Quote from: pigou on July 03, 2019, 10:01:50 AM
Quote from: Puget on July 03, 2019, 09:03:33 AM

What (if any) research practices have you changed since becoming more aware of replication problems?


I've now also increased sample sizes substantially. It increases the cost of doing research, but in the long run it saves money and time that'd otherwise have been wasted on effects that aren't actually real. But that's also a lot easier at places that have excellent research support. If I had to spend all my time for a week or two to manage data collection, I'd get nothing done. And at many universities, it'd simply be impossible to get 200-300 respondents in the first place.

Running a power analysis before collecting data is usually telling. If you have an estimate of your variance, it is not hard. Have the lab member decide how big a difference they want to be able to detect. Then run the stats to find out how many samples you need. If the answer is 200-300, and you can't find or afford that many, you have to rethink the project rather than go ahead with the knowledge that you cannot make any conclusions from the data you will collect.

It is also super helpful for dissuading people from throwing in another treatment. Run that power test with a second treatment and an interaction term and the answer will often let you write the Results before doing the experiment, "there were no statistically significant differences among the treatments".

Kron3007

Quote from: Hibush on July 03, 2019, 10:28:18 AM
Quote from: pigou on July 03, 2019, 10:01:50 AM
Quote from: Puget on July 03, 2019, 09:03:33 AM

What (if any) research practices have you changed since becoming more aware of replication problems?


I've now also increased sample sizes substantially. It increases the cost of doing research, but in the long run it saves money and time that'd otherwise have been wasted on effects that aren't actually real. But that's also a lot easier at places that have excellent research support. If I had to spend all my time for a week or two to manage data collection, I'd get nothing done. And at many universities, it'd simply be impossible to get 200-300 respondents in the first place.

Running a power analysis before collecting data is usually telling. If you have an estimate of your variance, it is not hard. Have the lab member decide how big a difference they want to be able to detect. Then run the stats to find out how many samples you need. If the answer is 200-300, and you can't find or afford that many, you have to rethink the project rather than go ahead with the knowledge that you cannot make any conclusions from the data you will collect.

It is also super helpful for dissuading people from throwing in another treatment. Run that power test with a second treatment and an interaction term and the answer will often let you write the Results before doing the experiment, "there were no statistically significant differences among the treatments".

Yes, I always tell my students that research is a constant case of sacrifice/compromise.  In my field, it is not possible to conduct the experiments we would ideally want to with our resources so we have to choose what to sacrifice to get the most meaningful data we can.  In some cases this means testing fewer levels (concentrations) within treatments, in others it means testing fewer treatments with more levels, and in some cases where the uniformity allows it, we could reduce the numbers per experimental unit.  We have not previously run power tests to inform these decisions though, but I may try it out.  Thanks for the idea. 

   

Puget

Wow, so many great responses already!
Here's what I'm doing so far-- always looking to add more good practices, so I look forward to continuing this discussion.

-Pre-register all papers (we use OSF https://osf.io free and non-profit -- check them out). This applies to papers using existing data sets, not just new data collection-- no one gets to so much as peak at our data files without a preregistration in place. Prevents HARKing. As a side benefit, I've found this to be really pedagogically helpful for forcing the students to carefully think through their hypotheses, methods, and analysis plan up front. I have them write the preregistration as the intro and methods section of a paper, plus more detailed analysis plan section, so it also helps them get papers out faster once they have data.

-Make sure studies are adequately powered to detect plausible and meaningful effect sizes by doing Monte Carlo simulation power analyses. This goes in the preregistration too.

-We do human research, often with time-intensive data collection (following participants over time, and/or long lab visits), and so we can't just repeat studies, but we try to show that results are robust to various semi-arbitrary modeling decisions by running additional analyses with different decisions (e.g., with and without various covariates) to show that it doesn't meaningfully change the results. These all go in online supplemental materials with brief pointers in the main text.

-When my lab fully owns the data, we generally put the final data files and analysis scripts on OSF with a link in the paper, but see challenges with always doing this below.

-As a reviewer, I have gotten very blunt about calling out bad practices, especially small sample sizes with no indication of how the sample size was determined.

Remaining challenges:

For some papers, data are from a larger data set collected with great effort and expense (like big longitudinal studies) from which future publications are planned addressing different research questions, but with many of the same variables. This makes it tricky to publicly post data without giving up the right of first publication on it.This gets even trickier with collaborative studies, when lots of people have a stake in being able to publish off the data. So far, my compromise in these situations has been to say we'll make the data available to anyone wanting to directly replicate, but for other purposes we'll post it once we've published out our planned papers. That seems fair, but I think we need to come to some consensus in my field about how long you should be able to embargo data for non-replication use.

I also have had mixed results getting collaborators on board with preregistrations and other good data practices. Generally, younger researchers are on board, but some more senior folks still don't see why they should have to change the the way they've always done things. I don't think they will change until forced to by journals and funding agencies honestly.
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

Volhiker78

These are all very good practices Puget.   I work as a statistician for some very large observational studies and over time, I've developed the following practices. 

1.  Create publication specific databases (SAS in my case) which contains only de-identified data used for the publication.  This makes it easier to control data updates, and makes it possible to publicly share after publication comes out without including data not in the publication.

2.  Develop publication specific analysis plans prior to analysis.  This contains what I consider the primary analysis and secondary analyses anticipated for publication.

3.  If possible, I keep myself blinded to the key factors that I am examining until last possible moment.  So, if I am examine potential covariates or data transformations, I decide these prior to examine the key variables, like drug exposure.   

4. I update the analysis plans if changes are made post unblinding. 

I do these primarily for my own reference in the future but wouldn't have a problem sharing these thing with journal reviewers or other outside researchers. 

Puget

Sounds very similar to what we do Volhiker-- if you aren't already, you might want to consider officially preregistering those analysis plans rather than just making them for yourself. You don't have to use any particular format-- it can be very flexible.

Even if you trust yourself not to engage in motivated reasoning (and I'm a psychologist so I know not to trust myself), I've found this quite useful for suspicious reviewers who are quick to call HARKing -- give them the preregistration link in the MS and you don't have to defend how you really did predict that interaction or plan to include those covariates or whatnot.

We are also very transparent about it in the manuscript when we deviate from the analysis plan, or when something is exploratory. Sometimes things do change between preregistration and analysis. I was just talking about this with one of my grad students today-- she found some new papers that convinced us we really should break down a measure into sub factors and control for another variable, which wasn't in her preregistered analysis plan, so she'll need to note that in the paper when she says the hypotheses were preregistered and explain why we changed the analysis plan. I've indoctrinated her so thoroughly that I had to convince her his really was OK-- that the goal is transparency and good science, not being a slave to a plan you put in place months ago when you've learned new things since.
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes