Opening up academic research #
In academia it is common to consider research projects secret until publication. Before publication, academics discuss their ideas only with a select few, under implicit understanding of confidentiality. This has several reasons. Primarily, people are scared of getting scooped, that is, have another research group publish first. This is understandable given that the winner typically takes it all. The credit for ground-breaking work goes to the research group that publishes first, and not the one that made the same discovery independently, but was still busy polishing their work. Besides, researchers—like all people—feel uncomfortable about unfinished work that attracts many critical eyes. “Oh lord, what will my peers think about my terrible thinking?” Even if competition and insecurity were not a concern, time would still be. It is challenging enough to get a paper published, but who can afford to curate online resources on top of that?
The secrecy around research projects has always bothered me greatly. I’m interested in more than just research papers, which are just parts of a research project. I want to study other’s code, learn how they process their data and run their experiments. What’s more, research papers typically only talk about what worked out. They don’t talk about negative results, that is, failed experiments along the way. We can learn just as much, if not more, from what does not work. Finally, many researchers—me included—are funded by public money, which means that it is not only our moral, but our professional duty to make our work as useful to the public as we can. We can do better than just churning out papers, many of which are doomed to deteriorate behind paywalls anyway. As part of my last research projects, I have been experimenting with ways to bring more transparency to my work. By transparency, I mostly mean three aspects.
- I want to enable people to give me early feedback. Numerous flaws in peer-“reviewed” research papers could have been prevented if only the right people would have reviewed a key concept. Diverse feedback can also help me steer my research into a direction that makes it more useful to the world.
- I want people to be able to reproduce my work. That is, reproduce it without having to ask me for data, code, or instructions; reproduce it easily, quickly, and successfully. This is a key aspect of science, and the fact that this is missing so often makes me wonder if the poor status quo in computer science really deserves the word “science” in its name.
- I want my work to be accessible outside academia. It’s 2016, and important scientific advances take place not only in universities or research labs. I feel very strongly that whoever has an interest in my work must be able to get all they need to get started themselves.
Naturally, I don’t have all the answers, but I have experimented with a number of ways to address these three points. First, I try to publish my results early and semi-often. Much of my work is on the Tor anonymity network, so I occasionally post to Tor’s mailing lists to present preliminary findings. There are always some people who point out things I have not considered. Once I have written up my work, I submit a copy to the arXiv, to make my work available to the public as early as possible. In parallel, I submit my work to academic conferences. In my field, most big conferences accept papers that are already published as a technical report. Having lost hope in most academic publishers, I am now also creating open access bibliographies for my work. This helps readers hunt down my references without hitting ACM, IEEE, or Springer paywalls.
Working in computer science, I also deal with a lot of code. I try to make it a habit to publish my code on GitHub, with useful documentation. This has worked reasonably well for projects such as exitmap, and I take great pride in knowing that the tool was useful to other researchers. Needless to say, my code is far from flawless, and any professional software engineer will get a chuckle out of it, but what matters is that it’s available, and it’s free. In fact, the lack of public prototypes for systems research is not just a nuisance; it makes it much harder to reproduce work and verify claims, jeopardising the quality of published research.
The third crucial piece for me, after papers and code, is datasets. I have yet to find a convenient way to organise and publish datasets, which is why I am hosting them myself, including CSVs, text files, and bulky tarballs. There is little structure, but at least I have backups and the data is unlikely to disappear. Summing up, my little experience with bringing more transparency to academic research has taught me the following.
- Don’t be afraid of getting scooped. It’s sad that we have to worry about getting scooped in the first place, but it no longer makes me nervous that fellow researchers—my competition, if you will—get to see my work at an early stage. Quite the opposite, because by publishing early, I am “marking my territory” in a way, hopefully encouraging collaboration instead of competition.
- You might get great feedback. I was lucky to have received great feedback from a lot of clever people—much of it I would not have received from formal peer review. I found the diversity of feedback particularly helpful because it was not just academics that shared their opinion; I also received pull request from software engineers, and usability feedback from users.
- You are helping science. Churning out papers at a frantic pace might be in the interest of advancing your career, but not in the interest of advancing science. We do not need more papers, we need better papers. Papers that benefit the world and can stand the test of time. Taking the time to polish your code, data, and writing is a small step in that direction.