# Effective tools for computer systems research

Open review version. Please leave feedback!

Give me six hours to chop down a tree and I will spend the first four sharpening the axe.

Abraham Lincoln

# Introduction

Research is messy. Our body of knowledge is scattered across countless journals, presentations, blog articles, and tweets, making it difficult to get up to speed in a field. Data collection often requires numerous iterations and is frequently insufficiently documented. The subsequent analysis of data is sometimes powered by fragile code and obscure plotting systems. The resulting research paper is originally called paper.doc, then paper-new.doc, paper-new-2.doc, paper-final.doc, and eventually paper-final-FINAL.doc.

It doesn’t have to be like that. A well-structured research project is not only possible but well within your research – as long as you know how. That’s what this book is about. The following chapters introduce effective tools to do research; in particular related to organising, versioning, reading, writing, programming, visualising, automating, and communicating. These tools are software, processes, or of cognitive nature.

In addition to discussing these tools, I will explain why they are effective. You will realise that logging your time will give you freedom, good writing is the opposite of academic writing, and having a Twitter account isn’t all about posting cat pictures.

Building an effective tool set is a significant return on investment in terms of time, sanity, and research quality. You will save time by automating parts of your research pipeline and putting them under version control; you will keep your sanity by having the peace of mind that your pipeline is robust; and your research will increase in quality because you’re now less likely to make unnecessary mistakes.

## Who is this book for?

I wrote this book primarily for new graduate students in computer systems.The field of computer science consists of “theory” and “systems.” I myself am working in systems, and you will find this book the most useful if you are a systems researcher – for example in a field like security, programming languages, or networking.

If this includes you, the entire book should be of interest. The secondary audience is people who program, write in LaTeX, or do research – basically anyone in a STEM field. This crowd will find a subset of this book useful.

## Why am I writing this book?

I had little research experience when I started my Ph.D. I faced a steep learning curve in the first couple years and research frequently felt overwhelming. In addition to reading hundreds of papers and trying to carve out your own area of research, you have to teach, take courses, and adopt best practices in research. I’ve always been curious about how other people work and cope, so I wrote the book that I wanted to read as a young student.

Perverse incentives in research place too much value on the number of papers and citations. As a result, people cut corners to maximise their research output – frequently while sacrificing rigor. Poorly documented workflows and sloppy code can lead to mistakes that jeopardise the correctness of a project. By adopting strict and effective workflows, we can minimise these mistakes and save time.

An unfortunate amount of knowledge in a scientific field is implicit, meaning that it’s rarely spelled out. Pinker calls this phenomenon the “curse of knowledge” (Pinker 2015, chap. 3): Years in a field make it difficult to put yourself in the shoes of a newcomer. Examples of (mostly) implicit knowledge are the reputation of conferences, collaboration etiquette in research projects, and how to organising your time. People do write and talk about these topics, but you may have a hard time finding a course or book that teaches how to go about these topics. In this book, I spell out aspects of research that don’t see much elaboration elsewhere – even if it means that parts of this book may seem obvious to you.

## How should you read this book?

All chapters are self-contained, so dive into whatever chapter appeals to you. Of particular importance is the chapter on versioning, which is referenced several times in subsequent chapters. Finally, this is a hands-on book and I strongly encourage you to read it while in front of your laptop – ideally with an open terminal. The retention rate is significantly higher if you put into practice what you read in this book.

# Organising

A combination of stubbornness and luck got me through my Ph.D. and postdoctoral training without a todo list or schedule. Each day, I would simply continue work where I stopped the day before. My lack of organisation didn’t get me in trouble because for the most part, I was involved in only one or two projects at a time, which was manageable. This changed after my postdoc. I suddenly found myself juggling software projects, monthly reports, research papers, and blog posts; all on tight deadlines. I had no choice but to become more organised.

You may – like me – get away with poor organisational skills during your Ph.D. but why not improve these skills before it’s absolutely necessary? Why not become more efficient in the process, and also prepare for your post-Ph.D. life, which will most likely be more demanding and require juggling several projects in parallel? This chapter discusses organisation tools and behavioural hacks that help you stay on track and make steady progress. After all, research is a marathon and not a sprint.

## Incorporating new skills

The incorporation of new organisational tools requires forming new habits. When adopting new habits – be it organising, exercising, or eating healthily – you may find that you can keep up the new habit for a few days but then drift back into old habits. In his book Atomic Habits (Clear 2018), James Clear provides insight into why that is: key to forming good habits (and eliminating bad ones) is to i) create an obvious cue to trigger an action, ii) make the action’s outcome attractive, iii) make the action easy to perform, and iv) create a satisfying outcome.

## Curate a todo list

As a Ph.D. student (and even as postdoc), I spent most of my days working on very few tasks – so few, in fact, that I was able to keep my todo list in my head. Most days, I worked on moving my research project forward, only interrupted by the occasional paper review, presentation, or work with students. I had few tasks and deadlines to keep in mind at any given time. Once I transitioned from my postdoc into the “real world”, I quickly realised that this had to change. I suddenly found myself having to push forward several small projects: research papers, bug fixes in complex code bases, organising workshops, applying for grants, and analysing data sets. It was no longer possible to keep my todo list in my head, so I started experimenting with tools. Console tools proved a bit too cumbersome while web tools proved too inconvenient and clunky, so I eventually ended up maintaining a simple text file with my todo list.

In essence, a todo list helps you keep track of tasks that you need to get done. As I mentioned before, the trick is to incorporate it into your workflow seamlessly. If adding a new item to your todo list involves spending thirty seconds finding todo.docx on your hard drive, waiting five seconds for Microsoft Office to open, and another three seconds to scroll to the end of the file to add a new task, you will soon give up.

If you feel the need to curate a todo list but find yourself unable to do so, work on minimising friction. First, make it so that you can open your todo list as quickly as possible. For example, configure a keyboard shortcut that opens your todo file. This way, if a new task is assigned to you during a meeting, you should be able to add it to your list within five seconds. Once a new task is in your list, you can forget about it, freeing you of the cognitive load of having to remember the task.

If you work on multiple devices and need your todo list on all these devices, you will have to sync it somehow. In this case, browser tools may be your best bet because they can take care of the syncing part for you. Manual syncing is out of the question because it’s not sustainable and you will wind up with out-of-sync todo lists.

The todo list format I eventually adopted is a markdown-formatted text file which is in the same file as my work log (see the section on work logs. Some tasks are more pressing than others. To reflect this, I use three sections, “today”, “this week”, and “eventually”. Here is an excerpt of what my todo list currently looks like:

# TODO

## Today

* work on presentation for hagenberg

* write monthly team report

* figure out how to move forward with #32126

## This week

* finish python 3 port of bridgedb (#30946)

* write introduction and impacts section for ttp grant

* read salmon research paper (#29288)

## Eventually

* refactor emma with dcf's feedback (#30794)

* wrap up expired /keys issue (#17548)

* come up with a solution to bridgedb's broken captcha (#24607)

* look into moat "password" idea (#28015)

When I finish a task, I move it from my todo list to my work log, which is in the same file, so it’s a matter of literal seconds. This minimises friction and renders the curation of my todo list easy enough that I actually stick with it.

Again, it does not matter what I do. The best todo list is one that works for you. I’m giving you an idea of my workflow in the hope that it helps you discover what works for you. My workflow is heavily centered around text files and command line tools. You may be more of a browser person, in which case a pinned browser tab may work significantly better. Experiment with different workflows until you found one that you can sustain.

A habit that I picked up relatively recently, after reading Cal Newport’s excellent book Deep Work is to plan my day (Newport 2016). Each morning, right after I turn on my laptop, I take a look at my todo list and derive a plan for the day, based on 30 minute blocks. I draw these blocks on my tablet but pen and paper work just as well.

I used to come to the office each day without having a clear idea of what needed to get done. I would typically continue work where I left the day before, or turn my attention to whatever seemed the most urgent. This approach may suffice if most of your time goes into a single project whose details you can keep in your head but it falls apart in the face of more complex responsibilities.

You may think that a detailed plan for each day impedes your creativity. In fact, it’s the opposite. By spending a few minutes planning your day, you reduce your cognitive load throughout the day. You won’t have to think about what to work on next, or when it’s time to switch tasks. You already took care of that in the morning, leaving the rest of the day for deep thinking with minimal context switches.

Perhaps the biggest benefit of a planned day is that it helps you stay on track. I have the annoying habit of polishing finished work more than is necessary or even useful. Having a daily plan in front of my nose serves as a reminder that the perfect is the enemy of the good, and that many other tasks are still waiting to get done. This helps me move on to the next task quickly, and be more productive throughout the day. Higher productivity means more happiness. At the end of the day, I feel that I have accomplished what I wanted, making it easy to get out of “work mode.” Whenever I feel unaccomplished, I find it difficult to leave work mode because I keep thinking about unfinished tasks. Needless to say, this work is far from productive and only prevents me from relaxing and recharging my batteries. A detailed plan for the day is a good antidote and can help you draw a clear line between your work and personal life.

Do you know the feeling of taking a break from work, only to catch yourself an hour later watching obscure YouTube videos? Or the feeling of having spent a full day of work but feeling like you have little or nothing to show for? It’s as if the day passed and you accomplished nothing? The solution to these problems it to establish a tight grip on your most prices possession: your time.

I use Time Tracker on Debian Linux. This lightweight tool lives in my system tray, allowing me to quickly open it and take note when I’m switching from one task to another. Each time I switch between tasks,Examples for tasks are “answering email,” “changing database API,” “reading research paper,” and so on. Tasks like “writing” or “programming” are likely too general while tasks like “adding second paragraph to introduction of research paper” are too specific.

I open the Time Tracker tool and jot down what I’m going to work on next. On a typical day, I end up with five to ten tasks. At the end of the day, I know exactly what I did, and can contrast it with what I was supposed to do.

Tracking your time at such granularity may feel oppressive and stressful. After all, it’s yet another thing to remember and worry about. That’s exactly what I thought until I started doing it but I learned that policing yourself helps you stay focused. Tracking my time helps me stay on track. It’s easy to feel busy all day long, without really getting anything done. One can spend several hours going over meeting notes, mulling over the next email, or stressing about all the things that need to get done. Despite feeling busy all the time, your output may be little. Keeping track of what exactly I’m doing throughout the day helps me notice when I’m actually productive. I actually am productive when I finish a handful of well-defined tasks throughout the day. If you are working on a big project, try to split it into tasks, so you can accomplish a handful of them each day.

Part of my day job consists of working on development tickets for sponsors. These tickets consist of software bugs, feature requests, or small projects. My employer, The Tor Project, is mostly funded through grants, and we need to have a good understanding of how much time a specific development task takes. How long does it take to set up a testbed to evaluate a new pluggable transport protocol? One hour? A day? A week? It’s important to have both experience and data for time estimation because our intuition isn’t always the most reliable predictor. I use my time tracker to record for each bug tracking ticket how many hours it took for me to work on it, which I then compare to the hours we projected to be working on it. Over time, my estimates became closer and closer to how much time I really needed–minus the occasional outlier, obviously.

I am a big advocate of keeping a log of what I have accomplished throughout the day. Did you finally manage to finish the introduction of your latest research paper? Mention that in your log. Did you finish refactoring the data processing pipeline in your research prototype? This should go straight to your log. Right after I complete a task worth writing down, I spend approximately five seconds adding it to my log and then move on.Occasionally, I forget to add a completed task to my log right away. I then add it later, or sometimes even the day after.

I don’t bother getting punctuation or even grammar right.

On a typical day, I jot down somewhere between five and ten tasks in my log. I don’t log every single email I write but I do sometimes log emails if they’re both lengthy and important. Here’s what my Oct 14, 2019 work log looked like:

• 2019-10-14
• filed #32064 for improved search results for “download tor” and to incorporate gettor links in our website descriptions
• replied to [redacted] and asked him if he’s willing to run default bridges for orbot
• created wiki page to formalise our “support ngos with private bridges” process
• thought about design for system that can scan the reachability of PT bridges (#31874)
• created summary of obfs4 work for quarterly race report and updated our obfs4 ticket (#30716) with current project status
• reviewed #31384 (snowflake.tp.o language switcher)
• reviewed #31253 (webext packaging target)
• responded to email with points worth communicating at otf summit
• started working on #17548 (deal with expired pgp keys)

You can see that my phrasing is rough around the edges but that’s okay: You are the primary consumer of this log and you will likely remember what you meant after the fact. You will interact with your work log several times a day, so minimise the friction of adding tasks to it. My work log is always open on a virtual desktop, so I don’t spend any time opening it. I also added a shortcut to my editor, vim, to quickly add today’s date to the bottom of the file. The format of my work log is markdown, which facilitates conversion into other document formats such as HTML or PDF.As always, do what works best for you. I’m a terminal person and enjoy fast, lightweight, and robust console tools. You may be more of a browser person, in which case it’s worth looking at web services that help you log your progress.

Using pandoc, converting a markdown-formatted file to PDF is as simple as running:

pandoc log.md -o log.pdf

As a student, I would take my work log spanning the past month and send it to my advisor at the end of each month. He appreciated seeing what I’ve been up to – at a level of granularity that was neither too coarse nor too detailed. Sharing your work log with your advisor also serves as insurance: your advisor won’t be able to ever complain that he or she was not kept in the loop.

Actually writing down and seeing what you have accomplished throughout the day can be surprising – in both a good and bad way, depending on how much you have accomplished. The importance of a progress log is that it allows us to police ourselves. If I see one or two low-progress days in a row, I realise it’s time to make a conscious effort to improve my productivity. Progress logs make it less likely to drift into a slump and perform poorly for many days or even weeks without noticing – something that happened several times throughout my Ph.D: I wasted many weeks going down rabbit holes, losing sight of the big picture. I mulled over attractive research ideas that were ultimately infeasible but I was too stubborn to give up on the idea. I was no longer on track and I didn’t realise that I had to take a step back to re-evaluate my research direction. Taking a look at your work log makes it easier to realise when you’re off track and by sending your work log to your advisor, your advisor should also be there to help you.

Note that time tracking and a work log are very similar but fulfill different goals. Time tracking allows you to police yourself on a micro scale while a work log polices you on a macro scale. It’s possible to be on the right track but spend a significant part of your day watching YouTube videos (a time tracker would reveal this bad habit) or be efficient in your day-to-day business but not spend time doing the right things (ideally, a work log would help you see this).

## Take notes of meetings

I had the privilege of collaborating with 25 people throughout my research career. Several of the projects I was involved in were not led by me, so I took a back seat. In some of these projects I was surprised by the lack of notes during meetings. Collaborators would meet and discuss projects as I was used to, but nobody took notes (at least not for the entire group) and the implicit expectation was that everyone would remember what was said and what was left to do. Needless to say, this expectation didn’t always work out. Throughout the next one to two weeks, people forgot what was discussed, only to end up discussing some of the very same topics at the next meeting. Besides, misunderstandings along the lines of “wait, I thought you were supposed to do this?” happened.

You can avoid these issues by consistently taking notes. Before each meeting begins, there must be a designated note taker. In fact, multiple people can take notes simultaneously in a Google Document or a Riseup Pad. The note taker(s) jot down key points during the meeting and todo items for each person. At the end of the meeting, these notes are then sent to all participants. If anybody disagrees with any of the notes that were taken, they should speak up. This way, all collaborators will be on the same page, there is a written record of what was discussed, and it’s also easy to go back in time to find out in what meeting a certain task was discussed. I can guarantee you that your collaborators will love you for taking notes.

Note taking isn’t just for high-stakes meetings with important collaborators. I take notes almost every time I am interacting with somebody – including with my advisor, back in my Ph.D. days. I created a simple shell function that facilitates the creation of a new file for each meeting. I simply type meet alice into a terminal, and the command automatically creates a new file, 2019-12-21-alice.md, and opens it in my text editor. Here’s the script, which you can add to your ~/.bashrc:

meet () {
d=date -u '+%Y-%m-%d'
file="${HOME}/doc/meeting/${d}-${1}.md" vim "$file"
DOCUMENTS=$(wildcard *.tex) all: pdf pdf:$(DOCUMENTS) $(FIGURES) GS_OPTIONS=-dPDFSETTINGS=/prepress rubber -f --pdf -Wrefs -Wmisc$(PAPER)

clean:
rubber --clean $(PAPER) The environment variable GS_OPTIONS ensures that all fonts that the paper uses are embedded, so the pdf looks the same on each machine, no matter what fonts are installed. This is a requirement of many conferences and generally best practice. When using this Makefile, make sure that the indented lines containing the two rubber commands must be prefixed by a tab character and not by spaces. Take a look at chapter TBA to learn more about creating Makefiles. Makefiles are powerful and excel at tasks that involve repeated processing of files. I use a Makefile to compile this book from the markdown format to HTML, epub, and pdf; and also to automatically publish new drafts of the book. The Makefile’s target is index.html–the HTML file I want to create. The prerequisites are book.md, pandoc.css, and references.bib–the source files that are necessary to produce the HTML file. Finally, the recipe is an invocation of the tool pandoc, which converts my markdown file to an HTML file. all_input = book.md pandoc.css references.bib metadata.xml html_output = index.html epub_output = ebook.epub all_output =$(html_output) $(epub_output) publish_files = index.html pandoc.css img publish_dir = ~/web/nymity.ch/book pandoc_flags = --toc --standalone --css=pandoc.css --bibliography=references.bib --filter pandoc-citeproc all:$(all_output)

$(html_output):$(all_input)
pandoc $(pandoc_flags) book.md -o$(html_output)

$(epub_output):$(all_input)
pandoc $(pandoc_flags) --epub-metadata=metadata.xml book.md -o$(epub_output)

.PHONY: clean
clean:
-rm -f $(all_output) .PHONY: publish publish:$(html_output)
@cp -r $(publish_files)$(publish_dir)
~/web/nymity.ch/deploy_website.sh

Whenever I added some more content, I type make, which compiles the given source files into an HTML file, which I have open in my browser. If I type make and nothing has changed since the last build, I see:

$make make: 'index.html' is up to date. A Makefile can also contain rules that are not about compiling input into output files. To share drafts of my book, I upload it to my personal web server. This involves copying the relevant files into a directory that contains my websites, and then invoking the script that syncs web content from my laptop to my web server. All of this happens by simply running make publish. If the book’s output formats currently don’t exist, make will first compile them (hence the prerequisite on $(book_output)). Then, an invocation to cp copies the book’s HTML files to another directory on my laptop and, finally, I invoke the script that uses rsync to sync all files to my web server.

If you are no fan of command line tools, you can still benefit from LaTeX by using one of its online development systems. The tool Overleaf has often been popular among my collaborators.

### A LaTeX template

Below, you can find a LaTeX template that I use for research papers. When submitting a paper to a conference, you typically have to use the conference style – which you can simply add to the template – but you may also have to change or remove parts of the template, depending on how restrictive the conference style is.

Note that \input{introduction} is replaced with the content of introduction.tex. I find it convenient to outsource sections to separate file because it makes your paper easier to manage and it also helps with version control if multiple people are working on a paper.

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[scaled=0.8]{beramono}
\usepackage[T1]{fontenc}

% For pretty tables.
\usepackage{booktabs}
% Also for pretty tables.
\usepackage{multirow}
% For using colours.
\usepackage{xcolor}
% For clickable links and back-references in the references.
\usepackage[pagebackref=true]{hyperref}
% For smart spacing in custom commands.
\usepackage{xspace}
\usepackage{amsmath}
% For embedded figures.
\usepackage{tikz}
\urlstyle{tt}

% Bibliography.
\usepackage[backend=biber,backref=true]{biblatex}
\bibliography{literature}
\renewcommand*{\bibfont}{\footnotesize}

% Add custom text right before backreferences in literature.
\renewcommand*{\backref}[1]{}
\renewcommand*{\backrefalt}[4]{
\ifcase #1
No cited.
\or
(Cited on p.~#2)
\else
(Cited on pp.~#2)
\fi}t

\definecolor{darkblue}{rgb}{0,0,0.4}
\definecolor{lightgray}{rgb}{0.93,0.93,0.93}

\newcommand\author{Alice and Bob}

\hypersetup{
urlcolor=darkblue,
citecolor=darkblue,
pdftitle={\title},
pdfauthor={\author},
pdfkeywords={foo, bar},
}

\begin{document}

\input{introduction}

...

\printbibliography

\end{document}

### Pre-submission paper checks

Conferences and journals almost always have specific requirements that paper submissions need to satisfy. It’s frustrating to have your paper rejected for unnecessary reasons like formatting violations, so it’s a good idea to spend five minutes checking the conference’s requirements before pressing the “submit” button.

• Make sure that your paper is within the page limit. The page limit sometimes includes and sometimes excludes references or appendices, so read carefully.

• LaTex shows broken references as question marks. Do a Ctrl + F for the string [?] to find broken references.

• Make sure that all fonts were properly embedded in your pdf. On Linux, I use the tool pdffonts which is part of the Debian package poppler-utils. I run it as pdffonts file.pdf and it displays a column called “emb,” which shows if a given font is embedded or not. While using pdffonts to write this paragraph, I realised to my dismay that one of my old papers did not embed all of its fonts:

$pdffonts Winter2012a.pdf name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- GJYVBN+NimbusRomNo9L-Medi Type 1 Custom yes yes no 100 0 NLMFQI+NimbusRomNo9L-Regu Type 1 Custom yes yes no 101 0 XNJNRQ+NimbusRomNo9L-ReguItal Type 1 Custom yes yes no 102 0 ZZEWFV+CMSY10 Type 1 Builtin yes yes no 103 0 UIPGCJ+CMTT8 Type 1 Builtin yes yes no 127 0 Helvetica Type 1 Custom no no no 174 0 Helvetica Type 1 Custom no no no 180 0 HNYWOO+StandardSymL-Slant_167 Type 1 Builtin yes yes no 203 0 JHYTSG+CMR10 Type 1 Builtin yes yes no 204 0 CUJHND+CMMI10 Type 1 Builtin yes yes no 205 0 ZapfDingbats Type 1 ZapfDingbats no no no 211 0 Helvetica Type 1 Custom no no no 212 0 Helvetica Type 1 Custom no no no 218 0 XEQPPW+CMTT10 Type 1 Builtin yes yes no 242 0 ## Git integration LaTeX files are all text files, which makes them prime candidates for version control. I recommend putting all your LaTeX source files into a git repository.It doesn’t matter if you prefer subversion, CVS, or mercurial over git. What matters is that you use some sort of version control. I like git because it has emerged as the most popular system and with that comes great documentation, tooling, and most people you collaborate with will have at least some understanding of git. Having your paper under version control has several advantages: • No writing is ever lost. Whatever you remove during editing is part of git’s history and can always be recovered. • You can easily determine the difference between two versions of your paper, making it easy to produce a pdf that highlights differences. • You can tell who changed what. ### Use tags for milestones A specific git commit can be assigned a “tag,” which is effectively an arbitrary label. Git tags are often used for version numbers. Whenever you publish a new version of your software, you assign the latest commit a tag like “0.2.4.” It doesn’t have to be version numbers though. I like to tag important milestones of my writing, for example whenever I submit a paper to a conference, or to the arXiv, or when I publish the final camera-ready version of a paper. You can even assign a tag to remember when you sent your paper to your advisor for feedback. * 5de077a - (tag: ndss17-camera-ready) added cs to my email (3 years, 7 months ago) <laurar> ... * 2cd29b1 - (tag: arXiv-resubmission-1) fixed last paragraph of internet scale section based on corrected plots (3 years, 9 months ago) <laurar> ... * fabf1e3 - (tag: arXiv-submission) Turn passive into active voice. (3 years, 10 months ago) <Philipp Winter> ... * 2187ef7 - (tag: NDSS-submission) Minor style harmonization and spelling fixes. (3 years, 11 months ago) <Philipp Winter> ### Learn who changed what With multiple people working on the same project, you will occasionally notice mistakes in your writing. Some of these mistakes may require discussion and instead of asking all your collaborators who’s responsible for a given piece of writing, you can find out yourself, by using the git’s “blame” functionality. It’s as simple as running git blame FILE. The output is the text file and for each line you can see when it was last changed, by whom, and as part of what commit. ### Help git do its job Remember to make one change per commit. Here are a few examples in the context of research papers: • Fixing one or more typos. If somebody is proof-reading an entire paper, it’s fine to have a single commit that fixes many (or all) typos in the paper. • Add a reference. Many claims need to be supported by references. Such a commit may add a new reference to the BibTeX file and then reference it in the corresponding LaTeX file. • Rephrase a paragraph or section. You may not like the way a paragraph (or entire section) is phrased. The action of rephrasing this paragraph or section should go in one commit. If you want to rephrase several pages worth of writing, consider using multiple commits. • Add more writing. Adding a coherent argument, paragraph, or section should go into a single commit. Adding two two independent paragraphs two separate sections should go into two commits. • Delete text to meet a page limit. Papers must sometimes be trimmed to meet a page limit. Unless it severely cripples the paper, it’s fine to do this in a single commit. Note that making small changes is not always possible or reasonable. As you are rewriting a paragraph, you may realise that the rewrite only makes sense if you also rewrite the paragraphs before and after. This is fine. The above recommendations are just that: recommendations. I personally find it helpful if paragraphs of text are broken into several lines spanning a maximum of 80 characters, instead of a single line of text. This makes it easier to inspect commit messages and understand what change was made. Consider the following example: @@ -1 +1 @@ -This is a paragraph that consists of a single, continuous line of text. Such long lines can make it cumbersome to determine what has changed in a lengthy diff. Instead, consider breaking a single long line into multiple lines that end at, say, 80 characters. +This is a paragraph that consists of a single, continuous line of text. Such long lines can make it cumbersome to determine what has changed in a lengthy diff! Instead, consider breaking a single long line into multiple lines that end at, say, 80 characters. Only a single character changed in this paragraph, which is formatted as a one line. It’s difficult to see what changed because the line is so long. @@ -1,4 +1,4 @@ This is a paragraph that consists of a single, continuous line of text. Such long lines can make it cumbersome to determine what has changed in a lengthy -diff. Instead, consider breaking a single long line into multiple lines that +diff! Instead, consider breaking a single long line into multiple lines that end at, say, 80 characters. Here, the same paragraph (and the same change) is formatted as separate lines. It’s easier to see what character was changed in this commit. # Communicating Regardless of what research you do, a substantial part of your job will be communication; mostly with your peers, but ideally also with the general public. We communicate constantly, by writing papers, sending emails, talking to advisors, presenting our work, and by complaining on Twitter. Being an outstanding researcher goes a long way but to truly excel, we have to also master communication. Effective communication creates numerous opportunities by 1) exposing your research to people who would otherwise not see it, 2) saving time, 3) “selling” your work, and by 4) earning the respect of your collaborators. In this chapter, I will encourage you to create project pages, publish pre-prints, present effectively, engage in popular science writing, and use social media to your advantage. Regarding the more “intimate” communication with your peers, this chapter also goes into socialising, manage your collaborators, proper email etiquette, picking the right communication mode, and the reasons for communicating openly. ## …with the world ### Project pages Have you ever stumbled upon a promising research paper that mentions that its source code is available upon emailing the authors? Only to find out that the authors’ email addresses are no longer available? Or they don’t respond to your email? Or they did get back to you but cannot find their source code anymore? The main output of a research project is the resulting scientific paper and once it’s published, there is little incentive for authors to do more. Early on in my Ph.D. life, I made it a habit to create project pages for almost every research project I have ever been involved in, including: The workload in research can be overwhelming and having to take care of yet another part of a project may sound daunting but the creation of a project page doesn’t take much time – maybe one afternoon, if you take your time. Once you have a template, you can re-use it for your next project, minimising the marginal cost of each new project page. I recommend that project pages have at least the following sections: • Project summary: Start with a paragraph that summarises your project. Similar to an abstract, it should convey (i) what problem your project solves, (ii) how it solves the problem, and (iii) what the results are. Try to write the project summary for a broad audience; write it the way you would explain your research to somebody in another department, or to somebody in the grocery store. In other words: use simple language and avoid jargon. • Datasets: Does your research come with a dataset? If so, your project page should link to the data. You don’t need to host datasets yourself as this can be difficult for large datasets. Consider using the Internet Archive to archive your dataset and have your project page link to your Internet Archive page. • Code: Your code matters because it allows others to reproduce your work. We therefore have an obligation to publish our code. Don’t ever be embarrassed of your code’s quality because code is never perfect. Nobody reasonable person would every judge you by your code’s quality. As with datasets, there is no need to host code yourself: feel free to link to a GitHub or GitLab repository. • Papers: Being the main outcome of a research project, we should all make our research papers and other write-ups available on our project page. Be sure to make your paper openly accessible instead of linking to a paywalled portal. Research papers behind a paywall are an injustice and prevent less wealthy scientists from engaging in the scientific discourse. If you are worried about legal consequences of publishing a paper that’s not meant to be published: don’t be. I have yet to hear of a single case of a scientist getting into trouble for making available their own work. • Contact information: Consider providing contact information to make it easy for fellow researchers to reach out to you. Try to use email addresses that will still work five years from now – even if this means using your GMail address instead of your university’s email address. I recommend keeping your project pages under your control, so you can edit them whenever you need to. It’s difficult to update the page if it is hosted at university.edu/project/ and you are no longer employed by your former university. At some point I decided to host all my project pages on my own Web server, nymity.ch, which gives me full control but this control comes at a price: responsibility. It is now your responsibility to keep your Web server alive, refresh your domain names and HTTPS certificates. If you want the same control with less responsibility, I recommend hosting your pages on services like GitHub Pages. It is increasingly common to buy fancy domains for project pages, often ending in the desirable “.io” top level domain. There is nothing wrong with that but try to not let these domains expire and your project page disappear. Are you still going to pay the yearly$15 fee for myproject.io ten years from now? If not, then don’t go down that route.

To get you started with project pages, feel free to use the following template that gets you a simple, fast, and decent-looking project page in little time.

<!doctype html>

<html lang="en">
<title>TODO: Page title</title>
<meta charset="utf-8">
<meta name="description" content="TODO: Web page description">
<style>
.toc {
justify-content: space-between;
display: flex;
}
body {
width: 60%;
font-family: sans-serif;
}
</style>

<body>
<div class="toc">
<a href="#overview">Overview</a>
<a href="#writing">Writing</a>
<a href="#code">Code</a>
<a href="#data">Data</a>
<a href="#contact">Contact</a>
</div>

<hr/>

<h2><a id="overview">Overview</a></h2>
<p>This is the project overview</p>

<h2><a id="writing">Writing</a></h2>
<p>An overview of what writing you published.</p>

<h2><a id="code">Code</a></h2>

<h2><a id="data">Data</a></h2>

<h2><a id="contact">Contact</a></h2>
<p>Contact information</p>

<hr/>

<p><i>Last update: YYYY-MM-DD</i></p>
</body>
</html>

One can think of project pages as documentation of a finished piece of work but I prefer to think of them as living documents that evolve as a research project progresses. The earlier you can share information about your work, the better. Research papers are often preceded by workshop papers, posters, abstracts, or presentations. All of these are worth making available early on, on a project page. In fact, a project page can serve as documentation for yourself, to keep track of your project’s output. I am not suggesting to create project pages simply for altruistic reasons; you get something out of it too:

• You get an idea of your audience by taking a look at your Web server logs. I used to regularly check the visitor log of my project pages. It was interesting to see what universities and departments would look at my work. In fact, it was gratifying to realise that anyone at all was interested in reading my work.

• You expose your research to a broader audience. Research papers follow a style of writing and presentation that can be alienating to a general audience. Project pages mitigate this problem. Somebody who would not read your paper may read your project page – and perhaps then decide to take a look at the paper too.

• It signals to potential employers that you go the extra mile and care about the presentation of your work even if you don’t have to.

### Publish preprints

For fear of getting scooped, research projects typically remain confidential until publication of a peer-reviewed paper. Getting a paper through peer review can take many months if not years because it is common for a paper to be submitted multiple times for review. Throughout all this time, your work could have been useful to others.

In a short-lived field like computer science, this antiquated publication model causes frustrating and unnecessary delays. It does not have to be this way. While we don’t get around publishing peer-reviewed papers – it is academia’s currency, after all – we can publish a technical report before the final, peer-reviewed version of a paper is out. If you are still not convinced: Correa et al. (Correa et al. 2020) provide (not yet peer-reviewed) evidence that openly accessible papers are cited more than closed access papers.

Originally created for the publication of physics pre-prints, the arXiv turned into computer science’s most popular pre-print publication platform too. You “publish” your work on the arXiv by uploading your research paper’s LaTeX code (be sure to first remove all cusswords in the comments) and, after a moderator reviewed your submission, your article will appear on the arXiv – typically after one or two days.

Conveniently, the arXiv provides a notification system that informs subscribers about new reports in their area of interest. This means that a non-trivial number of people who subscribe to the field “computer networks” will get a notification after the publication of your new report in computer networks.

A frequent concern about the arXiv is that many conferences don’t allow paper submissions that have previously been published in a peer-reviewed venue. Fortunately, the arXiv is not peer-reviewed, so a report on arXiv typically does not count as “published.” In my field of computer security, all top-tier conferences accept papers that were previously “published” on arXiv. Regardless, in case of doubt, ask a conference’s program chairs to clarify their policy regarding previously published (but not yet peer-reviewed) technical reports.

“But Philipp,” you may ask, "why go through the extra trouble of uploading your report to the arXiv? It’s all about exposure. Once your report is published, many of your peers will come across it – be it over Google Scholar, which crawls the Internet for research papers; the arXiv’s in-house notification system; or other aggregators. Early exposure can result in citations, potential collaboration, or at least people having heard of your work.

### Presenting

A good conference presentation opens doors. Science journalists may approach you to write a popular science article about your work,Or, in the time-tested academic tradition of unpaid labour, they may ask you to do it for them.

people from industry may wonder how one would deploy your research, and other academics may suggest projects to collaborate on. A great presentation can elevate your research from obscure insignificance to something that people talk about. Even if your research is not spectacular, a great presentation sets you apart from other presenters. Take presentations seriously.

Most conference talks I have attended are a missed opportunity. The average academic talk is difficult to follow, poorly structured, and dispassionate. Entire books have been written on effective presenting and I am not going to compete with these books. Instead, I am going to distill my advice into a few key points:

• Rehearse your talks. Some people believe the myth that great presenters are born instead of made. This is wrong. My best talks were the result of numerous (up to a dozen) rehearsals. That’s why they were my best talks. With more rehearsing comes confidence. You will know what to say, resulting in fewer “ehms,” poor transitions, and awkward pauses because you won’t have to try make sense of your own slides. Consider recording yourself to use your voice more effectively, improve your body language, and be mindful of and eliminate fillers like “ehm,” “you know,” and “like.”

• Capture your audience’s attention. Don’t dive right into the research. Try to start with a lighthearted joke, an interesting anecdote, or anything that gets people engaged. I once presented a paper on Sybil attacks and curiously, my name was listed twice on the conference’s list of accepted papers. I used this fact to start my presentation with a joke that got a few laughs.

• Focus on what matters. It is very common for presenters to ramble on about irrelevant details. Keep in mind that your audience is very limited in what it can take away from your presentation. Ask yourself: what are the two or three most important points that I want my audience to remember? Spin your presentation around these points.

• Have a narrative. Every sentence you say should be directly connected to the previous sentence. If you jump from one topic to another without proper transition, you will gradually lose your audience. Even with a proper narrative it can be difficult to follow a talk. Recapitulate occasionally, e.g., by saying “now that we looked at X and Y, it’s time to talk about Z.”

If you would like to learn more, take a look at Patrick Winston’s excellent lecture on “How To Speak”.

A good presentation uses slides sparingly but effectively. Here are my suggestions for optimal slide use:

• Minimise the number of words on slides and avoid clutter. Your audience is going to read what’s on your slides and while they are reading, they cannot pay attention to you. Your slides are supporting material and are not supposed to keep the audience busy reading.

• Use slide numbers, allowing your audience later in the Q&A to reference specific slides.

• Make sure that the font (even in diagrams) is big enough that even people in the last rows can clearly read it. Most presenters get this one wrong.

• When presenting charts, guide the audience. Explain the axes, discuss how one should read your chart, and highlight important insights.

• Optimise your slides for the 16:9 widescreen format, which is now supported by all modern projectors. To be safe, consider exporting a second slide set for the (outdated and increasingly rare) 4:3 aspect ratio.

• Use a sans-serif font (e.g. Arial) and avoid serif fonts (e.g. Times New Roman) because they are optimised for reading large amounts of text. Needless to say, this is not critical advice but I find it useful nonetheless.

Twitter has a (sometimes deserved) reputation for being a time sink fueled by conflict and outrage but in my experience, the platform is all about who you follow. If you follow the right people, you will learn a lot. Over the years, I compiled a set of Twitter accounts that share sharp insights and thoughtful commentary. I find that one can find high-quality discourse on Twitter that’s similar to dinner conversations at conferences. The best thing about Twitter is that you don’t have to pay \$800 worth of conference registration fees to participate in these conversations.

In deciding who to follow, I use the heuristic of following somebody for a few days or weeks and if I get the feeling that I don’t learn much from this person, I unfollow them again. Twitter is as much a marketing platform as it is a discussion platform and some people push the marketing aspect a little bit too far for my taste.

While you’re at it: use the opportunity to follow people outside your field. And by that, I don’t mean somebody in programming language design if you’re in computer vision. Follow people in psychology, economics, or biology. You can learn a lot by observing what problems scientists in other fields struggle with, and how culture and methods differ.

Twitter can be a great option to stay in the loop on various topics:

• Conferences and workshopsFor the non-computer scientist: original research in computer science is typically submitted to conferences instead of journals.

carry with them a reputation. When I was new to research, I did not know that. Eventually, I got a feeling of this (sometimes informal) “ranking” and what conferences carry the most prestige. Listening to researchers talk about conferences will help you get a sense of where you should submit your work to.

• Professors occasionally talk about faculty hiring processes, what they look for in Ph.D. or grant applications, and their opinions on the peer review process. While this knowledge does not necessarily generalise to all of academia, it is still helpful.

• Some researchers openly talk about their paper rejections, which serves as an important reminder that even the people you admire deal with rejection just as much (if not more) than you do. This helps calibrate your perspective.

• By engaging in discussions, you will eventually build a following, allowing you to promote your own work more effectively.

By no means do you need to use Twitter to be successful in your field but controlled use can be an advantage. However, avoid Twitter fights as they make you look like a combative fool to bystanders. Also, make an effort to post interesting and insightful content and don’t just advertise your latest paper.

### Teach the public

Communicating your work does not have to end with your peers. We have a responsibility to make our work accessible to the broader public. To that end, I have published two articles in The ConversationI was not paid to write these articles and have no financial interest in this site. I merely mention The Conversation because I have some experience with it.

. I was originally contacted by an editor who encouraged me to explain my research to the public by publishing an article in The Conversation. The site does not pay its authors but I still experienced it as an interesting endeavour because I have never worked with an editor before. For both articles, I originally came up with a first draft and my editor left plenty of suggestions and advice. After three or four more iterations, the article was at a point where it was ready to be published.

There are many other outlets that encourage scientists to explain their research to the public. Your research group’s or department’s blog platform is another great opportunity to practice these skills.

## …with collaborators

### Socialise

Academic conferences are where one forms new connections and collaborations. Conferences can be intimidating and uncomfortable, particularly for sufferers of the impostor syndrome. You find yourself surrounded by accomplished and smart people, and believe that your research pales in comparison to theirs. I know the feeling.

If you find that approaching somebody at a conference is scary, I recommend finding somebody who can introduce you. That could be your advisor or a common friend. Also, keep in mind that there is nothing wrong with approaching someone during a coffee break. Introduce yourself and start with a question or compliment someone’s research. Most people are flattered to hear that somebody enjoyed (or even just read) their research. If you feel very nervous or anxious, you may spend too much time in your head, focused on your stressful feelings. Try instead to make a conscious effort to focus on the other person. Pay close attention to what they are saying and ask follow-up questions.

While most of the networking at a conference happens in the “hallway track” during breaks, there are more networking opportunities in the evening. People often head out for dinner and drinks, which creates a less formal environment that makes it easier to strike up conversations and meet new collaborators. Consider tagging along with a group to not miss out on this opportunity.

### Manage collaborators

Eventually, you are going to lead a research project. This involves coordinating collaborators, organising meetings, and keeping everyone in the loop on the project’s progress. Things inevitably get messy when people with different personalities, cultures, and communication styles work together. The following tips help make the process more smooth:

• Your advisor is a collaborator too and needs “management.” Advisors differ significantly in their style and range from entirely hands-off to micromanaging their students. To learn more, take a look at Nick Feamster’s excellent blog post on the matter.

• If you want a collaborator to work on something, ask specific questions and provide clear instructions. Don’t expect them to realise how busy you are and offer help – they are likely too busy to notice.

• Keep people up-to-date on the project’s progress. Some people like to use email for this; others schedule regular calls to discuss progress. Consider using email for short updates and have the occasional call when there’s more to discuss.

• Don’t be afraid to express frustration but do so respectfully and with the intent to improve the collaboration rather than assign blame. For example, if half of your team always misses meetings, strike up a conversation on how to collaborate in ways that work for everyone.

• Even more important than expressing frustration is the expression of gratitude. Let your collaborators know when they did a good job! We all love to feel appreciated.

• Conflict among collaborators is a common occurrence. If you find it difficult to resolve a conflict yourself, consider involving your advisor as mediator.

• Whenever there is something to discuss, involve all collaborators by default, unless you have a good reason not to. Your collaborators will feel respected for being kept in the loop. (More on that below.)

• As if all of the above were not difficult enough, the average team consists of researchers from several cultures that have different customs regarding communication. Give people the benefit of the doubt and try to be clear and respectful in your communication.

### Email etiquette

Pick descriptive email subjects that make it clear what your email is about. I occasionally prefix the email subject with “FYI:” or “Action needed:” to let the recipient know that an email can be ignored or does require action.

Good: “FYI: Paper got accepted”

Good: “Action needed: Commit missing code to repository”

• Bad: “Need help with code”
Good: “Please commit missing code to repository”

Try to avoid top-posting when dealing with long and complicated emails because it makes it difficult to follow an email discussion:

I have strong opinions about your email.  You are wrong about X, Y, and Z.

On Tue, Jan 05, 2021 at 07:35:54PM +0000, John Doe wrote:
> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
> incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
> nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
> fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
> culpa qui officia deserunt mollit anim id est laborum.

Instead, try to quote and respond to specific parts of the original email:

On Tue, Jan 05, 2021 at 07:35:54PM +0000, John Doe wrote:
> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
> incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
> nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

This I agree with.

> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
> fugiat nulla pariatur.

I believe we should do X instead because of Y.

> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
> deserunt mollit anim id est laborum.

Well said.

Use To: and Cc: wisely. Put everyone whose attention you require into the To: field and the remaining collaborators into the Cc: field. Create an email alias to make it easy to reach all of your collaborators. For more email writing tips, take a look at Philip Guo’s excellent article.

### Pick the right communication mode

Find the right balance between the least invasive and the most convenient communication method. It may be convenient for you to call your collaborator each time you need something but she may experience this as distracting and invasive.

To discuss complicated research designs, you typically need a synchronous meeting; either a phone call or an in-person meeting. For topics that require less back-and-forth, asynchronous communications methods like email are a better fit. If you need something right now, an instant message or phone call may be the most appropriate.

Also, keep in mind that everyone’s communication preferences differ. Some people enjoy video calls while others prefer texting. Collaboration often requires compromising, i.e. to find communication methods that work for everyone.

Regarding concrete communication tools, Slack (or its free software alternative Mattermost) are useful because they allow collaborators to self-select what communication they want to participate in.

### Communicate openly

Imagine a small research projects consisting of three collaborators; Alice, Bob, and Eve. There are four possible communication channels – assuming nobody talks to themselves:

1. Alice ↔ Bob
2. Alice ↔ Eve
3. Bob ↔ Eve
4. Alice ↔ Bob ↔ Eve

Four collaborators have eleven possible communication channels while five collaborators have a whopping twenty-six possible communication channels!The binomial coefficient (a.k.a. choose n out of k) reveals the number of communication channels among a group of collaborators.

Five collaborators are by no means unusual – in fact, the top four academic security conferences now average five authors per paper (Balzarotti 2020).

The good news is: you don’t have to ponder which one of the twenty-six communication channels you opt for before writing an email. Unless you have a good reason not to, err on the side of inclusion when communicating. That is, include everybody in your email CC list by default. If any one of your collaborators feels overwhelmed by the communication, they can request to be omitted from future correspondence, or simply ignore your emails. Typically, it should be your collaborator’s decision, what to participate in – not yours. In my experience, collaborators appreciate being kept in the loop – even if they rarely respond to email threads.

As a young Ph.D. student, I mistakenly believed that I would do my collaborators a favour by not including them, unless I really needed their help. After all, isn’t everyone busy and has better things to do? This is a fallacy. Collaborators exist to help each other out and generally like to know what’s going on. Give them the opportunity to! Besides, a lack of communication can quickly lead to a culture of distrust when people are left out. Especially junior collaborators will eventually wonder if there are ulterior motives for them being left out.

However, not everything needs to be discussed with all of your collaborators. Do you need your advisor’s signature on a document? Your collaborators won’t care. The same is true if one of your collaborators is unable to log into a machine that you use for experiments. When it comes to the actual research, however, you need to have a good reason to not include somebody.

I know first-hand that it’s often tempting to initiate one-on-one communication; for example when you feel insecure about an idea, and would like to run it by someone before you share it further. Try to avoid this. The more you communicate in the open, the better for you and the project, and your collaborators will respect you for it.

# Acknowledgements

• Thanks to Will Scott for suggesting improvements to the way this book was organised.

• Thanks to Harald Lampesberger for proof-reading and providing invaluable suggestions on improving the content.

# Contact and Support

Send email to phw@nymity.ch.

Balzarotti, Davide. 2020. “System Security Circus 2019.” January 2020. https://s3.eurecom.fr/~balzarot/notes/top4_2019/.

Clear, James. 2018. Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones. Avery.

Correa, Juan C., Henry Laverde-Rojas, Fernando Marmolejo-Ramos, and Julian Tejada. 2020. “The Sci-Hub Effect: Sci-Hub Downloads Lead to More Article Citations.” 2020. https://arxiv.org/pdf/2006.14979.pdf.

Diffie, Whitfield, and Martin E. Hellman. 1976. “New Directions in Cryptography.” Transactions on Information Theory 22 (6). IEEE. https://ee.stanford.edu/~hellman/publications/24.pdf.

Keshav, Srinivasan. 2007. “How to Read a Paper.” SIGCOMM Computer Communication Review 37 (3). ACM. http://ccr.sigcomm.org/online/files/p83-keshavA.pdf.

Newport, Cal. 2016. Deep Work: Rules for Focused Success in a Distracted World. Grand Central Publishing.

Pinker, Steven. 2015. The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century. Penguin Books.

Pollan, Michael. 2009. In Defense of Food: An Eater’s Manifesto. Penguin Books.

Walker, Matthew. 2018. Why We Sleep. Scribner.