Effective tools for computer systems research

Open review version. Please leave feedback!

Give me six hours to chop down a tree and I will spend the first four sharpening the axe.

Abraham Lincoln

Introduction

Research is messy. Our body of knowledge is scattered across countless journals, presentations, blog articles, and tweets, making it difficult to get up to speed in a field. Data collection often requires numerous iterations and is frequently insufficiently documented. The subsequent analysis of data is sometimes powered by fragile code and obscure plotting systems. The resulting research paper is originally called paper.doc, then paper-new.doc, paper-new-2.doc, paper-final.doc, and eventually paper-final-FINAL.doc.

It doesn’t have to be like that. A well-structured research project is not only possible but well within your research – as long as you know how. That’s what this book is about. The following chapters introduce effective tools to do research; in particular related to organising, versioning, reading, writing, programming, visualising, automating, and communicating. These tools are software, processes, or of cognitive nature.

In addition to discussing these tools, I will explain why they are effective. You will realise that logging your time will give you freedom, good writing is the opposite of academic writing, and having a Twitter account isn’t all about posting cat pictures.

Building an effective tool set is a significant return on investment in terms of time, sanity, and research quality. You will save time by automating parts of your research pipeline and putting them under version control; you will keep your sanity by having the peace of mind that your pipeline is robust; and your research will increase in quality because you’re now less likely to make unnecessary mistakes.

Who is this book for?

I wrote this book primarily for new graduate students in computer systems.The field of computer science consists of “theory” and “systems.” I myself am working in systems, and you will find this book the most useful if you are a systems researcher – for example in a field like security, programming languages, or networking.

If this includes you, the entire book should be of interest. The secondary audience is people who program, write in LaTeX, or do research – basically anyone in a STEM field. This crowd will find a subset of this book useful.

Why am I writing this book?

I had little research experience when I started my Ph.D. I faced a steep learning curve in the first couple years and research frequently felt overwhelming. In addition to reading hundreds of papers and trying to carve out your own area of research, you have to teach, take courses, and adopt best practices in research. I’ve always been curious about how other people work and cope, so I wrote the book that I wanted to read as a young student.

Perverse incentives in research place too much value on the number of papers and citations. As a result, people cut corners to maximise their research output – frequently while sacrificing rigor. Poorly documented workflows and sloppy code can lead to mistakes that jeopardise the correctness of a project. By adopting strict and effective workflows, we can minimise these mistakes and save time.

An unfortunate amount of knowledge in a scientific field is implicit, meaning that it’s rarely spelled out. Pinker calls this phenomenon the “curse of knowledge” (Pinker 2015, chap. 3): Years in a field make it difficult to put yourself in the shoes of a newcomer. Examples of (mostly) implicit knowledge are the reputation of conferences, collaboration etiquette in research projects, and how to organising your time. People do write and talk about these topics, but you may have a hard time finding a course or book that teaches how to go about these topics. In this book, I spell out aspects of research that don’t see much elaboration elsewhere – even if it means that parts of this book may seem obvious to you.

How should you read this book?

All chapters are self-contained, so dive into whatever chapter appeals to you. Of particular importance is the chapter on versioning, which is referenced several times in subsequent chapters. Finally, this is a hands-on book and I strongly encourage you to read it while in front of your laptop – ideally with an open terminal. The retention rate is significantly higher if you put into practice what you read in this book.

Organising

A combination of stubbornness and luck got me through my Ph.D. and postdoctoral training without a todo list or schedule. Each day, I would simply continue work where I stopped the day before. My lack of organisation didn’t get me in trouble because for the most part, I was involved in only one or two projects at a time, which was manageable. This changed after my postdoc. I suddenly found myself juggling software projects, monthly reports, research papers, and blog posts; all on tight deadlines. I had no choice but to become more organised.

You may – like me – get away with poor organisational skills during your Ph.D. but why not improve these skills before it’s absolutely necessary? Why not become more efficient in the process, and also prepare for your post-Ph.D. life, which will most likely be more demanding and require juggling several projects in parallel? This chapter discusses organisation tools and behavioural hacks that help you stay on track and make steady progress. After all, research is a marathon and not a sprint.

Incorporating new skills

The incorporation of new organisational tools requires forming new habits. When adopting new habits – be it organising, exercising, or eating healthily – you may find that you can keep up the new habit for a few days but then drift back into old habits. In his book Atomic Habits (Clear 2018), James Clear provides insight into why that is: key to forming good habits (and eliminating bad ones) is to i) create an obvious cue to trigger an action, ii) make the action’s outcome attractive, iii) make the action easy to perform, and iv) create a satisfying outcome.

Curate a todo list

As a Ph.D. student (and even as postdoc), I spent most of my days working on very few tasks – so few, in fact, that I was able to keep my todo list in my head. Most days, I worked on moving my research project forward, only interrupted by the occasional paper review, presentation, or work with students. I had few tasks and deadlines to keep in mind at any given time. Once I transitioned from my postdoc into the “real world”, I quickly realised that this had to change. I suddenly found myself having to push forward several small projects: research papers, bug fixes in complex code bases, organising workshops, applying for grants, and analysing data sets. It was no longer possible to keep my todo list in my head, so I started experimenting with tools. Console tools proved a bit too cumbersome while web tools proved too inconvenient and clunky, so I eventually ended up maintaining a simple text file with my todo list.

In essence, a todo list helps you keep track of tasks that you need to get done. As I mentioned before, the trick is to incorporate it into your workflow seamlessly. If adding a new item to your todo list involves spending thirty seconds finding todo.docx on your hard drive, waiting five seconds for Microsoft Office to open, and another three seconds to scroll to the end of the file to add a new task, you will soon give up.

If you feel the need to curate a todo list but find yourself unable to do so, work on minimising friction. First, make it so that you can open your todo list as quickly as possible. For example, configure a keyboard shortcut that opens your todo file. This way, if a new task is assigned to you during a meeting, you should be able to add it to your list within five seconds. Once a new task is in your list, you can forget about it, freeing you of the cognitive load of having to remember the task.

If you work on multiple devices and need your todo list on all these devices, you will have to sync it somehow. In this case, browser tools may be your best bet because they can take care of the syncing part for you. Manual syncing is out of the question because it’s not sustainable and you will wind up with out-of-sync todo lists.

The todo list format I eventually adopted is a markdown-formatted text file which is in the same file as my work log (see the section on work logs. Some tasks are more pressing than others. To reflect this, I use three sections, “today”, “this week”, and “eventually”. Here is an excerpt of what my todo list currently looks like:

When I finish a task, I move it from my todo list to my work log, which is in the same file, so it’s a matter of literal seconds. This minimises friction and renders the curation of my todo list easy enough that I actually stick with it.

Again, it does not matter what I do. The best todo list is one that works for you. I’m giving you an idea of my workflow in the hope that it helps you discover what works for you. My workflow is heavily centered around text files and command line tools. You may be more of a browser person, in which case a pinned browser tab may work significantly better. Experiment with different workflows until you found one that you can sustain.

Plan your day

A habit that I picked up relatively recently, after reading Cal Newport’s excellent book Deep Work is to plan my day (Newport 2016). Each morning, right after I turn on my laptop, I take a look at my todo list and derive a plan for the day, based on 30 minute blocks. I draw these blocks on my tablet but pen and paper work just as well.

This is an example of my daily schedule. I reserve my mornings for design and development, and other tasks that require intense concentration.

Note

A day without a schedule runs the risk of resulting in yak shaving: imagine you want to fix that annoying bug in your code that has been messing with your experiments. While thinking about how to best fix the bug, you notice that the function that contains your bug has poor documentation. So you spend a moment updating the documentation. While doing that, you realise that your functions follow an inconsistent documentation style, which really bugs the perfectionist in you. So you harmonise the way functions are documented in your code and learn in the process that the documentation tool you’re using has released a new version that makes it more convenient to browse your software’s documentation. However, you operating system doesn’t have the newest packages yet, so you set out to compile it manually. Three hours later, you find yourself hunched over your laptop, covered in sweat, after finally having compiled your documentation tool manually. That bug that you originally intended to fix? It’s still there.

I used to come to the office each day without having a clear idea of what needed to get done. I would typically continue work where I left the day before, or turn my attention to whatever seemed the most urgent. This approach may suffice if most of your time goes into a single project whose details you can keep in your head but it falls apart in the face of more complex responsibilities.

You may think that a detailed plan for each day impedes your creativity. In fact, it’s the opposite. By spending a few minutes planning your day, you reduce your cognitive load throughout the day. You won’t have to think about what to work on next, or when it’s time to switch tasks. You already took care of that in the morning, leaving the rest of the day for deep thinking with minimal context switches.

Perhaps the biggest benefit of a planned day is that it helps you stay on track. I have the annoying habit of polishing finished work more than is necessary or even useful. Having a daily plan in front of my nose serves as a reminder that the perfect is the enemy of the good, and that many other tasks are still waiting to get done. This helps me move on to the next task quickly, and be more productive throughout the day. Higher productivity means more happiness. At the end of the day, I feel that I have accomplished what I wanted, making it easy to get out of “work mode.” Whenever I feel unaccomplished, I find it difficult to leave work mode because I keep thinking about unfinished tasks. Needless to say, this work is far from productive and only prevents me from relaxing and recharging my batteries. A detailed plan for the day is a good antidote and can help you draw a clear line between your work and personal life.

Track your time

Do you know the feeling of taking a break from work, only to catch yourself an hour later watching obscure YouTube videos? Or the feeling of having spent a full day of work but feeling like you have little or nothing to show for? It’s as if the day passed and you accomplished nothing? The solution to these problems it to establish a tight grip on your most prices possession: your time.

I use Time Tracker on Debian Linux. This lightweight tool lives in my system tray, allowing me to quickly open it and take note when I’m switching from one task to another. Each time I switch between tasks,Examples for tasks are “answering email,” “changing database API,” “reading research paper,” and so on. Tasks like “writing” or “programming” are likely too general while tasks like “adding second paragraph to introduction of research paper” are too specific.

I open the Time Tracker tool and jot down what I’m going to work on next. On a typical day, I end up with five to ten tasks. At the end of the day, I know exactly what I did, and can contrast it with what I was supposed to do.

A typical day consists of a handful of tasks I worked on. The numbers next to the tasks correspond to bug tracker IDs.

Note

Tracking your time at such granularity may feel oppressive and stressful. After all, it’s yet another thing to remember and worry about. That’s exactly what I thought until I started doing it but I learned that policing yourself helps you stay focused. Tracking my time helps me stay on track. It’s easy to feel busy all day long, without really getting anything done. One can spend several hours going over meeting notes, mulling over the next email, or stressing about all the things that need to get done. Despite feeling busy all the time, your output may be little. Keeping track of what exactly I’m doing throughout the day helps me notice when I’m actually productive. I actually am productive when I finish a handful of well-defined tasks throughout the day. If you are working on a big project, try to split it into tasks, so you can accomplish a handful of them each day.

Part of my day job consists of working on development tickets for sponsors. These tickets consist of software bugs, feature requests, or small projects. My employer, The Tor Project, is mostly funded through grants, and we need to have a good understanding of how much time a specific development task takes. How long does it take to set up a testbed to evaluate a new pluggable transport protocol? One hour? A day? A week? It’s important to have both experience and data for time estimation because our intuition isn’t always the most reliable predictor. I use my time tracker to record for each bug tracking ticket how many hours it took for me to work on it, which I then compare to the hours we projected to be working on it. Over time, my estimates became closer and closer to how much time I really needed–minus the occasional outlier, obviously.

Log your progress

I am a big advocate of keeping a log of what I have accomplished throughout the day. Did you finally manage to finish the introduction of your latest research paper? Mention that in your log. Did you finish refactoring the data processing pipeline in your research prototype? This should go straight to your log. Right after I complete a task worth writing down, I spend approximately five seconds adding it to my log and then move on.Occasionally, I forget to add a completed task to my log right away. I then add it later, or sometimes even the day after.

I don’t bother getting punctuation or even grammar right.

On a typical day, I jot down somewhere between five and ten tasks in my log. I don’t log every single email I write but I do sometimes log emails if they’re both lengthy and important. Here’s what my Oct 14, 2019 work log looked like:

  • 2019-10-14
    • filed #32064 for improved search results for “download tor” and to incorporate gettor links in our website descriptions
    • replied to [redacted] and asked him if he’s willing to run default bridges for orbot
    • created wiki page to formalise our “support ngos with private bridges” process
    • thought about design for system that can scan the reachability of PT bridges (#31874)
    • created summary of obfs4 work for quarterly race report and updated our obfs4 ticket (#30716) with current project status
    • reviewed #31384 (snowflake.tp.o language switcher)
    • reviewed #31253 (webext packaging target)
    • responded to email with points worth communicating at otf summit
    • started working on #17548 (deal with expired pgp keys)

You can see that my phrasing is rough around the edges but that’s okay: You are the primary consumer of this log and you will likely remember what you meant after the fact. You will interact with your work log several times a day, so minimise the friction of adding tasks to it. My work log is always open on a virtual desktop, so I don’t spend any time opening it. I also added a shortcut to my editor, vim, to quickly add today’s date to the bottom of the file. The format of my work log is markdown, which facilitates conversion into other document formats such as HTML or PDF.As always, do what works best for you. I’m a terminal person and enjoy fast, lightweight, and robust console tools. You may be more of a browser person, in which case it’s worth looking at web services that help you log your progress.

Using pandoc, converting a markdown-formatted file to PDF is as simple as running:

pandoc log.md -o log.pdf

As a student, I would take my work log spanning the past month and send it to my advisor at the end of each month. He appreciated seeing what I’ve been up to – at a level of granularity that was neither too coarse nor too detailed. Sharing your work log with your advisor also serves as insurance: your advisor won’t be able to ever complain that he or she was not kept in the loop.

Actually writing down and seeing what you have accomplished throughout the day can be surprising – in both a good and bad way, depending on how much you have accomplished. The importance of a progress log is that it allows us to police ourselves. If I see one or two low-progress days in a row, I realise it’s time to make a conscious effort to improve my productivity. Progress logs make it less likely to drift into a slump and perform poorly for many days or even weeks without noticing – something that happened several times throughout my Ph.D: I wasted many weeks going down rabbit holes, losing sight of the big picture. I mulled over attractive research ideas that were ultimately infeasible but I was too stubborn to give up on the idea. I was no longer on track and I didn’t realise that I had to take a step back to re-evaluate my research direction. Taking a look at your work log makes it easier to realise when you’re off track and by sending your work log to your advisor, your advisor should also be there to help you.

Note that time tracking and a work log are very similar but fulfill different goals. Time tracking allows you to police yourself on a micro scale while a work log polices you on a macro scale. It’s possible to be on the right track but spend a significant part of your day watching YouTube videos (a time tracker would reveal this bad habit) or be efficient in your day-to-day business but not spend time doing the right things (ideally, a work log would help you see this).

Take notes of meetings

I had the privilege of collaborating with 25 people throughout my research career. Several of the projects I was involved in were not led by me, so I took a back seat. In some of these projects I was surprised by the lack of notes during meetings. Collaborators would meet and discuss projects as I was used to, but nobody took notes (at least not for the entire group) and the implicit expectation was that everyone would remember what was said and what was left to do. Needless to say, this expectation didn’t always work out. Throughout the next one to two weeks, people forgot what was discussed, only to end up discussing some of the very same topics at the next meeting. Besides, misunderstandings along the lines of “wait, I thought you were supposed to do this?” happened.

You can avoid these issues by consistently taking notes. Before each meeting begins, there must be a designated note taker. In fact, multiple people can take notes simultaneously in a Google Document or a Riseup Pad. The note taker(s) jot down key points during the meeting and todo items for each person. At the end of the meeting, these notes are then sent to all participants. If anybody disagrees with any of the notes that were taken, they should speak up. This way, all collaborators will be on the same page, there is a written record of what was discussed, and it’s also easy to go back in time to find out in what meeting a certain task was discussed. I can guarantee you that your collaborators will love you for taking notes.

Note taking isn’t just for high-stakes meetings with important collaborators. I take notes almost every time I am interacting with somebody – including with my advisor, back in my Ph.D. days. I created a simple shell function that facilitates the creation of a new file for each meeting. I simply type meet alice into a terminal, and the command automatically creates a new file, 2019-12-21-alice.md, and opens it in my text editor. Here’s the script, which you can add to your ~/.bashrc:

I take notes in the simple, yet expressive markdown format, which is both expressive and very simple. All my meeting notes are in the same directory, in ~/doc/meetings/. Sometimes, I’m looking for something that was said in a past meeting but I don’t remember in which one, exactly. I then grepThis is as simple as running grep keyword *.

all my meeting notes for a keyword that I remember to be present in the log. Again, this is meant to minimise friction because otherwise I would be too lazy to take meeting notes.

Persevere

Graduate work takes its toll. Long work hours bring with them loneliness and isolation; paper rejections chew on your self worth; witnessing colleagues excel fuels imposter syndrome; and seeing childhood friends buy homes and get children makes you wonder if graduate school really was the right choice. It comes as no surprise that poor mental health is problematically common yet rarely talked about subject.

As a Ph.D. student, I was – and still am, to some extent – struggling with insecurities. I read outstanding papers in my field and immediately got disillusioned by the depth of the work and how it combined concepts I had not even heard of before. How would I ever be able to compete with that?

It is important to understand that your seemingly perfect colleagues are anything but, and often struggle with the very same issues. I was both surprised and relieved to learn that a friend – whose work I greatly respect – mentioned that he, too, struggles with imposter syndrome. I realised that if somebody as knowledgeable and capable as him is plagued by these feelings, it’s significantly more widespread than I thought it was.

Mental health takes place in your mind but a healthy mind lives in a healthy body. There are a few things you can do that greatly affect your mental well being.

  • Maximise your intake of whole foods (broccoli, potatoes, beans, coffee, apples) and avoid processed foods (cereal, pasta, candy, soda). (Pollan 2009)

  • Exercise regularly by making exercise a fixed part of your daily schedule. Don’t try squeeze exercise in between other responsibilities; squeeze other responsibilities around your mandatory exercise. Convince a friend to do the same and hold each other accountable.

  • Go to sleep and wake up at a regular time. Get at least seven hours of sleep. It’s fashionable to brag about how little sleep one gets, and in times of pressure it is easy to believe that sleep is a “waste of time” but the opposite is the case. What may take a tired brain three hours to accomplish can be a matter of 30 minutes for a well-rested brain. (Walker 2018)

  • Get a bit of sun every day. I like to go for a run around noon, or sometimes go for a brief walk while listening to a podcast. When I get back, I’m full of energy, calm, and ready to get back to work.

  • Practice meditation. It occupies only ten minutes of my day and I like to meditate right after waking up, with a hot cup of black tea in my hands. I enjoy Sam Harris’s Waking Up app. Using an app to facilitate meditation may sound ironic but there is merit in guided meditation – even if it’s just a few spoken sentences every other minute.

  • Procrastinate productively. Deep thinking requires creativity. you cannot force that.

Summary

  • Minimise friction by making organisational tasks as quick and pleasant as possible.
  • Experiment with different workflows until you find one that you can sustain.
  • Log your time to become more efficient.
  • Log your progress to become more effective.
  • Take notes during meetings to create a “paper trail” and clearly jot down todo items.

Reading

Graduate work requires an awful lot of reading. Course material, blog posts, textbooks, emails, and most importantly: research papers. This chapter (i) shows that there’s more to reading a research paper than working your way from one page to the next; (ii) presents strategies to organise your reading; (iii) shows how you can learn about relevant new papers; and (iv) explains how you can access papers that are locked away behind paywalls.

How to read a research paper

I spent too much time reading research papers like a novel: cover-to-cover, in the mistaken belief that that’s how I would get the most out of it. I eventually learned that my time is not always best spent trying to understand the minute details of a paper’s method section – especially if I never intend to apply this method myself. Nobody awards you a medal if you fight your way through a paper that lost you on page two already. Sometimes, there is no need to read a paper’s method section at all – you may only be interested in the conclusion, or the section on data collection. Before diving into a paper, know what you want to get out of it. Needless to say, there’s nothing wrong with reading a paper cover-to-cover. I still do it all the time, but understand that this is not universally the best approach to reading a paper.

Throughout your research career, you will take a look at hundreds of research papers. You may not necessarily read them all cover to cover but you will at least skim them. When is a paper worth reading cover to cover? And when is it best to only skim a paper? And what exactly does skimming mean? There are heuristics that help answer these questions and save you time. Sometimes your time is better spent understanding a specific key aspect of a paper – for example its methodIn computer science, researchers curiously started referring to their method as methodology, which is the study of methods. If you are writing a paper that compares and contrasts methods in your field, you should refer to it as methodology but otherwise as method.

– and ignoring the rest.

When opening a paper for the first time, you will read its title. Not all titles are descriptive or provide a good idea of what a paper is about but as long as it sounds vaguely interesting, you want to read the abstract too. Abstracts are generally short, and can be read in one or two minutes. Ideally, an abstract reveals the problem that the paper tries to solve, explains how it solves the problem, and what the results are. A well-written abstract provides enough information for you to decide if the paper is worth diving into.

Next up is typically the introduction in which authors provide context on what research problem they are working (the problem statement) and on why it matters (the paper’s motivation). I often skip the introduction of papers that are in my field because I understand the context and I have already bought into the paper’s motivation. For papers far outside my field of expertise, the introduction can be the most interesting part: I occasionally read the introduction (and nothing else) of cryptography research papers because I cannot be bothered to dig into their proofs and mathematical models. The introductions however help me understand why a topic matters and how it relates to a broader field.

A handful of other sections can separate a paper’s introduction from its “meat.” Many papers have a dedicated background section in which they explain technical concepts that the reader may not be familiar with. A good rule of thumb is: if one is unlikely to encounter a concept in a standard computer science curriculum, it may warrant a few paragraphs in the background section. A section on related work is also common, in which the authors put their own work into the context of existing work in the field. Well-written related work sections are as helpful as they are rare. Many papers approach their related work as thoughtless lists of references without any context. “Person A did this; person B did that; person C did this.” That completely misses the point. A related work section is meant to answer questions like:

  • How does this work compare to similar work that was published in the field?

  • What advantages and disadvantages does this paper have over others?

  • How do papers overlap and complement each other?

What your readers really want to read is: “Person A did X but we decided to do Y. While X comes with stellar performance benefits, we believe that Y provides security benefits that are needed in our threat model. Future work should study a hybrid approach that incorporates both X and Y.”

After several sections on introduction and background, the actual research begins, typically introduced by a “method” section. A method can make or a break a research project. It’s what reviewers pay the most attention to because this is where flaws and biases have the biggest effect. If a given paper is very similar to your own research, you probably want to read its method. In particular, you will want to know if the paper’s method addresses issues that you failed to consider (or the other way around),

Another crucial aspects of a method are its assumptions. Each research project makes assumptions on the format of data, the number of users in a system, the way users interact with a system, the performance of underlying hardware, and so on. Ideally, these assumptions are spelled out explicitly, but sometimes they are implicit and therefore more difficult to understand. Assumptions are often flawed. For example, is the paper assuming a normal distribution of its data? But is the data more likely to follow a power law? If so, it may not be appropriate to use whatever statistical analysis tool the paper uses to study the data. When reading a paper, you want to get a good idea of what its assumptions are. Finding an issue in assumptions can quickly invalidate a project’s results.

The above is a very brief overview of what to pay attention to when reading a paper. A 2007 article by Keshav goes into more (Keshav 2007).

Adopt a reviewer mindset

Your mindset is a key aspect in how you approach a paper. It’s tempting to read papers the way we read textbooks; by assuming that each paragraph is an untouchable source of truth that is not to be questioned.Arguably, this is not true. Textbooks contain mistakes and have errata but we are still conditioned from an early age to believe what’s printed in books.

A significant part of graduate training consists of unlearning this mindset and replace it with perpetual questioning. We wouldn’t be where we are today if Galileo didn’t question the geocentric world view. Adopt the mindset that you are reviewing rather than reading a paper. You aren’t passively absorbing but rather actively verifying information. Your null hypothesis should be to distrust a paper’s results and only through sound reasoning and rigorous methods can a paper change your mind.

Peer review is meant to weed out critical flaws in papers but that does not mean that peer-reviewed papers are free of flaws. Peer review is not perfect and has false positives (flawed papers pass the filter) and false negatives (decent papers get rejected – often because of reviewer antics). Be vigilant and question everything, all the time. Throughout my career, the smartest and most capable people in the room all shared one trait: they had a well-calibrated bullshit filter and would not fall for “smooth talkers.” We all need to strive for that.

Don’t just pay attention to what a paper says; pay attention to what it doesn’t say. Do you believe that a dataset requires an important preprocessing step that a paper never mentions? Can you think of an evaluation scenario that’s not discussed in the paper? Is a paper conveniently ignoring a performance analysis of the proposed database? These are examples of problems that arise through omission. Keep in mind, however, that papers are subject to page limits and therefore must omit content at some point. Bad reviewers often forget that and criticise papers for missing their favourite kind of analysis. The trick is to include what matters and omit the rest, which is easier said than done.

I highly recommend taking notes when reading a paper – be it with pen and paper, on a tablet, or in a text file on your laptop. Here are a number of questions that can get your started with thinking critically about a research paper:

  • Briefly summarise the paper’s method.
    • Is the method section detailed enough to facilitate reproduction?
  • What are the paper’s assumptions?
    • How robust and realistic are these assumptions?
    • Can you think of counter-examples to these assumptions?
  • What are the paper’s key results?
    • Do these results contradict or confirm existing work?
  • How could the paper (its method, presentation, or results) be improved?

  • What follow-up research questions come to mind?

  • What’s the paper’s conclusion?
    • Do the results support the conclusions?Competition at high-ranked journals and conferences is fierce, which creates the incentive for researchers to overstate the importance of their own work. Be wary of that.

Read and engage in actual peer review

There’s only so much you can do to train your mind to be more vigilant. Luckily, there are ways to draw on the knowledge of more experienced researchers. Some conferences publish peer-reviewed papers together with their reviews. As a young Ph.D. student without any experience in peer review, I found it fascinating to get a glimpse of how peer review works in practice. The IMC conference used to publish paper reviewsThe last iteration that still published paper reviews was the Internet Measurement Conference 2013.

but eventually stopped doing so because it constituted a significant burden for its reviewers and the extra work was not worth the use people got out of it.

Other conferences implement a “shadow program committee” that gives graduate students the opportunity to participate in an “alternate universe” reviewing process, which ultimately leads to a “shadow conference program.”Several conferences are or were running a shadow program committee; for example the IEEE Symposium on Security & Privacy, the Internet Measurement Conference, EuroSys, and USENIX NSDI.

This is a great resource and I highly recommend participating at least once.

Once you got your feet wet with reviewing papers, you may want to join an actual technical program committee (TPC) – the set of people who review a conference’s paper submissions. People usually wind up on TPCs after being invited by the conference chairs. To get invited, the chairs need to know you or your research. That’s generally not the case for junior researchers. To work around that, ask your advisor if she or he can get you onto a TPC. Alternatively, your advisor can hand you one or two papers to review for the conference she is reviewing for.

Join or organise a reading group

We get the most out of reading a paper by discussing it with colleagues. Many university departments hold regular reading groups for this purpose. Similar to a book club, reading group meetings revolve around a paper. All members are supposed to read the paper in advance, and then discuss it during the meeting. Your department may already have a reading group but if not, why not organise one? It’s as simple as asking your colleagues “let’s meet next Tuesday for an hour to discuss the attached research paper.”Ask your advisor or your department chair if they are willing to “sponsor” the reading group by ordering pizza. Try to keep the culinary incentive small—if the food is too good, people will show up for the food without having read the paper.

N+1 brains are strictly better than N brains because everyone brings a unique perspective to the table. Each reading group I ever attended left me walking away with a significantly better understanding of the paper and its context than I could have acquired myself. Besides, you will refine your sense of what to pay attention to when reading a paper. It’s a great way to flex the “reviewer mindset” muscle in you.

Here are guidelines for successful reading groups:

  1. Nominate a “session lead” who gets to pick a paper, maybe gives a brief summary of the paper before the discussion, and then moderates the subsequent discussion. Session leads can change with every reading group iteration. It can be a humbling experience to explain a paper to someone else. We often don’t know if we truly understanding something before trying to explain it to someone else.

  2. Go into the reading group with a set of questions to discuss:
    • What did you like about the paper?
    • What did you dislike about the paper?
    • What are follow-up research questions?

    Reading groups always drift off topic. Without any guidance, you will find yourself discussing the advantages of vim over emacs after five minutes. The session lead should steer the conversation back into productive territory, which is easier when keeping a few questions in mind.

  3. In my experience, the most effective reading group has fewer than a dozen attendees. As the number of participants increases, discussions become disorganised and the reading groups less productive.

  4. Seek to organise reading groups around specific research areas and not entire fields – for example, “privacy and security” rather than just “computer science”.

Organise your reading

You are likely to read hundreds of research papers throughout your Ph.D. studies. If you’re anything like me, you won’t remember them all. Early on in my Ph.D. career, I was struggling with the sheer amount of papers that were waiting to be read. How should I archive these papers? How can I make the list of papers that I have already read easy to search? How can I show my Ph.D. advisor the progress I have made? Eventually, I started building a small Web page that listed all papers I have read; each paper contained its title, authors, proceedings, year, publisher, BibTeX record, and a pdf copy.In the hope that it would be useful to somebody else, I decided to publish this page shortly after I built an initial version. It must have been some time in 2012. Eight years after its original creation, I am still curating it because it ended up being a useful resource to me and others.

My bibliography on Internet censorship-related research papers.

Note

To facilitate the curation of this Web page, I wrote bibliograpy, a Python tool that takes as input a .bib file and turns it into an HTML bibliography. There’s a good chance that you will write your research papers in LaTeX, so a BibTeX file is a convenient way to keep track of your reading. Whenever you write a new paper, you can include your BibTeX file and your citations are readily available. The BibTeX format supports custom fields that are typically not used when generating a bibliography. You can use these fields to take notes of papers. Here’s an example:

Once you have your own, growing BibTeX file, I would encourage you to build a bibliography – and ideally publish it, so your colleagues get to benefit from it too. Once you finished an initial version of your bibliography, maintenance requires little effort. I regularly take a look at the proceedings of relevant conferencesIn my field, these are USENIX Security, ACM CCS, IEEE Security & Privacy, the Internet Society’s NDSS, and a few others.

and add new papers to my bibliography’s .bib file. I then run a script, deploy_website.sh, which builds the Web page and uploads it to my Web server. Automation is key here. I would never bother to curate my bibliography if it took more than five minutes to add new papers. Occasionally, I receive patches from colleagues who stumbled upon papers that I wasn’t aware of.

Remember, the best system is one that works for you. Some people like to print papers and take notes with a pen. I used to read papers on my laptop, using the tool MendeleyI no longer use or recommend Mendeley. It is now owned by the company Elsevier, a long-standing opponent to the open access movement. Elsevier does not deserve our support.

. At some point I got a tablet, which I now routinely use to read papers. I like its high-resolution screen and the tablet is less distracting to my laptop, allowing me to better focus on the paper. Experiment with a few ways of organising your reading and stick with whatever you like.

Learn about new papers

Every conference cycle brings with it a new set of papers with the potential to affect your research. It’s important to stay up to date on these new papers because (i) you must know if a research group has been working on the same problem as you, and published before you (this is called getting “scooped”); (ii) you can learn about new research directions that you may want to pursue; and (iii) you may learn about ways to improve your own work, e.g., by building on better methods or datasets. My favourite tools for finding out about new papers is to regularly skim conference proceedings, use arXiv subscriptions, and Google Scholar.

Skim conference proceedings

The primary source of new papers in your field should be your top conferences and journals. Each scientific discipline has a few venues that are considered “top-tier” and your advisor will likely encourage you to publish in these venues. It’s not easy for newcomers in a field to understand what venues are of the highest quality, so ask your advisor or colleagues. In my field of computer security, these are the Network and Distributed System Security Symposium (NDSS), the Conference on Computer and Communications Security (CCS), the USENIX Security Symposium, and the IEEE Symposium on Security & Privacy. Each of these conferences takes place once a year – one in every quarter. Once one of these conferences publishes its research papers, I spend thirty minutes skimming the list of papers, to see what’s worth reading.I first skim the titles and then take a look at the abstracts of the papers whose titles seem promising. Depending on the abstract, I may then decide to skim or read the paper.

There are always a few papers that I find really interesting. I cannot over-emphasise the importance of following your top venues. It is the source of new research in your field.

Sign up for Google Scholar

In addition to conferences, there’s another handy way to learn about new papers. Google operates a service, Google Scholar, that provides several useful features for academics, one of which is a notification system that sends you emails about new research papers in your field. I have been using Google Scholar for many years to get emails about new papers from a given set of authors or by keyword. You can provide Google Scholar with a handful of names and it will then send you an email notification each time one of these people published new work. After a few months of reading papers, you will be able to name a handful of researchers whose work you find particularly relevant or insightful. Consider adding their names to Google Scholar and get notified whenever they publish a new paper. You can also get email alerts for keywords that Google Scholar extracts from papers. The more specific the keywords, the more useful the alerts will be. I currently have an alert for the keywords “censorship”, “system”, “anonymity”, and “tor.”

Every other day, I get an email with about a dozen new papers. Most are uninteresting and some are amusingly unrelated, but occasionally Google Scholar sends me highly relevant papers that I would not have found otherwise, which makes the service absolutely worth it to me: The value I get from the occasional true positive far outweighs the annoyance of the regular false positives. Google Scholar once helped me learn about an important research paper that nobody in my field knew about because it was published at a somewhat atypical conference.

In addition to research papers, Google Scholar tracks the citation count of each paper. At least among computer scientists, Google Scholar has turned into the source of truth to learn somebody’s h-index.The h-index is the largest n for which one can say “I published n papers that were all cited at least n times”. Academic hiring committees frequently use the h-index to quantify the “productivity” and “impact” of a researchers. I’m putting these two words under quotation marks because like all metrics, the h-index is a poor approximation of one’s productivity and people have learned to abuse it by forming citation rings and engaging in “salami publishing.”

It’s not healthy to obsess over one’s h-index but it can sometimes come in handy: I once had to provide my h-index as part of my green card application for the United States. I applied for a green card using the “national interest waiver” track, which requires the applicant to prove that they have an established research record. Providing one’s h-index is part of this proof.

Use arXiv subscriptions

The arXiv is a database of preprintsA preprint is a paper that has not yet been peer reviewed.

that was originally conceived by the physics community. It has since gained popularity in many other fields including computer science, and some disciplines built their own version of the arXiv; like bioRxiv in the biological sciences. Many researchers post their papers to the arXiv before (or in parallel to) submitting them to a journal or conference. Note that most journals do not accept work that has been previously published – this rule typically does not include preprints but in case of doubt, you should check with the editors.

The arXiv manages an email subscription service that allows you to subscribe to research topics of your interest. Similar to Google Scholar, this email service will alert your of new papers.

Circumvent paywalls

Most research papers can be found in the databases of academic publishers like IEEE, ACM, Springer, Elsevier, Sage, or Wiley. It is the very job of these publishers to make available research to researchers. Unfortunately, access to these databases is not universally free. One can pay to access single articles but universities generally sign up for subscriptions, which allows university employees to access (a subset of) all of a publisher’s articles. The cost of these subscriptions is exorbitant and has prompted numerous universities to cancel them. In 2019, after months of unfruitful negotiations, the University of California system decided to cancel its Elsevier subscription.

A paywall at the ACM Digital Library. ACM non-members have to pay $15 to purchase this article. A simple Google search for the paper title typically uncovers a freely accessible version of the same paper.

Note

Researchers without a university affiliation, or an employer that is not wealthy enough to pay for subscriptions, are stuck behind paywalls. I mentioned earlier that a typical Ph.D. student will skim hundreds of research papers throughout her education. Even if it’s just 200 research papers; at an average price of $15 per paper, this would amount to $3,000 dollars – unaffordable to many.

Thankfully, there are ways around paywalls. The most popular service these days is Sci-Hub.Sci-Hub is the brainchild of the scientist Alexandra Elbakyan who is still curating the service.

The site’s minimalistic interface has a search bar that allows you to search for a paper’s URL or DOI. For the above paper, Sci-Hub promptly opened the pdf for the URL https://dl.acm.org/purchase.cfm?id=2517856. Note that publishers deem Sci-Hub a threat and keep trying to have its domains taken down. If sci-hub.se ever stops working, the website whereisscihub.now.sh can point you to its latest domain.

Sci-Hub’s website, available at sci-hub.se as of February 2020. Add a paper’s URL or DOI in the search box and get instant access.

Note

If Sci-Hub was unable to find a paper for you, then take a look at Library Genesis (often abbreviated as Libgen). The project has similar goals, providing a web frontend for a database full of scholarly articles and books. Another option is social media. On Twitter, people have started using the hash tag #icanhazpdf to ask for papers that other Twitter users will make available. The /r/scholar subreddit is used to request scholarly literature.

The #icanhazpdf Twitter hash tag in action.

Note

Finally, you can often find a paper (or at least its preprint) by simply searching the web for the paper’s title. Many authors (including myself) make available their research papers on their personal websites.All of my research papers are available on my personal website and I encourage you to do the same. People occasionally hesitate to put their papers online because it may be a violation of the copyright agreement they have with their respective publishers. That may be true but I’m not aware of anyone ever getting into trouble for that.

If all else fails, email the authors of the paper you need, and ask for a copy of their work. Don’t worry about crossing a line: If anything, the authors will feel flattered that somebody is showing interest in their work. All authors I’ve personally asked for a copy of their work (most of whom I have never met) have generously sent me a copy. Talent is everywhere but opportunity is not. Many of our colleagues cannot afford to keep up with the literature because science is being held hostage by greedy publishers that failed to adapt to the internet. It is our duty to support our colleagues by making our work freely available.

Summary

  • Don’t feel obliged to read a research paper cover to cover. Know what you want to get out of it and focus on what matters to you.

  • Train yourself to question everything, all the time. Join reading groups and engage in (shadow) program committees to train your questioning skills.

  • Organise your reading somehow. A single BibTeX file can work surprisingly well.

Writing

It is common for researchers to experience writing as the most frustrating part of their work. Most of us get into a scientific career because we’re interested in research, and not the act of writing up results. I believe that part of the frustration with writing comes from poor strategy and organisation: writing is seen as a necessary evil that happens last minute; the idea of creating twelve pages worth of content triggers anxiety; and the obscure nature of LaTeX makes everything worse.

This chapter discusses ways to make the process more structured, effective, and – perhaps most importantly – more pleasant.

Start writing

What is the point of writing tips, you may ask, if you cannot get yourself to write in the first place? We have all been there, many times. Writing is a deeply creative process and therefore difficult to invoke on demand. There are however some hacks that help with getting you to write.

Write consistently

Have you ever accomplished something hard and meaningful, only to have people tell you that they wouldn’t have the “motivation” to do the same? That’s nonsense. Motivation doesn’t help you achieve goals because it comes and goes, and nobody is always motivated. You accomplished your goal because you’re disciplined. When motivation is nowhere to be seen, discipline is what keeps you going. Don’t conflate these two concepts.

And as you may suspect, discipline and consistency is also the key to getting writing done. Motivation alone won’t get you far. I’m writing these very words while feeling unmotivated to work on this book. It’s a sunny Saturday morning and I would rather be outside. However, each day of my todo list has “Write on book,” and that’s what I’m doing. If I always waited for motivation to strike, I would never get anything done. Sometimes I only edit two sentences and sometimes I spend an hour adding a new chunk of writing. There are good and bad days but you have to keep showing up and do the work – even if it’s just a little. I don’t have much writing to show for on any given day but small and steady improvements keep adding up. You will be surprised by the amount of progress you can make by simply being consistent.

Consistent writing does not only help your progress. It also helps your writing! I strongly believe that it is impossible to churn out good writing in a short amount of time. Instead, good writing ages, like an expensive bottle of wine. I wrote my best research papers over multiple months, in small increments, and edited my writing frequently. Each iteration made the writing a tiny bit better. On some days, all I did was to add two or three sentences, or rephase an image caption. On others days, I felt more creative and worked my way through several paragraphs, and maybe even sections. Don’t let anyone (especially yourself!) fool you into thinking that you’re not a good writer. Just keep going at it, one day at a time, and you will eventually have great writing to show.

Capitalise on creativity

Writing is a deeply creative process. Unfortunately, one cannot simply evoke creativity, so when it finally does show up, it is important to make the best of it. In the words of the insightful Naval Ravikant: “Inspiration is perishable – act on it immediately.” Open your text editor, brew yourself a delicious cup of coffee,The coffee is not optional. To engange in a task you don’t particularly enjoy, you need to make it more attractive. Applied to writing, this could mean getting your favourite (non-alcoholic) beverage, listen to relaxing music, or writing in the park, under the sun.

and get to work! Ignore your surroundings as much as you can until you feel your creativity wane. I argued above that you cannot rely on motivation alone but if you do feel motivated, make the best of it.

I produce 80% of my work in 20% of my time (the pareto principle all over again). That is not because I’m lazy and waste 80% of my time but because these 20% of my time is when I’m particularly focused and creative. I am then able to produce work at a quality and quantity that is outside my reach on most days. Don’t let these periods of peak creativity go to waste!

Living documents

I have met many researchers whose idea of writing is to wait until three days before a submission deadline and then engage in a manic, caffeine-fuelled sprint culminating in twelve hastily-written pages. Needless to say, the output is always underwhelming. In addition, the prospect of having to write a full paper in a short amount of time is daunting and agonising, which leads to even more procrastination – a negative feedback loop.

I encourage you to instead space out your writing process over time, and let your papers evolve. Try to write a little, whenever you feel like you have something to say. Capitalise on your creativity. I treat my research papers like a living document. One of the first things I do when starting a new research project is the creation of a paper.tex file. I use it to jot down bullet points about research ideas, how these ideas connect, and can ultimately be presented. These bullet points eventually turn into sentences, paragraphs, and then a finished paper. Once I start writing code or collecting measurement results, I try to distill key results and start thinking about a narrative.

I find it helpful to approach a research paper by distilling its key messages. What are your 1–3 core insights? What evidence do you have that supports these insights? The rest of the paper is then spun around these key messages. Not only will your writing quality improve drastically, you will also look at the writing process in a more favorable way, which leads to more writing – a positive feedback loop.

Modular decomposition of a research paper

A finished research paper can evoke emotions of satisfaction or anxiety – depending on where you are in the writing process. The finished product may seem overwhelming but remind yourself that you won’t be tackling it all at once, just like you don’t climb a tall mountain all in one go but you take breaks, rest, and make steady, little, consistent process. It’s much easier to make steady progress by breaking a research paper into separate modules, which become easier to tackle. We use the concept of “modular decomposition” in programming to break down programs into separate modules that interact with each other, and I’m telling you that you can do the same with research papers. A research paper consists of several sections: typically an introduction, related work, presentation of your method, experimental results, and so on. Each of these sections consists of several subsections. For example, your experimental results may consists of a subsection on experimental setup, data pruning, and visualisation. Drafting a subsection on data pruning is significantly less daunting then writing the entire section on experimental results.

Once your broke your research paper into pieces, tackle these pieces in isolation. Ask your advisor or collaborators to help you with breaking your paper into pieces.

Note that the above sections apply to any creative work and not just writing. I put them in the writing section because writing happens to be the activity that most people struggle with.

Write better

Many books have been written on writing in general and academic writing in particular.

  • Delete unnecessary words. Many people often make their academic writing very fluffy, which makes it difficult and tedious for a reader to read. Notice how awful that sentence was? Let’s delete unnecessary words: Many people often make their academic writing very fluffy, which makes it difficult and tedious for a reader to read.

    Go over your writing sentence by sentence and delete every word that isn’t necessary. Removing the fluff from your writing makes it substantially easier to read. Most college student (including yours truly) eventually pick up the annoying habit of adding unnecessary words to sound sophisticated and pad their writing assignments. It is now time to unlearn this habit.

  • Use active instead of passive voice. Instead of “the data was analysed,” write “we analysed the data.” Instead of “it has been shown by Turing,” write “Turing showed that.” Excessive use of the passive voice is yet another habit that’s particularly widespread in academia. Don’t imitated the lifeless and boring writing of your peers.

  • Engage your audience. Never start your paper with something entirely obvious and uncreative like “The Internet has become the world’s biggest communication network.” You are more creative than that. Nobody expects pearls of wisdom at the very beginning of a paper but make an active effort to make your paper as engaging to your reader as possible. Take a look at the introduction of a seminal paper in the field of cryptography (Diffie and Hellman 1976). Its first sentence is a classic.

    Note

Finally, keep in mind that good writing is often subjective, and not everyone will agree with the advice above. I once got the following feedback on one of my paper submissions:

First, authors abuse of questions in the paper. Writing a scientific paper is not writing a “thriller”. No suspense is required (this makes the reading pretty much annoying). We only have to write about facts.

Reviewer A

Reviewer A is the kind of person who starts their paper by pointing out how important the Internet has become. Don’t be like reviewer A. Bring some colour to your writing.

Ask for feedback

Good writing rarely happens in isolation. Even professional novelists have editors and actively seek out feedback. In academia, advisors often assume the role of an editor (if you’re lucky) while friends and colleagues can provide you with feedback. As a Ph.D. student, my friends and I would often share drafts with each other even though we worked in different subfields. I would review my friend’s applied cryptography paper and while I didn’t understand every single aspect about it, I would still learn a lot and provide an important perspective: the one of a potential reviewer.

The feedback of somebody who’s not intimately familiar with your research is important because it’s not affected by the “curse of knowledge” – a cognitive bias that emerges when a professional has a hard time putting herself in the shoes of a newcomer.

When asking your peers for feedback, keep in mind that somebody can read your work for the first time only once. The first reading is particularly important because somebody approaches your writing with a fresh mind and an unbiased frame of reference. Don’t waste these opportunities. Instead of sending your very first draft to all the people you know, send it to only one or maybe two. Pay close attention to their feedback, address it, and then send the revised draft to the next person. This “feedback chain” approach will result in significantly better and less repetitive feedback.

Some of your colleagues may be inexperienced and not know what exactly they should pay attention to when reading your work. In that case, help them out by letting them know what you want feedback on – be it the narrative in the introduction, or the intelligibility of your method section, or the informative value of your diagrams in the results.

Write effectively

Writing effectively means writing in LaTeX. While LaTeX tries hard to produce good-looking output, most computer science papers are still poorly typeset. This comes as no surprise because unlike other fields, we typeset our own papers and hardly anybody is trained in typesetting. By embracing a handful of rules, you can greatly improve the visual clarity of your writing.

Use LaTeX comments

LaTeX interprets lines that begin with a % character as comments. I recommend using comments in the following scenarios:

Balancing references

If your paper has two or more columns, the package balance balances your references across columns, making them visually more pleasing. Include the package by adding \usepackage{balance} to your list of packages and use it by adding \balance before you include your references. Compare the end of a bibliography without balance:

And here is the same bibliography after using the balance package:

Better tables

What table looks better? The one on the left or the one on the right? We both know it’s the one on the right! Notice how horizontal and vertical lines are almost absent in the right table, removing visual clutter. Both the top and the bottom of the table exhibit more spacing, making the table less cramped, and the quantities in the two rightmost columns are right-aligned instead of centred, rendering the numbers easier to compare.

The booktabs package helps you build create more pleasing table and, equally important, read its documentation because it does a great job at explaining how to properly typeset tables. For a quick reference, here’s the LaTeX code of the above table on the left.

Backreferences

It is possible to add backreferences to your references. At the cost of using up a little extra space, backreferences list the pages that cite a given reference. While not a must-have, backreferences are still a nice feature because they make it easier to find a given reference. People also often look for their own work in a paper’s references because they are curious in what context their work is cited. Backreferences make this easier.

An example of backreferences for three papers. Note the “Cited on” right after the URLs. The hyperref package makes these pages clickable, rendering it easy to jump directly to the respective references. The acronym “p.” is short for “page” and “pp.” is short for “pages.”

Note

There are several ways to implement backreferences. One option is to use the popular hyperref package’s pagebackref option as follows:

Self-contained diagrams

The TikZ package makes it possible to typeset diagrams directly in LaTeX. TikZ diagrams have the advantage that they are of very small size, they use LaTeX fonts (and therefore look less clunky), and can be created with just a text editor. On my personal website I maintain several TikZ examples examples that help you get started. Granted, it does take time to learn TikZ but I recommend it for those who value high quality type setting.

Note how this TikZ digram integrates well by sharing a font with the surrounding text. It is typeset directly in LaTeX, making it lightweight and visually appealing.

Note

Common LaTeX mistakes

While reviewing research papers, the same kind of LaTeX mistakes stick out to me over and over again. Fortunately, they are easy to fix.

  • Use decimal separators to make large numbers easier to read.
    Bad: 1000000
    Good: 1,000,000 (or 1.000.000, depending on your language)

  • Use a ~ to prevent dangling references.
    Bad: Newton et al. [1]
    Good: Newton et al.~[1]

  • Citations are not nouns.
    Bad: as discussed in~[1]
    Good: as discussed by Newton~[1]

  • Use proper LaTeX quotation marks.
    Bad: "Foo"
    Good: ``Foo'' (note that quotation marks are language-dependent.)

  • Reference more specific parts of a paper if possible.
    Bad: See Newton et al.~\cite{Newton}
    Good: See Newton et al.~\cite[\S~5]{Newton}

Don’t miss the comprehensive typesetting guides of Eddie Kohler, Markus Kuhn, and D. J. Bernstein’s to learn more about effective and beautiful typesetting in LaTeX.

Building LaTeX papers

LaTeX papers with references are cumbersome to compile. Most people remember the process as “a few runs of pdflatex plus a few runs of bibtex, but nobody knows why.” Wouldn’t it be much simpler if all you need to do is type make? Here’s how: I recommend using a compilation tool such as rubber plus a Makefile. Below is an example of a Makefile that I typically use for all of my research papers. The Makefile assumes that your root document is called paper.tex.

When using this Makefile, make sure that the indented lines containing the two rubber commands are prefixed by a tab character and not by spaces.

If you are no fan of command line tools, you can still benefit from LaTeX by using one of its online development systems. The tool Overleaf has often been popular among my collaborators.

Pre-submission paper checks

Conferences and journals almost always have specific requirements that paper submissions need to satisfy. It’s frustrating to have your paper rejected because of formatting violations, so it’s a good idea to spend five minutes checking the conference’s requirements before pressing the submit button.

  • Make sure that your paper is within the page limit. The page limit sometimes includes and sometimes excludes references or appendices, so read the instructions carefully.

  • LaTex shows broken references as question marks. Do a Ctrl + F for the string [?] to find broken references.

  • Make sure that all fonts were properly embedded in your pdf. On Linux, I use the tool pdffonts which is part of the Debian package poppler-utils. I run it as pdffonts file.pdf and it displays a column called “emb,” which shows if a given font is embedded or not. While using pdffonts to write this paragraph, I realised to my dismay that one of my old papers did not embed all of its fonts:

    $ pdffonts Winter2012a.pdf
    name                                 type              encoding         emb sub uni object ID
    ------------------------------------ ----------------- ---------------- --- --- --- ---------
    GJYVBN+NimbusRomNo9L-Medi            Type 1            Custom           yes yes no     100  0
    NLMFQI+NimbusRomNo9L-Regu            Type 1            Custom           yes yes no     101  0
    XNJNRQ+NimbusRomNo9L-ReguItal        Type 1            Custom           yes yes no     102  0
    ZZEWFV+CMSY10                        Type 1            Builtin          yes yes no     103  0
    UIPGCJ+CMTT8                         Type 1            Builtin          yes yes no     127  0
    Helvetica                            Type 1            Custom           no  no  no     174  0
    Helvetica                            Type 1            Custom           no  no  no     180  0
    HNYWOO+StandardSymL-Slant_167        Type 1            Builtin          yes yes no     203  0
    JHYTSG+CMR10                         Type 1            Builtin          yes yes no     204  0
    CUJHND+CMMI10                        Type 1            Builtin          yes yes no     205  0
    ZapfDingbats                         Type 1            ZapfDingbats     no  no  no     211  0
    Helvetica                            Type 1            Custom           no  no  no     212  0
    Helvetica                            Type 1            Custom           no  no  no     218  0
    XEQPPW+CMTT10                        Type 1            Builtin          yes yes no     242  0

Git integration

LaTeX files are text files, which makes them prime candidates for version control. I recommend putting all your LaTeX source files into a git repository.It doesn’t matter if you prefer subversion, CVS, or mercurial over git. What matters is that you use some sort of version control. I like git because it has emerged as the most popular system and with that comes great documentation, tooling, and most people you collaborate with will have at least some understanding of git.

Having your paper under version control has several advantages:

  • No writing is ever lost. Whatever you remove during editing is part of git’s history and can always be recovered.

  • You can easily determine the difference between two versions of your paper, making it easy to produce a pdf that highlights differences.

  • You can tell who changed what.

Use tags for milestones

One can assign a “tag” to a specific git commit. Tags are effectively arbitrary labels and often used for version numbers. Whenever you publish a new version of your software, you assign the latest commit a tag like “0.2.4.” It doesn’t have to be version numbers though. I like to tag important milestones of my writing, for example whenever I submit a paper to a conference, or to the arXiv, or when I publish the final camera-ready version of a paper. You can even assign a tag to remember when you sent your paper to your advisor for feedback. Here are examples of how I used tags in one of my research papers:

* 5de077a - (tag: ndss17-camera-ready) added cs to my email (3 years, 7 months ago) <laurar>
...
* 2cd29b1 - (tag: arXiv-resubmission-1) fixed last paragraph of internet scale section based on corrected plots (3 years, 9 months ago) <laurar>
...
* fabf1e3 - (tag: arXiv-submission) Turn passive into active voice. (3 years, 10 months ago) <Philipp Winter>
...
* 2187ef7 - (tag: NDSS-submission) Minor style harmonization and spelling fixes. (3 years, 11 months ago) <Philipp Winter>

To tag the latest git commit, run:

git tag "arxiv-submission"

By default, git doesn’t push tags to your upstream repository. You have to push them yourself, as follows:

git push origin --tags

You can learn more about tags from the excellent Pro Git book.

Learn who changed what

With multiple people working on the same project, you will occasionally notice mistakes in the writing. Some of these mistakes may require discussion and instead of asking all your collaborators who’s responsible for a given piece of writing, you can find out yourself, by using git’s “blame” functionality. It’s as simple as running:

`git blame FILE`

The output is the file’s content and for each line you can see when it was last changed, by whom, and as part of what commit.

Help git do its job

To avoid merge conflicts and have a clean and descriptive git history, try to limit each commit to one small and simple change. Here are a few examples in the context of research papers:

  • Fixing one or more typos. If somebody is proof-reading an entire paper, it’s fine to have a single commit that fixes many (or all) typos in the paper.

  • Add a reference. Many claims need to be supported by references. Such a commit may add a new reference to the BibTeX file and then reference it in the corresponding LaTeX file.

  • Rephrase a paragraph or section. You may not like the way a paragraph (or entire section) is phrased. The action of rephrasing this paragraph or section should go in one commit. If you want to rephrase several pages worth of writing, consider using multiple commits.

  • Add more writing. Adding a coherent argument, paragraph, or section should go into a single commit. Adding two independent paragraphs or two separate sections should go into two commits.

  • Delete text to meet a page limit. Papers must sometimes be trimmed to meet a page limit. Unless it severely cripples the paper, it’s fine to do this in a single commit.

Note that making small changes is not always possible or reasonable. As you are rewriting a paragraph, you may realise that the rewrite only makes sense if you also rewrite the paragraphs before and after. This is fine. The above recommendations are just that: recommendations.

I personally find it helpful if paragraphs of text are broken into several lines spanning a maximum of 80 characters, instead of a single line of text. This makes it easier to inspect commit messages and understand what change was made. Consider the following example:

Only a single character changed in this paragraph, which is formatted as a one line. It’s difficult to see what changed because the line is so long. Now consider the following example, where the same paragraph (and the same change) is formatted as separate lines. It’s easier to see what character was changed in this commit.

Acknowledgements

Contact and Support

Send email to phw@nymity.ch.

Clear, James. 2018. Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones. Avery.

Diffie, Whitfield, and Martin E. Hellman. 1976. “New Directions in Cryptography.” Transactions on Information Theory 22 (6). IEEE. https://ee.stanford.edu/~hellman/publications/24.pdf.

Keshav, Srinivasan. 2007. “How to Read a Paper.” SIGCOMM Computer Communication Review 37 (3). ACM. http://ccr.sigcomm.org/online/files/p83-keshavA.pdf.

Newport, Cal. 2016. Deep Work: Rules for Focused Success in a Distracted World. Grand Central Publishing.

Pinker, Steven. 2015. The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century. Penguin Books.

Pollan, Michael. 2009. In Defense of Food: An Eater’s Manifesto. Penguin Books.

Walker, Matthew. 2018. Why We Sleep. Scribner.