Navigating the maze

By Adrian Bridgett | 2020-02-22

Overview

It comes as a surprise to my friends that, despite being interested in computers, I’m not a technology geek. I certainly have some useful gadgets around the house - chromecast, UEBoom. My first smartphone was an HTC Hero, before that I had an old Nokia. What was the “killer feature” that made me buy the smartphone? It wasn’t email, it was Google maps.

Maps for a company

What is a map? It shows you the lay of the land. Zoomed out it shows you the overview of the terrain - the limits, what’s explored, the unknowns, your position therein. Close-up it guides you to your destination, highlighing main thoroughfares and alternative routes.

For a company, documentation provides this facility. However often the documentation is more akin to a multitude of inconsistent maps in different map projections, torn into pieces and thrown into the air, landing where they may. If you pick up a piece it says “here be dragons” (aka “ask Bob about this”).

New employees become archaeologists, scraping away at codebases to try and determine how pieces fit together. Sometimes uncovering buried treasure, sometimes the equivalent of a trap (obsolete or misleading information). Leaning on the map analogy again, documentation allows people to explore exotic lands, ahem, other components of the company without getting lost.

Benefits

Saving time is clearly the biggest win, not only for the person trying to find the answer (who could be blocked for hours or even days), but also the person who knows the answer who otherwise answers multiple times (perhaps even to the same person). Common activities (e.g. resetting passwords or generating TLS certificates) - “How do I….” type questions are easy wins here. Never underestimate how long it takes to “look something up on Google” (and find the example that works with your version of software) compared to cutting and pasting from a document. If they steps are long or complex, a natural next step is to have a script in a git-repo or web-service to make life even easier. “Documentation as code” you could call it.

Documentation helps increase standardisation - you just do “they same thing as that other team”. This is much easier when you know how they did it. With use, documentation improves - as more people read it they add their knowledge in the form of better, faster, safer methods - benefiting everyone.

A key benefit is the prevention of mistakes and misunderstanding. This maybe simply by having the correct steps written down, or it maybe as it shows the flow of data through the system. Especially when there is a crisis on, avoiding mistakes is critical. By capturing this knowledge ahead of time, we avoid the mistakes inherent when people are placed under great pressure and/or time constraints.

Why bother?

“Documentation is always outdated”

This is a big problem, however it doesn’t have to be this way. Why is documentation outdated? One reason is that it’s difficult to update - so make it easy. Don’t restrict who can update a document, if you must, at least allow them to add comments or submit changes. Good systems can notify document owners of changes so if they are incorrect they can be reverted - although the vast majority of changes will be improvements that can be kept.

Documentation may become “out of sync” with code. So generate your documentation from the code - all major programming languages support this. Make this part of the code review. I find that architectural documentation is often better done outside a codebase (in a wiki for example) as it changes (or is improved) on a different timescale.

“I don’t have time”

I prefer to phrase this as “We don’t have time not to write documentation”. We’ve all been in situations where only one person can perform a task as they have the critical knowledge. This is bad in many senses (if they leave, are taken ill, on holiday) it has bad affect on team morale - both the protagonist (too busy) and the rest of the team who feel like sub-standard members.

A poor memory has encouraged me to document what I do - otherwise I’ll have to figure it out from first principles next time. Most importantly it’s also allowed other people to reuse this information (and improve it). Most selfishly it means that I can handover work to others (especially repetitive work).

“No one reads it anyway”

This isn’t true. However it may take time - once people start to find useful documentation they’ll start to look. If I’m asked a question or need to fix something I ask myself “should I document this?” How likely it is to happen again is a major influence on this. Setup work I’ll generally document in a wiki. Corrective action I’ll often document in the relevant ticket. Frequent corrective action will be in the runbooks.

“Source code is the truth”

This depends. Yes if you want to know all of our cloud resources, check terraform. However whilst I expect DevOps engineers to do that, I do not expect our data science team to do that. Once a (highly talented) engineer said that people should be reading the test suite feature files to understand how to run a program and what the various command line flags meant. I think that’s the equivalent of telling someone that they should know how to drive a car given the Haynes manual.

When is it written?

A common issue is that documentation is promised but never delivered. A common example is architectural documentation - how the system “hangs together”. What approach was taken, what alternatives were considered, the compromises taken, the reasons behind this.

To spend weeks or months writing code is considered fine. However spending a few hours writing documentation is considered “unimportant”. I think this is completely unacceptable - especially for key architectural documentation. A rushed design often leads to thrown-away code (or “it’s too late to adopt the better design”) - this is a very false economy.

Is a senior developer’s time more important than a junior developer? I’d certainly hope so. But is it worth more than five junior developers? Or the time take for the on-call person to debug a business critical problem at 3am?

This sort of documentation should not be skimped. It’ll additionally pay for itself immediately as it forces people to consider the detailed aspects of their design. As Camille Fournier describes in “The Managers Path” discussing time estimation, you need to push through the details and unknows to get accurate timescales. Here it serves a different purpose - it will help to flag problematic areas or potential showstoppers early.

Even in the early days of a project, a “rough plan” brings a lot of clarity to the table. Allowing others to understand the proposal more easily and with greater depth. Better alternatives are easier to see when there’s something to base them upon.

Onboarding

Documentation plays no greater role than when onboarding a new colleague. You can (and should) walk people through the system, asking and answering any questions. When joining a company there’s a huge amount of information to absorb. Knowing that the details are clearly written up allows new-starters to concentrate on the message, the culture rather than hurredly note-taking. It gives peace of mind, minimises stress and mistakes.

High level (archtectural) documentation is helpful during this process - if only for the diagrams, however it’s the low-level guides that are the biggest time savers. I’m talking about the “process” guides (from “how to book a holiday” to “how to get a database snapshot”), those step-by-step instructions. These are huge timesavers for both parties, especially when you consider how much unblocking is done. It empowers people to fix their own problems, work things out by themselves rather than being dependant upon others.

New-starters are also your secret weapon - they should be encouraged to validate the documentation. If it’s wrong, get the owner of that are to fix it. Even better they maybe able to fix it themselves - however this should not be the expectation - in particular it maybe worth scanning the documentation before people join to ensure that it’s up-to-date.

Librarians

At university a lecturer complained about the term “software engineering”, he said that it’s more akin to gardening - weeds grow in our code and need pulling out, flowers need nurturing to grow and bloom.

Documentation is no different. You may have heard of code-rot, well there’s also documentation-rot. Despite all your best efforts, it’s going to become messier over time. This is natural - what works for ten people doesn’t work for fifty. What works for fifty people doesn’t work for five hundred.

Just like code we want to avoid repetition. Sometimes this involves taking documentation and splitting it up into smaller pieces. We may need to restructure the documentation hierarchy to make more sense. Perhaps an area has grown so much it needs a whole “overview”. Maybe some areas are going to be replaced or don’t matter any longer.

A few tricks I’ve used:

  • Two-level guides. The top page is just a set of “aide-memoire” bullet points that long-term employees use
    • (Some of) those bullet points then link to step-by-step guides which new (or forgetful) employees use to ensure they execute the steps correctly
  • An “archive/obsolete/deprecated” section where you move documentation that’s no longer relevant. Even better, just delete it (so that it no longer shows up in searches), however sometimes there’s one or two users left. This way you help to make it clear to everyone what’s current or not. An example might be when you have both “old” and “new” processes
  • “Shout outs” at the top of documents. Perhaps a “This is obsolete - see …” note or “Work in progress - see ..”. Obviously the latter shouldn’t be used as an excuse not to finish the docs.

I’d also highly recommend Daniele Procida’s guide where he talks about the four types of documentation. I found this immensely useful to determine what to write and how to write it.

Source of truth

There should be a “source of truth” for the company. This might be documentation (so ensure there’s not multiple ways to do something, nor multiple guides to how to book time off). However it’s also often in code. Make it clear which it is. For example we use Terraform (code) to manage S3 buckets, however we also have a page which list the most important ones. That page has a “not authoritative - see (link to terraform code)” note at the top. When people search on the wiki, they find the page and for 90% of people that’s enough. For the remaining 10% we’ve also shown them where the answer is.

Summary

Documentation is one of the best returns that you can get for the time invested in it. It does require effort to keep it that way - this is not a fire-and-forget investment. This is also a team responsability - lack of documentation is one way people build empires where they are the only person who knows. You can’t afford to let that happen, I’d actually say that it’s necesssary to prevent them from working in new areas until they’ve adequately documented the old areas.