18 Documentation

18.1 The README that says “run the notebook”

You inherit a project from someone who has left. The README, if there is one, says “run analysis.ipynb”. There are no docstrings. The data comes from “the usual place”. The model in models/ was trained by a script nobody can find, with hyperparameters nobody recorded. Everything needed to use this work lived in one person’s head, and that person is gone.

Documentation is what stands between a project that someone else can pick up and one that dies with its author. For data scientists this is a particularly acute risk, because the notebook feels self-documenting — it has the code, the outputs, even prose between the cells — right up until you’re not there to narrate it. This chapter is about the documentation that actually survives the handover: not more of it, but the right kinds, kept honest.

18.2 Different documents for different questions

The reason so much documentation is bad is that it tries to do several incompatible jobs at once. The Diátaxis framework (Procida 2017) — the structure this book itself follows — separates documentation by what the reader needs at the moment they’re reading. A tutorial teaches a newcomer by the hand. A how-to guide walks an already-competent reader through a specific task. Reference lets someone look up a precise fact — what this function takes, what that column means. Explanation gives the why behind a design. A reference page that tries to teach is bloated; a tutorial that lists every option is unfollowable. Knowing which kind you’re writing is most of the battle.

Data Science Bridge

You already produce documentation — you just don’t always call it that. A model card is documentation: a statement of what a model does, the data it was trained on, how it was evaluated, its known limitations, and where it should and shouldn’t be used. A data dictionary is reference documentation: each column with its type, meaning, and units. The shift is to recognise these as documentation, with the discipline that implies — kept current, written for a reader, treated as part of the deliverable — rather than as one-off artefacts you make when asked.

Where the analogy breaks down: a model card has sections that general software documentation doesn’t — intended use, the populations the model was and wasn’t validated on, fairness and ethical considerations — because a model carries risks that ordinary code doesn’t. Documenting a model is documenting a thing that makes consequential decisions about people, so it answers questions (“who might this harm, and where is it not safe to use?”) that a function’s reference page never has to.

18.3 Docstrings: documentation that lives in the code

The documentation least likely to rot is the documentation closest to the code, and nothing is closer than a docstring. A docstring sits inside the function it describes, is accessible at runtime and in every editor, and travels with the code wherever it goes:

import inspect

def spend_per_active_day(spend: float, active_days: int) -> float:
    """Average daily spend over a customer's active period.

    Parameters
    ----------
    spend : float
        Total spend over the period, in pounds.
    active_days : int
        Number of days the customer was active; must be positive.

    Returns
    -------
    float
        Spend divided by active days.

    Examples
    --------
    >>> spend_per_active_day(100.0, 4)
    25.0
    """
    return spend / active_days

# The docstring is live documentation — available wherever the code is.
print(inspect.getdoc(spend_per_active_day).splitlines()[0])
print(f"example checks out: {spend_per_active_day(100.0, 4)}")

Average daily spend over a customer's active period.
example checks out: 25.0

The docstring is retrievable at runtime (inspect.getdoc, or help()), pops up in the IDE as you type the call, and — because it states the contract right where the contract lives — is far harder to leave out of date than a separate document. A consistent convention (the NumPy style shown here, or Google’s) covers what the function does, its parameters, what it returns, what it raises, and an example.

18.4 Writing down why

Of the four Diátaxis kinds, explanation is the one data science projects lose most completely, and the loss is the expensive one. Reference can be regenerated from the code. A how-to can be reconstructed by someone patient. But the reasoning behind a decision exists nowhere except in the head of whoever made it, and it evaporates on a timescale of weeks — including from your own head.

Think about what a model actually accumulates. You dropped a feature because it turned out to leak. You chose recall over precision because the cost of a missed churn is roughly eight times the cost of a wasted retention offer. You capped an input at the 99th percentile because a handful of corporate accounts were dragging the whole distribution. You settled on 0.73 as the threshold after a conversation with the commercial team about how many calls they could actually make in a week. Every one of those is a judgement, made with context, and none of them is visible in the code that resulted. What the code shows is threshold = 0.73 and a feature that simply isn’t there.

The consequence is predictable and you have probably lived it from the other side. Six months on, someone — plausibly you — notices the missing feature, adds it back because it looks predictive, and reintroduces the leak. Or asks why the threshold isn’t tuned for F1, retunes it, and quietly breaks the commercial constraint nobody wrote down. Undocumented decisions don’t stay decided; they get re-litigated, and each round costs more than writing them down would have.

The remedy is lightweight. Engineers keep architecture decision records — short numbered notes, committed alongside the code, each capturing one decision in a few paragraphs: the context, the options weighed, what was chosen, and the consequences accepted. A decisions/ directory with a dozen half-page files is enough, and the format matters far less than the habit:

docs/decisions/
├── 0001-use-recall-over-precision.md
├── 0002-drop-last_login_source-feature.md
└── 0003-cap-spend-at-99th-percentile.md

Two things make this work where a wiki page doesn’t. The records are immutable — you don’t edit a decision when it changes, you write a new one that supersedes it, so the history of the thinking survives rather than being overwritten. And they live in the repository, so they arrive with git clone and get reviewed alongside the change that motivated them (Chapter 17). This is the same instinct as the experiment log from Chapter 2, pointed at design rather than results: the commit records what changed, and the decision record records why anyone thought that was a good idea.

18.5 Documentation that stays true

The chronic disease of documentation is rot: prose that confidently describes code which has since changed. The defences are structural, not motivational. Keep documentation as close to the code as possible, so a change to one prompts a change to the other — docstrings over separate documents. Generate reference documentation from the code (tools like MkDocs, Sphinx, and Quarto read your docstrings), so the reference cannot describe a function signature that no longer exists. And make examples executable — a doctest or a tested snippet turns a stale example into a failing test, so the example in the docstring above can’t silently lie. At the project level, the highest-leverage document of all is a README that covers the few things a newcomer most needs: what this is, how to set up the environment, how to run it, and where the data comes from (the orientation promised back in Chapter 9).

Author’s Note

Documentation feels like pure overhead in the moment. It’s writing about the work instead of doing the work, for a reader who is hypothetical and may never come. The data science instinct sharpens this: the notebook, with its inline outputs and prose, seems to document itself, so why write more? The answer is that the notebook documents itself only while you’re standing next to it to explain which cell to skip and what the magic number meant.

The reframe is that every document is a letter to a specific future reader — and that reader is, more often than not, you, six months from now, with no memory of today’s context. Failing that, it’s the colleague who inherits the project when you move on. Neither can ask you. Writing the README is the difference between your work becoming a foundation others build on and a dead end they quietly rewrite from scratch because they couldn’t understand it. And of all the documentation you could write, the README earns its keep fastest: the single page that takes a newcomer from git clone to a running system in minutes is worth more than any volume of prose nobody can find.

18.6 Summary

Documentation is what lets work outlive the person who wrote it:

Different documents answer different questions. Tutorials teach, how-to guides walk through a task, reference lets you look things up, explanation gives the why — and mixing them is why documentation fails.
You already write documentation. A model card and a data dictionary are documentation; treat them as such, with the discipline of being kept current and written for a reader.
Docstrings live in the code. Co-located with the function, accessible at runtime and in the IDE, they’re the documentation least likely to drift — write them to a consistent convention.
Fight rot structurally. Keep docs near the code, generate reference from the code, and make examples executable — and prioritise the README, the cheapest, highest-leverage document there is.

The next chapter confronts what accumulates when these practices are skipped under deadline — and how to manage it: technical debt.

18.7 Exercises

Write or rewrite the README for one of your projects so that a newcomer can go from git clone to a running system in minutes — what it does, how to set up the environment, how to run it, where the data comes from. Hand it to someone unfamiliar and note the first point at which they get stuck.
Add proper docstrings to the public functions of one of your modules — what it does, parameters, returns, and an example. Then call help() on one and ask whether the rendered documentation is enough to use the function without reading its body.
Write a model card for a model you’ve built: intended use, training data, evaluation, known limitations, and where it should not be used. Which section was hardest to write, and what does that difficulty tell you about the model?
Conceptual: Using the Diátaxis categories (tutorial, how-to, reference, explanation), classify a README, a docstring, a tutorial notebook, and a model card. Where does each fit, and why does mixing two of these jobs into one document make it worse at both?
Conceptual: Documentation rots. Describe two practices that structurally keep documentation in sync with the code, and explain why “remember to update the docs” is not one of them.

--- # Content: CC BY-NC-SA 4.0 | Code: MIT - see /LICENSE.md --- # Documentation {#sec-documentation} ## The README that says "run the notebook" {#sec-run-the-notebook} You inherit a project from someone who has left. The README, if there is one, says "run `analysis.ipynb`". There are no docstrings. The data comes from "the usual place". The model in `models/` was trained by a script nobody can find, with hyperparameters nobody recorded. Everything needed to use this work lived in one person's head, and that person is gone. Documentation is what stands between a project that someone else can pick up and one that dies with its author. For data scientists this is a particularly acute risk, because the notebook *feels* self-documenting — it has the code, the outputs, even prose between the cells — right up until you're not there to narrate it. This chapter is about the documentation that actually survives the handover: not more of it, but the right kinds, kept honest. ## Different documents for different questions {#sec-diataxis} The reason so much documentation is bad is that it tries to do several incompatible jobs at once. The Diátaxis framework [@procida_diataxis] — the structure this book itself follows — separates documentation by what the reader needs at the moment they're reading. A *tutorial* teaches a newcomer by the hand. A *how-to guide* walks an already-competent reader through a specific task. *Reference* lets someone look up a precise fact — what this function takes, what that column means. *Explanation* gives the why behind a design. A reference page that tries to teach is bloated; a tutorial that lists every option is unfollowable. Knowing which kind you're writing is most of the battle. ::: {.callout-note} ## Data Science Bridge You already produce documentation — you just don't always call it that. A *model card* is documentation: a statement of what a model does, the data it was trained on, how it was evaluated, its known limitations, and where it should and shouldn't be used. A *data dictionary* is reference documentation: each column with its type, meaning, and units. The shift is to recognise these as documentation, with the discipline that implies — kept current, written for a reader, treated as part of the deliverable — rather than as one-off artefacts you make when asked. Where the analogy breaks down: a model card has sections that general software documentation doesn't — intended use, the populations the model was and wasn't validated on, fairness and ethical considerations — because a model carries risks that ordinary code doesn't. Documenting a model is documenting a thing that makes consequential decisions about people, so it answers questions ("who might this harm, and where is it not safe to use?") that a function's reference page never has to. ::: ## Docstrings: documentation that lives in the code {#sec-docstrings} The documentation least likely to rot is the documentation closest to the code, and nothing is closer than a docstring. A docstring sits inside the function it describes, is accessible at runtime and in every editor, and travels with the code wherever it goes: ```{python} #| label: docstring-as-documentation #| echo: true import inspect def spend_per_active_day(spend: float, active_days: int) -> float: """Average daily spend over a customer's active period. Parameters ---------- spend : float Total spend over the period, in pounds. active_days : int Number of days the customer was active; must be positive. Returns ------- float Spend divided by active days. Examples -------- >>> spend_per_active_day(100.0, 4) 25.0 """ return spend / active_days # The docstring is live documentation — available wherever the code is. print(inspect.getdoc(spend_per_active_day).splitlines()[0]) print(f"example checks out: {spend_per_active_day(100.0, 4)}") ``` The docstring is retrievable at runtime (`inspect.getdoc`, or `help()`), pops up in the IDE as you type the call, and — because it states the contract right where the contract lives — is far harder to leave out of date than a separate document. A consistent convention (the NumPy style shown here, or Google's) covers what the function does, its parameters, what it returns, what it raises, and an example. ## Writing down why {#sec-decision-records} Of the four Diátaxis kinds, *explanation* is the one data science projects lose most completely, and the loss is the expensive one. Reference can be regenerated from the code. A how-to can be reconstructed by someone patient. But the reasoning behind a decision exists nowhere except in the head of whoever made it, and it evaporates on a timescale of weeks — including from your own head. Think about what a model actually accumulates. You dropped a feature because it turned out to leak. You chose recall over precision because the cost of a missed churn is roughly eight times the cost of a wasted retention offer. You capped an input at the 99th percentile because a handful of corporate accounts were dragging the whole distribution. You settled on 0.73 as the threshold after a conversation with the commercial team about how many calls they could actually make in a week. Every one of those is a *judgement*, made with context, and none of them is visible in the code that resulted. What the code shows is `threshold = 0.73` and a feature that simply isn't there. The consequence is predictable and you have probably lived it from the other side. Six months on, someone — plausibly you — notices the missing feature, adds it back because it looks predictive, and reintroduces the leak. Or asks why the threshold isn't tuned for F1, retunes it, and quietly breaks the commercial constraint nobody wrote down. Undocumented decisions don't stay decided; they get re-litigated, and each round costs more than writing them down would have. The remedy is lightweight. Engineers keep *architecture decision records* — short numbered notes, committed alongside the code, each capturing one decision in a few paragraphs: the context, the options weighed, what was chosen, and the consequences accepted. A `decisions/` directory with a dozen half-page files is enough, and the format matters far less than the habit: ```text docs/decisions/ ├── 0001-use-recall-over-precision.md ├── 0002-drop-last_login_source-feature.md └── 0003-cap-spend-at-99th-percentile.md ``` Two things make this work where a wiki page doesn't. The records are *immutable* — you don't edit a decision when it changes, you write a new one that supersedes it, so the history of the thinking survives rather than being overwritten. And they live in the repository, so they arrive with `git clone` and get reviewed alongside the change that motivated them (@sec-code-review). This is the same instinct as the experiment log from @sec-version-control, pointed at design rather than results: the commit records *what* changed, and the decision record records *why* anyone thought that was a good idea. ## Documentation that stays true {#sec-docs-stay-true} The chronic disease of documentation is rot: prose that confidently describes code which has since changed. The defences are structural, not motivational. Keep documentation as close to the code as possible, so a change to one prompts a change to the other — docstrings over separate documents. Generate reference documentation *from* the code (tools like MkDocs, Sphinx, and Quarto read your docstrings), so the reference cannot describe a function signature that no longer exists. And make examples executable — a `doctest` or a tested snippet turns a stale example into a failing test, so the example in the docstring above can't silently lie. At the project level, the highest-leverage document of all is a README that covers the few things a newcomer most needs: what this is, how to set up the environment, how to run it, and where the data comes from (the orientation promised back in @sec-project-structure). ::: {.callout-tip} ## Author's Note Documentation feels like pure overhead in the moment. It's writing *about* the work instead of doing the work, for a reader who is hypothetical and may never come. The data science instinct sharpens this: the notebook, with its inline outputs and prose, seems to document itself, so why write more? The answer is that the notebook documents itself only while you're standing next to it to explain which cell to skip and what the magic number meant. The reframe is that every document is a letter to a specific future reader — and that reader is, more often than not, you, six months from now, with no memory of today's context. Failing that, it's the colleague who inherits the project when you move on. Neither can ask you. Writing the README is the difference between your work becoming a foundation others build on and a dead end they quietly rewrite from scratch because they couldn't understand it. And of all the documentation you could write, the README earns its keep fastest: the single page that takes a newcomer from `git clone` to a running system in minutes is worth more than any volume of prose nobody can find. ::: ## Summary {#sec-documentation-summary} Documentation is what lets work outlive the person who wrote it: 1. **Different documents answer different questions.** Tutorials teach, how-to guides walk through a task, reference lets you look things up, explanation gives the why — and mixing them is why documentation fails. 2. **You already write documentation.** A model card and a data dictionary are documentation; treat them as such, with the discipline of being kept current and written for a reader. 3. **Docstrings live in the code.** Co-located with the function, accessible at runtime and in the IDE, they're the documentation least likely to drift — write them to a consistent convention. 4. **Fight rot structurally.** Keep docs near the code, generate reference from the code, and make examples executable — and prioritise the README, the cheapest, highest-leverage document there is. The next chapter confronts what accumulates when these practices are skipped under deadline — and how to manage it: *technical debt*. ## Exercises {#sec-documentation-exercises} 1. Write or rewrite the README for one of your projects so that a newcomer can go from `git clone` to a running system in minutes — what it does, how to set up the environment, how to run it, where the data comes from. Hand it to someone unfamiliar and note the first point at which they get stuck. 2. Add proper docstrings to the public functions of one of your modules — what it does, parameters, returns, and an example. Then call `help()` on one and ask whether the rendered documentation is enough to *use* the function without reading its body. 3. Write a model card for a model you've built: intended use, training data, evaluation, known limitations, and where it should not be used. Which section was hardest to write, and what does that difficulty tell you about the model? 4. **Conceptual:** Using the Diátaxis categories (tutorial, how-to, reference, explanation), classify a README, a docstring, a tutorial notebook, and a model card. Where does each fit, and why does mixing two of these jobs into one document make it worse at both? 5. **Conceptual:** Documentation rots. Describe two practices that structurally keep documentation in sync with the code, and explain why "remember to update the docs" is not one of them.