17  Code review

17.1 Your first pull request

You’ve done the engineering. The logic is in a module, it’s tested, it’s configured, and you open a pull request to merge it. Within the hour, twelve comments come back. Some question your approach, some point out a case you didn’t handle, one asks why you named something the way you did. If you’ve worked alone — as most data scientists have — this lands as criticism: the work was put forward and found wanting.

It isn’t. Code review is a normal, expected, routine part of how teams ship software, applied to every change regardless of who wrote it, and the most senior engineer on the team gets the same twelve comments. Reframing it from “my work is being judged” to “my work is getting a second pair of eyes” is the first and most important step, because review is one of the highest-value practices in this book — and one that data science culture, built around solo notebooks, almost entirely lacks.

17.2 What review is for

Review does two jobs at once. The obvious one is quality: a second reader catches what the author cannot see — a missed edge case, a subtle bug, an assumption that won’t hold — precisely because they didn’t write it and aren’t blinded by what they meant. The less obvious, and arguably more valuable, job is knowledge sharing: the reviewer comes away understanding how the code works, so the project no longer lives in one person’s head. A team where every change is reviewed is a team where no single person leaving takes the only understanding of a system with them.

What review is not for is style. Whether the quotes are single or double, whether the line is too long — those are for the formatter and linter to settle automatically (Chapter 5), not for a human to litigate in comments. A review that descends into style nitpicks is one where the automation hasn’t been set up yet.

NoteData Science Bridge

Code review is peer review, applied to code. You already know and trust peer review: before an analysis goes to a stakeholder or a paper goes out, a colleague reads it, checks the reasoning, questions the method, and catches the error you were too close to see. Code review is the same institution — a knowledgeable peer reads your change, examines the logic, and pushes back where it’s unclear or wrong. The cultural machinery you respect in research already exists for code.

Where the analogy breaks down: peer review of a paper is rare and heavy — a whole study, reviewed in depth, once. Code review is frequent and light — a single change, reviewed quickly, many times a week. That difference inverts an instinct carried over from research, where you polish a large body of work before submitting it. Code review works best in the opposite mode: small, frequent changes reviewed in minutes, not a quarter’s work dropped in one enormous request.

17.3 Small changes, reviewed often

The single biggest determinant of whether review works is the size of the change. A thousand-line pull request gets a rubber stamp — no reviewer can hold that much in their head, so they skim it and approve. A fifty-line pull request gets a real review, because a reviewer can actually understand all of it and reason about whether it’s correct. Small changes are also reversible (the small commits of Chapter 2) and keep the main branch moving rather than blocked behind a giant merge.

This cuts directly against a notebook habit: the instinct to do all the work, then share it once it’s “done”. Reviewable work is the opposite — a stream of small, self-contained changes, each understandable on its own. Splitting work this way is a skill, and it’s worth developing, because it’s what makes everything else about review function.

17.4 What automated checks free you to do

Continuous integration (Chapter 13) and linters handle everything mechanical: does it pass the tests, is it formatted, does it type-check. That isn’t a substitute for review — it’s what enables good review, by clearing away the trivia so the human can spend their attention on what only a human can judge. And what only a human can judge is the conceptual correctness that no tool will catch:

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

rng = np.random.default_rng(42)
X = rng.normal(size=(200, 3))

# Both versions run, pass type checks, and lint cleanly. One leaks.
def prepare_leaky(X):
    X_scaled = StandardScaler().fit_transform(X)          # fit on ALL data...
    return train_test_split(X_scaled, test_size=0.25)      # ...then split

def prepare_correct(X):
    X_train, X_test = train_test_split(X, test_size=0.25)
    scaler = StandardScaler().fit(X_train)                 # fit on TRAIN only
    return scaler.transform(X_train), scaler.transform(X_test)

print("Both functions execute without error and pass every automated check.")
print("Only a reviewer who knows what they're reading catches that")
print("prepare_leaky() fits the scaler on test data — a leak no linter sees.")
Both functions execute without error and pass every automated check.
Only a reviewer who knows what they're reading catches that
prepare_leaky() fits the scaler on test data — a leak no linter sees.

Both functions run, both pass the linter and the type checker, both would sail through CI. But prepare_leaky fits the scaler on the whole dataset, letting information from the test set bleed into preprocessing — the kind of conceptual flaw that produces an over-optimistic evaluation and that only a human reviewer who understands the domain will catch. This is the heart of what review is for: the bugs that aren’t syntactic.

17.5 Giving and receiving review

Giving good review is a skill. Be specific (point at the line, not the vibe), be kind (it’s the code under discussion, never the person), and distinguish a blocking problem (“this leaks the test set”) from a suggestion (“this might read more clearly as a comprehension”) so the author knows what must change versus what’s optional — and say why, because the reasoning is what teaches. Receiving review is also a skill: read comments as being about the code, respond to each (even if only to disagree, with a reason), and ask when something’s unclear rather than guessing.

A short checklist helps a reviewer be systematic: is it correct, is it tested, is it readable, are the edge cases handled, and — for data science specifically — does it leak, does it hard-code a secret, are the data assumptions stated. The first three are general; the last few are the ones a data science team must add.

TipAuthor’s Note

Data science work is overwhelmingly solo and unreviewed. Your notebook is yours; nobody reads it; the first time another human sees the code is often when it has already broken something. Against that backdrop, review feels like exposure — an audit, a vote of no confidence — and the instinct is to be defensive.

The reframe is that being reviewed is a privilege rather than an indictment. Twelve comments means someone has actually read and now understands your work — so it’s no longer trapped in your head, can be maintained by someone else, and has had at least one bug caught before it became your 3am incident. That is the cheapest quality-and-knowledge mechanism a team has, and it’s running for you. The same is true in reverse: reviewing other people’s code is the fastest way to learn a codebase, far quicker than any documentation, because you see how it’s really put together one change at a time. The discomfort of the first few reviews is the price of never again being the only person who understands a critical piece of work.

17.6 Summary

Review is how a team keeps quality high and knowledge shared:

  1. Review does two jobs. It catches what the author can’t see, and it spreads understanding so a system doesn’t live in one head. Style is the linter’s job, not the reviewer’s.

  2. Small changes get real review. A huge pull request gets rubber-stamped; a small one gets understood — so split work into self-contained changes, against the notebook instinct to share it all at once.

  3. Automation frees humans for the conceptual. CI and linters handle the mechanical so reviewers can focus on correctness, design, and the bugs no tool catches — like a data leak that runs perfectly.

  4. Review is a skill, both ways. Give it specifically and kindly, separating blocking issues from suggestions; receive it as being about the code, responding to each comment.

The next chapter is about the other half of sharing understanding — the kind that doesn’t need a reviewer present to read it: documentation.

17.7 Exercises

  1. Open a small, focused pull request (aim for under ~100 lines) for a change to one of your own projects, with a description of what it does and why. Then read it back as a reviewer would — is it easy to follow, and what specifically made it so or not?

  2. Review someone else’s pull request (a colleague’s, or an open-source one) and leave at least three comments, each clearly marked as a blocking issue or a suggestion, each saying why. Which was harder: finding the issues, or phrasing them constructively?

  3. Find (or plant) a conceptual bug that automated checks would miss — a data leak, the wrong metric, an off-by-one in a split — in some code. Explain why a linter and the tests pass it, and what kind of attention from a reviewer would catch it.

  4. Conceptual: The Data Science Bridge compares code review to peer review of a paper. Give one way the analogy holds and one way it breaks down. How do the cadence and size of the two differ, and what does that imply about how to submit work for review?

  5. Conceptual: Draft a short code-review checklist for a data science team. What items would you add that a general software checklist wouldn’t have, and what would you deliberately leave off — and why does leaving those off make reviews better?