Appendix A — Tooling reference
This appendix is a one-stop reference for the tools the book uses, grouped by the job they do. For each, it gives what the tool is for, the command or entry point that gets you started, and the chapter that introduces it. You do not need all of them, and you certainly do not need them all at once — reach for each when the problem it solves actually appears in your work. Where two tools do a similar job, the more modern or widely used default is listed first.
A.1 Version control and collaboration
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| Git | Versioning code and recording the history of decisions | git init, git add, git commit |
2 |
| nbdime | Notebook-aware diffs and merges (readable .ipynb changes) |
nbdiff notebook.ipynb |
2 |
| Jupytext | Pairing a notebook with a plain-text script that diffs cleanly | jupytext --set-formats ipynb,py:percent nb.ipynb |
2 |
| nbstripout | Stripping notebook outputs before commit | nbstripout --install |
2 |
| GitHub (or GitLab) | Hosting repositories, pull requests, and code review | open a pull request | 17 |
A.2 Environments and packaging
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| venv | Isolating a project’s Python environment | python -m venv .venv |
3 |
| pip-tools | Compiling an abstract spec into a pinned lockfile | pip-compile requirements.in |
3 |
| uv | A fast, modern alternative for environments and locking | uv pip compile, uv venv |
3 |
| conda / mamba | Environments with heavy binary or GPU dependencies | conda create -n proj |
3 |
| pyproject.toml | Declaring an installable package and its dependencies | pip install -e . |
6 |
A.3 Code quality
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| ruff | Fast linting and formatting in one tool | ruff check ., ruff format . |
5 |
| black | Opinionated code formatting | black . |
5 |
| mypy | Static type checking from your type hints | mypy src/ |
5 |
A.4 Testing, debugging, and profiling
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| pytest | Running tests; the standard test runner | pytest |
7 |
| Hypothesis | Property-based testing (generates inputs for you) | @given(...) |
7 |
| pdb | Interactive debugging at a breakpoint | breakpoint() |
8 |
| logging | Levelled, structured diagnostics that scale past print |
logging.getLogger(__name__) |
8 |
| cProfile / line_profiler | Finding where the time actually goes | python -m cProfile -s cumtime script.py |
8 |
A.5 Project structure and pipelines
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| cookiecutter | Scaffolding a standard project layout | cookiecutter <template> |
9 |
| make | A simple task runner and dependency graph | a Makefile with named targets |
4, 10 |
| Snakemake / Prefect / Dagster | Orchestrating complex, scheduled pipelines with retries | declare stages and dependencies | 10 |
| pandera / Great Expectations | Validating a DataFrame against a declared schema | schema.validate(df) |
10 |
A.6 Configuration, secrets, and APIs
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| pydantic | Typed, validated configuration and data models | class Config(BaseModel): ... |
11, 12 |
| PyYAML | Reading and writing YAML configuration files | yaml.safe_load(...) |
11 |
| python-dotenv | Loading secrets from a local, untracked .env |
load_dotenv() |
11 |
| Hydra | Composing and sweeping over experiment configurations | @hydra.main(...) |
11 |
| FastAPI | Serving a model behind a typed HTTP API | @app.post("/predict") |
12 |
| uvicorn | The server that runs a FastAPI app | uvicorn main:app |
12 |
| httpx | Making HTTP requests; backs FastAPI’s TestClient |
httpx.post(...) |
12 |
A.7 Operations: containers, CI/CD, deployment, monitoring
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| Docker | Packaging the whole environment as a portable image | a Dockerfile, docker build |
14 |
| docker-compose | Running several services together as one stack | compose.yaml, docker compose up |
14 |
| GitHub Actions | Running tests and checks automatically on every change | .github/workflows/ci.yml |
13 |
| pre-commit | Fast local checks before a commit is recorded | .pre-commit-config.yaml, pre-commit install |
13 |
| cron / Airflow | Scheduling batch jobs (simple to complex) | a cron entry, or an Airflow DAG |
15 |
| Prometheus / Grafana | Collecting and dashboarding service metrics | scrape /metrics, build a dashboard |
16 |
| Evidently | Off-the-shelf data and prediction drift reports | compare a reference to live data | 16 |
A.8 Documentation, data, and model versioning
| Tool | What it’s for | Where to start | Ch. |
|---|---|---|---|
| MkDocs / Sphinx / Quarto | Generating documentation from your code and prose | mkdocs serve / quarto render |
18 |
| DVC | Versioning large data and models alongside Git | dvc add data/raw/... |
22 |
| MLflow | Tracking experiments and registering model versions | mlflow.log_metric(...), the model registry |
23 |
| joblib | Serialising a trained model to a portable artefact | joblib.dump(model, path) |
15, 21 |
A closing note in the spirit of the book: this list is deliberately a menu, not a checklist. A throwaway analysis needs almost none of it; a model that real users depend on eventually touches most of it. The skill is choosing the smallest set of tools that makes a given piece of work reliable enough for what it has to do — and reaching for the next one only when the problem it solves is the problem you actually have.