In the previous chapter’s opening we met a colleague who pulls your notebook and gets different numbers, and spends two hours on a call before discovering that scikit-learn changed a default. That story is so familiar it’s a punchline, but it’s worth taking seriously, because it points at something most data science training never mentions: your code does not run in isolation. It runs on top of a tower of other people’s code — pandas, NumPy, scikit-learn, and the hundreds of packages they in turn depend on — pinned to a particular Python interpreter, on a particular operating system. Change any layer of that tower and the behaviour of your code can change, even though you didn’t touch a line of it.
You already control one source of this variation without thinking about it. When you write train_test_split(X, y, random_state=42), you’re fixing the random seed so the split is reproducible. But you leave the version of scikit-learn that performs the split floating free, and a new release that changes a default — the solver, the handling of a tie, the meaning of a parameter — will shift your results just as surely as an unseeded random number generator would. The seed is an input you learned to control. The environment is an input you probably haven’t.
3.2 The environment is an input
It helps to stop thinking of the environment as “what happens to be installed” and start thinking of it as an input to your analysis, on equal footing with the code and the data. A result is the product of all three. If you change the data, you expect different numbers; you’d never call a finding reproducible if it only held for one particular CSV that lives on your laptop. The environment is no different, but because it’s invisible — you don’t import a version number, you just import pandas — it’s easy to forget it’s there at all.
NoteData Science Bridge
Pinning your dependencies is the same move as fixing a random seed, applied one level down. You set random_state so that the one stochastic step in your pipeline produces the same output every time; pinning versions does the same for the code your pipeline is built from. A fully specified environment is a controlled experiment — you hold every factor fixed so that when a result changes, you know it changed because of something you did, not because a dependency shifted underneath you.
The analogy has a limit worth stating. A fixed seed gives bit-for-bit identical results within a given environment. Pinning package versions removes one more source of variation, but it still can’t promise identical numbers across different hardware or operating systems — floating-point arithmetic and the underlying maths libraries (BLAS, LAPACK) can differ. Closing that last gap needs the heavier machinery of containers (the subject of Containerisation), and even then GPU operations can introduce their own non-determinism. Pinning versions is not the whole of reproducibility, but it is the cheapest and highest-leverage part of it.
3.3 One project, one environment
The first practical step is isolation: each project gets its own environment, separate from your system Python and from every other project. The reason is collisions. Project A needs an old version of a library because you haven’t had time to migrate it; project B needs the latest. If both draw on a single shared set of installed packages, upgrading for B silently breaks A, and you won’t find out until you next run it. A data scientist juggling several analyses on one machine hits this constantly, usually diagnosing it as “something broke” rather than “my environments are entangled”.
A virtual environment solves this. venv (built into Python) creates an isolated directory with its own copy of the interpreter and its own installed packages:
python-m venv .venv # create an isolated environment in .venv/source .venv/bin/activate # activate it (Windows: .venv\Scripts\activate)pip install -r requirements.txt
For projects with heavy binary or GPU dependencies, conda (or its faster sibling mamba) plays the same role and additionally manages non-Python libraries like CUDA. The choice between them matters less than the habit: one environment per project, never installed into the system Python, and the environment directory itself excluded from version control (it’s generated, not authored — exactly the distinction from the previous chapter’s .gitignore).
3.4 Abstract versus locked dependencies
Here is the distinction that the requirements.txt you’ve probably seen does not make clear, and it’s the heart of reproducible environments. There are two different things you might want to write down, and they serve different purposes.
An abstract specification records what your project can tolerate: “I need pandas 2.2 or newer.” It’s written by a human, expresses intent, and deliberately leaves room — you don’t care whether you get 2.2.1 or 2.2.3, only that it’s recent enough. A locked specification records what you actually ran: the exact version of every package, including the dependencies of your dependencies, resolved at a moment in time. It’s generated by a tool, not hand-written, and it exists to be reproduced exactly.
The trouble with a hand-written requirements.txt full of >= constraints is that it’s an abstract spec masquerading as a reproducible one. Two people who install it on different days get different environments, because “newest version satisfying >=2.2” is a moving target.
from importlib.metadata import version# What an abstract requirements file might *ask* for...requested = {"numpy": ">=1.26","pandas": ">=2.2","scikit-learn": ">=1.4",}# ...versus what is *actually installed* in this environment right now.print(f"{'package':<16}{'requested':<12}{'installed'}")for package, spec in requested.items():print(f"{package:<16}{spec:<12}{version(package)}")print("\nThe abstract spec is the left column; a lockfile records the right one.")
package requested installed
numpy >=1.26 2.4.6
pandas >=2.2 3.0.3
scikit-learn >=1.4 1.9.0
The abstract spec is the left column; a lockfile records the right one.
The installed versions on the right are what produced these results. If you only ever record the left column, you’re trusting that whoever reproduces your work happens to resolve to the same versions you did — which, over time, they won’t. A lockfile captures the right column so the environment can be rebuilt exactly.
The tooling has matured a great deal here. pip freeze produces a crude lock — a flat list of everything currently installed — but it doesn’t distinguish your direct dependencies from their transitive ones, so it’s hard to maintain. pip-tools formalises the two-file split: you write your intent in requirements.in (the abstract spec) and run pip-compile to generate a fully pinned requirements.txt (the lock), with every version resolved and annotated with the reason it’s there. uv does the same far faster and is rapidly becoming the default; poetry and pipenv bundle the same idea with environment management. The common principle, whichever tool you pick: develop against the abstract spec, reproduce against the lock.
3.5 Recreating an environment
A lockfile earns its keep the moment someone — a colleague, a server, or you in six months — needs to rebuild your environment. With one, the recipe is mechanical: create a fresh virtual environment and install from the lock, and you get back the exact set of versions that produced the original results.
One thing the lockfile does not capture is the Python interpreter itself. A project locked against Python 3.11 can still misbehave on 3.13, so record the intended Python version too — in a .python-version file, in your pyproject.toml, or in the project README. This is also the natural boundary where pinning versions stops being enough and containerisation takes over: a container image freezes the interpreter, the system libraries, and the operating system alongside your packages, sealing the whole tower rather than just its top. We’ll build one in Containerisation; for now, a lockfile plus a recorded Python version covers the great majority of reproduction failures.
TipAuthor’s Note
Data scientists tend to treat the environment as ambient — it’s “just what’s installed”, something Anaconda or the platform team manages, not part of the actual work. Recording it feels like plumbing: tedious, far from the science, the kind of thing you’ll get to later. The resistance is understandable, because in the exploratory phase it genuinely doesn’t matter; you’re the only person running the code and the environment is whatever it is.
The reframing that makes it click is to recognise that the environment is as much a part of your result as the code and the data. A finding you can’t reproduce because “it needed an older pandas and I can’t remember which” isn’t yet a finding — it’s a rumour. Recording the environment is the cheapest insurance in this entire book: one pip-compile now, against a lost afternoon of version archaeology later, or a result that quietly stops being reproducible the moment you upgrade. It is the smallest possible act of taking your own work seriously enough to expect it to outlive the session that produced it.
3.6 Summary
Reproducible work depends on controlling the environment, not just the code:
The environment is an input. Your results depend on the versions of everything you import and the interpreter you run, just as they depend on the data. Invisible doesn’t mean absent.
Isolate every project. One virtual environment per project keeps one analysis from breaking another, and keeps the environment directory out of version control.
Abstract and locked specifications are different jobs. A >= requirements file expresses what you can tolerate; a lockfile records what you actually ran. Develop against the first, reproduce against the second.
Record what the lockfile misses. Pin the Python version alongside the packages, and reach for containers when you need to seal the interpreter and operating system too.
In the next chapter we leave the notebook behind entirely and learn to run code where there’s no button to click: the command line.
3.7 Exercises
Take one of your own projects and produce a proper lockfile from its dependencies — declare your direct dependencies in a requirements.in and run pip-compile (or uv pip compile) to generate a pinned requirements.txt. Then create a fresh virtual environment, install from the lock, and confirm the project still runs.
Audit a recent project for unpinned dependencies: compare what’s declared (often nothing, or a few >= constraints) against what pip freeze reports is actually installed. Identify one library where a major-version upgrade could plausibly change your results, and explain how.
Conceptual: The chapter argues that a locked environment is like a controlled experiment. Name one source of variation that pinning your Python-package versions controls, and one source it does not. Which tool would you reach for to control the second?
Reproduce a colleague’s environment from their lockfile (or, if they don’t have one, help them generate it). Document every step that the lockfile alone didn’t capture — the Python version, a system library, an environment variable — and note where that missing information should have lived.
Conceptual: Not every project warrants a fully pinned lockfile. Describe one situation where abstract >= requirements are the right choice, and one where an exact lock is essential. What is different about the two situations that justifies the different level of rigour?
---# Content: CC BY-NC-SA 4.0 | Code: MIT - see /LICENSE.mdtitle: "Environments and dependencies"---## "It works on my machine" {#sec-works-on-my-machine}In the previous chapter's opening we met a colleague who pulls your notebook and gets different numbers, and spends two hours on a call before discovering that `scikit-learn` changed a default. That story is so familiar it's a punchline, but it's worth taking seriously, because it points at something most data science training never mentions: your code does not run in isolation. It runs on top of a tower of other people's code — pandas, NumPy, scikit-learn, and the hundreds of packages they in turn depend on — pinned to a particular Python interpreter, on a particular operating system. Change any layer of that tower and the behaviour of your code can change, even though you didn't touch a line of it.You already control one source of this variation without thinking about it. When you write `train_test_split(X, y, random_state=42)`, you're fixing the random seed so the split is reproducible. But you leave the *version* of scikit-learn that performs the split floating free, and a new release that changes a default — the solver, the handling of a tie, the meaning of a parameter — will shift your results just as surely as an unseeded random number generator would. The seed is an input you learned to control. The environment is an input you probably haven't.## The environment is an input {#sec-environment-is-input}It helps to stop thinking of the environment as "what happens to be installed" and start thinking of it as an input to your analysis, on equal footing with the code and the data. A result is the product of all three. If you change the data, you expect different numbers; you'd never call a finding reproducible if it only held for one particular CSV that lives on your laptop. The environment is no different, but because it's invisible — you don't import a version number, you just `import pandas` — it's easy to forget it's there at all.::: {.callout-note}## Data Science BridgePinning your dependencies is the same move as fixing a random seed, applied one level down. You set `random_state` so that the one stochastic step in your pipeline produces the same output every time; pinning versions does the same for the *code* your pipeline is built from. A fully specified environment is a controlled experiment — you hold every factor fixed so that when a result changes, you know it changed because of something you did, not because a dependency shifted underneath you.The analogy has a limit worth stating. A fixed seed gives bit-for-bit identical results *within a given environment*. Pinning package versions removes one more source of variation, but it still can't promise identical numbers across different hardware or operating systems — floating-point arithmetic and the underlying maths libraries (BLAS, LAPACK) can differ. Closing that last gap needs the heavier machinery of containers (the subject of *Containerisation*), and even then GPU operations can introduce their own non-determinism. Pinning versions is not the whole of reproducibility, but it is the cheapest and highest-leverage part of it.:::## One project, one environment {#sec-isolation}The first practical step is isolation: each project gets its own environment, separate from your system Python and from every other project. The reason is collisions. Project A needs an old version of a library because you haven't had time to migrate it; project B needs the latest. If both draw on a single shared set of installed packages, upgrading for B silently breaks A, and you won't find out until you next run it. A data scientist juggling several analyses on one machine hits this constantly, usually diagnosing it as "something broke" rather than "my environments are entangled".A virtual environment solves this. `venv` (built into Python) creates an isolated directory with its own copy of the interpreter and its own installed packages:```bashpython-m venv .venv # create an isolated environment in .venv/source .venv/bin/activate # activate it (Windows: .venv\Scripts\activate)pip install -r requirements.txt```For projects with heavy binary or GPU dependencies, `conda` (or its faster sibling `mamba`) plays the same role and additionally manages non-Python libraries like CUDA. The choice between them matters less than the habit: one environment per project, never installed into the system Python, and the environment directory itself excluded from version control (it's generated, not authored — exactly the distinction from the previous chapter's `.gitignore`).## Abstract versus locked dependencies {#sec-abstract-vs-locked}Here is the distinction that the `requirements.txt` you've probably seen does not make clear, and it's the heart of reproducible environments. There are two different things you might want to write down, and they serve different purposes.An **abstract** specification records what your project can *tolerate*: "I need pandas 2.2 or newer." It's written by a human, expresses intent, and deliberately leaves room — you don't care whether you get 2.2.1 or 2.2.3, only that it's recent enough. A **locked** specification records what you *actually ran*: the exact version of every package, including the dependencies of your dependencies, resolved at a moment in time. It's generated by a tool, not hand-written, and it exists to be reproduced exactly.The trouble with a hand-written `requirements.txt` full of `>=` constraints is that it's an abstract spec masquerading as a reproducible one. Two people who install it on different days get different environments, because "newest version satisfying `>=2.2`" is a moving target.```{python}#| label: requested-vs-installed#| echo: truefrom importlib.metadata import version# What an abstract requirements file might *ask* for...requested = {"numpy": ">=1.26","pandas": ">=2.2","scikit-learn": ">=1.4",}# ...versus what is *actually installed* in this environment right now.print(f"{'package':<16}{'requested':<12}{'installed'}")for package, spec in requested.items():print(f"{package:<16}{spec:<12}{version(package)}")print("\nThe abstract spec is the left column; a lockfile records the right one.")```The installed versions on the right are what produced these results. If you only ever record the left column, you're trusting that whoever reproduces your work happens to resolve to the same versions you did — which, over time, they won't. A lockfile captures the right column so the environment can be rebuilt exactly.The tooling has matured a great deal here. `pip freeze` produces a crude lock — a flat list of everything currently installed — but it doesn't distinguish your direct dependencies from their transitive ones, so it's hard to maintain. `pip-tools` formalises the two-file split: you write your intent in `requirements.in` (the abstract spec) and run `pip-compile` to generate a fully pinned `requirements.txt` (the lock), with every version resolved and annotated with the reason it's there. `uv` does the same far faster and is rapidly becoming the default; `poetry` and `pipenv` bundle the same idea with environment management. The common principle, whichever tool you pick: **develop against the abstract spec, reproduce against the lock.**## Recreating an environment {#sec-recreating}A lockfile earns its keep the moment someone — a colleague, a server, or you in six months — needs to rebuild your environment. With one, the recipe is mechanical: create a fresh virtual environment and install from the lock, and you get back the exact set of versions that produced the original results.```bashpython-m venv .venvsource .venv/bin/activatepip install -r requirements.txt # the locked, fully pinned file```One thing the lockfile does *not* capture is the Python interpreter itself. A project locked against Python 3.11 can still misbehave on 3.13, so record the intended Python version too — in a `.python-version` file, in your `pyproject.toml`, or in the project README. This is also the natural boundary where pinning versions stops being enough and containerisation takes over: a container image freezes the interpreter, the system libraries, and the operating system alongside your packages, sealing the whole tower rather than just its top. We'll build one in *Containerisation*; for now, a lockfile plus a recorded Python version covers the great majority of reproduction failures.::: {.callout-tip}## Author's NoteData scientists tend to treat the environment as ambient — it's "just what's installed", something Anaconda or the platform team manages, not part of the actual work. Recording it feels like plumbing: tedious, far from the science, the kind of thing you'll get to later. The resistance is understandable, because in the exploratory phase it genuinely doesn't matter; you're the only person running the code and the environment is whatever it is.The reframing that makes it click is to recognise that the environment is as much a part of your result as the code and the data. A finding you can't reproduce because "it needed an older pandas and I can't remember which" isn't yet a finding — it's a rumour. Recording the environment is the cheapest insurance in this entire book: one `pip-compile` now, against a lost afternoon of version archaeology later, or a result that quietly stops being reproducible the moment you upgrade. It is the smallest possible act of taking your own work seriously enough to expect it to outlive the session that produced it.:::## Summary {#sec-environments-summary}Reproducible work depends on controlling the environment, not just the code:1. **The environment is an input.** Your results depend on the versions of everything you import and the interpreter you run, just as they depend on the data. Invisible doesn't mean absent.2. **Isolate every project.** One virtual environment per project keeps one analysis from breaking another, and keeps the environment directory out of version control.3. **Abstract and locked specifications are different jobs.** A `>=` requirements file expresses what you can tolerate; a lockfile records what you actually ran. Develop against the first, reproduce against the second.4. **Record what the lockfile misses.** Pin the Python version alongside the packages, and reach for containers when you need to seal the interpreter and operating system too.In the next chapter we leave the notebook behind entirely and learn to run code where there's no button to click: *the command line*.## Exercises {#sec-environments-exercises}1. Take one of your own projects and produce a proper lockfile from its dependencies — declare your direct dependencies in a `requirements.in` and run `pip-compile` (or `uv pip compile`) to generate a pinned `requirements.txt`. Then create a fresh virtual environment, install from the lock, and confirm the project still runs.2. Audit a recent project for unpinned dependencies: compare what's declared (often nothing, or a few `>=` constraints) against what `pip freeze` reports is actually installed. Identify one library where a major-version upgrade could plausibly change your results, and explain how.3. **Conceptual:** The chapter argues that a locked environment is like a controlled experiment. Name one source of variation that pinning your Python-package versions controls, and one source it does *not*. Which tool would you reach for to control the second?4. Reproduce a colleague's environment from their lockfile (or, if they don't have one, help them generate it). Document every step that the lockfile alone didn't capture — the Python version, a system library, an environment variable — and note where that missing information should have lived.5. **Conceptual:** Not every project warrants a fully pinned lockfile. Describe one situation where abstract `>=` requirements are the right choice, and one where an exact lock is essential. What is different about the two situations that justifies the different level of rigour?