Appendix B — Concept bridge: Data science ↔︎ Software engineering

When you first hear “continuous integration” it sounds like operations jargon, until you realise it is re-running your evaluation suite on every change. A test suite is a holdout set for code. Pinned dependencies are a fixed random seed, one level down. A model registry is an experiment log for things you ship. The concepts are mostly ones you already own; the vocabulary is what’s unfamiliar.

This appendix maps the data science concepts you know to the software engineering concepts this book teaches. It is organised by theme rather than by chapter, so you can look up an idea you understand and find the engineering practice it connects to. Where the analogy breaks down, that is noted: a misleading bridge is worse than no bridge at all.

B.1 Reproducibility and control

These map the instinct to control sources of variation — which you already apply to randomness — onto the inputs engineering also controls.

DS concept	SE concept	How they connect	Ch.
Fixing a random seed	Pinning dependencies (a lockfile)	Both remove a source of variation so a result repeats. A seed pins the randomness; a lockfile pins the code your result is built on. The mechanism differs: a seed is one integer in the code, a lockfile is generated and records every transitive version.	3
A frozen holdout set	Versioned data (DVC)	Both pin an input so comparisons are fair and results reproducible. Unlike a seed, data is large and lives outside the code, so it needs external storage and a committable pointer rather than a line in a script.	22
The experiment log / lab notebook	Git history	Both are a dated, attributed record of what you tried and why. Git versions files, not runs, so it answers “what is the code?” — an experiment tracker still answers “what did this run score?”.	2
A controlled experiment	A reproducible pipeline	Hold every input fixed and the only thing that can move the result is a change you made deliberately. Reproducibility is that discipline applied to a whole analysis.	22

B.2 Testing and validation

The richest source of shared-but-divergent vocabulary, and where the difference in verdict matters most.

DS concept	SE concept	How they connect	Ch.
A holdout set	A test suite	Both verify against cases the development process didn’t see. The crucial difference: a holdout yields a graded score you interpret; a test yields pass/fail, and “mostly passes” is a broken test.	7
Model validation (a continuum)	Software testing (binary)	The same word, “validation”, but model validation tolerates degradation (82% may be fine) while a test is a gate. Carrying the continuum mindset into testing produces flaky tests.	7
Data validation / schema checks	Input validation at boundaries	Rejecting malformed data at a pipeline seam, or a bad request at an API, is the same move: catch the problem where it enters, not three stages downstream.	10, 12
Re-running until the numbers look right	A flaky test re-run until green	Both exploit repeated attempts to get a misleadingly good result by chance — the peeking problem, reincarnated as the test everyone re-runs until it passes.	7, 13

B.3 Code, reuse, and configuration

DS concept	SE concept	How they connect	Ch.
A `PARAMS` block or hyperparameter dict	Application configuration	Both pull the knobs out of the logic into one adjustable place. Application config adds a dimension a hyperparameter dict lacks: the same code, different values across dev, staging, and production.	11
A helper copied across notebooks	A package / importable module	The copy-paste you already do is duplication waiting to drift; a package is the single source of truth you import instead, so a fix lands everywhere at once.	6
`df2`, `tmp`, `final_final`	Readable code (names, types, docstrings)	Exploratory naming is fine while code is disposable; the moment it’s kept, names become the cheapest documentation there is.	5

B.4 Pipelines and architecture

DS concept	SE concept	How they connect	Ch.
An `sklearn` `Pipeline`	A data / workflow pipeline	The chained-transformers idea you use for modelling, scaled up to the whole workflow as composable stages — but spanning processes and files, so it needs explicit artefacts and an orchestrator.	10
Chaining pandas operations	Shell pipes	Both compose small single-purpose steps left to right; the shell passes untyped text between programs, pandas passes a typed object within one process.	4
The analysis as a sequence of cells	A build graph (`make`)	A pipeline declares each stage’s inputs and outputs, so a runner rebuilds only what changed — the cure for re-running the whole notebook to update one figure.	10

B.5 Serving and operations

DS concept	SE concept	How they connect	Ch.
`model.predict()`	An API endpoint	An endpoint is `predict` over HTTP. The difference is the caller: a notebook trusts its input, an endpoint receives it from strangers and must validate, handle errors, and version.	12
A saved model (`.pkl`)	A deployable build artefact	Both are produced once and run elsewhere; the model artefact is versioned, stored, and loaded at serving time rather than retrained in place.	14, 15
Tracking experiment runs	A model registry / versioned releases	The registry is the experiment log for things you ship — each model versioned with its lineage and metrics, so promotion and rollback are a matter of naming a version.	23
The experiment–iterate cycle	The MLOps loop (CI/CD for models)	The train–evaluate–adjust loop you drive by hand, automated and triggered by monitoring instead of curiosity — with your judgement encoded as gates because no one watches each turn.	23

B.6 Monitoring

DS concept	SE concept	How they connect	Ch.
Validating a model once	Monitoring in production	Monitoring is validation that never stops — but production labels lag or never arrive, so you watch the inputs and predictions as a proxy rather than measuring accuracy directly.	16
Comparing two distributions	Drift detection and alerting	The two-sample test or stability index you’d run to check whether two datasets match, wired to an alert that fires when live data drifts from the training reference.	16
Concept drift (the relationship changes)	A changed requirement breaking old assumptions	A model whose world has shifted is like code whose spec quietly changed: it still runs, but it’s now solving the wrong problem.	16

B.7 Collaboration

DS concept	SE concept	How they connect	Ch.
Peer review of a paper	Code review	A knowledgeable peer checks the reasoning before it counts — but code review is frequent and small (one change), not rare and large (a whole study).	17
A model card / data dictionary	Documentation	You already write these; treating them as documentation — kept current, written for a reader — is the shift. A model card carries risk and limitation sections ordinary code docs don’t.	18
The messy notebook you’re afraid to touch	Technical debt	The interest you pay reopening a tangled analysis is technical debt — invisible until you have to change the code, then charged all at once.	19
The DS/SE vocabulary gap	A data contract / team interface	An agreed schema lets two systems exchange data without guessing; a team contract does the same for people — plus the intent and ownership no schema captures.	20

The recurring lesson across these is that crossing the bridge is rarely about learning a wholly new idea. It is about recognising a practice you already follow in one domain, seeing the engineering version of it, and — just as importantly — noticing the one place each analogy stops holding, because that gap is usually where the real work of adopting the practice lies.

--- # Content: CC BY-NC-SA 4.0 | Code: MIT - see /LICENSE.md title: "Concept bridge: Data science ↔ Software engineering" --- When you first hear "continuous integration" it sounds like operations jargon, until you realise it is re-running your evaluation suite on every change. A test suite is a holdout set for code. Pinned dependencies are a fixed random seed, one level down. A model registry is an experiment log for things you ship. The concepts are mostly ones you already own; the vocabulary is what's unfamiliar. This appendix maps the data science concepts you know to the software engineering concepts this book teaches. It is organised by theme rather than by chapter, so you can look up an idea you understand and find the engineering practice it connects to. Where the analogy breaks down, that is noted: a misleading bridge is worse than no bridge at all. ## Reproducibility and control {#sec-bridge-reproducibility} These map the instinct to control sources of variation — which you already apply to randomness — onto the inputs engineering also controls. | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | Fixing a random seed | Pinning dependencies (a lockfile) | Both remove a source of variation so a result repeats. A seed pins the randomness; a lockfile pins the code your result is built on. The mechanism differs: a seed is one integer in the code, a lockfile is generated and records every transitive version. | 3 | | A frozen holdout set | Versioned data (DVC) | Both pin an input so comparisons are fair and results reproducible. Unlike a seed, data is large and lives outside the code, so it needs external storage and a committable pointer rather than a line in a script. | 22 | | The experiment log / lab notebook | Git history | Both are a dated, attributed record of what you tried and why. Git versions *files*, not *runs*, so it answers "what is the code?" — an experiment tracker still answers "what did this run score?". | 2 | | A controlled experiment | A reproducible pipeline | Hold every input fixed and the only thing that can move the result is a change you made deliberately. Reproducibility is that discipline applied to a whole analysis. | 22 | ## Testing and validation {#sec-bridge-testing} The richest source of shared-but-divergent vocabulary, and where the difference in *verdict* matters most. | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | A holdout set | A test suite | Both verify against cases the development process didn't see. The crucial difference: a holdout yields a graded score you interpret; a test yields pass/fail, and "mostly passes" is a broken test. | 7 | | Model validation (a continuum) | Software testing (binary) | The same word, "validation", but model validation tolerates degradation (82% may be fine) while a test is a gate. Carrying the continuum mindset into testing produces flaky tests. | 7 | | Data validation / schema checks | Input validation at boundaries | Rejecting malformed data at a pipeline seam, or a bad request at an API, is the same move: catch the problem where it enters, not three stages downstream. | 10, 12 | | Re-running until the numbers look right | A flaky test re-run until green | Both exploit repeated attempts to get a misleadingly good result by chance — the peeking problem, reincarnated as the test everyone re-runs until it passes. | 7, 13 | ## Code, reuse, and configuration {#sec-bridge-code} | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | A `PARAMS` block or hyperparameter dict | Application configuration | Both pull the knobs out of the logic into one adjustable place. Application config adds a dimension a hyperparameter dict lacks: the same code, different values across dev, staging, and production. | 11 | | A helper copied across notebooks | A package / importable module | The copy-paste you already do is duplication waiting to drift; a package is the single source of truth you import instead, so a fix lands everywhere at once. | 6 | | `df2`, `tmp`, `final_final` | Readable code (names, types, docstrings) | Exploratory naming is fine while code is disposable; the moment it's kept, names become the cheapest documentation there is. | 5 | ## Pipelines and architecture {#sec-bridge-architecture} | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | An `sklearn` `Pipeline` | A data / workflow pipeline | The chained-transformers idea you use for modelling, scaled up to the whole workflow as composable stages — but spanning processes and files, so it needs explicit artefacts and an orchestrator. | 10 | | Chaining pandas operations | Shell pipes | Both compose small single-purpose steps left to right; the shell passes untyped text between programs, pandas passes a typed object within one process. | 4 | | The analysis as a sequence of cells | A build graph (`make`) | A pipeline declares each stage's inputs and outputs, so a runner rebuilds only what changed — the cure for re-running the whole notebook to update one figure. | 10 | ## Serving and operations {#sec-bridge-operations} | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | `model.predict()` | An API endpoint | An endpoint is `predict` over HTTP. The difference is the caller: a notebook trusts its input, an endpoint receives it from strangers and must validate, handle errors, and version. | 12 | | A saved model (`.pkl`) | A deployable build artefact | Both are produced once and run elsewhere; the model artefact is versioned, stored, and loaded at serving time rather than retrained in place. | 14, 15 | | Tracking experiment runs | A model registry / versioned releases | The registry is the experiment log for things you ship — each model versioned with its lineage and metrics, so promotion and rollback are a matter of naming a version. | 23 | | The experiment–iterate cycle | The MLOps loop (CI/CD for models) | The train–evaluate–adjust loop you drive by hand, automated and triggered by monitoring instead of curiosity — with your judgement encoded as gates because no one watches each turn. | 23 | ## Monitoring {#sec-bridge-monitoring} | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | Validating a model once | Monitoring in production | Monitoring is validation that never stops — but production labels lag or never arrive, so you watch the inputs and predictions as a proxy rather than measuring accuracy directly. | 16 | | Comparing two distributions | Drift detection and alerting | The two-sample test or stability index you'd run to check whether two datasets match, wired to an alert that fires when live data drifts from the training reference. | 16 | | Concept drift (the relationship changes) | A changed requirement breaking old assumptions | A model whose world has shifted is like code whose spec quietly changed: it still runs, but it's now solving the wrong problem. | 16 | ## Collaboration {#sec-bridge-collaboration} | DS concept | SE concept | How they connect | Ch. | | :--- | :--- | :--- | :---: | | Peer review of a paper | Code review | A knowledgeable peer checks the reasoning before it counts — but code review is frequent and small (one change), not rare and large (a whole study). | 17 | | A model card / data dictionary | Documentation | You already write these; treating them as documentation — kept current, written for a reader — is the shift. A model card carries risk and limitation sections ordinary code docs don't. | 18 | | The messy notebook you're afraid to touch | Technical debt | The interest you pay reopening a tangled analysis is technical debt — invisible until you have to change the code, then charged all at once. | 19 | | The DS/SE vocabulary gap | A data contract / team interface | An agreed schema lets two systems exchange data without guessing; a team contract does the same for people — plus the intent and ownership no schema captures. | 20 | The recurring lesson across these is that crossing the bridge is rarely about learning a wholly new idea. It is about recognising a practice you already follow in one domain, seeing the engineering version of it, and — just as importantly — noticing the one place each analogy stops holding, because that gap is usually where the real work of adopting the practice lies.