Pragmatic-Practitioner

Pragmatic Data Science with Python

Skip the toy datasets and tutorial-grade code. This book equips you with the modern Python data science stack — the tools, patterns, and hard-won lessons that separate production ML systems from Kaggle notebooks.

From high-performance data manipulation with Polars and DuckDB, through robust feature engineering and gradient-boosted models, to production deployment with FastAPI and Docker — every chapter confronts the real failure modes you will encounter and shows you how to handle them.

What You Will Learn

Production-grade environments: Dependency management with uv, type-checked data validation with Pydantic, and structured ML repositories
High-performance data manipulation: Polars DataFrames, DuckDB analytical SQL, and out-of-core processing
Robust feature engineering: Handling missing data mechanisms, target leakage, distribution shifts, and high-cardinality encodings
Pragmatic modeling: XGBoost/LightGBM tuning, monotonic constraints, imbalanced data strategies, and deep learning where it actually helps
Applied NLP and LLMs: RAG pipelines, vector databases, evaluation of generative output, and parameter-efficient fine-tuning
Honest evaluation: Calibration, proper cross-validation, A/B testing, and causal inference
Production deployment: ONNX packaging, FastAPI inference APIs, Docker containerization, drift detection, and retraining triggers

Chapter Overview

Chapter	Topic	Key Concepts
1	Setting Up for Production	uv, Pydantic, ML repo structure, DVC
2	High-Performance Data	Polars, DuckDB, lazy evaluation, out-of-core
3	Failure Modes of Real Data	Missing data, target leakage, outliers, distribution shift
4	Feature Engineering	Target encoding, time-series traps, embeddings, PCA/UMAP
5	Gradient Boosted Trees	XGBoost, LightGBM, monotonic constraints, imbalanced data
6	Deep Learning Where It Counts	PyTorch, tabular DL, transfer learning, GPU realities
7	Applied NLP & LLMs	RAG, vector DBs, LLM evaluation, LoRA fine-tuning
8	Evaluation Beyond Accuracy	Calibration, CV strategies, A/B testing
9	Causal Inference	Propensity scores, DiD, synthetic control, uplift
10	Deploying Models	ONNX, FastAPI, Docker, serverless
11	Monitoring & Continual Learning	MLflow, drift detection, feedback loops, canary releases

11 Chapters

5h 25m total

64,975 words

Feb 27, 2026

Start Reading

About This Book

Voice Pragmatic-Practitioner

Tone Direct, opinionated, and battle-tested

Categories

Analytical Definitional Argumentative Narrative

1

Setting Up for Production, Not Just Notebooks

This chapter confronts the uncomfortable gap between a data scientist's local notebook...

5 min read
This chapter confronts the uncomfortable gap between a data scientist's local notebook and a production system that runs reliably at 3 AM without human intervention. We examine the four pillars of production readiness — dependency management, type safety, repository structure, and data version control — and build a modern Python toolchain that prevents the most common deployment failures before they happen.
Read Chapter
1. Dependency Management and Type-Safe Validation
  9 min read
2. Repository Structure and Version Control for Data
  11 min read
2

High-Performance Data Manipulation

This chapter confronts the reality that Pandas — the default choice for...

4 min read
This chapter confronts the reality that Pandas — the default choice for Python data work — becomes a liability at scale. You will learn when and why Pandas fails, understand three competing paradigms (eager, lazy, and SQL-native), and build the mental model needed to pick the right tool for your workload. We cover Polars for high-performance lazy evaluation, DuckDB for analytical SQL without a server, and out-of-core techniques for datasets that exceed available RAM.
Read Chapter
1. Polars: The DataFrame That Respects Your Hardware
  9 min read
2. DuckDB and Out-of-Core Processing
  12 min read
3

Failure Modes of Real-World Data

This chapter maps the four failure modes that silently destroy production ML...

5 min read
This chapter maps the four failure modes that silently destroy production ML systems: missing data mechanisms, target leakage, outliers, and distribution shift. We open with a post-mortem of a model that achieved 99.5% accuracy yet cost its company $2M because of target leakage, then build a diagnostic framework for recognizing each failure mode from its symptoms. Every dataset you encounter in production has at least one of these problems — the question is whether you find it before or after deployment.
Read Chapter
1. Missing Data and Target Leakage
  12 min read
2. Outliers and Distribution Shift
  12 min read
4

High-Signal Feature Engineering

Feature engineering is where domain knowledge meets mathematical rigor, and it is...

7 min read
Feature engineering is where domain knowledge meets mathematical rigor, and it is the single highest-leverage activity in the entire ML pipeline. This chapter covers four pillars: categorical encoding for high-cardinality variables, time-series feature construction with temporal integrity, text featurization from TF-IDF to dense embeddings, and dimensionality reduction that preserves signal while compressing noise. Every technique includes runnable code and explicit warnings about the traps that silently corrupt models — target leakage through naive encoding, lookahead bias in lag features, vocabulary explosion in text pipelines, and information loss from premature dimensionality reduction.
Read Chapter
1. Categorical Encoding and Time-Series Features
  13 min read
2. Text Features and Dimensionality Reduction
  13 min read
5

The 'You Probably Just Need XGBoost' Chapter

Most tabular ML problems have the same answer: XGBoost or LightGBM. This...

7 min read
Most tabular ML problems have the same answer: XGBoost or LightGBM. This chapter confronts that uncomfortable truth head-on, then equips you with the skills to use gradient-boosted trees at a professional level. We start with linear baselines — not because they win, but because they serve as an indispensable diagnostic tool and a performance floor you must beat to justify model complexity. From there, we cover the mechanics of gradient boosting, the hyperparameters that actually matter (and the fifty you can ignore), and how to use SHAP for reliable feature importance. The chapter closes with two areas where naive modeling fails catastrophically: monotonic constraints for domain-consistent predictions and imbalanced data strategies that go beyond the usual SMOTE advice into class weights, threshold tuning, and proper evaluation with precision-recall metrics.
Read Chapter
1. Linear Baselines and Gradient Boosted Trees
  15 min read
2. Monotonic Constraints and Imbalanced Data
  14 min read
6

Deep Learning Where It Counts

Deep learning is the most overapplied tool in modern data science. On...

7 min read
Deep learning is the most overapplied tool in modern data science. On tabular data, gradient-boosted trees win — the evidence from Chapter 5 is unambiguous. But the moment your data is images, text, audio, sequences, or any combination of modalities, trees are not a contender. Deep learning is not optional — it is the only viable approach. This chapter draws a clean decision boundary: when DL genuinely earns its complexity cost, and when you are paying GPU bills for no measurable gain. We cover PyTorch from the ground up (the 20% you use 80% of the time), confront the narrow cases where tabular deep learning edges out trees, build transfer learning pipelines that leverage billions of parameters you did not have to train, and close with the hardware realities that determine whether your model ships or sits in a notebook.
Read Chapter
1. PyTorch Fundamentals and Tabular Deep Learning
  11 min read
2. Transfer Learning and Hardware Realities
  14 min read
7

Applied NLP and the LLM Ecosystem

NLP in 2025 is unrecognizable from NLP in 2020. The central skill...

5 min read
NLP in 2025 is unrecognizable from NLP in 2020. The central skill is no longer training models — it is orchestrating them. Prompting, retrieval, evaluation, and selective fine-tuning have replaced feature engineering and model selection as the primary work. This chapter lays out the three strategic options — prompt engineering, retrieval-augmented generation, and parameter-efficient fine-tuning — and provides a concrete decision framework based on the cost/quality/latency trade-off triangle. You will build a complete RAG pipeline, evaluate generative output with methods that actually work, and fine-tune a model with LoRA when prompting and retrieval are not enough.
Read Chapter
1. RAG Pipelines and Vector Databases
  11 min read
2. Evaluating LLMs and Parameter-Efficient Fine-Tuning
  14 min read
8

Why Your High AUC is Lying to You

A 0.97 AUC means nothing if your model sends the business backward....

10 min read
A 0.97 AUC means nothing if your model sends the business backward. AUC is a rank-ordering metric that says nothing about probability estimates, nothing about the cost asymmetry of errors, and nothing about whether your offline evaluation reflects reality. This chapter dismantles the default evaluation workflow — train, call model.score(), celebrate — and replaces it with four pillars that survive contact with production: metric selection driven by business cost matrices, cross-validation strategies that respect the structure in your data, probability calibration so your model's confidence means something, and statistical testing so you know whether Model B actually beats Model A or you are chasing noise.
Read Chapter
1. Metrics That Matter and Cross-Validation Done Right
  11 min read
2. Calibration and A/B Testing
  13 min read
9

An Introduction to Causal Inference

Your model predicts that customers who receive a discount are more likely...

9 min read
Your model predicts that customers who receive a discount are more likely to buy — but is the discount causing the purchase, or are you targeting people who were already going to buy? This distinction is the core problem of causal inference, and ignoring it means burning budget on interventions that look effective in your dashboard but produce zero incremental value. This chapter introduces the potential outcomes framework, explains why randomized experiments are the gold standard and why you often cannot run them, and surveys the observational methods — propensity score matching, inverse probability weighting, difference-in-differences, synthetic control, and uplift modeling — that let you estimate causal effects from the messy, non-random data you actually have. Each method comes with assumptions you must verify and failure modes you must anticipate.
Read Chapter
1. Confounders and Observational Methods
  12 min read
2. Difference-in-Differences and Uplift Modeling
  14 min read
10

Deploying Models as Services

Your model performs well offline — on your laptop, with your Python...

5 min read
Your model performs well offline — on your laptop, with your Python version, with your exact package versions. Congratulations: you have completed roughly 30% of the project. The remaining 70% is getting that model into production, which means solving four problems in sequence. First, packaging: serializing the model so it loads reliably across environments without introducing security vulnerabilities. Second, API construction: wrapping the model in a service that validates inputs, handles errors gracefully, and processes concurrent requests without exhausting memory. Third, containerization: bundling everything — code, dependencies, model weights — into an image that deploys identically on any host. Fourth, infrastructure selection: choosing between serverless, containers, and managed platforms based on your actual latency, throughput, and cost requirements rather than what a blog post recommended.
Read Chapter
1. Model Packaging and Inference APIs
  14 min read
2. Containerization and Infrastructure
  12 min read
11

Monitoring and Continual Learning

Deploying a model is not the finish line — it is the...

5 min read
Deploying a model is not the finish line — it is the starting line. Every model degrades over time as the world changes around it. User behavior evolves, market conditions shift, upstream data pipelines break silently. This chapter builds the infrastructure that keeps models healthy after deployment: artifact tracking with MLflow so you know exactly which model is in production and why, drift detection that catches distribution shifts before they corrupt predictions, feedback loops that close the gap between prediction and ground truth, and safe deployment strategies — shadow mode, canary releases, automatic rollback — that let you ship new models without risking production traffic. The complete lifecycle is: train, validate, register, shadow, canary, promote, monitor, retrain. Every step has failure modes, and this chapter confronts each one.
Read Chapter
1. Model Registries and Drift Detection
  10 min read
2. Feedback Loops and Safe Deployment
  15 min read

About This Book

Table of Contents