Pragmatic-Practitioner
Pragmatic Data Science with Python
Pragmatic Data Science with Python
Skip the toy datasets and tutorial-grade code. This book equips you with the modern Python data science stack — the tools, patterns, and hard-won lessons that separate production ML systems from Kaggle notebooks.
From high-performance data manipulation with Polars and DuckDB, through robust feature engineering and gradient-boosted models, to production deployment with FastAPI and Docker — every chapter confronts the real failure modes you will encounter and shows you how to handle them.
What You Will Learn
- Production-grade environments: Dependency management with
uv, type-checked data validation with Pydantic, and structured ML repositories - High-performance data manipulation: Polars DataFrames, DuckDB analytical SQL, and out-of-core processing
- Robust feature engineering: Handling missing data mechanisms, target leakage, distribution shifts, and high-cardinality encodings
- Pragmatic modeling: XGBoost/LightGBM tuning, monotonic constraints, imbalanced data strategies, and deep learning where it actually helps
- Applied NLP and LLMs: RAG pipelines, vector databases, evaluation of generative output, and parameter-efficient fine-tuning
- Honest evaluation: Calibration, proper cross-validation, A/B testing, and causal inference
- Production deployment: ONNX packaging, FastAPI inference APIs, Docker containerization, drift detection, and retraining triggers
Chapter Overview
| Chapter | Topic | Key Concepts |
|---|---|---|
| 1 | Setting Up for Production | uv, Pydantic, ML repo structure, DVC |
| 2 | High-Performance Data | Polars, DuckDB, lazy evaluation, out-of-core |
| 3 | Failure Modes of Real Data | Missing data, target leakage, outliers, distribution shift |
| 4 | Feature Engineering | Target encoding, time-series traps, embeddings, PCA/UMAP |
| 5 | Gradient Boosted Trees | XGBoost, LightGBM, monotonic constraints, imbalanced data |
| 6 | Deep Learning Where It Counts | PyTorch, tabular DL, transfer learning, GPU realities |
| 7 | Applied NLP & LLMs | RAG, vector DBs, LLM evaluation, LoRA fine-tuning |
| 8 | Evaluation Beyond Accuracy | Calibration, CV strategies, A/B testing |
| 9 | Causal Inference | Propensity scores, DiD, synthetic control, uplift |
| 10 | Deploying Models | ONNX, FastAPI, Docker, serverless |
| 11 | Monitoring & Continual Learning | MLflow, drift detection, feedback loops, canary releases |
11 Chapters
5h 25m total
64,975 words
About This Book
Voice Pragmatic-Practitioner
Tone Direct, opinionated, and battle-tested
Categories
Analytical Definitional Argumentative Narrative