Skip to main content
Pragmatic-Practitioner

Pragmatic Data Science with Python

Pragmatic Data Science with Python

Skip the toy datasets and tutorial-grade code. This book equips you with the modern Python data science stack — the tools, patterns, and hard-won lessons that separate production ML systems from Kaggle notebooks.

From high-performance data manipulation with Polars and DuckDB, through robust feature engineering and gradient-boosted models, to production deployment with FastAPI and Docker — every chapter confronts the real failure modes you will encounter and shows you how to handle them.

What You Will Learn

  • Production-grade environments: Dependency management with uv, type-checked data validation with Pydantic, and structured ML repositories
  • High-performance data manipulation: Polars DataFrames, DuckDB analytical SQL, and out-of-core processing
  • Robust feature engineering: Handling missing data mechanisms, target leakage, distribution shifts, and high-cardinality encodings
  • Pragmatic modeling: XGBoost/LightGBM tuning, monotonic constraints, imbalanced data strategies, and deep learning where it actually helps
  • Applied NLP and LLMs: RAG pipelines, vector databases, evaluation of generative output, and parameter-efficient fine-tuning
  • Honest evaluation: Calibration, proper cross-validation, A/B testing, and causal inference
  • Production deployment: ONNX packaging, FastAPI inference APIs, Docker containerization, drift detection, and retraining triggers

Chapter Overview

Chapter Topic Key Concepts
1 Setting Up for Production uv, Pydantic, ML repo structure, DVC
2 High-Performance Data Polars, DuckDB, lazy evaluation, out-of-core
3 Failure Modes of Real Data Missing data, target leakage, outliers, distribution shift
4 Feature Engineering Target encoding, time-series traps, embeddings, PCA/UMAP
5 Gradient Boosted Trees XGBoost, LightGBM, monotonic constraints, imbalanced data
6 Deep Learning Where It Counts PyTorch, tabular DL, transfer learning, GPU realities
7 Applied NLP & LLMs RAG, vector DBs, LLM evaluation, LoRA fine-tuning
8 Evaluation Beyond Accuracy Calibration, CV strategies, A/B testing
9 Causal Inference Propensity scores, DiD, synthetic control, uplift
10 Deploying Models ONNX, FastAPI, Docker, serverless
11 Monitoring & Continual Learning MLflow, drift detection, feedback loops, canary releases
11 Chapters
5h 25m total
64,975 words
Start Reading

About This Book

Voice Pragmatic-Practitioner
Tone Direct, opinionated, and battle-tested
Categories
Analytical Definitional Argumentative Narrative

Table of Contents