Six SQL Patterns for Scalable Transaction Fraud Detection
These articles are AI-generated summaries. Please check the original sources for full details.
Six SQL patterns I use to catch transaction fraud
Program Integrity Analyst Fixel Smith leverages standard SQL over complex machine learning models to identify high-risk anomalies in movement-of-money logs. One critical signal involves flagging transactions occurring in two distant locations faster than a commercial jet’s 600 mph cruise speed.
Why This Matters
While current industry trends emphasize graph databases and machine learning, the technical reality for program integrity teams is that SQL remains the most efficient tool for identifying fraud shapes. Relying on complex models often increases the iteration loop to weeks, whereas SQL window functions allow analysts to test and deploy new fraud hypotheses in hours. Static thresholds frequently fail in production due to seasonality and merchant size variations; therefore, implementing rolling baselines, such as 168-hour trailing averages, is necessary to minimize false positives and prevent legitimate transaction blocks. Failure to properly handle sentinel values like ‘9999-12-31’ or performing window functions on unfiltered datasets can lead to significant warehouse credit waste and missed signals.
Key Insights
- Impossible travel detection utilizes the Haversine distance function to identify cloned cards used in distant locations within a 600 mph threshold.
- Velocity patterns use sliding windows (1-minute, 5-minute, and 1-hour) to distinguish between rapid card-testing server hits and slower benefit-trafficking rings.
- Amount anomalies target round values like $1.00 for card testing and values just under thresholds, such as $99.99 or $499.99, to evade ID checks or ATM caps.
- Suspicious merchant detection requires a 168-hour rolling average to account for weekly seasonality, flagging spikes three times higher than the baseline.
- Off-hours analysis establishes a 90-day behavioral baseline for cardholders, requiring at least two purchases in a specific hour to qualify it as ‘normal’ behavior.
- Window function primitives like LAG and ROW_NUMBER enable chained signals that allow analysts to filter complex fraud rings using simple Boolean expressions.
Working Examples
Basic velocity check using hour buckets and count thresholds.
SELECT cardholder_id, date_trunc('hour', timestamp) AS hour_bucket, count(*) AS tx_count, min(timestamp) AS first_tx, max(timestamp) AS last_tx FROM transactions WHERE timestamp >= current_date - INTERVAL '30 days' GROUP BY 1, 2 HAVING count(*) > 10;
Sliding-window velocity using the QUALIFY clause for high-frequency detection.
SELECT cardholder_id, timestamp, count(*) OVER (PARTITION BY cardholder_id ORDER BY timestamp RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW ) AS tx_in_last_5min FROM transactions QUALIFY tx_in_last_5min >= 5 ORDER BY cardholder_id, timestamp;
Impossible travel logic using Haversine distance and a 600 mph velocity threshold.
WITH ordered_tx AS (SELECT cardholder_id, timestamp, location, LAG(timestamp) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_ts, LAG(location) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_loc FROM transactions) SELECT cardholder_id, prev_ts, timestamp, haversine(prev_loc, location) / nullif(EXTRACT(EPOCH FROM (timestamp - prev_ts)), 0) * 3600 AS mph FROM ordered_tx WHERE prev_ts IS NOT NULL AND haversine(prev_loc, location) / nullif(EXTRACT(EPOCH FROM (timestamp - prev_ts)), 0) * 3600 > 600;
Practical Applications
- Credit card issuers utilize velocity thresholds to block stolen cards being drained; Pitfall: Failing to whitelist high-volume legitimate users like vending machine operators leads to customer friction.
- Public-sector benefit programs use off-hours analysis to flag 3am transactions for users with 9-to-5 habits; Pitfall: Applying this to new accounts without a 90-day history results in unreliable alerts.
- E-commerce platforms identify card-testing rings by monitoring for round dollar amounts like $1.00; Pitfall: Static merchant thresholds fail to account for size, where a Costco naturally processes more volume than a bookshop.
References:
Continue reading
Next article
2026 EOL Roadmap: Managing Security Risks for 50 Critical Products
Related Content
Mastering Advanced SQL for Surgical Business Intelligence
Datta Sable explains how advanced SQL techniques like CTEs and window functions are essential for optimizing BI performance and preventing AI hallucinations.
Vector Sync Patterns: Keeping AI Features Fresh When Your Data Changes
Ricardo Ferreira shares 5 essential Vector Sync Patterns designed to solve the complex, multi-dimensional challenges of vector staleness and synchronization in AI-driven microservices. He explains how to leverage event-driven architecture (Kafka/Flink) and CDC to reliably manage data, application, and business-driven changes for architects and senior developers.
Engineering a Unified Korean Entertainment Database Across 10 Fragmented Sources
Engineer Cara Jung builds a unified database for Korean entertainment, aggregating data from 10 sources including NAVER and KOBIS to solve metadata fragmentation.