Skip to main content
digital payment systems cryptography banking protocols and blockchain internals

Tokenization Engines and PAN Security Architecture

9 min read Chapter 6 of 21

Tokenization Engines and PAN Security Architecture

A Primary Account Number (PAN) — the 16-digit number on your card — is the most valuable piece of data in a payment transaction. It’s the key that unlocks the ability to charge any amount to a cardholder’s account. Every system that stores, processes, or transmits a PAN falls under PCI DSS scope, requiring expensive annual audits, network segmentation, encryption, access controls, and logging.

Tokenization replaces the PAN with a surrogate value (a token) that is useless outside the specific context it was issued for. The token looks like a card number, passes Luhn validation, and flows through existing payment infrastructure — but it cannot be used to initiate a payment without the Token Service Provider’s (TSP’s) de-tokenization.

Tokenization Architecture

Token Vault Architecture

The token vault is the core of any tokenization system: a database that maps PANs to tokens. Its design determines the security, performance, and reliability of the entire tokenization layer.

Design Requirements

  1. Bijective mapping: Each PAN maps to exactly one token per domain (merchant + channel). Same PAN at the same merchant always returns the same token (for subscription billing).

  2. Format preservation: Tokens must be the same length as PANs, pass Luhn check, and have the correct BIN prefix (so routing infrastructure still works).

  3. Non-reversible without vault access: Given a token, you cannot derive the PAN mathematically. This distinguishes tokenization from encryption.

  4. High availability: The vault is in the critical payment path. If it’s down, no transactions process. Target: 99.999% uptime.

import hashlib
import secrets
from typing import Optional

class TokenVault:
    """
    Payment token vault with domain-restricted token generation.
    
    Production implementations use HSM-backed encryption for PAN
    storage and deterministic token generation for idempotency.
    """
    
    def __init__(self, encryption_key: bytes, db_connection):
        self._key = encryption_key
        self._db = db_connection
    
    def tokenize(
        self,
        pan: str,
        merchant_id: str,
        channel: str = "ecommerce"
    ) -> str:
        """
        Generate or retrieve a token for a PAN within a specific domain.
        
        Domain = merchant_id + channel. The same PAN tokenized for
        different merchants produces different tokens, preventing
        cross-merchant correlation.
        """
        domain = f"{merchant_id}:{channel}"
        
        # Check if token already exists for this PAN + domain
        existing = self._lookup_by_pan(pan, domain)
        if existing:
            return existing
        
        # Generate a new format-preserving token
        token = self._generate_format_preserving_token(pan)
        
        # Store the mapping (PAN is encrypted at rest)
        encrypted_pan = self._encrypt_pan(pan)
        self._store_mapping(encrypted_pan, token, domain)
        
        return token
    
    def detokenize(self, token: str, merchant_id: str, channel: str) -> str:
        """
        Retrieve the original PAN from a token.
        
        Only the TSP can perform this operation. The merchant never
        sees the PAN after initial tokenization.
        
        Domain restriction: a token issued for Merchant A cannot be
        detokenized by Merchant B, even if both call the TSP.
        """
        domain = f"{merchant_id}:{channel}"
        encrypted_pan = self._lookup_by_token(token, domain)
        
        if not encrypted_pan:
            raise ValueError("Token not found or domain mismatch")
        
        return self._decrypt_pan(encrypted_pan)
    
    def _generate_format_preserving_token(self, pan: str) -> str:
        """
        Generate a token that:
        1. Preserves the BIN (first 6 digits) for routing
        2. Preserves the last 4 digits (for display: **** **** **** 0234)
        3. Replaces middle digits with random values
        4. Passes Luhn validation
        """
        bin_prefix = pan[:6]
        last_four = pan[-4:]
        
        # Generate random middle digits
        middle_length = len(pan) - 10  # Subtract BIN (6) + last 4
        middle = ''.join([str(secrets.randbelow(10)) for _ in range(middle_length)])
        
        # Construct token without check digit
        token_partial = bin_prefix + middle + last_four[:-1]
        
        # Calculate Luhn check digit
        check = self._luhn_check_digit(token_partial)
        token = bin_prefix + middle + last_four[:-1] + str(check)
        
        # Verify uniqueness (collision is extremely unlikely but must be handled)
        if self._token_exists(token):
            return self._generate_format_preserving_token(pan)
        
        return token
    
    @staticmethod
    def _luhn_check_digit(partial: str) -> int:
        """Calculate the Luhn check digit for a partial card number."""
        digits = [int(d) for d in partial]
        odd_sum = sum(digits[-1::-2])
        even_sum = sum(sum(divmod(2 * d, 10)) for d in digits[-2::-2])
        total = odd_sum + even_sum
        return (10 - (total % 10)) % 10
    
    def _encrypt_pan(self, pan: str) -> bytes:
        """Encrypt PAN for storage using AES-256-GCM."""
        from cryptography.hazmat.primitives.ciphers.aead import AESGCM
        import os
        nonce = os.urandom(12)
        aesgcm = AESGCM(self._key)
        ct = aesgcm.encrypt(nonce, pan.encode(), b"pan-storage")
        return nonce + ct
    
    def _decrypt_pan(self, encrypted: bytes) -> str:
        from cryptography.hazmat.primitives.ciphers.aead import AESGCM
        nonce, ct = encrypted[:12], encrypted[12:]
        aesgcm = AESGCM(self._key)
        return aesgcm.decrypt(nonce, ct, b"pan-storage").decode()
    
    def _lookup_by_pan(self, pan: str, domain: str) -> Optional[str]:
        """Look up an existing token by PAN hash + domain."""
        pan_hash = hashlib.sha256(
            (pan + domain).encode()
        ).hexdigest()
        # Query DB by pan_hash (not the PAN itself)
        return None  # Placeholder
    
    def _lookup_by_token(self, token: str, domain: str) -> Optional[bytes]:
        """Look up encrypted PAN by token + domain."""
        return None  # Placeholder
    
    def _store_mapping(self, encrypted_pan: bytes, token: str, domain: str):
        """Store the PAN-token mapping."""
        pass  # Placeholder
    
    def _token_exists(self, token: str) -> bool:
        """Check if a token is already in use."""
        return False  # Placeholder

Format-Preserving Encryption (FPE)

An alternative to vault-based tokenization is Format-Preserving Encryption (FPE), which uses a cryptographic algorithm to transform a PAN into a token of the same format. Unlike vault-based tokens, FPE tokens can be de-tokenized using the encryption key alone — no database lookup required.

The NIST-approved FPE modes are FF1 and FF3-1 (specified in NIST SP 800-38G):

# FF1 mode conceptual implementation
# In production, use a vetted library (pyffx, or HSM-native FPE)

def ff1_encrypt(key: bytes, tweak: bytes, plaintext: str, radix: int = 10) -> str:
    """
    FF1 Format-Preserving Encryption (NIST SP 800-38G).
    
    Encrypts a string of digits into another string of the same
    length using the same alphabet (digits 0-9).
    
    This is a Feistel network where the round function uses AES:
    - Split input into left (A) and right (B) halves
    - 10 rounds of: A, B = B, A + F(B, round, tweak)
    - F uses AES-CBC to generate pseudorandom values in the right range
    
    Parameters:
        key: AES key (128, 192, or 256 bits)
        tweak: Additional input that restricts the token domain
               (use merchant_id + channel as tweak for domain restriction)
        plaintext: The PAN digits to encrypt
        radix: The alphabet size (10 for decimal digits)
    """
    n = len(plaintext)
    u = n // 2
    v = n - u
    
    # Convert string to integer representation
    A = int(plaintext[:u])
    B = int(plaintext[u:])
    
    for round_num in range(10):
        # Build the round input
        # P = [version || method || round || ... || tweak || B]
        # Q = derived from tweak + round + B
        
        # Round function F uses AES-CBC-MAC
        # F output is a number in [0, radix^m) where m = len(half)
        F_output = _ff1_round_function(key, tweak, round_num, B, radix, v)
        
        # Feistel step
        C = (A + F_output) % (radix ** u)
        A = B
        B = C
        
        # Swap u and v for odd rounds
        u, v = v, u
    
    # Reconstruct the ciphertext
    return str(A).zfill(n // 2) + str(B).zfill(n - n // 2)

When to Use FPE vs Vault

AspectVault-BasedFPE
De-tokenizationRequires DB lookupKey-based (no DB)
Token uniquenessGuaranteed by DB constraintGuaranteed by encryption bijectivity
ScalabilityLimited by DB throughputLimited by crypto throughput
Key compromise impactAttacker needs both key AND DBAttacker with key can de-tokenize all
PCI DSSVault is in scopeKey management is in scope
Offline de-tokenizationNot possiblePossible with key

Most production payment systems use vault-based tokenization because key compromise in FPE is catastrophic — one key exposes every token ever generated. Vault-based systems offer defense in depth: even with the encryption key, the attacker still needs access to the vault database.

Network Tokenization

Visa Token Service (VTS) and Mastercard Digital Enablement Service (MDES) operate at the network level. They replace the PAN before it reaches the merchant, so the merchant never handles real card numbers:

Traditional flow:
  Cardholder → PAN → Merchant → PAN → Acquirer → PAN → Network → PAN → Issuer

Network-tokenized flow:
  Cardholder → PAN → [TSP] → Token → Merchant → Token → Acquirer → Token → Network
  Network de-tokenizes internally → PAN → Issuer

Network tokens include domain restrictions: a token issued for Amazon’s e-commerce channel cannot be used at a physical terminal, and vice versa. The TSP enforces these restrictions during de-tokenization.

The shift toward network tokenization is driven by a compelling economic incentive: Visa and Mastercard report that network-tokenized transactions have 26% lower fraud rates than non-tokenized transactions. This translates to lower interchange rates for merchants — a direct financial reward for adopting tokenization.

@dataclass
class NetworkToken:
    """
    Representation of a network-issued payment token.
    """
    token_pan: str              # The token value (looks like a PAN)
    token_expiry: str           # Token-specific expiry (may differ from card)
    token_requestor_id: str     # Identifies who requested the token
    token_reference_id: str     # TSP's internal reference
    
    # Domain restrictions
    allowed_merchant_ids: list[str]
    allowed_channels: list[str]   # "ecommerce", "contactless", "in_app"
    allowed_countries: list[str]  # ISO 3166-1 alpha-2
    
    # Lifecycle
    status: str                   # ACTIVE, SUSPENDED, DELETED
    
    # Cryptogram support
    supports_dynamic_cryptogram: bool  # True for mobile wallets
    
    def generate_payment_cryptogram(self, transaction_data: bytes) -> bytes:
        """
        Generate a per-transaction cryptogram for this token.
        
        Mobile wallets (Apple Pay, Google Pay) generate a unique
        cryptogram for each transaction, making the token useless
        even if intercepted — the cryptogram can't be replayed.
        """
        if not self.supports_dynamic_cryptogram:
            raise ValueError("This token does not support dynamic cryptograms")
        
        # In production, this is computed inside the device's
        # Secure Element using keys provisioned by the TSP
        import hmac
        import hashlib
        # Simplified — actual cryptogram uses EMV session key derivation
        return hmac.new(
            b"token-session-key",  # Would be derived per-transaction
            transaction_data,
            hashlib.sha256
        ).digest()[:8]

Tokenization doesn’t eliminate risk — it moves and concentrates it. The token vault becomes the highest-value target in the system. But concentrating risk is the point: you’d rather defend one hardened vault with HSMs, FIPS 140-2 Level 3 hardware, and 24/7 monitoring than try to secure PANs across every merchant’s system worldwide.