In September 2024, Slim CD disclosed that cybercriminals had accessed their payment processing systems for ten months. The breach exposed credit card numbers, expiration dates, names, and addresses for 1.7 million people. Slim CD handles electronic payments for merchants across the US and Canada. The attackers had access from August 2023 to June 2024, though the company claims the window for actual data access was narrower.
A month later, Oregon Zoo warned 117,815 customers that their payment card information had been compromised through a third-party vendor processing online ticket purchases. The attackers had redirected customer transactions without the zoo's knowledge.
These breaches share a common pattern: payment card data existed in systems and documents where organizations didn't expect it or couldn't protect it adequately. Card numbers appear not just in payment processing systems but in invoices, receipts, support tickets, contract files, and email attachments throughout organizations. Finding that data before attackers do requires systematic detection.
The short version: If you need to redact sensitive documents before they reach AI systems, PaperVeil handles that layer. The rest of this article explains where it fits in the broader governance architecture.
What Counts as Payment Card Data
PCI-DSS defines specific data elements that require protection:
Primary Account Numbers (PANs)
The 15-19 digit number on the front of payment cards. This is the core identifier. Any document containing a full PAN triggers PCI-DSS requirements for how that document must be stored, transmitted, and accessed.
Different card networks use different patterns:
- Visa: Starts with 4, typically 16 digits
- Mastercard: Starts with 51-55 or 2221-2720, 16 digits
- American Express: Starts with 34 or 37, 15 digits
- Discover: Starts with 6011, 644-649, or 65, 16 digits
- JCB: Starts with 3528-3589, 16 digits
Cardholder Name
The name embossed or printed on the card. While less sensitive than the PAN, cardholder names combined with other data elements create exposure.
Expiration Date
The month and year the card expires. Appears in various formats (MM/YY, MM/YYYY, Month Year) across documents.
Service Code
Three or four digit codes on the magnetic stripe that specify card acceptance requirements. Rarely appears in documents but must be detected when present.
Sensitive Authentication Data
This category includes elements that must never be stored after authorization:
CVV/CVC/CVV2: The 3-4 digit security code. Should never appear in stored documents, but frequently does in support tickets, order forms, and email correspondence where customers provide card details manually.
PIN data: Personal identification numbers for debit transactions. Also should never be stored but occasionally appears in customer communications or manual records.
Full magnetic stripe data: The complete track data from the card's magnetic stripe. Extremely sensitive.
Where Card Data Hides
Beyond obvious payment systems, card data accumulates across organizations:
Customer support systems. Customers email, chat, or submit tickets with their card numbers when disputing charges or requesting refunds. Those numbers persist in ticket archives, chat logs, and email systems.
Order documentation. Manual orders, phone orders, and fax orders often capture full card numbers on paper or in attached files that get scanned and stored.
Accounting records. Invoices, receipts, and financial documents may contain partial or full card numbers for record-keeping purposes.
Contracts and agreements. Payment terms sections sometimes include card details for recurring charges.
Email attachments. Customers send card information through attachments, screenshots, or typed into email bodies.
Backup systems. Even if you remove card data from production systems, it may persist in backups, archives, and disaster recovery environments.
How PCI Detection Works
Effective detection combines multiple techniques because card data appears in varied formats and contexts:
Pattern Matching with Card Network Rules
Each card network has specific patterns for valid card numbers:
IIN/BIN ranges: The first six digits identify the issuing bank and card network. Pattern matching starts by identifying numbers that begin with valid IIN ranges.
Length validation: Different card types have specific length requirements. A number starting with 4 that's 15 digits long isn't a valid Visa card.
Format variations: Card numbers appear with spaces (4111 1111 1111 1111), dashes (4111-1111-1111-1111), or continuous digits (4111111111111111). Detection must handle all variations.
Luhn Algorithm Validation
The Luhn algorithm (also called mod-10) validates that a number could be a real card number. It catches typos, transposition errors, and random number sequences that happen to match card patterns.
The algorithm works by:
- Starting from the rightmost digit, doubling every second digit
- If doubling produces a number over 9, subtracting 9
- Summing all digits
- Valid numbers produce a sum divisible by 10
Luhn validation dramatically reduces false positives. A 16-digit number matching Visa patterns has roughly a 10% chance of passing Luhn validation randomly. Combined with IIN validation, false positive rates drop further.
Context Analysis
Not every valid card number is a real card number in active use. Context helps distinguish:
Surrounding text. Numbers near words like "card," "payment," "VISA," "credit" are more likely to be card data.
Document type. Support tickets, order forms, and payment documents are high-probability locations.
Format indicators. "Exp:" or "CVV:" near number sequences suggest card data.
Test numbers. Known test card numbers (4111 1111 1111 1111 for Visa testing) should be flagged differently than potential real cards.
OCR for Image and PDF Content
Card numbers appear in scanned documents, screenshots, and images:
Scanned receipts. Paper receipts scanned for record-keeping may show full or partial card numbers.
Screenshots. Customer support interactions often include screenshots of payment screens or card images.
Embedded images. PDFs and documents may contain images with card information that text extraction misses.
Detection pipelines must apply OCR before pattern matching to catch card data in visual formats.
Building a PCI Detection Pipeline
Organizations need systematic detection across document types and storage locations:
Stage 1: Document Ingestion
Collect documents from all potential sources:
- Email systems: Attachments and body text from customer communications
- Ticket systems: Support tickets, chat logs, case files
- File shares: Scanned documents, order files, accounting records
- Cloud storage: Documents in SharePoint, Google Drive, Box, Dropbox
- Databases: Text fields, BLOB content, attached documents
Each source requires appropriate connectors and extraction methods. Email attachments need MIME parsing. PDFs need text extraction. Images need OCR.
Stage 2: Text Extraction and Normalization
Prepare content for analysis:
- PDF processing: Extract text layers and apply OCR to image-only pages
- Office documents: Extract text from Word, Excel, PowerPoint including headers, footers, and embedded content
- Image OCR: Convert images to searchable text
- Encoding normalization: Handle character encoding variations
- Whitespace handling: Standardize spacing for consistent pattern matching
Quality of text extraction directly affects detection accuracy. Poor OCR produces garbled text that hides card numbers.
Stage 3: Detection Execution
Apply detection methods:
Pattern matching. Scan for sequences matching card network patterns. Handle format variations (spaces, dashes, no separators).
Luhn validation. Verify candidates pass the Luhn checksum.
IIN validation. Confirm the first six digits match valid issuer identification numbers.
Context scoring. Weight findings based on surrounding text and document type.
Run detection comprehensively. A single missed card number creates PCI-DSS scope for the entire document and potentially the entire system containing it.
Stage 4: Classification and Scoring
Categorize findings:
- Card type (Visa, Mastercard, Amex, etc.)
- Confidence level (high for Luhn-valid numbers with context, lower for pattern-only matches)
- Location (document, page, position for audit purposes)
- Sensitivity (full PAN vs. truncated, presence of CVV)
Classification determines appropriate response. A high-confidence full PAN requires different handling than a truncated number or possible false positive.
Stage 5: Action
Based on findings:
Immediate remediation. Remove or redact card data from documents that shouldn't contain it.
Quarantine. Hold documents with card data for review before further processing or sharing.
Alert. Notify appropriate personnel when card data appears in unexpected locations.
Logging. Maintain audit trails of detection activity and remediation actions for PCI-DSS compliance documentation.
Optimizing Detection Accuracy
PCI detection must balance catching all card data against avoiding excessive false positives:
Reducing False Negatives
Missed card numbers create the most serious exposure:
Handle all formats. Card numbers appear with spaces, dashes, dots, or no separators. Some documents use unusual formatting like parentheses around groups.
Process all content types. Embedded images, headers, footers, comments, and metadata all can contain card data.
Check archives and backups. Historical documents may predate security policies and contain unprotected card data.
Scan attachments recursively. Emails containing zipped files containing PDFs containing scanned images of receipts. Each layer must be processed.
Reducing False Positives
Excessive false positives waste review time and create alert fatigue:
Validate with Luhn. This single check eliminates most random number sequences.
Verify IIN ranges. Numbers starting with invalid ranges aren't payment cards.
Exclude known test numbers. Published test card numbers appear in documentation and training materials.
Consider context. Numbers in contexts clearly unrelated to payments (serial numbers, identifiers) can be deprioritized.
Tune for your environment. If your organization uses 16-digit internal identifiers, add exclusion patterns.
Handling Truncation and Masking
PCI-DSS allows storage of truncated PANs showing maximum first six and last four digits. Detection must distinguish:
Full PANs require immediate remediation and full PCI-DSS controls.
Properly truncated numbers (e.g., 411111XXXXXX1111) may be acceptable depending on use case.
Insufficient masking (e.g., 4111-1111-1111-XXXX showing 12 digits) still creates exposure.
PCI-DSS Compliance Context
Detection supports but doesn't guarantee compliance. Understanding requirements helps prioritize:
PCI-DSS 4.0 Requirements
As of April 2025, PCI-DSS 4.0.1 is mandatory. Key requirements affecting document handling:
Requirement 3: Protect stored account data. Card data must be rendered unreadable anywhere it's stored. Detection identifies where that data exists.
Requirement 7: Restrict access by business need to know. Only personnel with legitimate need should access card data. Detection shows what documents contain card data for access control purposes.
Requirement 10: Log and monitor all access. Detection activity itself generates audit logs demonstrating due diligence.
Requirement 12: Support information security with policies. Detection enables policy enforcement by identifying violations.
Scope Implications
Any system containing card data falls within PCI-DSS scope. This includes:
- The document itself
- The system storing the document
- Systems that can access that system
- Network segments connected to those systems
Uncontrolled card data in documents expands compliance scope dramatically. Detection enables scope reduction by identifying and remediating card data before it spreads.
Annual Assessments
Organizations must annually define and document PCI-DSS scope. Detection provides the foundation for accurate scope definition by showing exactly where card data exists.
Enterprise Integration
Detection must integrate into existing workflows to provide continuous protection:
Payment Processing Workflows
Integrate detection at key points in payment workflows:
Before archiving. Scan documents before they enter long-term storage. Card data in archives creates persistent exposure.
During AI processing. Any document going to AI tools must be scanned and card data removed. AI systems ingest and potentially retain content you provide.
Before external sharing. Documents leaving your organization should be verified free of card data unless sharing card data is the explicit purpose.
Customer Communication Channels
Support systems are primary vectors for card data accumulation:
Email scanning. Inbound customer emails frequently contain card numbers. Detect and quarantine before support agents see them.
Chat monitoring. Live chat logs should trigger alerts when card patterns appear.
Ticket processing. Support ticket systems should automatically redact detected card numbers while preserving ticket context.
Incident Response Integration
Detection supports breach response:
Scope assessment. When incidents occur, detection shows exactly which documents contain card data that may have been exposed.
Notification decisions. Knowing which cards were in compromised systems determines notification requirements.
Forensic support. Detection logs provide evidence of what protections existed and when.
The Detection Imperative
Only 27.9 percent of organizations are fully PCI-DSS compliant according to Verizon's Payment Security Report. The organizations suffering breaches like Slim CD and Oregon Zoo likely believed they had adequate controls. They discovered otherwise when attackers found card data in locations those controls didn't reach.
Card data accumulates in documents through normal business operations: customers sending card numbers through support channels, staff creating order documentation, systems generating receipts and invoices. Without systematic detection, this accumulation goes unnoticed until a breach reveals it.
Modern detection combines pattern matching, Luhn validation, IIN verification, and context analysis to find card data across document types and formats. The technology works. The question is whether you've deployed it comprehensively across your document workflows, storage systems, and communication channels.
The organizations that find card data before attackers do avoid becoming the next breach headline.
PaperVeil combines pattern matching, Luhn validation, and context analysis to find payment card data in your documents. Automatic detection and redaction in a simple drag-and-drop interface. Audit trails that document what was found and how it was handled. The detection layer that finds card numbers before PCI violations occur.