In late 2020, Nitro PDF suffered a security incident that exposed customer data. The cloud-based PDF service processed documents for thousands of organizations. When attackers breached their systems, they gained access to documents that customers had assumed were processed and discarded.
The incident highlighted a fundamental tension with online document tools. Convenience requires uploading files to someone else's servers. Once uploaded, you no longer control what happens to that data.
For redaction specifically, this creates an ironic situation. You're trying to protect sensitive information by uploading it, unredacted, to a third-party service. The tool removes the sensitive data from the document, but the service now has a copy of the original, unredacted file. The information you wanted to protect has been transmitted to and processed by a company whose security practices you may not have evaluated.
The short version: If you need to redact sensitive documents before they reach AI systems, PaperVeil handles that layer. The rest of this article explains where it fits in the broader governance architecture.
How Online Redaction Tools Work
When you use an online PDF redaction tool like Smallpdf, iLovePDF, or similar services, your document goes through a predictable process:
-
Upload. Your PDF transmits from your device to the service's servers, typically hosted in cloud infrastructure like AWS or Google Cloud.
-
Processing. The service's software processes your file. For redaction, this means parsing the PDF structure and either removing content or overlaying it with visual masks.
-
Download. You receive a processed file. The redacted version downloads to your device.
-
Retention. The service retains your uploaded file for some period. This varies by provider from one hour to indefinitely.
At each stage, your unredacted document exists on infrastructure you don't control. The service's security practices, employee access policies, and data retention decisions determine what happens to your sensitive information.
What Happens to Your Data
Major online PDF services publish their data handling practices. Understanding these reveals the exposure you accept when using them.
Smallpdf states that files are permanently removed from their servers after one hour of processing. They have ISO 27001 certification and comply with GDPR. However, Smallpdf does not operate under a zero-knowledge architecture. During that processing hour, their systems have full access to your document content.
iLovePDF claims automatic deletion within two hours of processing. They encrypt files in transit with HTTPS. However, iLovePDF lacks formal third-party security certification. Security researchers have noted that files processed through iLovePDF sometimes show "iLovePDF" in the document metadata, raising questions about what modifications occur during processing.
Free tools with unclear policies present the highest risk. Many smaller online redaction services provide no information about data retention, server location, or security practices. Your document could be retained indefinitely, accessed by employees, or stored without adequate protection.
The common pattern: your unredacted document exists on third-party servers for some period, processed by software you can't inspect, under security practices you haven't verified.
The Redaction Paradox
Online redaction creates a logical contradiction. You're trying to prevent sensitive information from reaching unauthorized parties. To accomplish this, you send the unredacted document to a third party.
Consider a scenario: You have a contract containing Social Security numbers that you need to share with an external party. You upload it to an online redaction tool. The tool removes the SSNs. You download and share the redacted version.
But during that process, the unredacted document containing the SSNs was transmitted across the internet to a company whose security practices you likely haven't reviewed. For that period, the information you wanted to protect was accessible to that service's infrastructure, employees, and any attackers who might compromise their systems.
The redacted version protects the data from the recipient. But the unredacted version was exposed to the redaction service.
Real Security Risks
Several concrete risks emerge from online redaction:
Visual masking instead of true redaction. Many free online tools perform visual masking, not true redaction. They draw black boxes over text without removing the underlying content. Research has found that 65% of documents claimed to be redacted still exposed hidden information. If you copy text from beneath the visual mask, the sensitive data remains accessible.
Data retention beyond stated policies. Stated retention periods may not reflect actual practice. Backup systems, logging infrastructure, and error recovery processes can retain copies of your documents beyond the advertised deletion window. You have no way to verify deletion actually occurred.
Third-party subprocessors. Online services typically use cloud infrastructure providers, content delivery networks, and other subprocessors. Your document may traverse multiple third-party systems during processing. Each represents additional exposure.
Employee access. Service employees may have access to uploaded documents for debugging, quality assurance, or support purposes. Privacy policies may permit this access even when you'd prefer your documents remain unexamined.
Jurisdiction and legal access. Services operating in different jurisdictions face different legal requirements for data disclosure. Your documents may be subject to legal frameworks you haven't considered.
When Online Tools Create Acceptable Risk
Online redaction isn't categorically wrong. The question is whether the exposure matches the sensitivity of your documents.
Acceptable scenarios:
- Public documents where the "sensitive" information is already available elsewhere
- Internal drafts where exposure would be embarrassing but not harmful
- Non-regulated personal documents with low-stakes content
- Documents where the benefit of convenience genuinely outweighs limited risk
Unacceptable scenarios:
- Documents containing PII subject to privacy regulations (GDPR, CCPA)
- Healthcare records with PHI (HIPAA requirements)
- Financial documents subject to compliance frameworks (SOX, GLBA)
- Legal documents with privileged information
- Any document where exposure could create liability or harm
For regulated data, online redaction services create compliance exposure. You've disclosed protected information to a third party. Even if that party deletes it promptly, the disclosure occurred.
Why This Matters for AI Workflows
If you're redacting documents before AI processing, online tools compound the exposure problem.
The workflow becomes: Upload unredacted document to redaction service. Download redacted document. Upload redacted document to AI service.
Your sensitive data now exists on two third-party services. The redaction service has the full unredacted version. The AI service has the redacted version. You've doubled your exposure surface to achieve the protection you wanted.
For AI preparation specifically, local processing becomes essential. The document should never leave your environment until sensitive data has been removed.
The Local Alternative
Desktop redaction tools process documents without transmitting them to external servers:
Adobe Acrobat Pro performs true redaction locally. The document never leaves your machine during processing. The trade-off is cost and software installation requirements.
Preview on Mac includes redaction features that process locally. Effective for occasional use, though limited in features.
PDF-XChange Editor offers local redaction on Windows. The free version includes watermarks on output.
The common advantage: your unredacted document stays on your system. You control access. You verify deletion. No third party receives the sensitive version.
Evaluating Online Services
If you must use online redaction, evaluate services systematically:
Security certification. Look for ISO 27001, SOC 2, or equivalent third-party validation. These indicate audited security practices.
Data retention policy. Understand exactly how long files are retained and whether backups extend that period.
Processing location. Know where servers are located and what legal jurisdiction applies.
True redaction vs. masking. Verify the service performs actual content removal, not visual overlays.
Encryption. Confirm TLS for transit and encryption at rest during the retention period.
Subprocessor list. Understand what other services handle your data.
Most free tools fail multiple criteria. Even premium services may not meet requirements for regulated data.
The Audit Question
When regulators or auditors ask how you protect sensitive data, "we use an online tool that deletes files after an hour" is a weak answer.
You can't prove deletion occurred. You can't demonstrate the service's security practices. You can't show what controls protected the data during processing.
Local processing creates a different answer: "Sensitive data is processed locally and never transmitted to third parties." This is verifiable, auditable, and defensible.
For compliance-critical workflows, auditability matters as much as actual security.
Making the Decision
Online PDF redaction tools solve a convenience problem. They let you redact documents without installing software or learning complex tools. For low-stakes documents, this convenience has value.
But convenience comes with exposure. Your unredacted document transmits to third-party infrastructure. For that period, you've lost control over sensitive information.
The decision framework is straightforward:
- Could exposure of this document create regulatory liability?
- Could exposure create business harm or competitive damage?
- Would I be comfortable if this document appeared publicly?
- Is there any regulated data (PII, PHI, financial data) in the document?
If any answer is yes, online tools create unacceptable risk. Process locally.
If all answers are no, online tools may offer acceptable convenience. But verify the service performs true redaction rather than visual masking.
PaperVeil processes documents locally before they reach any external service. Detect and remove sensitive data on your infrastructure. No upload of unredacted content. No third-party exposure. The redaction layer that keeps sensitive data where it belongs.