You are about to forward a support transcript to a colleague. Or drop a chunk of a production log into Slack. Or paste a CSV excerpt into ChatGPT to get it summarized. The content is useful — but it contains real names, real email addresses, maybe a phone number, maybe a card number that shows up in an order confirmation line. The second that document lands somewhere it should not, you have a problem.
Redacting data sounds simple. In practice, almost everyone gets it wrong at least once. They use a black highlighter in a PDF that can be selected around. They swap out a name in the first paragraph and miss the same name in paragraph four. They pseudonymize a field that still traces back to the original person through context. This guide is the practical workflow that catches all of that — what PII really is, the three common failure modes, the right order of operations, and the tools that do it in your browser.
Every tool referenced here runs entirely client-side. Nothing you paste leaves your device, which is the only sensible way to handle a document you are cleaning precisely because it is sensitive.
What counts as PII (and what you probably missed)
Personally Identifiable Information is broader than most people think. The classic list — names, emails, phone numbers, addresses, dates of birth, Social Security numbers, credit card numbers, account numbers — is the starting point, not the ceiling.
The rest of it matters just as much:
- IP addresses (both IPv4 and IPv6 — a GDPR regulator considers these personal data).
- Device IDs, session IDs, and persistent cookies.
- Exact geolocation coordinates.
- Biometric hashes — even one-way hashed fingerprints or face embeddings.
- Employee IDs, student IDs, customer numbers — anything that maps back to one person in some system.
- Timestamps of known individual actions (e.g. "logged in at 09:03:17 from the office network").
Then there are the "quasi-identifiers". These look harmless on their own but re-identify a person when combined. A zip code plus a birth year plus a gender identifies roughly 87% of US residents uniquely — Latanya Sweeney demonstrated this in the 1990s and it has not gotten harder since. A job title plus a company plus a city often points to a single employee. "Remote SRE at small Austin fintech" is not anonymous.
When you redact, you are not just deleting obvious identifiers. You are deleting anything that, combined with other fields in the document, can point back to an individual.
The three ways redaction fails
Every data leak from a "redacted" document I have seen falls into one of these three categories. Avoid all three and you are 95% of the way there.
1. Black highlighter on a digital document
The most famous failure mode. Someone opens a PDF, drags a black rectangle over the sensitive text, saves, and sends it. The recipient opens the PDF, selects the blacked-out region, copy-pastes the original text straight out of it. The visual overlay never touched the underlying text stream. Court documents, government reports, and legal filings have all leaked this way — it keeps happening because drawing a black box *feels* like deleting something.
2. Replacing text but leaving metadata
You scrub the body of a Word doc but the author field still has the original employee name. The PDF was exported from a slide deck and the original slides had comments. The image in the doc has EXIF data with a GPS coordinate and a device ID. The Git commit history attached to the exported markdown still shows every original revision. None of these show up in a normal read-through, but every one of them can be pulled out by anyone who knows where to look.
3. Inconsistent redaction across a document
Someone redacts "Jane Smith" in the first mention but leaves "Jane" standalone in paragraph three. Or they redact the email jane@acme.com but leave the sentence "reach out to Jane at her acme.com address". The unredacted fragment plus the redacted one makes the subject just as identifiable — sometimes more so, because the act of redacting part of it signals which part was sensitive.
Redact, remove, anonymize, pseudonymize — pick the right one
These four words get used interchangeably. They are not the same thing, and GDPR treats them as distinct categories. If you are sharing data under a compliance requirement, the distinction matters.
- **Remove** — the sensitive fragment is deleted and the document is now shorter. Simplest, loses the most information.
- **Redact** — the sensitive fragment is replaced with a placeholder like
[REDACTED]. The document keeps its shape but you cannot recover the original. - **Anonymize** — identifiers are replaced with generic tokens like
[EMAIL]or[PERSON_1]and critically cannot be reversed back to the original by anyone. True anonymization is harder than it sounds because of quasi-identifiers. - **Pseudonymize** — identifiers are replaced with consistent tokens that a separate key can reverse.
Jane Smithalways becomesUSER_4721in every occurrence. Still considered personal data under GDPR because the mapping exists.
For most "I am about to share this with a third party" scenarios, you want redaction or anonymization — not pseudonymization, because a consistent mapping is still PII under most frameworks. The Redact Text and Anonymize Text tools handle the two common cases and neither stores any mapping server-side.
The identify → replace → verify workflow
A working redaction pass is three steps, in order. Skip any of them and you leak something.
Step 1: Identify
Read the document once and list every PII type present. Not the specific strings yet — just the categories: "this has names, emails, two phone numbers, one IP, one order amount, no card numbers". That list tells you which tools you need.
Step 2: Replace, one type at a time
Run a dedicated tool per category. Mixing all of them into one regex is what gives you inconsistent redaction — these tools each target a single pattern with much higher precision than a generic sweep.
One tool per PII type
Example: paste your text into Remove Email from Text. Every address is detected and replaced with a placeholder. Take the output, paste it into Remove Phone Numbers. Repeat for IP addresses if the document is a log. Repeat for credit card numbers if it is a transaction export — the tool uses the Luhn checksum so it will not false-positive on random 16-digit strings. Repeat for dates if dates of birth or event timestamps are present. Finish with Mask Names which heuristically detects capitalised name-like tokens.
Example input and output for the email pass:
Input:
Contact jane@acme.com or fall back to ops+billing@acme.co.uk
Output:
Contact [EMAIL] or fall back to [EMAIL]Names are the hardest category. No regex catches all of them, and none of them miss false positives (common words that look like names). Mask Names is a strong starting point but budget 30 seconds of eyeballing afterward — "Bob from finance" should become "[NAME] from finance", and the tool will usually get that, but unusual names or typos in the source can slip through.
Step 3: Verify
Read the output once, top to bottom. Look for:
- Any word still capitalised mid-sentence — often a missed name.
- Strings of digits longer than four — missed phone, card, or account numbers.
- Anything that looks like a date in any format (
2026-04-12,April 12 2026,12/04/2026). - URLs with paths that include slugs (
/users/jsmith/orders/4421) which leak both a username and an order ID. - Any combination of quasi-identifiers that points to one person (job title + company + city).
Step 4: Hidden metadata in attachments
If what you are sharing includes files — a screenshot, a PDF, a Word doc — the file metadata is separate from the body text and you have not touched it. Image EXIF can contain GPS coordinates. PDF properties include the original author. Office doc history tracks every edit. For images, any metadata stripper will clean EXIF. For PDFs and Office docs, "save as new copy" and clear the properties from Document → Info.
Step 5: Quasi-identifier review
After redaction, re-read once more asking the question "could someone guess who this is?". If the answer is yes — a single job title, an unusual combination of details, a distinctive writing style — you have not anonymized, you have only removed the obvious labels. Consider removing the specifics: round a salary to the nearest $10k, blur a date to the month, replace a specific city with a region.
Redacting before you paste into ChatGPT or Claude
This is the most common real-world scenario in 2026 — you have a customer email thread or a list of transactions and you want an LLM to summarize, classify, or draft a reply. Pasting it straight in means the content goes to the LLM provider, gets logged somewhere, and may be used to train future models depending on your plan.
The workflow is small and becomes muscle memory fast:
- Copy the source text into Anonymize Text.
- Run the one-click redaction for emails, phones, URLs, and IPs.
- If names matter, pipe the result through Mask Names.
- If card numbers could be present, run Remove Credit Card Numbers.
- Paste the cleaned text into your LLM. Ask it to answer in terms of the placeholders — e.g. "summarize the complaint from [PERSON_1] about the charge on [DATE]".
You still get a useful answer from the model because structure and content are preserved. But every downstream system now sees anonymized text, which is what you want.
Why browser-based matters for redaction tools
This is the one decision you cannot fudge. If you are redacting a document because it contains sensitive data, uploading that document to a third-party server just to clean it defeats the entire point. The document is now sitting on someone else's disk, possibly in a log, possibly backed up, possibly indexed for "quality improvements".
Every tool linked in this guide runs entirely in your browser. The JavaScript ships once when the page loads and all the regex matching happens locally. You can verify this yourself — open DevTools → Network tab, paste a long document, click Redact, and watch: nothing leaves. No request, no upload, no log. This is the only safe pattern for a privacy tool.
The 60-second redaction workflow
- Paste the source into Anonymize Text and run the default sweep to catch emails, phones, URLs, and IPs.
- If the content has names, pipe the result into Mask Names.
- If the content has card numbers or dates, run Remove Credit Card Numbers and Remove Dates.
- For anything custom — a specific term, product name, internal project codename — finish with Redact Text and add that term to the list.
- Read the output once. Check for capitalised missed names and long digit strings.
- Strip metadata from any attached files separately.
- Send.
Bookmark the Redact Text page. Next time a customer sends you a message thread to share internally, or a client asks for a log excerpt from production, or you want to paste an email into an LLM for summarizing — it is a 60-second job, and it keeps your organization out of an incident report.
Frequently Asked Questions
What is the difference between redacting and anonymizing?
Redacting replaces the sensitive text with a fixed marker like [REDACTED] — the content is gone but the location is obvious. Anonymizing replaces identifiers with generic tokens like [PERSON_1] or [EMAIL] so the text still reads naturally, and critically the tokens cannot be reversed back to the originals. GDPR treats fully anonymous data as outside its scope, while pseudonymized data (reversible with a key) is still regulated as personal data.
Is drawing a black box over text in a PDF a safe way to redact?
No. The black box is a visual overlay — it does not remove the underlying text. Any recipient can select the "redacted" area, copy-paste, and read the original. Real redaction means the characters are replaced in the source text stream, not covered by a shape on the page. This is the single most common way documents leak after a "redaction" step.
Does the redaction tool send my text to a server?
No. All processing happens inside your browser using JavaScript that was loaded once when the page rendered. You can confirm this by opening DevTools → Network and watching while you redact — there are no outbound requests. This is deliberate: a privacy tool that uploads your content defeats its own purpose.
What counts as PII under GDPR?
Any information that relates to an identified or identifiable person. That includes the obvious (name, email, phone, address, government ID) plus IP addresses, device IDs, location data, and quasi-identifiers (combinations like zip + DOB + gender that uniquely identify someone). The EDPB guidance is clear: if the combination is enough to point to one individual, it is personal data.
How do I redact data before pasting it into ChatGPT or Claude?
Run the text through the anonymizer first. Open Anonymize Text, paste the source, click the default sweep to replace emails, phones, URLs, and IPs with placeholders, and for names run Mask Names afterward. Paste the cleaned output into the LLM and phrase your prompt using the placeholder tokens ("summarize the complaint from [PERSON_1] about [DATE]"). You get a useful answer; the LLM never sees the real data.
Why does my redacted document still leak information?
Two common reasons. First, quasi-identifiers — job title plus company plus city is often enough to identify one employee even after all direct identifiers are gone. Second, file metadata — EXIF on images, author fields on PDFs, edit history on Office docs. Redact the body text, review the quasi-identifiers, and strip metadata from any attached files separately before sharing.
Should I use one all-in-one redaction tool or one tool per PII type?
Use the all-in-one for a first pass (the Anonymize Text tool catches emails, phones, URLs, and IPs in one click), then use a per-type tool for anything specific that the general sweep missed — card numbers via Remove Credit Card Numbers (which uses Luhn validation), dates via Remove Dates, names via Mask Names. The two-pass approach catches more than either on its own.