Data Cleaning
Upload qualitative data files for cleaning. Removes timestamps, filler words, and personally identifying information (PII) — pattern-matching catches structured identifiers, then AI review catches names, locations, and contextual identifiers. How it works →
Before you start
- .docx — Word documents (text is extracted automatically)
- .txt — Plain text files
- .csv — CSV files (you'll choose which columns to clean)
- .pdf — PDF documents (text is extracted automatically)
Not supported: Excel (.xlsx) — export as CSV first (File → Save As → CSV). Scanned or image-based PDFs cannot be processed — use a text-based PDF or convert to .docx first.
Time estimate: Around 1–2 minutes per file.
How your data is handled
- Step 1 — Pattern scan: on-server regex catches phone numbers, emails, postal codes, SINs, URLs, social media handles, and long ID numbers. Your data doesn't leave our server for this step.
- Step 2 — AI review: your text is sent to DeepSeek V4 Pro via Tensorix (EU, servers across Europe — 100% EU data residency, Zero Data Retention, CLOUD Act exempt) to catch names, initials, locations, organisation names, job titles, and identifying combinations.
- All files are automatically deleted from our server after 1 hour, when the session expires.
- Always review cleaned files before sharing. Automated de-identification is a strong first pass, not a replacement for human review.