Files are read locally — nothing is uploaded to any server.
What is a CSV Cleaner?
A CSV cleaner removes the typical noise found in CSV exports — duplicate rows from joined queries, fully empty rows left by spreadsheet drag-fills, stray whitespace from manual data entry — and normalizes encoding so the same file can move between systems without breaking. In real-world data work, the raw CSV is almost never the file you want: it has a UTF-8 BOM that confuses your parser, Windows-style CRLF mixed with bare LF, dates that look like 2026/05/18 in one row and 18-May-2026 in the next, or — most painful of all — bytes encoded in Shift-JIS that turn into mojibake (文字化け) the moment you open the file in the wrong tool. A good CSV cleaner cuts through all of this before the data ever reaches your pipeline. DevFormatLab's CSV Cleaner is built specifically for messy real-world inputs, with first-class support for the Japanese ecosystem: it auto-detects UTF-8, Shift-JIS and EUC-JP on file open, and lets you download cleanly either as UTF-8 (with or without BOM) or as Shift-JIS for legacy Windows Excel, accounting systems and government portals. The RFC 4180 parser handles quoted fields with embedded commas, doubled quotes, and multi-line values correctly; the cleaning toggles (remove duplicates, remove empty rows, trim cell whitespace) operate on the entire dataset while the preview table stays capped at 50 rows so the page remains responsive even on 100 MB files. Everything runs in the browser via the File API — your CSV is read locally, processed locally, and downloaded locally; no upload, no analytics on contents, no third-party services. That makes it safe for HR exports, customer lists, financial extracts, and any other CSV that shouldn't be emailed around.
Features
- Remove exact-duplicate rows (byte-equal after normalization)
- Remove completely empty rows left by spreadsheet drag-fills
- Trim leading and trailing whitespace from every cell
- Auto-detect UTF-8, Shift-JIS and EUC-JP on file open
- Download as UTF-8 (with or without BOM) or Shift-JIS
- RFC 4180 parser: quoted fields, doubled quotes, embedded newlines
- Live preview of the first 50 rows in a sortable table
- Pure browser File API — no upload, safe for sensitive data
How to use
- Click "Open file" and pick a CSV from your computer — encoding is auto-detected from the first 4 KB, or you can force UTF-8 / Shift-JIS / EUC-JP from the dropdown. You can also paste CSV text directly into the input area.
- Toggle the cleaning operations you want: Remove duplicates, Remove empty rows, Trim whitespace. Each toggle runs over the full dataset, not just the preview.
- Inspect the preview table on the right — the first 50 rows are shown, but the row counter above tells you the true total.
- If the preview looks garbled, switch the encoding dropdown manually until Japanese / Chinese / Korean characters render correctly, then proceed.
- Click Download UTF-8 (with BOM) to produce a file Excel for Windows can open without mojibake, or Download Shift-JIS for legacy systems, or plain UTF-8 (no BOM) for Linux pipelines, Google Sheets and modern editors.
Frequently Asked Questions
How do I fix garbled Japanese / Chinese characters (mojibake / 文字化け)?
▾
Mojibake appears when the CSV is decoded with the wrong charset. Typical patterns: • 譁?ュ怜喧縺? ← Shift-JIS bytes read as UTF-8 • 中文ä¹±ç ← UTF-8 bytes read as Latin-1 Fix: use the encoding selector when you click Open file (Auto, UTF-8, Shift-JIS, EUC-JP). The auto-detector inspects the first 4 KB and usually picks correctly. If the preview is still garbled, switch to the encoding that produces readable Japanese, then click Download UTF-8 to normalize the file going forward.
Why does the cleaned file not open correctly in Japanese Excel?
▾
Excel for Windows (Japanese edition) double-clicks open CSVs as Shift-JIS by default and ignores UTF-8 unless a BOM is present. • Choose Download Shift-JIS → opens directly in Excel • Or choose Download UTF-8 (BOM) → the BOM tells Excel to use UTF-8 • Plain UTF-8 without BOM → Excel will mojibake; use macOS Numbers / Google Sheets / VS Code instead
Why are quoted fields with commas being split into multiple columns?
▾
Make sure each value with a comma is wrapped in double quotes, and any literal double quote inside is doubled. The parser follows RFC 4180: id,name,note 1,"Smith, John","He said ""hi""" 2,Alice,Hello If an external tool produced lines like 1,Smith, John,He said "hi" without quoting, fix the source export — there is no unambiguous way to recover the original columns.
How are duplicate rows detected?
▾
Two rows are considered duplicates only when every cell matches byte-for-byte after the selected normalization (Trim whitespace, lowercase header if enabled). The header row is always preserved. If you need fuzzy de-duplication (e.g. case-insensitive emails), enable Trim and ensure the column is normalized before exporting.
Why does the preview only show 50 rows?
▾
The on-screen table is capped at 50 rows so the page stays responsive even for 100 MB files. The cleaning operations themselves run over the entire dataset; the downloaded file contains every row. Use the row counter above the preview to confirm the total.
Can I process Excel .xlsx files directly?
▾
Not yet — DevFormatLab works on text CSV. From Excel choose File → Save As → "CSV UTF-8 (Comma delimited) (*.csv)" or "CSV (Comma delimited) (*.csv)" first. From Google Sheets choose File → Download → Comma-separated values (.csv). The exported file can then be loaded here, cleaned and re-saved in either encoding.
Related tools
Format, minify, validate and beautify JSON with inline error highlighting.
Compare two JSON documents side-by-side with line-level highlighting and key sorting.
Convert YAML ↔ JSON and YAML ↔ Java .properties with strict validation.
Encode and decode Base64 (and Base64URL) for text or files. Real-time, browser-only.
Encode and decode URLs, query strings and URI components with percent-encoding and form-style spaces.
Generate MD5, SHA-1, SHA-256, SHA-384 and SHA-512 hashes for text or files in your browser.
Convert Unix timestamps (seconds or milliseconds) to and from human-readable dates across timezones.
Decode JSON Web Tokens to inspect header, payload and signature, with readable timestamps and expiry status.
Test regular expressions in real time with match highlighting and presets.
Escape JSON into a string literal suitable for embedding into source code (double quotes and backslashes escaped).
Canonical: https://devformatlab.com/en/csv-cleaner