Trim / Normalize Whitespace
Clean up text by trimming spaces, collapsing blank lines, normalizing line endings, and converting tabs.
Operations
About this tool
Invisible whitespace characters are a common source of subtle bugs in data processing. Text copied from web pages or documents frequently contains non-breaking spaces (U+00A0), which look identical to regular spaces but do not match \s patterns in some languages and cause string comparison failures. PDFs often insert extra spaces and line breaks that do not reflect the original document's structure. Windows line endings (CRLF, \r\n) mixed with Unix line endings (LF, \n) cause mismatches when comparing strings from different sources.
Text normalization goes beyond simple trimming. Collapsing multiple spaces to a single space handles copy-paste artifacts. Converting non-standard whitespace characters (tabs, non-breaking spaces, em spaces, thin spaces) to regular spaces ensures consistent processing. Normalizing line endings to a single standard avoids cross-platform incompatibilities. Unicode normal form NFC (Canonical Decomposition followed by Canonical Composition) resolves multiple representations of the same character, such as é as either a single composed character or as e followed by a combining accent.
Zero-width characters are particularly insidious: zero-width space (U+200B), zero-width non-joiner (U+200C), zero-width joiner (U+200D), and the byte order mark (U+FEFF) are completely invisible in most text editors and terminals. They can be accidentally pasted from web content, introduced by copy-paste from certain apps, or deliberately inserted to evade text filters. These characters cause string matching failures and sorting irregularities that are extremely difficult to debug without a tool that makes them visible.