Text Cleaner

Size: 0 B, 0 characters

Cleaning options:

Transform:

 
Size: 0 B, 0 characters

Free Text Cleaner Online

Clean messy text instantly: Paste text copied from documents, emails, PDFs, spreadsheets, websites, or chat tools and fix common formatting problems — including invisible characters you cannot see — directly in your browser.

What is a Text Cleaner?

A text cleaner is a utility that removes unwanted characters and formatting problems from plain text. It turns messy copied text into clean content that is easier to edit, publish, paste into forms, or process with other tools.

This tool is especially useful when text copied from PDFs, Word documents, or web pages contains invisible characters, non-breaking spaces, curly quotes, or HTML markup that is not visible but causes problems in code editors, databases, and search engines.

Why Does Copied Text Get Messy?

The problem rarely starts with you. It starts with how applications store and transfer text.

PDFs store positions, not flow. A PDF is a fixed-layout document. It stores each character at an absolute coordinate on the page — there is no concept of a “word” or a “line” as a flowing stream of text. When you copy from a PDF, the reader reconstructs the flow by inferring word boundaries from character positions. This reconstruction is imperfect: it inserts extra spaces between characters, breaks hyphenated words across lines, and fails to distinguish a visual line break from a paragraph break.

HTML clipboard format. Most modern applications — browsers, email clients, Slack, Notion, Google Docs — write HTML to the clipboard when you copy. When you paste into a plain-text field, the receiving app strips the tags, but artifacts remain: non-breaking spaces used for indentation, invisible Unicode marks from the original markup structure.

Word processors apply smart typography. Microsoft Word, Google Docs, and Apple Pages automatically convert straight quotes to curly quotes, double hyphens to em dashes, and regular spaces to non-breaking spaces in certain contexts. Useful inside the word processor, but a source of broken characters everywhere else.

Line ending mismatch. Windows uses CRLF (\r\n) as a line ending. macOS and Linux use LF (\n) only. Text pasted across platforms carries stray \r characters that are invisible but cause rendering issues in terminals, code editors, and command-line tools.

What This Text Cleaner Can Do

Cleanup options:

  • Trim each line: Remove spaces at the start and end of every line.
  • Collapse extra spaces: Convert repeated spaces, tabs, and non-breaking spaces into a single space.
  • Remove empty lines: Delete all blank lines from the text.
  • Limit blank lines to 1: Preserve paragraph breaks but collapse runs of two or more blank lines into one.
  • Remove invisible characters: Strip zero-width spaces, BOM characters, and soft hyphens that are invisible but cause problems in editors, databases, and search engines.
  • Straighten smart quotes: Convert typographic curly quotes (" " ' ') to straight quotes (" '), useful for code, markdown, and data entry.
  • Strip HTML tags: Remove <b>, <p>, <span>, and all other HTML markup, leaving only the plain text content.
  • Remove duplicate lines: Keep the first copy of each line and remove later duplicates.

Transform:

  • Convert line breaks to spaces: Turn multi-line text into a single paragraph.

How to Use This Text Cleaner

  1. Paste or type your text into the input field.
  2. Select the cleanup options you want to apply.
  3. The result updates instantly as you type or change options.
  4. Click Upload to load text from a local file.
  5. Click Copy to copy the cleaned result, or Download to save it as a .txt file.

Before and After Examples

PDF copy with extra spaces and broken lines:

Before:

This  is  a  sentence  with  extra   spaces.
It  has  a  leading  indent   too.

After (Trim each line + Collapse extra spaces):

This is a sentence with extra spaces.
It has a leading indent too.

HTML-pasted content from a website or CMS:

Before:

<p><strong>Project Update</strong></p>
<ul>
<li>Task 1 is complete</li>
<li>Task 2 is in progress</li>
</ul>

After (Strip HTML tags + Remove empty lines):

Project Update
Task 1 is complete
Task 2 is in progress

Smart quotes from Word or Google Docs:

Before:

He said “this won’t work” and closed the file.

After (Straighten smart quotes):

He said “this won’t work” and closed the file.

Excessive blank lines between paragraphs:

Before:

Introduction paragraph.



Second section.




Final notes.

After (Limit blank lines to 1):

Introduction paragraph.

Second section.

Final notes.

Remove Invisible Characters

Text copied from PDFs, Word documents, and web pages often contains invisible characters that are impossible to spot by eye: zero-width spaces (U+200B), zero-width non-joiners (U+200C), BOM markers (U+FEFF), and soft hyphens (U+00AD). These can break word counts, search functions, spell checkers, and database entries.

Enable Remove invisible characters to strip them all in one step. This option is on by default.

Whitespace and Invisible Character Reference

CharacterUnicodeNameCommon Source
(space)U+0020SpaceAll sources
(tab)U+0009Horizontal tabCode, spreadsheets
&nbsp;U+00A0Non-breaking spaceHTML, Word, Google Docs
(invisible)U+200BZero-width spaceWeb pages, PDFs, Wikipedia
(invisible)U+200CZero-width non-joinerWeb content, RTL text
(invisible)U+200DZero-width joinerEmoji sequences, web content
(invisible)U+00ADSoft hyphenWord, typesetting tools
(invisible)U+FEFFBOM / Zero-width no-break spaceWindows Notepad, UTF-8 exports
(invisible)U+2028Line separatorLegacy systems
(invisible)U+2029Paragraph separatorLegacy systems

Remove invisible characters targets U+200B, U+200C, U+200D, U+FEFF, and U+00AD. Collapse extra spaces handles U+00A0 alongside regular spaces and tabs. Line endings U+2028 and U+2029 are normalized automatically.

Collapse Extra Spaces

The Collapse extra spaces option replaces repeated whitespace — including tabs and non-breaking spaces (&nbsp;, U+00A0) common in HTML-pasted content — with a single regular space. Enable Trim each line at the same time to also remove leading and trailing whitespace from every line.

Strip HTML Tags and Straighten Smart Quotes

Strip HTML tags is useful when copying rich text from websites, emails, or CMS editors. It removes all markup and leaves only the readable text.

Straighten smart quotes converts typographic curly quotes back to standard ASCII quotes. Word processors and web editors automatically replace straight quotes with curly ones, which can cause problems in code, CSV files, and structured data.

Control Blank Lines

Choose between two mutually exclusive options:

  • Remove empty lines deletes every blank line for compact, continuous output.
  • Limit blank lines to 1 collapses runs of multiple blank lines into one, preserving paragraph breaks while removing excessive spacing.

Cleaning Text from Specific Sources

From Google Docs

Google Docs writes HTML to the clipboard when you copy. The pasted result often includes non-breaking spaces used for indentation, smart quotes, and occasional invisible Unicode marks. Recommended combination: Straighten smart quotes + Collapse extra spaces + Trim each line. For structured documents, also enable Remove invisible characters.

From Microsoft Word

Word is the most aggressive smart-typography engine in common use. It converts straight quotes to curly quotes, double hyphens to em dashes, and inserts non-breaking spaces in specific positions. Pasting Word content into code, markdown, or CSV almost always requires Straighten smart quotes and Collapse extra spaces at minimum.

From PDFs

PDFs are the messiest source. Expect extra spaces between words, broken hyphenated words split across lines, and invisible characters from the PDF’s internal encoding. Best combination: Remove invisible characters + Collapse extra spaces + Trim each line. Add Remove empty lines or Limit blank lines to 1 depending on whether you want to preserve paragraph breaks.

From Excel or Google Sheets

Cells copied from spreadsheets carry tab characters between columns and line breaks within cells. Use Collapse extra spaces to normalize whitespace and Remove empty lines to clean up blank rows.

Line Endings: CRLF vs LF

Every line of text ends with one or more invisible control characters that tell applications where a line stops:

  • LF (\n, U+000A): macOS, Linux, Unix — the modern standard for most development environments.
  • CRLF (\r\n, U+000D + U+000A): Windows and DOS — the standard for Windows applications and text file exports.
  • CR (\r, U+000D): Older Mac systems (pre-OS X) — rare today.

When text moves between platforms, the \r character causes visible artifacts — the ^M symbol in vim, broken line counts in scripts — or silent errors in string processing. This tool normalizes all line endings to LF automatically before applying any other option, regardless of the source platform.

Data as Parameter

You can pre-fill the input field with the ?input= query parameter:

https://www.uprek.com/en/tools/text-cleaner?input=hello%20%20world

For private text, avoid sharing URLs that contain the content itself.

Your Text Never Leaves Your Browser

When cleaning text that includes internal documents, customer data, API keys, or confidential communications, pasting it into a server-side tool creates a real security risk.

At UPREK, our philosophy is simple: Your data stays yours. We don’t want it, we don’t collect it, and we can’t see it.

  • 100% Local Processing: All cleaning and transformation algorithms run locally on your machine via your browser’s JavaScript engine.
  • Zero Server Uploads: Your input text is never routed through, processed by, or uploaded to our servers.
  • No Logs or Backups: We do not log, store, or back up any of the text or files you input into this tool.
  • Instant Deletion: The text you work with exists only in your browser’s active memory. Close the tab and the data is gone.
  • Enterprise-Grade Security: Because we never possess or transmit your data, using UPREK makes you inherently immune to server-side data breaches.

Real-World Use Cases

1. Cleaning Text Copied from PDFs

PDFs are notorious for introducing invisible characters, non-breaking spaces, and broken line breaks when text is copied. Enable Remove invisible characters, Collapse extra spaces, and Trim each line to quickly produce clean, portable text from any PDF extract.

2. Stripping HTML from CMS Exports

When exporting content from WordPress, Notion, or any rich-text CMS, the exported text is often littered with inline HTML tags. Use Strip HTML tags to reduce it to plain text before importing elsewhere or processing with scripts.

3. Normalizing Data Before Database Import

User-submitted text often arrives with inconsistent whitespace, smart quotes, and invisible characters. Running it through the cleaner before inserting into a database prevents encoding issues, broken queries, and search index corruption.

4. Fixing Smart Quotes in Code and Markdown

Word processors and web editors automatically replace straight quotes with typographic curly quotes. This breaks code samples, YAML files, and markdown. Use Straighten smart quotes to convert them back to ASCII-safe " and ' in a single step.

5. Deduplicating and Cleaning Lists

When aggregating data from multiple sources — search results, exported rows, crawled URLs — you often end up with duplicate or inconsistently formatted entries. Combine Remove duplicate lines, Trim each line, and Remove empty lines to produce a clean, normalized list ready for further processing.

6. Preparing Log Snippets for Documentation

Server logs and debug output often contain extra whitespace, carriage returns, and Unicode line separators that render unpredictably in documentation or tickets. The cleaner normalizes all line endings and strips noise characters, producing clean text ready to paste into Jira, Confluence, or a pull request description.

Frequently Asked Questions

Is the text I paste actually private?

Yes. Your text is processed entirely within your browser's JavaScript engine and never transmitted to any server. UPREK cannot see, access, or store the text you paste into this tool.

What are invisible characters and why do they matter?

Invisible characters are Unicode code points with no visible glyph: zero-width spaces (U+200B), BOM markers (U+FEFF), and soft hyphens (U+00AD). They are silently introduced when copying from PDFs, Word documents, and web pages. They can break string comparisons, corrupt search indexes, cause unexpected behavior in code, and inflate character counts without any visible sign.

What is the difference between "Remove empty lines" and "Limit blank lines to 1"?

Remove empty lines deletes every blank line, producing a compact block with no gaps. Limit blank lines to 1 collapses runs of two or more consecutive blank lines into a single blank line, preserving paragraph structure while removing excessive spacing. The two options are mutually exclusive — enabling one disables the other.

Does the tool change my actual text content?

Only in ways you explicitly select. Trimming whitespace, collapsing spaces, and removing invisible characters only affect whitespace and control characters — your readable words and sentences are untouched. Strip HTML tags removes markup but leaves text content intact. Straighten smart quotes changes the quote characters themselves, which is the intended behavior for code-safe output.

Can I clean very large text files?

Yes. All processing runs in your browser and is not subject to a server upload limit. Performance depends on your device, but the tool handles typical log files and document exports without issue.

How do I remove special characters from text?

"Special characters" typically means one of three things: visible punctuation and symbols (@, #, $), Unicode characters outside standard ASCII, or invisible control characters. This tool targets the third category — invisible characters that silently corrupt text. It does not strip visible punctuation or symbols, as those are usually part of the meaning of the text.

What is a BOM character and why does it cause problems?

BOM stands for Byte Order Mark (U+FEFF). It was originally used to signal the byte order of a UTF-16 file. In UTF-8 files, a BOM is unnecessary and often harmful: it triggers "invalid character" errors in JSON parsers, SQL imports, and command-line tools that do not expect it at the start of a file. BOM characters are commonly introduced by Windows Notepad and some Excel exports. Enable Remove invisible characters to strip them.

Why does text from a PDF have strange spacing?

PDFs store characters at absolute page coordinates rather than as flowing text. When you copy, the PDF reader estimates word and line boundaries from those coordinates — a process that frequently inserts extra spaces between words, breaks hyphenated words across lines, and adds invisible encoding characters. Use Collapse extra spaces + Trim each line + Remove invisible characters together to resolve the most common PDF copy artifacts.

Changelog

v1.1.0 May 20, 2026
  • Rebuilt UI with line number sidebars, bordered panels, toolbar, and size counters
  • Added four new cleaning options: remove invisible characters, straighten smart quotes, strip HTML tags, limit consecutive blank lines to one
  • Fixed collapse-spaces to also handle non-breaking spaces (U+00A0)
v1.0.0 May 10, 2026
  • Remove extra spaces, trim each line, remove blank lines, and deduplicate lines
  • Upload a text file; copy or download cleaned output