normalize-html-whitespace
Safely remove repeating whitespace from HTML text.
Using \s
to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:
\u00a0
or
(non-breaking space)
\ufeff
or 
(zero-width non-breaking space)
…as well as these lesser-known ones:
\u1680
or  
(Ogham space mark)
\u180e
or ᠎
(Mongolian vowel separator)
\u2000
or  
(en quad)
\u2001
or  
(em quad)
\u2002
or  
(en space)
\u2003
or  
(em space)
\u2004
or  
(three-per-em space)
\u2005
or  
(four-per-em space)
\u2006
or  
(six-per-em space)
\u2007
or  
(figure space)
\u2008
or  
(punctuation space)
\u2009
or  
(thin space)
\u200a
or  
(hair space)
\u2028
or 

(line separator)
\u2029
or 

(paragraph separator)
\u202f
or  
(narrow non-breaking space)
\u205f
or  
(medium mathematical space)
\u3000
or  
(ideographic space)
For the sake of completeness, the following characters which are not part of \s
will also not be affected:
\u200b
or ​
(zero-width breaking space)
Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.
Installation
Node.js >= 8
is required. Type this at the command line:
npm install normalize-html-whitespace
Usage
const normalizeWhitespace = require('normalize-html-whitespace');
normalizeWhitespace(' foo bar baz ');
//-> ' foo bar baz '