[html] Soft hyphen in HTML (<wbr> vs. &shy;)

How do you solve the problem with soft hyphens on your web pages? In a text there can be long words which you might want to line break with a hyphen. But you do not want the hyphen to show if the whole word is on the same line.

According to comments on this page <wbr> is a non standard "tag soup invented by Netscape". It seems like &shy; has its problems with standard compliance as well. There seems to be no way to get a working solution for all browsers.

Which is your way for handling soft hyphens and why did you choose it? Is there a preferred solution or best practice?


See related SO Discussion here.

This question is related to html text soft-hyphen wbr

The answer is


Sometimes web browsers seems to be more forgiving if you use the Unicode string &#173; rather than the &shy; entity.


It is very important to notice that, as of HTML5, <wbr> and &shy; are not supposed to do the same thing!

Soft hyphens

&shy; is a soft hyphen, i.e., U+00AD: SOFT HYPHEN. For example,

innehålls&shy;förteckning

might be rendered as

innehållsförteckning

or as

innehålls-
förteckning

As of today, soft hyphens work in Firefox, Chrome, and Internet Explorer.

The wbr element

The wbr element is a word-break opportunity, which will not display a hyphen if a line break occurs. For example,

ABCDEFG<wbr/>abcdefg

might be rendered as

ABCDEFGabcdefg

or as

ABCDEFG
abcdefg

As of today, this element works in Firefox and Chrome.


The zero-width space entity can be used in place of <wbr> tag reliably on virtually every platform.

&#8203;

Also useful is the word joiner entity, that can be used to prohibit a break. (Insert between each character of a word, except where you want the break.)

&#8288;

With the two of these, you can do anything.


I use &shy;, inserted manually where necessary.

I always find it a pity that people don’t use techniques because there is some—maybe old or strange—browser around which doesn’t handle them the way they were specified. I found that &shy; is working properly in both recent Internet Explorer and Firefox browsers, that should be enough. You may include a browser check telling people to use something mature or continue at their own risk if they come around with some strange browser.

Syllabification isn’t that easy and I cannot recommend leaving it to some Javascript. It’s a language specific topic and may need to be carefully revised by the deskman if you don’t want it to turn your text irritating. Some languages, such as German, form compound words and are likely to lead to decomposition problems. E.g. Spargelder (germ. saved money, pl.) may, by syllabification rules, be wrapped in two places (Spar-gel-der). However, wrapping it in the second position, turns the first part to show up as Spargel- (germ. asparagus), activating a completely misleading concept in the head of the reader and therefore shoud be avoided.

And what about the string Wachstube? It could either mean ‘guardroom’ (Wach-stu-be) or ‘tube of wax’ (Wachs-tu-be). You may probably find other examples in other languages as well. You should aim to provide an environment in which the deskman can be supported in creating a well-syllabified text, proof-reading every critical word.


I suggest using wbr, so the code can be written like this:

<p>???????,???<wbr
></wbr>??;?????</p>

It won't lead space between charaters, while &shy; won't stop spaces created by line breaks.


It is very important to notice that, as of HTML5, <wbr> and &shy; are not supposed to do the same thing!

Soft hyphens

&shy; is a soft hyphen, i.e., U+00AD: SOFT HYPHEN. For example,

innehålls&shy;förteckning

might be rendered as

innehållsförteckning

or as

innehålls-
förteckning

As of today, soft hyphens work in Firefox, Chrome, and Internet Explorer.

The wbr element

The wbr element is a word-break opportunity, which will not display a hyphen if a line break occurs. For example,

ABCDEFG<wbr/>abcdefg

might be rendered as

ABCDEFGabcdefg

or as

ABCDEFG
abcdefg

As of today, this element works in Firefox and Chrome.


Feb 2015 summary (partially updated Nov 2017)

They all perform pretty well, &#173; edges it as Google can still index of words containing it.

  • In browsers: &shy; and &#173; both display as expected in major browsers (even old IE!). <wbr> isn't supported in recent versions of IE (10 or 11) and doesn't work properly in Edge.
  • When copied and pasted from browsers: (tested 2015) as expected for &shy; and &#173; for Chrome and Firefox on Mac, on Windows (10), it keeps the characters and pastes hard hyphens into Notepad and invisible soft hyphens into applications that support them. IE (win7) always pastes with hyphens, even in IE10, and Safari (Mac) copies in a way which pastes as hyphens in some applications (e.g. MS Word), but not others
  • Find on page works for &shy; and &#173; on all browsers except IE which only matches exact copied-and-pasted matches (even up to IE11)
  • Search engines: Google matches words containing &#173; with words typed normally. As of 2017 it appears to no longer match words containing &shy;. Yandex appers to be the same. Bing and Baidu seem to not match either.

Test it

For up-to-date live testing, here are some examples of unique words with soft hyphens.

  • &shy; - confumbabbl&shy;ication&shy;ism - confumbabbl­ication­ism
    • ..............................................................................................................confumbabbl­ication­ism
    • ..................................................................................................................confumbabbl­ication­ism

<wbr> - donfounbabbl<wbr>ication<wbr>ism. This site removes <wbr/> from output. Here's a jsbin.com snippet for testing.

  • &#173; - eonfulbabbl&#173;ication&#173;ism - eonfulbabbl­ication­ism
    • .................................................................................................................eonfulbabbl­ication­ism
    • ....................................................................................................................eonfulbabbl­ication­ism

Here they are with no shy hyphens (this is for copying and pasting into find-on-page testing; written in a way which won't break the search engine tests):

ZZZconfumbabblicationismZZZdonfounbabblicationismZZZeonfulbabblicationismZZZ

Display across browsers

Success: displaying as a normal word, except where it should break, when it breaks and hyphenates in the specified place.

Failure: displaying unusually, or failing to break in the intended place.

  • Chrome (40.0.2214.115, Mac): &shy; success, <wbr> success, &#173; success
  • Firefox (35.0.1, Mac): &shy; success, <wbr> success, &#173; success
  • Safari (6.1.2, Mac): &shy; success, <wbr> not tested yet, &#173; success
  • Edge (Windows 10): &shy; success, <wbr> fail (break but no hyphen), &#173; success
  • IE11 (Windows 10): &shy; success, <wbr> fail (no break), &#173; success
  • IE10 (Windows 10): &shy; success, <wbr> fail (no break), &#173; success
  • IE8 (Windows 7): erratic - sometimes, none of them work at all and they all just follow css word-wrap. Sometimes, they seem to all work. Not yet found any clear pattern as to why.
  • IE7 (Windows 7): &shy; success, <wbr> success, &#173; success

Copy-paste across browsers

Success: copying and pasting the whole word, unhyphenated. (tested on Mac pasting into browser search, MS Word 2011, and Sublime Text)

Failure: pasting with a hyphen, space, line break, or with junk characters.

  • Chrome (40.0.2214.115, Mac): &shy; success, <wbr> success, &#173; success
  • Firefox (35.0.1, Mac): &shy; success, <wbr> success, &#173; success
  • Safari (6.1.2, Mac): &shy; fail into MS Word (pastes all as hyphens), success in other applications <wbr> fail, &#173; fail into MS Word (pastes all as hyphens), success in other applications
  • IE10 (Win7): &shy; fail pastes all as hyphens, <wbr> fail, &#173; fail pastes all as hyphens
  • IE8 (Win7): &shy; fail pastes all as hyphens, <wbr> fail, &#173; fail pastes all as hyphens
  • IE7 (Win7): &shy; fail pastes all as hyphens, <wbr> fail, &#173; fail pastes all as hyphens

Search engine matching

Updated in November 2017. <wbr> not tested because StackOverflow's CMS stripped it out.

Success: searches on the whole, non-hyphenated word find this page.

Failure: search engines only find this page on searches for the broken segments of the words, or a word with hyphens.

  • Google: &shy; fails, &#173; succeeds
  • Bing: &shy; fails, &#173; fails
  • Baidu: &shy; fails, &#173; fails (can match fragments within longer strings but not the words on their own containing a &#173; or &shy;)
  • Yandex: &shy; fails, &#173; succeeds (though it's possible it's matching a string fragment like Baidu, not 100% sure)

Find on page across browsers

Success and failure as search engine matching.

  • Chrome (40.0.2214.115, Mac): &shy; success, <wbr> success, &#173; success
  • Firefox (35.0.1, Mac): &shy; success, <wbr> success, &#173; success
  • Safari (6.1.2, Mac): &shy; success, <wbr> success, &#173; success
  • IE10 (Win7): &shy; fail only matches when both contain shy hyphens, <wbr> success, &#173; fail only matches when both contain shy hyphens
  • IE8 (Win7): &shy; fail only matches when both contain shy hyphens, <wbr> success, &#173; fail only matches when both contain shy hyphens
  • IE7 (Win7): &shy; fail only matches when both contain shy hyphens, <wbr> success, &#173; fail only matches when both contain shy hyphens

There is an ongoing effort to standardize hyphenation in CSS3.

Some modern browsers, notably Safari and Firefox, already support this. Here is a good and up to date reference on browser support.

Once the CSS hyphenation gets implemented universally, that would be the best solution. In the meantime, I can recommend Hyphenator - a JS script that figures out how to hyphenate your text in the way most appropriate for a particular browser.

Hyphenator:

  • relies on Franklin M. Liangs hyphenation algorithm, commonly known from LaTeX and OpenOffice.
  • uses CSS3 hyphenation where it is available,
  • automatically inserts &shy; on most other browsers,
  • supports multiple languages,
  • is highly configurable,
  • gracefully falls back in case javascript is not enabled.

I've used it and it works great!


I suggest using wbr, so the code can be written like this:

<p>???????,???<wbr
></wbr>??;?????</p>

It won't lead space between charaters, while &shy; won't stop spaces created by line breaks.


<wbr> and &shy;

Today you can use both.

<wbr> use to break and do not put more information.

Example, use to show links:

 https://stackoverflow.com/questions/226464/soft-hyphen-in-html-wbr-vs-shy

&shy; when necessary, at this point the text will break and add a hyphen.

Example:

"É im&shy;pos&shy;sí&shy;vel pa&shy;ra um ho&shy;mem a&shy;pren&shy;der a&shy;qui&shy;lo que ele acha que já sa&shy;be."

_x000D_
_x000D_
div{_x000D_
  max-width: 130px;_x000D_
  border-width: 2px;_x000D_
  border-style: dashed;_x000D_
  border-color: #f00;_x000D_
  padding: 10px;_x000D_
}
_x000D_
<div>https://<wbr>stackoverflow.com<wbr>/questions/226464<wbr>/soft-hyphen-in-<wbr>html-wbr-vs-shy</div>_x000D_
_x000D_
<div>É im&shy;pos&shy;sí&shy;vel pa&shy;ra um ho&shy;mem a&shy;pren&shy;der a&shy;qui&shy;lo que ele acha que já sa&shy;be.</div>
_x000D_
_x000D_
_x000D_


The zero-width space entity can be used in place of <wbr> tag reliably on virtually every platform.

&#8203;

Also useful is the word joiner entity, that can be used to prohibit a break. (Insert between each character of a word, except where you want the break.)

&#8288;

With the two of these, you can do anything.


I used soft hyphen unicode character successfully in few desktop and mobile browsers to solve the issue.

The unicode symbol is \u00AD and is pretty easy to insert into Python unicode string like s = u'????? ? ?????? ???????\u00AD??\u00AD??\u00AD??\u00AD???'.

Other solution is to insert the unicode char itself, and the source string will look perfectly ordinary in editors like Sublime Text, Kate, Geany, etc (cursor will feel the invisible symbol though).

Hex editors of in-house tools can automate this task easily.

An easy kludge is to use rare and visible character, like ¦, which is easy to copy and paste, and replace it on soft hyphen using, e.g. frontend script in $(document).ready(...). Source code like s = u'????? ? ?????? ???¦???¦?¦??¦??¦??¦???'.replace('¦', u'\u00AD') is easier to read than s = u'????? ? ?????? ???\u00AD?\u00AD??\u00AD?\u00AD??\u00AD??\u00AD??\u00AD???'.


There is an ongoing effort to standardize hyphenation in CSS3.

Some modern browsers, notably Safari and Firefox, already support this. Here is a good and up to date reference on browser support.

Once the CSS hyphenation gets implemented universally, that would be the best solution. In the meantime, I can recommend Hyphenator - a JS script that figures out how to hyphenate your text in the way most appropriate for a particular browser.

Hyphenator:

  • relies on Franklin M. Liangs hyphenation algorithm, commonly known from LaTeX and OpenOffice.
  • uses CSS3 hyphenation where it is available,
  • automatically inserts &shy; on most other browsers,
  • supports multiple languages,
  • is highly configurable,
  • gracefully falls back in case javascript is not enabled.

I've used it and it works great!


If you have bad luck and still has to use JSF 1, then the only solution is to use &#173;, &shy; does not work.


I use &shy;, inserted manually where necessary.

I always find it a pity that people don’t use techniques because there is some—maybe old or strange—browser around which doesn’t handle them the way they were specified. I found that &shy; is working properly in both recent Internet Explorer and Firefox browsers, that should be enough. You may include a browser check telling people to use something mature or continue at their own risk if they come around with some strange browser.

Syllabification isn’t that easy and I cannot recommend leaving it to some Javascript. It’s a language specific topic and may need to be carefully revised by the deskman if you don’t want it to turn your text irritating. Some languages, such as German, form compound words and are likely to lead to decomposition problems. E.g. Spargelder (germ. saved money, pl.) may, by syllabification rules, be wrapped in two places (Spar-gel-der). However, wrapping it in the second position, turns the first part to show up as Spargel- (germ. asparagus), activating a completely misleading concept in the head of the reader and therefore shoud be avoided.

And what about the string Wachstube? It could either mean ‘guardroom’ (Wach-stu-be) or ‘tube of wax’ (Wachs-tu-be). You may probably find other examples in other languages as well. You should aim to provide an environment in which the deskman can be supported in creating a well-syllabified text, proof-reading every critical word.


This is a crossbrowser solution that I was looking at a little while ago that runs on the client and using jQuery:

(function($) { 
  $.fn.breakWords = function() { 
    this.each(function() { 
      if(this.nodeType !== 1) { return; } 

      if(this.currentStyle && typeof this.currentStyle.wordBreak === 'string') { 
        //Lazy Function Definition Pattern, Peter's Blog 
        //From http://peter.michaux.ca/article/3556 
        this.runtimeStyle.wordBreak = 'break-all'; 
      } 
      else if(document.createTreeWalker) { 

        //Faster Trim in Javascript, Flagrant Badassery 
        //http://blog.stevenlevithan.com/archives/faster-trim-javascript 

        var trim = function(str) { 
          str = str.replace(/^\s\s*/, ''); 
          var ws = /\s/, 
          i = str.length; 
          while (ws.test(str.charAt(--i))); 
          return str.slice(0, i + 1); 
        }; 

        //Lazy Function Definition Pattern, Peter's Blog 
        //From http://peter.michaux.ca/article/3556 

        //For Opera, Safari, and Firefox 
        var dWalker = document.createTreeWalker(this, NodeFilter.SHOW_TEXT, null, false); 
        var node,s,c = String.fromCharCode('8203'); 
        while (dWalker.nextNode()) { 
          node = dWalker.currentNode; 
          //we need to trim String otherwise Firefox will display 
          //incorect text-indent with space characters 
          s = trim( node.nodeValue ).split('').join(c); 
          node.nodeValue = s; 
        } 
      } 
    }); 

    return this; 
  }; 
})(jQuery); 

<wbr> and &shy;

Today you can use both.

<wbr> use to break and do not put more information.

Example, use to show links:

 https://stackoverflow.com/questions/226464/soft-hyphen-in-html-wbr-vs-shy

&shy; when necessary, at this point the text will break and add a hyphen.

Example:

"É im&shy;pos&shy;sí&shy;vel pa&shy;ra um ho&shy;mem a&shy;pren&shy;der a&shy;qui&shy;lo que ele acha que já sa&shy;be."

_x000D_
_x000D_
div{_x000D_
  max-width: 130px;_x000D_
  border-width: 2px;_x000D_
  border-style: dashed;_x000D_
  border-color: #f00;_x000D_
  padding: 10px;_x000D_
}
_x000D_
<div>https://<wbr>stackoverflow.com<wbr>/questions/226464<wbr>/soft-hyphen-in-<wbr>html-wbr-vs-shy</div>_x000D_
_x000D_
<div>É im&shy;pos&shy;sí&shy;vel pa&shy;ra um ho&shy;mem a&shy;pren&shy;der a&shy;qui&shy;lo que ele acha que já sa&shy;be.</div>
_x000D_
_x000D_
_x000D_


Feb 2015 summary (partially updated Nov 2017)

They all perform pretty well, &#173; edges it as Google can still index of words containing it.

  • In browsers: &shy; and &#173; both display as expected in major browsers (even old IE!). <wbr> isn't supported in recent versions of IE (10 or 11) and doesn't work properly in Edge.
  • When copied and pasted from browsers: (tested 2015) as expected for &shy; and &#173; for Chrome and Firefox on Mac, on Windows (10), it keeps the characters and pastes hard hyphens into Notepad and invisible soft hyphens into applications that support them. IE (win7) always pastes with hyphens, even in IE10, and Safari (Mac) copies in a way which pastes as hyphens in some applications (e.g. MS Word), but not others
  • Find on page works for &shy; and &#173; on all browsers except IE which only matches exact copied-and-pasted matches (even up to IE11)
  • Search engines: Google matches words containing &#173; with words typed normally. As of 2017 it appears to no longer match words containing &shy;. Yandex appers to be the same. Bing and Baidu seem to not match either.

Test it

For up-to-date live testing, here are some examples of unique words with soft hyphens.

  • &shy; - confumbabbl&shy;ication&shy;ism - confumbabbl­ication­ism
    • ..............................................................................................................confumbabbl­ication­ism
    • ..................................................................................................................confumbabbl­ication­ism

<wbr> - donfounbabbl<wbr>ication<wbr>ism. This site removes <wbr/> from output. Here's a jsbin.com snippet for testing.

  • &#173; - eonfulbabbl&#173;ication&#173;ism - eonfulbabbl­ication­ism
    • .................................................................................................................eonfulbabbl­ication­ism
    • ....................................................................................................................eonfulbabbl­ication­ism

Here they are with no shy hyphens (this is for copying and pasting into find-on-page testing; written in a way which won't break the search engine tests):

ZZZconfumbabblicationismZZZdonfounbabblicationismZZZeonfulbabblicationismZZZ

Display across browsers

Success: displaying as a normal word, except where it should break, when it breaks and hyphenates in the specified place.

Failure: displaying unusually, or failing to break in the intended place.

  • Chrome (40.0.2214.115, Mac): &shy; success, <wbr> success, &#173; success
  • Firefox (35.0.1, Mac): &shy; success, <wbr> success, &#173; success
  • Safari (6.1.2, Mac): &shy; success, <wbr> not tested yet, &#173; success
  • Edge (Windows 10): &shy; success, <wbr> fail (break but no hyphen), &#173; success
  • IE11 (Windows 10): &shy; success, <wbr> fail (no break), &#173; success
  • IE10 (Windows 10): &shy; success, <wbr> fail (no break), &#173; success
  • IE8 (Windows 7): erratic - sometimes, none of them work at all and they all just follow css word-wrap. Sometimes, they seem to all work. Not yet found any clear pattern as to why.
  • IE7 (Windows 7): &shy; success, <wbr> success, &#173; success

Copy-paste across browsers

Success: copying and pasting the whole word, unhyphenated. (tested on Mac pasting into browser search, MS Word 2011, and Sublime Text)

Failure: pasting with a hyphen, space, line break, or with junk characters.

  • Chrome (40.0.2214.115, Mac): &shy; success, <wbr> success, &#173; success
  • Firefox (35.0.1, Mac): &shy; success, <wbr> success, &#173; success
  • Safari (6.1.2, Mac): &shy; fail into MS Word (pastes all as hyphens), success in other applications <wbr> fail, &#173; fail into MS Word (pastes all as hyphens), success in other applications
  • IE10 (Win7): &shy; fail pastes all as hyphens, <wbr> fail, &#173; fail pastes all as hyphens
  • IE8 (Win7): &shy; fail pastes all as hyphens, <wbr> fail, &#173; fail pastes all as hyphens
  • IE7 (Win7): &shy; fail pastes all as hyphens, <wbr> fail, &#173; fail pastes all as hyphens

Search engine matching

Updated in November 2017. <wbr> not tested because StackOverflow's CMS stripped it out.

Success: searches on the whole, non-hyphenated word find this page.

Failure: search engines only find this page on searches for the broken segments of the words, or a word with hyphens.

  • Google: &shy; fails, &#173; succeeds
  • Bing: &shy; fails, &#173; fails
  • Baidu: &shy; fails, &#173; fails (can match fragments within longer strings but not the words on their own containing a &#173; or &shy;)
  • Yandex: &shy; fails, &#173; succeeds (though it's possible it's matching a string fragment like Baidu, not 100% sure)

Find on page across browsers

Success and failure as search engine matching.

  • Chrome (40.0.2214.115, Mac): &shy; success, <wbr> success, &#173; success
  • Firefox (35.0.1, Mac): &shy; success, <wbr> success, &#173; success
  • Safari (6.1.2, Mac): &shy; success, <wbr> success, &#173; success
  • IE10 (Win7): &shy; fail only matches when both contain shy hyphens, <wbr> success, &#173; fail only matches when both contain shy hyphens
  • IE8 (Win7): &shy; fail only matches when both contain shy hyphens, <wbr> success, &#173; fail only matches when both contain shy hyphens
  • IE7 (Win7): &shy; fail only matches when both contain shy hyphens, <wbr> success, &#173; fail only matches when both contain shy hyphens

I used soft hyphen unicode character successfully in few desktop and mobile browsers to solve the issue.

The unicode symbol is \u00AD and is pretty easy to insert into Python unicode string like s = u'????? ? ?????? ???????\u00AD??\u00AD??\u00AD??\u00AD???'.

Other solution is to insert the unicode char itself, and the source string will look perfectly ordinary in editors like Sublime Text, Kate, Geany, etc (cursor will feel the invisible symbol though).

Hex editors of in-house tools can automate this task easily.

An easy kludge is to use rare and visible character, like ¦, which is easy to copy and paste, and replace it on soft hyphen using, e.g. frontend script in $(document).ready(...). Source code like s = u'????? ? ?????? ???¦???¦?¦??¦??¦??¦???'.replace('¦', u'\u00AD') is easier to read than s = u'????? ? ?????? ???\u00AD?\u00AD??\u00AD?\u00AD??\u00AD??\u00AD??\u00AD???'.


This is a crossbrowser solution that I was looking at a little while ago that runs on the client and using jQuery:

(function($) { 
  $.fn.breakWords = function() { 
    this.each(function() { 
      if(this.nodeType !== 1) { return; } 

      if(this.currentStyle && typeof this.currentStyle.wordBreak === 'string') { 
        //Lazy Function Definition Pattern, Peter's Blog 
        //From http://peter.michaux.ca/article/3556 
        this.runtimeStyle.wordBreak = 'break-all'; 
      } 
      else if(document.createTreeWalker) { 

        //Faster Trim in Javascript, Flagrant Badassery 
        //http://blog.stevenlevithan.com/archives/faster-trim-javascript 

        var trim = function(str) { 
          str = str.replace(/^\s\s*/, ''); 
          var ws = /\s/, 
          i = str.length; 
          while (ws.test(str.charAt(--i))); 
          return str.slice(0, i + 1); 
        }; 

        //Lazy Function Definition Pattern, Peter's Blog 
        //From http://peter.michaux.ca/article/3556 

        //For Opera, Safari, and Firefox 
        var dWalker = document.createTreeWalker(this, NodeFilter.SHOW_TEXT, null, false); 
        var node,s,c = String.fromCharCode('8203'); 
        while (dWalker.nextNode()) { 
          node = dWalker.currentNode; 
          //we need to trim String otherwise Firefox will display 
          //incorect text-indent with space characters 
          s = trim( node.nodeValue ).split('').join(c); 
          node.nodeValue = s; 
        } 
      } 
    }); 

    return this; 
  }; 
})(jQuery);