How to display raw html code in PRE or something like it but without escaping it

Question

I d like to display raw HTML  We all know one has to escape each   lt   and     like this        lt PRE gt  this is a test   amp ltDIV amp gt  lt  PRE gt    However  I do not want to do this  I d like a way to keep the HTML code as is  since it is easier to read   inside the editor  and I might want to copy it and use it again myself as actual HTML code  and do not want to have to change it again or have 2 versions of the same code one escaped and one not escaped    Is there any other environment that is more  raw  than PRE that might allow this   So one does not have to keep editing HTML and changing everything each time they want to show some raw HTML code  may be in HTML5   Something like  lt REALLY REALLY VERBATIM gt          lt   lt REALLY REALLY VERBATIM gt   screen shot  The javascript solution does not work on FF 21  here is screen shot   screen shot 2   The first solution still does not work on firefox  here is screen shot

User · Answer

echo   lt pre gt     htmlspecialchars   lt div gt  lt b gt raw HTML lt  b gt  lt  div gt        lt  pre gt      I think that s what you re looking for   In other words  use htmlspecialchars   in PHP

User · Answer

Cheap and cheerful answer    lt textarea gt Some raw content lt  textarea gt    The textarea will handle tabs  multiple spaces  newlines  line wrapping all verbatim  It copies and pastes nicely and its valid HTML all the way  It also allows the user to resize the code box  You don t need any CSS  JS  escaping  encoding   You can alter the appearance and behaviour as well  Here s a monospace font  editing disabled  smaller font  no border    lt textarea     style  width 100   font-family  Monospace  font-size 10px  border 0       rows  30  disabled  gt Some raw content lt  textarea gt    This solution is probably not semantically correct  So if you need that  it might be best to choose a more sophisticated answer

User · Answer

You can use the xmp element  see What was the  lt XMP gt  tag used for   It has been in HTML since the beginning and is supported by all browsers  Specifications frown upon it  but HTML5 CR still describes it and requires browsers to support it  though it also tells authors not to use it  but it cannot really prevent you    Everything inside xmp is taken as such  no markup  tags or character references  is recognized there  except  for apparent reason  the end tag of the element itself   lt  xmp gt    Otherwise xmp is rendered like pre   When using    real XHTML     i e  XHTML served with an XML media type  which is rare   the special parsing rules do not apply  so xmp is treated like pre  But in    real XHTML     you can use a CDATA section  which implies similar parsing rules  It has no special formatting  so you would probably want to wrap it inside a pre element    lt pre gt  lt   CDATA  This is a demo  tags like  lt p gt  will appear literally     gt  lt  pre gt    I don   t see how you could combine xmp and CDATA section to achieve so-called polyglot markup

User · Answer

xmp is the way to go  i e     lt xmp gt      your code     lt  xmp gt

User · Answer

Essentially the original question can be broken down in 2 parts     Main objective challenge  embedding  transporting  a raw formatted code-snippet   any kind of code  in a web-page s markup  for simple copy paste edit due to no  encoding escaping  correctly displaying rendering that code-snippet  possibly edit it  in the browser    The short  but  ambiguous answer is  you can t     but you can  get very close    I know  that are 3 contradicting answers  so read on       polyglot  x  ht ml Markup-languages rely on wrapping  almost  everything between begin opening and end closing tags character sequences   So  to embed any kind of raw code snippet inside your markup-language  one will always have to escape encode every instance  inside that snippet  that resembles the character -sequence  that would close the wrapping  container  element in the markup    During this post I ll refer to this as rule no 1   Think of  some  data  here  or  lt i gt   close italics with   lt  i gt  -tag lt  i gt   where it is obvious one should escape encode  something in   lt  i and    or change container s quote-character from   to            So  because of rule no 1  you can t  just  embed  any  unknown raw code-snippet inside markup  Because  if one has to escape encode even one character inside the raw snippet  then that snippet would no longer be the same original  pure raw code  that anyone can copy paste edit in the document s markup without further thought  It would lead to malformed illegal markup and Mojibake  mainly  because of entities  Also  should that snippet contain such characters  you d still need some javascript to  translate  that character sequence  from  and to  it s escaped encoded representation to display the snippet correctly in the  webpage   for copy paste edit         That brings us to  some of  the datatypes that markup-languages specify  These datatypes essentially define what are considered  valid characters  and their meaning  per tag  property  etc      PCDATA  Parsed Character DATA   will expand entities and one must escape  lt    amp   and  gt  depending on markup language version   Most tags like body  div  pre  etc  but also textarea  until HTML5  fall under this type  So not only do you need to encode all the container s closing character-sequences  inside the snippet  you also have to encode all  lt    amp     gt   characters   at minimum   Needless to say  encoding escaping this many characters falls outside this objective s scope of embedding a raw snippet in the markup     But a textarea seems to work      yes  either because of the browsers  error-engine trying to make something out of it  or because HTML5  RCDATA  Replaceable Character DATA   will not not treat tags inside the  text as markup  but are still governed by rule 1   so one doesn t need to  encode  lt    gt     BUT entities are still expanded  so they and  ambiguous  ampersands    amp   need special care  The current HTML5 spec says the textarea is now a RCDATA field and  quote       The text in raw text and RCDATA elements must not contain any   occurrences of the string   lt     U 003C LESS-THAN SIGN  U 002F SOLIDUS    followed by characters that case-insensitively match the tag name of   the element followed by one of U 0009 CHARACTER TABULATION  tab     U 000A LINE FEED  LF   U 000C FORM FEED  FF   U 000D CARRIAGE RETURN    CR   U 0020 SPACE  U 003E GREATER-THAN SIGN      or U 002F SOLIDUS        Thus no matter what  textarea needs a hefty entity translation handler or it will eventually Mojibake on entities  CDATA  Character Data  will not treat tags inside the text as markup and will not expand entities  So as long as the raw snippet code does not violate rule 1  that one can t  have the containers closing character sequence  inside the snippet   this  requires no other escaping encoding    Clearly this boils down to  how can we minimize the number of characters character-sequences that still need to be encoded in the snippet s raw source and the number of times that character sequence  might appear in an average snippet  something that is also of importance for the javascript that handles the translation of these characters  if they occur    So what  containers  have this CDATA context          Most value properties of tags are CDATA  so one could  ab use a hidden input s value property  proof of concept jsfiddle here   However  conform rule 1  this creates an encoding escape problem with nested quotes    and    in the raw snippet and one needs some javascript to get translate and set the snippet in another  visible  element  or simply setting it as a text-area s value   Somehow this gave me problems with entities in FF  just like in a textarea   But it doesn t really matter  since the  price  of having to escape encode nested quotes is higher then a  HTML5  textarea  quotes are quite common in source code      What about trying to  ab use  lt   CDATA  lt tag gt bla  amp  bla lt  tag gt    gt   As Jukka points out in his extended answer  this would only work in  rare   real xhtml   I thought of using a script-tag  with or without such a CDATA wrapper inside the script-tag  together with a multi-line comment        that wraps the raw snippet  script-tags can have an id and you can access them by count   But since this obviously introduces a escaping problem with        gt  and  lt  script in the raw snippet  this doesn t seem like a solution either   Please post other viable  containers  in the comments to this answer   By the way  encoding or counting the number of - characters and balancing them out inside a comment tag  lt  -- -- gt  is just insane for this purpose  apart from rule 1       That leaves us with Jukka K  Korpela s excellent answer  the  lt xmp gt  tag seems the best option        The  forgotten   lt xmp gt  holds CDATA  is intended for this purpose AND is indeed still in the current HTML 5 spec  and has been at least since HTML3 2   exactly what we need  It s also widely supported  even in IE6  that is   until it suffers from the same regression as the scrolling table-body   Note  as Jukka pointed out  this will not work in true xhtml or polyglot  that will treat it as a pre  and the xmp tag must still adhere to rule no 1  But that s the  only  rule           Consider the following markup    lt  -- ATTENTION  replace any occurrence of  amp lt  xmp with  lt  xmp -- gt   lt xmp id  snippet-container  gt   lt div gt       lt div gt this is an example div  amp amp  holds an xmp tag  lt br   gt           lt xmp gt    lt html gt  lt head gt    lt  -- indentation col 0   -- gt       lt title gt My Title lt  title gt   lt  head gt  lt body gt       lt p gt hello world    lt  p gt   lt  body gt  lt  html gt           amp lt  xmp gt    lt  -- note this encoded escaped tag -- gt       lt  div gt      This line is also part of the snippet  lt  div gt   lt  xmp gt    The above codeblok illustrates a raw piece of markup where  lt xmp id  snippet-container  gt  contains an  almost raw  code-snippet  containing div gt div gt xmp gt html-document   Notice the encoded closing tag in this markup  To comply with rule no 1  this was encoded escaped            So embedding transporting the  sometimes almost  raw code is seems solved   What about displaying rendering the snippet  and that encoded  amp lt  xmp gt     The browser will  or it should  render the snippet  the contents inside snippet-container  exactly the way you see it in the codeblock above  with some discrepancy amongst browsers whether or not the snippet starts with a blank line   That includes the formatting indentation  entities  like the string  amp amp    full tags  comments AND the encoded closing tag  amp lt  xmp gt   just like it was encoded in the markup   And depending on browser version  one could even try use the property contenteditable  true  to edit this snippet  all that without javascript enabled   Doing something like textarea value xmp innerHTML is also a breeze    So you can    if the snippet doesn t contain the containers closing character-sequence   However  should a raw snippet contain the closing character-sequence  lt  xmp  because it is an example of xmp itself or it contains some regex  etc   you must accept that you have to encode escape that sequence in the raw snippet AND need a javascript handler to translate that encoding to display render the encoded  amp lt  xmp gt  like  lt  xmp gt  inside a textarea  for editing posting  or  for example  a pre just to correctly render the snippet s code  or so it seems    A very rudimentary jsfiddle example of this here  Note that getting embedding displaying retrieving-to-textarea worked perfect even in IE6  But setting the xmp s innerHTML revealed some interesting  would-be-intelligent  behavior on IE s part  There is a more extensive note and workaround on that in the fiddle   But now comes the important kicker  another reason why you only get very close   Just as an over-simplified example  imagine this rabbit-hole         Intended raw code-snippet    lt  -- remember to translate between  lt  xmp gt  and  amp lt  xmp gt  -- gt   lt xmp gt   lt p gt a paragraph lt  p gt   lt  xmp gt    Well  to comply with rule 1  we  only  need to encode those  lt  xmp  gt   n r t f    sequences  right   So that gives us the following markup  using just a possible encoding     lt xmp id  container  gt   lt  -- remember to translate between  amp lt  xmp gt  and  amp lt  xmp gt  -- gt   lt xmp gt   lt p gt a paragraph lt  p gt   amp lt  xmp gt   lt  xmp gt    Hmm   shalt I get my crystal ball or flip a coin  No  let the computer look at its system-clock and state that a derived number is  random   Yes  that should do it    Using a regex like  xmp innerHTML replace   amp lt      xmp  gt   n r t f     gi    lt      would translate  back  to this    lt  -- remember to translate between  lt  xmp gt  and  lt  xmp gt  -- gt   lt xmp gt   lt p gt a paragraph lt  p gt   lt  xmp gt    Hmm   seems this random generator is broken    Houston    Should you have missed the joke problem  read again starting at the  intended raw code-snippet         Wait  I know  we  also  need to encode      to      Ok  rewind to  intended raw code-snippet  and read again  Somehow this all begins to smell like the famous hilarious-but-true rexgex-answer on SO  a good read for people fluent in mojibake    Maybe someone knows a clever algorithm or solution to fix this problem  but I assume that the embedded raw code will get more and more obscure to the point where you d be better of properly escaping encoding just your  lt    amp   and  gt    just like the rest of the world    Conclusion   using the xmp tag          it can be done with known snippets that do not contain the container s closing character-sequence   we can get very close to the original objective with known snippets that only use  basic first-level  escaping encoding so we don t fall in the rabbithole  but ultimately it seems that one can t do this reliably in a  production-environment  where people can should copy paste edit  any unknown  raw snippets while not knowing understanding the implications rules rabbithole  depending on your implementation of handling translating for rule 1 and the rabbit-hole     Hope this helps   PS  Whilst I would appreciate an upvote if you find this explanation useful  I kind of think Jukka s answer should be the accepted answer  should no better option answer come along   since he was the one who remembered the xmp tag  that I forgot about over the years and got  distracted  by the commonly advocated PCDATA elements like pre  textarea  etc    This answer originated in explaining why you can t do it  with any unknown raw snippet  and explain some obvious pitfalls that some other  now deleted  answers overlooked when advising a textarea for embedding transport  I ve expanded my existing explanation to also support and further explain Jukka s answer  since all that entity and  CDATA stuff is almost harder than code-pages

User · Answer

GitaarLAB and  Jukka elaborate that  lt xmp gt  tag is obsolete  but still the best  When I use it like this   lt xmp gt   lt div gt Lorem ipsum lt  div gt   lt p gt Hello lt  p gt   lt  xmp gt    then the first EOL is inserted in the code  and it looks awful   It can be solved by removing that EOL   lt xmp gt  lt div gt Lorem ipsum lt  div gt   lt p gt Hello lt  p gt   lt  xmp gt    but then it looks bad in the source  I used to solve it with wrapping  lt div gt   but recently I figured out a nice CSS3 rule  I hope it also helps somebody   xmp   margin  5px 0  padding  0 5px 5px 5px  background   CCC    xmp before   content      display  block  height  1em  margin  0 -5px -2em -5px      This looks better

User · Answer

If you have jQuery enabled you can use an escapeXml function and not have to worry about escaping arrows or special characters    lt pre gt      fn escapeXml        lt  -- all your code -- gt           lt  pre gt

[html] How to display raw html code in PRE or something like it but without escaping it

Examples related to html

Examples related to pre