How to remove html special chars

Question

I am creating a RSS feed file for my application in which I want to remove HTML tags  which is done by strip tags  But strip tags is not removing HTML special code chars     amp nbsp   amp amp   amp copy     etc   Please tell me any function which I can use to remove these special code chars from my string

User · Answer

try this   lt  php  str     x8F          Outputs an empty string echo htmlentities  str  ENT QUOTES   UTF-8        Outputs       echo htmlentities  str  ENT QUOTES   ENT IGNORE   UTF-8      gt

User · Answer

lt  php function strip only  str   tags   stripContent   false         content           if  is array  tags              tags    strpos  str    gt        false                    explode   gt    str replace   lt         tags                      array  tags            if end  tags         array pop  tags             foreach  tags as  tag            if   stripContent                content        lt     tag     gt    gt                str   preg replace    lt      tag     gt    gt    content   is        str             return  str      str     lt font color  red  gt red lt  font gt  text    tags    font    a   strip only  str   tags      red text  b   strip only  str   tags  true      text   gt

User · Answer

Use html entity decode to convert HTML entities   You ll need to set charset to make it work correctly

User · Answer

It looks like what you really want is     function xmlEntities  string         translationTable   get html translation table HTML ENTITIES  ENT QUOTES        foreach   translationTable as  char   gt   entity             from      entity           to       amp    ord  char                 return str replace  from   to   string       It replaces the named-entities with their number-equivalent

User · Answer

A plain vanilla strings way to do it without engaging the preg regex engine   function remEntities  str      if substr count  str    amp     amp  amp  substr count  str                 Find amper      amp pos   strpos  str    amp           Find the        semi pos   strpos  str               Only if the   is after the  amp      if  semi pos  gt   amp pos            is a HTML entity  try to remove        tmp   substr  str  0   amp pos          tmp    tmp  substr  str   semi pos   1  strlen  str           str    tmp          Has another entity in it        if substr count  str    amp     amp  amp  substr count  str                 str   remEntities  tmp               return  str

User · Answer

You can try htmlspecialchars decode  string   It works for me   http   www w3schools com php func string htmlspecialchars decode asp

User · Answer

Either decode them using html entity decode or remove them using preg replace    Content   preg replace    amp    a-z0-9    i      Content       From here   EDIT  Alternative according to Jacco s comment     might be nice to replace the     with    2 8  or something  This will limit   the chance of replacing entire   sentences when an unencoded   amp   is   present     Content   preg replace    amp    a-z0-9  2 8   i      Content

User · Answer

You may want take a look at htmlentities   and html entity decode   here   orig    I ll   walk   the  lt b gt dog lt  b gt  now     a   htmlentities  orig     b   html entity decode  a    echo  a     I ll  amp quot walk amp quot  the  amp lt b amp gt dog amp lt  b amp gt  now  echo  b     I ll  walk  the  lt b gt dog lt  b gt  now

User · Answer

This might work well to remove special characters    modifiedString   preg replace     a-zA-Z0-9  - s          content

User · Answer

If you want to convert the HTML special characters and not just remove them as well as strip things down and prepare for plain text this was the solution that worked for me     function htmlToPlainText  str        str   str replace   amp nbsp          str        str   html entity decode  str  ENT QUOTES   ENT COMPAT    UTF-8         str   html entity decode  str  ENT HTML5   UTF-8         str   html entity decode  str        str   htmlspecialchars decode  str        str   strip tags  str        return  str      string     lt p gt this is   amp nbsp   a test lt  p gt   lt div gt Yes this is   amp amp  does it get  processed    lt  div gt    htmlToPlainText  string       this is     a test  Yes this is   amp  does it get processed      html entity decode w  ENT QUOTES   ENT XML1 converts things like  amp  39  htmlspecialchars decode converts things like  amp amp   html entity decode converts things like   amp lt  and strip tags removes any HTML tags left over   EDIT - Added str replace   nbsp          str   and several other html entity decode   as continued testing has shown a need for them

User · Answer

string        c       convert   Array                gt  a                 gt  A                 gt  a                 gt  A                 gt  a                 gt  A                 gt  a                 gt  A                 gt  a                 gt  A            c   gt  c            C   gt  C            c   gt  c            C   gt  C            d   gt  d            D   gt  D            e   gt  e            E   gt  E                 gt  e                 gt  E                 gt  e            string   strtr  string    convert     echo  string    aace

User · Answer

In addition to the good answers above  PHP also has a built-in filter function that is quite useful  filter-var   To remove HMTL characters  use    cleanString   filter var  dirtyString  FILTER SANITIZE STRING    More info    function filter-var filter sanitize string

User · Answer

The function I used to perform the task  joining the upgrade made by schnaader is       mysql real escape string          preg replace callback    amp    a-z0-9    i   function  m                 return mb convert encoding  m 1    UTF-8    HTML-ENTITIES                strip tags  row  cuerpo        This function removes every html tag and html symbol  converted in UTF-8 ready to save in MySQL

User · Answer

If you are working in WordPress and are like me and simply need to check for an empty field  and there are a copious amount of random html entities in what seems like a blank string  then take a look at  sanitize title with dashes  string  title  string  raw title       string  context    display     Link to wordpress function page For people not working on WordPress  I found this function REALLY useful to create my own sanitizer  take a look at the full code and it s really in depth

User · Answer

What I have done was to use  html entity decode  then use strip tags to removed them

[php] How to remove html special chars?

Examples related to php

Examples related to html-encode