In our case, we receive an XML as a String
and need to get rid of the values that have some "special" characters, like &<>
etc. Basically someone can provide an XML to us in this form:
<notes>
<note>
<to>jenice & carl </to>
<from>your neighbor <; </from>
</note>
</notes>
So I need to find in that String
the values jenice & carl
and your neighbor <;
and properly escape &
and <
(otherwise this is an invalid xml if you later pass it to an engine that shall rename unnamed).
Doing this with regex is a rather dumb idea to begin with, but it's cheap and easy. So the brave ones that would like to do the same thing I did, here you go:
String xml = ...
Pattern p = Pattern.compile("<(.+)>(?!\\R<)(.+)</(\\1)>");
Matcher m = p.matcher(xml);
String result = m.replaceAll(mr -> {
if (mr.group(2).contains("&")) {
return "<" + m.group(1) + ">" + m.group(2) + "+ some change" + "</" + m.group(3) + ">";
}
return "<" + m.group(1) + ">" + mr.group(2) + "</" + m.group(3) + ">";
});