In RegEx
, I want to find the tag and everything between two XML tags
, like the following:
<primaryAddress>
<addressLine>280 Flinders Mall</addressLine>
<geoCodeGranularity>PROPERTY</geoCodeGranularity>
<latitude>-19.261365</latitude>
<longitude>146.815585</longitude>
<postcode>4810</postcode>
<state>QLD</state>
<suburb>Townsville</suburb>
<type>PHYSICAL</type>
</primaryAddress>
I want to find the tag and everything between primaryAddress
, and erase that.
Everything between the primaryAddress
tag is a variable, but I want to remove the entire tag and sub-tags whenever I get primaryAddress
.
Anyone have any idea how to do that?
It is not good to use this method but if you really want to split it with regex
<primaryAddress.*>((.|\n)*?)<\/primaryAddress>
the verified answer returns the tags but this just return the value between tags.
You should be able to match it with: /<primaryAddress>(.+?)<\/primaryAddress>/
The content between the tags will be in the matched group.
In our case, we receive an XML as a String
and need to get rid of the values that have some "special" characters, like &<>
etc. Basically someone can provide an XML to us in this form:
<notes>
<note>
<to>jenice & carl </to>
<from>your neighbor <; </from>
</note>
</notes>
So I need to find in that String
the values jenice & carl
and your neighbor <;
and properly escape &
and <
(otherwise this is an invalid xml if you later pass it to an engine that shall rename unnamed).
Doing this with regex is a rather dumb idea to begin with, but it's cheap and easy. So the brave ones that would like to do the same thing I did, here you go:
String xml = ...
Pattern p = Pattern.compile("<(.+)>(?!\\R<)(.+)</(\\1)>");
Matcher m = p.matcher(xml);
String result = m.replaceAll(mr -> {
if (mr.group(2).contains("&")) {
return "<" + m.group(1) + ">" + m.group(2) + "+ some change" + "</" + m.group(3) + ">";
}
return "<" + m.group(1) + ">" + mr.group(2) + "</" + m.group(3) + ">";
});
this can capture most outermost layer pair of tags, even with attribute in side or without end tags
(<!--((?!-->).)*-->|<\w*((?!\/<).)*\/>|<(?<tag>\w+)[^>]*>(?>[^<]|(?R))*<\/\k<tag>\s*>)
edit: as mentioned in comment above, regex is always not enough to parse xml, trying to modify the regex to fit more situation only makes it longer but still useless
Source: Stackoverflow.com