Java regex to extract text between tags

Question

I have a file with some custom tags and I d like to write a regular expression to extract the string between the tags   For example if my tag is    customtag String I want to extract  customtag    How would I write a regular expression to extract only the string between the tags   This code seems like a step in the right direction   Pattern p   Pattern compile   customtag        customtag     Matcher m   p matcher   customtag String I want to extract  customtag       Not sure what to do next   Any ideas   Thanks

User · Answer

final Pattern pattern   Pattern compile  tag            tag        final Matcher matcher   pattern matcher   tag String I want to extract  tag         matcher find        System out println matcher group 1

User · Answer

A generic simpler and a bit primitive approach to find tag  attribute and value      Pattern pattern   Pattern compile   lt    w          gt        lt    1 gt         System out println pattern matcher   lt asd gt  TEST lt  asd gt    find         System out println pattern matcher   lt asd TEST lt  asd gt    find         System out println pattern matcher   lt asd attr  3  gt  TEST lt  asd gt    find         System out println pattern matcher   lt asd gt   lt x gt TEST lt x gt asd gt    find         System out println  -------        Matcher matcher   pattern matcher   lt as x gt  TEST lt  as gt         if  matcher find              for  int i   0  i  lt   matcher groupCount    i                  System out println i         matcher group i

User · Answer

You re on the right track  Now you just need to extract the desired group  as follows   final Pattern pattern   Pattern compile   lt tag gt       lt  tag gt    Pattern DOTALL   final Matcher matcher   pattern matcher   lt tag gt String I want to extract lt  tag gt     matcher find    System out println matcher group 1       Prints String I want to extract   If you want to extract multiple hits  try this   public static void main String   args        final String str     lt tag gt apple lt  tag gt  lt b gt hello lt  b gt  lt tag gt orange lt  tag gt  lt tag gt pear lt  tag gt        System out println Arrays toString getTagValues str  toArray         Prints  apple  orange  pear     private static final Pattern TAG REGEX   Pattern compile   lt tag gt       lt  tag gt    Pattern DOTALL    private static List lt String gt  getTagValues final String str        final List lt String gt  tagValues   new ArrayList lt String gt         final Matcher matcher   TAG REGEX matcher str       while  matcher find              tagValues add matcher group 1              return tagValues      However  I agree that regular expressions are not the best answer here  I d use XPath to find elements I m interested in  See The Java XPath API for more info

User · Answer

String s     lt B gt  lt G gt Test lt  G gt  lt  B gt  lt C gt Test1 lt  C gt         String pattern      lt        gt       lt    gt       lt      1   gt            int count   0           Pattern p   Pattern compile pattern           Matcher m    p matcher s           while m find                          System out println m group 2                count

User · Answer

Try this   Pattern p   Pattern compile   lt     lt  any tag    gt     s     s        lt     any tag    gt    Matcher m   p matcher anyString     For example    String str     lt TR gt   lt TD gt 1Q Ene lt  TD gt   lt TD gt 3 08  lt  TD gt   lt  TR gt    Pattern p   Pattern compile     lt     lt TD   gt     s     s        lt    TD   gt      Matcher m   p matcher str   while m find        Log e  Regex    Regex result      m group               Output   10 Ene  3 08

User · Answer

To be quite honest  regular expressions are not the best idea for this type of parsing  The regular expression you posted will probably work great for simple cases  but if things get more complex you are going to have huge problems  same reason why you cant reliably parse HTML with regular expressions   I know you probably don t want to hear this  I know I didn t when I asked the same type of questions  but string parsing became WAY more reliable for me after I stopped trying to use regular expressions for everything   jTopas is an AWESOME tokenizer that makes it quite easy to write parsers by hand  I STRONGLY suggest jtopas over the standard java scanner etc   libraries   If you want to see jtopas in action  here are some parsers I wrote using jTopas to parse this type of file   If you are parsing XML files  you should be using an xml parser library  Dont do it youself unless you are just doing it for fun  there are plently of proven options out there

User · Answer

I prefix this reply with  you shouldn t use a regular expression to parse XML -- it s only going to result in edge cases that don t work right  and a forever-increasing-in-complexity regex while you try to fix it    That being said  you need to proceed by matching the string and grabbing the group you want   if  m matches         String result   m group 1         do something with result

[java] Java regex to extract text between tags

Examples related to java

Examples related to regex