Searching for UUIDs in text with regex

Question

I m searching for UUIDs in blocks of text using a regex  Currently I m relying on the assumption that all UUIDs will follow a patttern of 8-4-4-4-12 hexadecimal digits   Can anyone think of a use case where this assumption would be invalid and would cause me to miss some UUIDs

User · Answer

0-9a-f  8 - 0-9a-f  4 - 1-5  0-9a-f  3 - 89AB  0-9a-f  3 - 0-9a-f  12   i   Gajus  regexp rejects UUID V1-3 and 5  even though they are valid

User · Answer

w  8  -  w  4   3 -  w  12  has worked for me in most cases   Or if you want to be really specific   w  8 -  w  4 -  w  4 -  w  4 -  w  12

User · Answer

If you want to check or validate a specific UUID version  here are the corresponding regexes      Note that the only difference is the version number  which is explained in 4 1 3  Version chapter of UUID 4122 RFC    The version number is the first character of the third group    VERSION NUMBER  0-9A-F  3      UUID v1       0-9A-F  8 - 0-9A-F  4 - 1  0-9A-F  3 - 89AB  0-9A-F  3 - 0-9A-F  12   i  UUID v2       0-9A-F  8 - 0-9A-F  4 - 2  0-9A-F  3 - 89AB  0-9A-F  3 - 0-9A-F  12   i  UUID v3       0-9A-F  8 - 0-9A-F  4 - 3  0-9A-F  3 - 89AB  0-9A-F  3 - 0-9A-F  12   i  UUID v4       0-9A-F  8 - 0-9A-F  4 - 4  0-9A-F  3 - 89AB  0-9A-F  3 - 0-9A-F  12   i  UUID v5       0-9A-F  8 - 0-9A-F  4 - 5  0-9A-F  3 - 89AB  0-9A-F  3 - 0-9A-F  12   i

User · Answer

So  I think Richard Bronosky actually has the best answer to date  but I think you can do a bit to make it somewhat simpler  or at least terser    re uuid   re compile r  0-9a-f  8    - 0-9a-f  4   3 - 0-9a-f  12    re I

User · Answer

Here is the working REGEX  https   www regextester com 99148 const regex    0-9a-fA-F  8  - 0-9a-fA-F  4  - 0-9a-fA-F  4  - 0-9a-fA-F  4  - 0-9a-fA-F  12

User · Answer

For bash   grep -E   a-f0-9  8 - a-f0-9  4 -4 a-f0-9  3 - 89aAbB  a-f0-9  3 - a-f0-9  12     For example     gt  echo  f2575e6a-9bce-49e7-ae7c-bff6b555bda4    grep -E   a-f0-9  8 - a-f0-9  4 -4 a-f0-9  3 - 89aAbB  a-f0-9  3 - a-f0-9  12   f2575e6a-9bce-49e7-ae7c-bff6b555bda4

User · Answer

If using Posix regex  grep -E  MySQL  etc    this may be easier to read  amp  remember     xdigit    8  -   xdigit    4   3 -   xdigit    12   Edit  Perl  amp  PCRE flavours also support Posix character classes so this ll work with them  For those  change the       to a non-capturing subgroup

User · Answer

UUID RE   join  -   map     0-9a-f         8  4  4  4  12    BTW  allowing only 4 on one of the positions is only valid for UUIDv4  But v4 is not the only UUID version that exists  I have met v1 in my practice as well

User · Answer

I agree that by definition your regex does not miss any UUID  However it may be useful to note that if you are searching especially for Microsoft s Globally Unique Identifiers  GUIDs   there are five equivalent string representations for a GUID    ca761232ed4211cebacd00aa0057b223     CA761232-ED42-11CE-BACD-00AA0057B223      CA761232-ED42-11CE-BACD-00AA0057B223       CA761232-ED42-11CE-BACD-00AA0057B223       0xCA761232  0xED42  0x11CE   0xBA  0xCD  0x00  0xAA  0x00  0x57  0xB2  0x23

User · Answer

The regex for uuid is    b 0-9a-f  8  b- 0-9a-f  4 - 0-9a-f  4 - 0-9a-f  4 - b 0-9a-f  12  b

User · Answer

Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8  9  A  or B  e g  f47ac10b-58cc-4372-a567-0e02b2c3d479    source  http   en wikipedia org wiki Uuid Definition  Therefore  this is technically more correct     a-f0-9  8 - a-f0-9  4 -4 a-f0-9  3 - 89aAbB  a-f0-9  3 - a-f0-9  12

User · Answer

ivelin  UUID can have capitals  So you ll either need to toLowerCase   the string or use    a-fA-F0-9  8 - a-fA-F0-9  4 - a-fA-F0-9  4 - a-fA-F0-9  4 - a-fA-F0-9  12   Would have just commented this but not enough rep

User · Answer

By definition  a UUID is 32 hexadecimal digits  separated in 5 groups by hyphens  just as you have described   You shouldn t miss any with your regular expression   http   en wikipedia org wiki Uuid Definition

User · Answer

Wanted to give my contribution  as my regex cover all cases from OP and correctly group all relevant data on the group method  you don t need to post process the string to get each part of the uuid  this regex already get it for you     d w  8  -    d w  4  -    d w  4  -    d w  4  -    d w  12     0x     d w  8   0x    4    d w  4   0x    4    d w  4   0x     5    d w  2   0x    4    d w  2   0x    4    d w  2   0x    4    d w  2   0x    4    d w  2   0x    4    d w  2   0x    4    d w  2   0x    4    d w  2

User · Answer

In python re  you can span from numberic to upper case alpha  So    import re test    01234ABCDEFGHIJKabcdefghijk01234abcdefghijkABCDEFGHIJK  re compile r  0-f     findall test    Bad  matches all uppercase alpha chars      01234ABCDEFGHIJKabcdef    01234abcdef    ABCDEFGHIJK   re compile r  0-F     findall test    Partial  does not match lowercase hex chars      01234ABCDEF    01234    ABCDEF   re compile r  0-F     re I  findall test    Good      01234ABCDEF    abcdef    01234abcdef    ABCDEF   re compile r  0-f     re I  findall test    Good      01234ABCDEF    abcdef    01234abcdef    ABCDEF   re compile r  0-Fa-f     findall test    Good  with uppercase-only magic       01234ABCDEF    abcdef    01234abcdef    ABCDEF   re compile r  0-9a-fA-F     findall test    Good  with no magic       01234ABCDEF    abcdef    01234abcdef    ABCDEF     That makes the simplest Python UUID regex   re uuid   re compile   0-F  8 -  0-F  4 -  3  0-F  12    re I    I ll leave it as an exercise to the reader to use timeit to compare the performance of these   Enjoy  Keep it Pythonic      NOTE  Those spans will also match    lt   gt     so  if you suspect that could give you false positives  don t take the shortcut   Thank you Oliver Aubert for pointing that out in the comments

User · Answer

Variant for C      include  lt regex gt      Required include          Source string     std  wstring srcStr   L String with GIUD   4d36e96e-e325-11ce-bfc1-08002be10318  any text       Regex and match std  wsmatch match  std  wregex rx L      A-F0-9  8 - A-F0-9  4 - A-F0-9  4 - A-F0-9  4 - A-F0-9  12        std  regex constants  icase       Search std  regex search srcStr  match  rx       Result std  wstring strGUID         match 1

User · Answer

For UUID generated on OS X with uuidgen  the regex pattern is    A-F0-9  8 - A-F0-9  4 -4 A-F0-9  3 - 89AB  A-F0-9  3 - A-F0-9  12    Verify with   uuidgen   grep -E   A-F0-9  8 - A-F0-9  4 -4 A-F0-9  3 - 89AB  A-F0-9  3 - A-F0-9  12

[regex] Searching for UUIDs in text with regex

Examples related to regex