Update (2020), Gyum Fox proposes https://regex101.com/r/U9Yqqg/2 (note regex101.com
did not exist when I wrote originally this answer)
(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|\s*\/?[>"']))+.)["']?
Applied to:
<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href='test.html' class="xyz">
<script type="text/javascript" defer async id="something" onload="alert('hello');"></script>
<img src="test.png">
<img src="a test.png">
<img src=test.png />
<img src=a test.png />
<img src=test.png >
<img src=a test.png >
<img src=test.png alt=crap >
<img src=a test.png alt=crap >
Original answer (2008): If you have an element like
<name attribute=value attribute="value" attribute='value'>
this regex could be used to find successively each attribute name and value
(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?
Applied on:
<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href='test.html' class="xyz">
it would yield:
'href' => 'test.html'
'class' => 'xyz'
Note: This does not work with numeric attribute values e.g.
<div id="1">
won't work.
Edited: Improved regex for getting attributes with no value and values with " ' " inside.
([^\r\n\t\f\v= '"]+)(?:=(["'])?((?:.(?!\2?\s+(?:\S+)=|\2))+.)\2?)?
Applied on:
<script type="text/javascript" defer async id="something" onload="alert('hello');"></script>
it would yield:
'type' => 'text/javascript'
'defer' => ''
'async' => ''
'id' => 'something'
'onload' => 'alert(\'hello\');'