[javascript] Named capturing groups in JavaScript regex?

As far as I know there is no such thing as named capturing groups in JavaScript. What is the alternative way to get similar functionality?

This question is related to javascript regex

The answer is


You can use XRegExp, an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods:

  • Adds new regex and replacement text syntax, including comprehensive support for named capture.
  • Adds two new regex flags: s, to make dot match all characters (aka dotall or singleline mode), and x, for free-spacing and comments (aka extended mode).
  • Provides a suite of functions and methods that make complex regex processing a breeze.
  • Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.
  • Lets you easily create and use plugins that add new syntax and flags to XRegExp's regular expression language.

Another possible solution: create an object containing the group names and indexes.

var regex = new RegExp("(.*) (.*)");
var regexGroups = { FirstName: 1, LastName: 2 };

Then, use the object keys to reference the groups:

var m = regex.exec("John Smith");
var f = m[regexGroups.FirstName];

This improves the readability/quality of the code using the results of the regex, but not the readability of the regex itself.


While you can't do this with vanilla JavaScript, maybe you can use some Array.prototype function like Array.prototype.reduce to turn indexed matches into named ones using some magic.

Obviously, the following solution will need that matches occur in order:

_x000D_
_x000D_
// @text Contains the text to match_x000D_
// @regex A regular expression object (f.e. /.+/)_x000D_
// @matchNames An array of literal strings where each item_x000D_
//             is the name of each group_x000D_
function namedRegexMatch(text, regex, matchNames) {_x000D_
  var matches = regex.exec(text);_x000D_
_x000D_
  return matches.reduce(function(result, match, index) {_x000D_
    if (index > 0)_x000D_
      // This substraction is required because we count _x000D_
      // match indexes from 1, because 0 is the entire matched string_x000D_
      result[matchNames[index - 1]] = match;_x000D_
_x000D_
    return result;_x000D_
  }, {});_x000D_
}_x000D_
_x000D_
var myString = "Hello Alex, I am John";_x000D_
_x000D_
var namedMatches = namedRegexMatch(_x000D_
  myString,_x000D_
  /Hello ([a-z]+), I am ([a-z]+)/i, _x000D_
  ["firstPersonName", "secondPersonName"]_x000D_
);_x000D_
_x000D_
alert(JSON.stringify(namedMatches));
_x000D_
_x000D_
_x000D_


In ES6 you can use array destructuring to catch your groups:

let text = '27 months';
let regex = /(\d+)\s*(days?|months?|years?)/;
let [, count, unit] = regex.exec(text) || [];

// count === '27'
// unit === 'months'

Notice:

  • the first comma in the last let skips the first value of the resulting array, which is the whole matched string
  • the || [] after .exec() will prevent a destructuring error when there are no matches (because .exec() will return null)

Naming captured groups provide one thing: less confusion with complex regular expressions.

It really depends on your use-case but maybe pretty-printing your regex could help.

Or you could try and define constants to refer to your captured groups.

Comments might then also help to show others who read your code, what you have done.

For the rest I must agree with Tims answer.


There is a node.js library called named-regexp that you could use in your node.js projects (on in the browser by packaging the library with browserify or other packaging scripts). However, the library cannot be used with regular expressions that contain non-named capturing groups.

If you count the opening capturing braces in your regular expression you can create a mapping between named capturing groups and the numbered capturing groups in your regex and can mix and match freely. You just have to remove the group names before using the regex. I've written three functions that demonstrate that. See this gist: https://gist.github.com/gbirke/2cc2370135b665eee3ef


Update: It finally made it into JavaScript (ECMAScript 2018)!


Named capturing groups could make it into JavaScript very soon.
The proposal for it is at stage 3 already.

A capture group can be given a name inside angular brackets using the (?<name>...) syntax, for any identifier name. The regular expression for a date then can be written as /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u. Each name should be unique and follow the grammar for ECMAScript IdentifierName.

Named groups can be accessed from properties of a groups property of the regular expression result. Numbered references to the groups are also created, just as for non-named groups. For example:

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups.year === '2015';
// result.groups.month === '01';
// result.groups.day === '02';

// result[0] === '2015-01-02';
// result[1] === '2015';
// result[2] === '01';
// result[3] === '02';

Don't have ECMAScript 2018?

My goal was to make it work as similar as possible to what we are used to with named groups. Whereas in ECMAScript 2018 you can place ?<groupname> inside the group to indicate a named group, in my solution for older javascript, you can place (?!=<groupname>) inside the group to do the same thing. So it's an extra set of parenthesis and an extra !=. Pretty close!

I wrapped all of it into a string prototype function

Features

  • works with older javascript
  • no extra code
  • pretty simple to use
  • Regex still works
  • groups are documented within the regex itself
  • group names can have spaces
  • returns object with results

Instructions

  • place (?!={groupname}) inside each group you want to name
  • remember to eliminate any non-capturing groups () by putting ?: at the beginning of that group. These won't be named.

arrays.js

// @@pattern - includes injections of (?!={groupname}) for each group
// @@returns - an object with a property for each group having the group's match as the value 
String.prototype.matchWithGroups = function (pattern) {
  var matches = this.match(pattern);
  return pattern
  // get the pattern as a string
  .toString()
  // suss out the groups
  .match(/<(.+?)>/g)
  // remove the braces
  .map(function(group) {
    return group.match(/<(.+)>/)[1];
  })
  // create an object with a property for each group having the group's match as the value 
  .reduce(function(acc, curr, index, arr) {
    acc[curr] = matches[index + 1];
    return acc;
  }, {});
};    

usage

function testRegGroups() {
  var s = '123 Main St';
  var pattern = /((?!=<house number>)\d+)\s((?!=<street name>)\w+)\s((?!=<street type>)\w+)/;
  var o = s.matchWithGroups(pattern); // {'house number':"123", 'street name':"Main", 'street type':"St"}
  var j = JSON.stringify(o);
  var housenum = o['house number']; // 123
}

result of o

{
  "house number": "123",
  "street name": "Main",
  "street type": "St"
}

As Tim Pietzcker said ECMAScript 2018 introduces named capturing groups into JavaScript regexes. But what I did not find in the above answers was how to use the named captured group in the regex itself.

you can use named captured group with this syntax: \k<name>. for example

var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/

and as Forivin said you can use captured group in object result as follow:

let result = regexObj.exec('2019-28-06 year is 2019');
// result.groups.year === '2019';
// result.groups.month === '06';
// result.groups.day === '28';

_x000D_
_x000D_
  var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/mgi;_x000D_
_x000D_
function check(){_x000D_
    var inp = document.getElementById("tinput").value;_x000D_
    let result = regexObj.exec(inp);_x000D_
    document.getElementById("year").innerHTML = result.groups.year;_x000D_
    document.getElementById("month").innerHTML = result.groups.month;_x000D_
    document.getElementById("day").innerHTML = result.groups.day;_x000D_
}
_x000D_
td, th{_x000D_
  border: solid 2px #ccc;_x000D_
}
_x000D_
<input id="tinput" type="text" value="2019-28-06 year is 2019"/>_x000D_
<br/>_x000D_
<br/>_x000D_
<span>Pattern: "(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>";_x000D_
<br/>_x000D_
<br/>_x000D_
<button onclick="check()">Check!</button>_x000D_
<br/>_x000D_
<br/>_x000D_
<table>_x000D_
  <thead>_x000D_
    <tr>_x000D_
      <th>_x000D_
        <span>Year</span>_x000D_
      </th>_x000D_
      <th>_x000D_
        <span>Month</span>_x000D_
      </th>_x000D_
      <th>_x000D_
        <span>Day</span>_x000D_
      </th>_x000D_
    </tr>_x000D_
  </thead>_x000D_
  <tbody>_x000D_
    <tr>_x000D_
      <td>_x000D_
        <span id="year"></span>_x000D_
      </td>_x000D_
      <td>_x000D_
        <span id="month"></span>_x000D_
      </td>_x000D_
      <td>_x000D_
        <span id="day"></span>_x000D_
      </td>_x000D_
    </tr>_x000D_
  </tbody>_x000D_
</table>
_x000D_
_x000D_
_x000D_