[regex] Regular expression for address field validation

I am trying to write a regular expression that facilitates an address, example 21-big walk way or 21 St.Elizabeth's drive I came up with the following regular expression but I am not too keen to how to incorporate all the characters (alphanumeric, space dash, full stop, apostrophe)

"regexp=^[A-Za-z-0-99999999'

This question is related to regex

The answer is


Just to add to Serzas' answer(since don't have enough reps. to comment). alphabets and numbers can effectively be replaced by \w for words. Additionally apostrophe,comma,period and hyphen doesn't necessarily need a backslash. My requirement also involved front and back slashes so \/ and finally whitespaces with \s. The working regex for me ,as such was :

pattern: "[\w',-\\/.\s]"

I have succesfully used ;

Dim regexString = New stringbuilder
    With regexString
       .Append("(?<h>^[\d]+[ ])(?<s>.+$)|")                'find the 2013 1st ambonstreet 
       .Append("(?<s>^.*?)(?<h>[ ][\d]+[ ])(?<e>[\D]+$)|") 'find the 1-7-4 Dual Ampstreet 130 A
       .Append("(?<s>^[\D]+[ ])(?<h>[\d]+)(?<e>.*?$)|")    'find the Terheydenlaan 320 B3 
       .Append("(?<s>^.*?)(?<h>\d*?$)")                    'find the 245e oosterkade 9
    End With

    Dim Address As Match = Regex.Match(DataRow("customerAddressLine1"), regexString.ToString(), RegexOptions.Multiline)

    If Not String.IsNullOrEmpty(Address.Groups("s").Value) Then StreetName = Address.Groups("s").Value
    If Not String.IsNullOrEmpty(Address.Groups("h").Value) Then HouseNumber = Address.Groups("h").Value
    If Not String.IsNullOrEmpty(Address.Groups("e").Value) Then Extension = Address.Groups("e").Value

The regex will attempt to find a result, if there is none, it move to the next alternative. If no result is found, none of the 4 formats where present.


This one worked for me:

\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\.?

The source: https://www.codeproject.com/Tips/989012/Validate-and-Find-Addresses-with-RegEx


Regular expression for simple address validation

^[#.0-9a-zA-Z\s,-]+$

E.g. for Address match case

#1, North Street, Chennai - 11 

E.g. for Address not match case

$1, North Street, Chennai @ 11

Here is the approach I have taken to finding addresses using regular expressions:

A set of patterns is useful to find many forms that we might expect from an address starting with simply a number followed by set of strings (ex. 1 Basic Road) and then getting more specific such as looking for "P.O. Box", "c/o", "attn:", etc.

Below is a simple test in python. The test will find all the addresses but not the last 4 items which are company names. This example is not comprehensive, but can be altered to suit your needs and catch examples you find in your data.

import re
strings = [
    '701 FIFTH AVE',
    '2157 Henderson Highway',
    'Attn: Patent Docketing',
    'HOLLYWOOD, FL 33022-2480',
    '1940 DUKE STREET',
    '111 MONUMENT CIRCLE, SUITE 3700',
    'c/o Armstrong Teasdale LLP',
    '1 Almaden Boulevard',
    '999 Peachtree Street NE',
    'P.O. BOX 2903',
    '2040 MAIN STREET',
    '300 North Meridian Street',
    '465 Columbus Avenue',
    '1441 SEAMIST DR.',
    '2000 PENNSYLVANIA AVENUE, N.W.',
    '465 Columbus Avenue',
    '28 STATE STREET',
    'P.O, Drawer 800889.',
    '2200 CLARENDON BLVD.',
    '840 NORTH PLANKINTON AVENUE',
    '1025 Connecticut Avenue, NW',
    '340 Commercial Street',
    '799 Ninth Street, NW',
    '11318 Lazarro Ln',
    'P.O, Box 65745',
    'c/o Ballard Spahr LLP',
    '8210 SOUTHPARK TERRACE',
    '1130 Connecticut Ave., NW, Suite 420',
    '465 Columbus Avenue',
    "BANNER & WITCOFF , LTD",
    "CHIP LAW GROUP",
    "HAMMER & ASSOCIATES, P.C.",
    "MH2 TECHNOLOGY LAW GROUP, LLP",
]

patterns = [
    "c\/o [\w ]{2,}",
    "C\/O [\w ]{2,}",
    "P.O\. [\w ]{2,}",
    "P.O\, [\w ]{2,}",
    "[\w\.]{2,5} BOX [\d]{2,8}",
    "^[#\d]{1,7} [\w ]{2,}",
    "[A-Z]{2,2} [\d]{5,5}",
    "Attn: [\w]{2,}",
    "ATTN: [\w]{2,}",
    "Attention: [\w]{2,}",
    "ATTENTION: [\w]{2,}"
]
contact_list = []
total_count = len(strings)
found_count = 0
for string in strings:
    pat_no = 1
    for pattern in patterns:
        match = re.search(pattern, string.strip())
        if match:
            print("Item found: " + match.group(0) + " | Pattern no: " + str(pat_no))
            found_count += 1
        pat_no += 1

print("-- Total: " + str(total_count) + " Found: " + str(found_count)) 

Regex is a very bad choice for this kind of task. Try to find a web service or an address database or a product which can clean address data instead.

Related:


As a simple one line expression recommend this,

^([a-zA-z0-9/\\''(),-\s]{2,255})$


In case if you don't have a fixed format for the address as mentioned above, I would use regex expression just to eliminate the symbols which are not used in the address (like specialized sybmols - &(%#$^). Result would be:

[A-Za-z0-9'\.\-\s\,]