What characters are allowed in an email address

Question

I m not asking about full email validation   I just want to know what are allowed characters in user-name and server parts of email address  This may be oversimplified  maybe email adresses can take other forms  but I don t care  I m asking about only this simple form  user-name server  e g  wild wezyr best-server-ever com  and allowed characters in both parts

User · Answer

For simplicity s sake  I sanitize the submission by removing all text within double quotes and those associated surrounding double quotes before validation  putting the kibosh on email address submissions based on what is disallowed  Just because someone can have the John   The  hizzle Bizzle   Doe whatever com address doesn t mean I have to allow it in my system  We are living in the future where it maybe takes less time to get a free email address than to do a good job wiping your butt  And it isn t as if the email criteria are not plastered right next to the input saying what is and isn t allowed   I also sanitize what is specifically not allowed by various RFCs after the quoted material is removed  The list of specifically disallowed characters and patterns seems to be a much shorter list to test for   Disallowed       local part starts with a period    account host com       local part ends with a period     account  host com       two or more periods in series     lots  of   dots host com        amp                                    some amp thing bad host com       more than one                     which one host com                                         mo characters mo problems host com     In the example given   John   The  hizzle Bizzle   Doe whatever com -- gt  John  Doe whatever com  John  Doe whatever com -- gt  John Doe whatever com   Sending a confirm email message to the leftover result upon an attempt to add or change the email address is a good way to see if your code can handle the email address submitted  If the email passes validation after as many rounds of sanitization as needed  then fire off that confirmation  If a request comes back from the confirmation link  then the new email can be moved from the holding  temporary  purgatory status or storage to become a real  bonafide first-class stored email   A notification of email address change failure or success can be sent to the old email address if you want to be considerate  Unconfirmed account setups might fall out of the system as failed attempts entirely after a reasonable amount of time   I don t allow stinkhole emails on my system  maybe that is just throwing away money  But  99 9  of the time people just do the right thing and have an email that doesn t push conformity limits to the brink utilizing edge case compatibility scenarios  Be careful of regex DDoS  this is a place where you can get into trouble  And this is related to the third thing I do  I put a limit on how long I am willing to process any one email  If it needs to slow down my machine to get validated-- it isn t getting past the my incoming data API endpoint logic   Edit  This answer kept on getting dinged for being  bad   and maybe it deserved it  Maybe it is still bad  maybe not

User · Answer

I created this regex according to RFC guidelines       w                 amp                -                             w         w   -

User · Answer

The format of e-mail address is  local-part domain-part  max  64 255 characters  no more 256 in total    The local-part and domain-part could have different set of permitted characters  but that s not all  as there are more rules to it   In general  the local part can have these ASCII characters    lowercase Latin letters  abcdefghijklmnopqrstuvwxyz  uppercase Latin letters  ABCDEFGHIJKLMNOPQRSTUVWXYZ  digits  0123456789  special characters       amp    -            dot     not first or last character or repeated unless quoted   space punctuations such as         lt  gt       with some restrictions   comments      are allowed within parentheses  e g   comment john smith example com     Domain part     lowercase Latin letters  abcdefghijklmnopqrstuvwxyz  uppercase Latin letters  ABCDEFGHIJKLMNOPQRSTUVWXYZ  digits  0123456789  hyphen  -  not first or last character   can contain IP address surrounded by square brackets  jsmith  192 168 2 1  or jsmith  IPv6 2001 db8  1     These e-mail addresses are valid    prettyandsimple example com very common example com disposable style email with symbol example com other email-with-dash example com x example com  one-letter local part   much more unusual  example com  very unusual   unusual com  example com  very       lt       VERY   very     very   unusual  strange example com example-indeed strange-example com admin mailserver1  local domain name with no top-level domain       amp    -           example org     lt                amp  -            a  example org     example org  space between the quotes  example localhost  sent from localhost  example s solutions  see the List of Internet top-level domains  user com user localserver user  IPv6 2001 db8  1    And these examples of invalid    Abc example com  no   character  A b c example com  only one   is allowed outside quotation marks  a b c d e f gi j k l example com  none of the special characters in this local part are allowed outside quotation marks  just not right example com  quoted strings must be dot separated or the only element making up the local part  this is not allowed example com  spaces  quotes  and backslashes may only exist when within quoted strings and preceded by a backslash  this  still  not allowed example com  even if escaped  preceded by a backslash   spaces  quotes  and backslashes must still be contained by quotes  john  doe example com  double dot before      with caveat  Gmail lets this through  john doe example  com  double dot after    a valid address with a leading space a valid address with a trailing space   Source  Email address at Wikipedia    Perl s RFC2822 regex for validating emails          r n     t                 lt  gt               000- 031            r n     t      Z           lt  gt                          r               r n     t             r n     t                r n     t           lt  gt               000- 031             r n     t     Z           lt  gt                          r               r n      t            r n     t              r n     t           lt  gt               000- 0 31            r n     t     Z           lt  gt                           r                   r n     t                r n     t           lt  gt               000- 031             r n     t     Z           lt  gt                           r                   r n     t               lt  gt               000- 031            r n     t     Z            lt  gt                          r               r n     t            r n      t       lt        r n     t               lt  gt               000- 031             r n     t     Z           lt  gt                           r                  r n      t                r n     t           lt  gt               000- 031            r n      t     Z           lt  gt                           r                  r n     t                   r n     t           lt  gt               000- 031            r n      t     Z           lt  gt                           r                  r n     t                 r n     t           lt  gt               000- 031            r n     t      Z           lt  gt                           r                  r n     t                 r n     t             lt  gt               000- 031            r n     t      Z           lt  gt                          r               r n     t            r  n     t                r n     t           lt  gt               000- 031             r n     t     Z           lt  gt                          r               r n     t             r n     t              r n     t           lt  gt               000- 031             r n     t     Z           lt  gt                           r                   r n     t                r n     t           lt  gt               000- 031             r n     t     Z           lt  gt                           r                   r n     t        gt        r n     t             lt  gt               000- 031             r n     t     Z           lt  gt                          r               r n      t            r n     t             r n     t                 lt  gt                000- 031            r n     t     Z           lt  gt                          r                r n     t            r n     t                r n     t           lt  gt                000- 031            r n     t     Z           lt  gt                           r               r n     t            r n     t              r n     t            lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t                r n     t            lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t               lt  gt               000-  031            r n     t     Z           lt  gt                          r                r n     t            r n     t       lt        r n     t               lt  gt                000- 031            r n     t     Z           lt  gt                            r                  r n     t                r n     t           lt  gt                000- 031            r n     t     Z           lt  gt                            r                  r n     t                  r n     t           lt  gt                000- 031            r n     t     Z           lt  gt                            r                  r n     t                r n     t           lt  gt                000- 031            r n     t     Z           lt  gt                           r                   r n     t                r n     t             lt  gt               0 00- 031            r n     t     Z           lt  gt                          r                r n     t            r n     t                r n     t           lt  gt                000- 031            r n     t     Z           lt  gt                           r               r n     t            r n     t              r n     t            lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t                r n     t            lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t        gt        r n     t         s             lt  gt               000- 031            r n     t     Z           lt  gt                           r               r n     t            r n     t                 r n     t           lt  gt               000- 031            r n     t     Z            lt  gt                          r               r n     t            r n     t               r n     t           lt  gt               000- 031            r n     t      Z           lt  gt                           r                  r n     t                 r n     t           lt  gt               000- 031            r n     t      Z           lt  gt                           r                  r n     t                lt  gt               000- 031            r n     t     Z           lt  gt                           r               r n     t            r n     t       lt        r n      t               lt  gt               000- 031            r n     t     Z            lt  gt                           r                  r n     t                r n      t           lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t                  r n      t           lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t                r n     t            lt  gt               000- 031            r n     t     Z           lt  gt                            r                  r n     t                r n     t              lt  gt               000- 031            r n     t     Z           lt  gt                           r               r n     t            r n     t                 r n     t           lt  gt               000- 031            r n     t     Z            lt  gt                          r               r n     t            r n     t               r n     t           lt  gt               000- 031            r n     t      Z           lt  gt                           r                  r n     t                 r n     t           lt  gt               000- 031            r n     t     Z            lt  gt                           r                  r n     t        gt         r n     t          s        The full regexp for RFC2822 addresses was a mere 3 7k    See also  RFC 822 Email Address Parser in PHP     The formal definitions of e-mail addresses are in    RFC 5322  sections 3 2 3 and 3 4 1  obsoletes RFC 2822   RFC 5321  RFC 3696  RFC 6531  permitted characters       Related    The true power of regular expressions

User · Answer

Gmail will only allow   sign as special character and in some cases     but any other special characters are not allowed at Gmail   RFC s says that you can use special characters but you should avoid sending mail to Gmail with special characters

User · Answer

The answer is  almost  ALL  7-bit ASCII   If the inclusion rules is     allowed under some any none conditions      Just by looking at one of several possible inclusion rules for allowed text in the  domain text  part in RFC 5322 at the top of page 17 we find   dtext               d33-90              Printable US-ASCII                     d94-126              characters not including                    obs-dtext                       or       the only three missing chars in this description are used in domain-literal     to form a quoted-pair    and the white space character   d32   With that the whole range 32-126  decimal  is used  A similar requirement appear as  qtext  and  ctext   Many control characters are also allowed used  One list of such control chars appears in page 31 section 4 1 of RFC 5322 as obs-NO-WS-CTL   obs-NO-WS-CTL       d1-8                US-ASCII control                     d11                  characters that do not                     d12                  include the carriage                     d14-31               return  line feed  and                     d127                 white space characters   All this control characters are allowed as stated at the start of section 3 5        MAY be used  the use of US-ASCII control characters  values      1 through 8  11  12  and 14 through 31  is discouraged        And such an inclusion rule is therefore  just too wide   Or  in other sense  the expected rule is  too simplistic

User · Answer

Wikipedia has a good article on this  and the official spec is here  From Wikipdia   The local-part of the e-mail address may use any of these ASCII characters   Uppercase and lowercase English letters  a-z  A-Z  Digits 0 to 9 Characters          amp        -                     Character    dot  period  full stop  provided that it is not the first or last character  and provided also that it does not appear two or more times consecutively   Additionally  quoted-strings  ie   quot John Doe quot  example com  are permitted  thus allowing characters that would otherwise be prohibited  however they do not appear in common practice  RFC 5321 also warns that  quot a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires  or uses  the Quoted-string form quot

User · Answer

A good read on the matter   Excerpt   These are all valid email addresses    Abc  def  example com  Fred Bloggs  example com  Joe  Blow  example com  Abc def  example com customer department shipping example com   A12345 example com  def xyz abc example com  somename example com

User · Answer

Google do an interesting thing with their gmail com addresses  gmail com addresses allow only letters  a-z   numbers  and periods which are ignored    e g   pikachu gmail com is the same as pi kachu gmail com  and both email addresses will be sent to the same mailbox  PIKACHU gmail com is also delivered to the same mailbox   So to answer the question  sometimes it depends on the implementer on how much of the RFC standards they want to follow  Google s gmail com address style is compatible with the standards  They do it that way to avoid confusion where different people would take similar email addresses e g        gmail com accepting rules     d oy smith gmail com    accepted  d oy smith gmail com    bounce and account can never be created  doysmith gmail com      accepted  D Oy Smith gmail com    bounce and account can never be created    The wikipedia link is a good reference on what email addresses generally allow   http   en wikipedia org wiki Email address

User · Answer

You can start from wikipedia article    Uppercase and lowercase English letters  a-z  A-Z  Digits 0 to 9 Characters          amp        -                     Character    dot  period  full stop  provided that it is not the first or last character  and provided also that it does not appear two or more times consecutively

User · Answer

The accepted answer refers to a Wikipedia article when discussing the valid local-part of an email address  but Wikipedia is not an authority on this   IETF RFC 3696 is an authority on this matter  and should be consulted at section 3  Restrictions on email addresses on page 5      Contemporary email addresses consist of a  local part  separated from   a  domain part   a fully-qualified domain name  by an at-sign          The syntax of the domain part corresponds to that in the previous   section   The concerns identified in that section about filtering and   lists of names apply to the domain names used in an email context as   well   The domain name can also be replaced by an IP address in    square brackets  but that form is strongly discouraged except for   testing and troubleshooting purposes       The local part may appear using the quoting conventions described   below   The quoted forms are rarely used in practice  but are required   for some legitimate purposes   Hence  they should not be rejected in   filtering routines but  should instead be passed to the email system   for evaluation by the destination host       The exact rule is that any ASCII character  including control   characters  may appear quoted  or in a quoted string   When quoting is   needed  the backslash character is used to quote the following   character   For example    Abc  def example com       is a valid form of an email address   Blank spaces may also appear    as in    Fred  Bloggs example com       The backslash character may also be used to quote itself  e g      Joe   Blow example com       In addition to quoting using the backslash character  conventional   double-quote characters may be used to surround strings   For example     Abc def  example com     Fred Bloggs  example com       are alternate forms of the first two examples above   These quoted   forms are rarely recommended  and are uncommon in practice  but  as   discussed above  must be supported by applications that are processing   email addresses   In particular  the quoted forms often appear in the   context of addresses associated with transitions from other systems   and contexts  those transitional requirements do still arise and    since a system that accepts a user-provided email address cannot    know  whether that address is associated with a legacy system  the   address forms must be accepted and passed into the email environment       Without quotes  local-parts may consist of any combination of   alphabetic characters  digits  or any of the special characters             amp        -                              period       may also appear  but may not be used to start or end   the local part  nor may two or more consecutive periods appear     Stated differently  any ASCII graphic  printing  character other than   the at-sign        backslash  double quote  comma  or square brackets   may appear without quoting   If any of that list of excluded   characters are to appear  they must be quoted   Forms such as    user mailbox example com    customer department shipping example com     A12345 example com     def xyz abc example com     somename example com       are valid and are seen fairly regularly  but any of the characters   listed above are permitted    As others have done  I submit a regex that works for both PHP and JavaScript to validate email addresses      a-z0-9      amp              -        a-z0-9      amp              -         a-z0-9     a-z0-9-   a-z0-9        a-zA-Z  2    i

User · Answer

The short answer is that there are 2 answers  There is one standard for what you should do  ie behaviour that is wise and will keep you out of trouble  There is another  much broader  standard for the behaviour you should accept without making trouble  This duality works for sending and accepting email but has broad application in life   For a good guide to the addresses you create  see  http   www remote org jochen mail info chars html  To filter valid emails  just pass on anything comprehensible enough to see a next step  Or start reading a bunch of RFCs  caution  here be dragons

User · Answer

See RFC 5322  Internet Message Format and  to a lesser extent  RFC 5321  Simple Mail Transfer Protocol   RFC 822 also covers email addresses  but it deals mostly with its structure    addr-spec      local-part     domain          global address       local-part     word       word                uninterpreted                                                case-preserved   domain         sub-domain       sub-domain        sub-domain     domain-ref   domain-literal       domain-ref     atom                           symbolic reference   And as usual  Wikipedia has a decent article on email addresses      The local-part of the email address may use any of these ASCII characters          uppercase and lowercase Latin letters A to Z and a to z    digits 0 to 9    special characters      amp    -              dot    provided that it is not the first or last character unless quoted  and provided also that it does not appear consecutively unless quoted  e g  John  Doe example com is not allowed but  John  Doe  example com is allowed     space and        lt  gt      characters are allowed with restrictions  they are only allowed inside a quoted string  as described in the paragraph below  and in addition  a backslash or double-quote must be preceded by a backslash     comments are allowed with parentheses at either end of the local-part  e g  john smith comment  example com and  comment john smith example com are both equivalent to john smith example com       In addition to ASCII characters  as of 2012 you can use international characters above U 007F  encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia   Note that as of 2019  these standards are still marked as Proposed  but are being rolled out slowly   The changes in this spec essentially added international characters as valid alphanumeric characters  atext  without affecting the rules on allowed  amp  restricted special characters like    and      For validation  see Using a regular expression to validate an email address   The domain part is defined as follows      The Internet standards  Request for Comments  for protocols mandate that component hostname labels may contain only the ASCII letters a through z  in a case-insensitive manner   the digits 0 through 9  and the hyphen  -   The original specification of hostnames in RFC 952  mandated that labels could not start with a digit or with a hyphen  and must not end with a hyphen  However  a subsequent specification  RFC 1123  permitted hostname labels to start with digits  No other symbols  punctuation characters  or blank spaces are permitted

User · Answer

Name    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789     amp    -              Server   abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-

User · Answer

As can be found in this Wikipedia link  The local-part of the email address may use any of these ASCII characters   uppercase and lowercase Latin letters A to Z and a to z   digits 0 to 9   special characters      amp    -             dot    provided that it is not the first or last character unless quoted  and provided also that it does not appear consecutively unless quoted  e g  John  Doe example com is not allowed but  quot John  Doe quot  example com is allowed    space and  quot       lt  gt      characters are allowed with restrictions  they are only allowed inside a quoted string  as described in the paragraph below  and in addition  a backslash or double-quote must be preceded by a backslash    comments are allowed with parentheses at either end of the local-part  e g  john smith comment  example com and  comment john smith example com are both equivalent to john smith example com    In addition to the above ASCII characters  international characters above U 007F  encoded as UTF-8  are permitted by RFC 6531  though mail systems may restrict which characters to use when assigning local-parts  A quoted string may exist as a dot separated entity within the local-part  or it may exist when the outermost quotes are the outermost characters of the local-part  e g   abc  quot defghi quot  xyz example com or  quot abcdefghixyz quot  example com are allowed  Conversely  abc quot defghi quot xyz example com is not  neither is abc  quot def  quot ghi example com   Quoted strings and characters however  are not commonly used  RFC 5321 also warns that  quot a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires  or uses  the Quoted-string form quot   The local-part postmaster is treated specially   it is case-insensitive  and should be forwarded to the domain email administrator  Technically all other local-parts are case-sensitive  therefore jsmith example com and  JSmith example com specify different mailboxes  however  many organizations treat uppercase and lowercase letters as equivalent  Despite the wide range of special characters which are technically valid  organisations  mail services  mail servers and mail clients in practice often do not accept all of them  For example  Windows Live Hotmail only allows creation of email addresses using alphanumerics  dot      underscore     and hyphen  -   Common advice is to avoid using some special characters to avoid the risk of rejected emails

User · Answer

Watch out  There is a bunch of knowledge rot in this thread  stuff that used to be true and now isn t   To avoid false-positive rejections of actual email addresses in the current and future world  and from anywhere in the world  you need to know at least the high-level concept of RFC 3490   quot Internationalizing Domain Names in Applications  IDNA  quot   I know folks in US and A often aren t up on this  but it s already in widespread and rapidly increasing use around the world  mainly the non-English dominated parts   The gist is that you can now use addresses like mason    com and wildwezyr fahrvergn  gen net  No  this isn t yet compatible with everything out there  as many have lamented above  even simple qmail-style  ident addresses are often wrongly rejected   But there is an RFC  there s a spec  it s now backed by the IETF and ICANN  and--more importantly--there s a large and growing number of implementations supporting this improvement that are currently in service  I didn t know much about this development myself until I moved back to Japan and started seeing email addresses like hei    ca and Amazon URLs like this  http   www amazon co jp         -       -           b ref topnav storetab e ie UTF8 amp node 3210981 I know you don t want links to specs  but if you rely solely on the outdated knowledge of hackers on Internet forums  your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work  For those users  such validation will be just as annoying as the commonplace brain-dead form that we all hate  the one that can t handle a   or a three-part domain name or whatever  So I m not saying it s not a hassle  but the full list of characters  quot allowed under some any none conditions quot  is  nearly  all characters in all languages  If you want to  quot accept all valid email addresses  and many invalid too  quot  then you have to take IDN into account  which basically makes a character-based approach useless  sorry   unless you first convert the internationalized email addresses  dead since September 2015  used to be like this   a working alternative is here  to Punycode  After doing that you can follow  most of  the advice above

User · Answer

Check for   and   and then send an email for them to verify     I still can t use my  name email address on 20  of the sites on the internet because someone screwed up their email validation  or because it predates the new addresses being valid

User · Answer

In my PHP I use this check    lt  php if  preg match          w          amp        -                          w          amp        -                               a-zA-Z0-9      a-zA-Z0-9  -         0 61  a-zA-Z0-9 -       a-zA-Z0-9      a-zA-Z0-9  -        0 61  a-zA-Z0-9                 01   d 1 2  2 0-4  d 25 0-5      3     01   d 1 2  2 0-4  d 25 0-5            tim qqq gmail com                  echo  legit email     else       echo  NOT legit email       gt    try it yourself http   phpfiddle org main code 9av6-d10r

[forms] What characters are allowed in an email address?

Examples related to forms

Examples related to email

Examples related to email-validation

Examples related to email-address