[security] The definitive guide to form-based website authentication

Form-based authentication for websites

We believe that Stack Overflow should not just be a resource for very specific technical questions, but also for general guidelines on how to solve variations on common problems. "Form based authentication for websites" should be a fine topic for such an experiment.

It should include topics such as:

  • How to log in
  • How to log out
  • How to remain logged in
  • Managing cookies (including recommended settings)
  • SSL/HTTPS encryption
  • How to store passwords
  • Using secret questions
  • Forgotten username/password functionality
  • Use of nonces to prevent cross-site request forgeries (CSRF)
  • OpenID
  • "Remember me" checkbox
  • Browser autocompletion of usernames and passwords
  • Secret URLs (public URL protected by digest)
  • Checking password strength
  • E-mail validation
  • and much more about form based authentication...

It should not include things like:

  • Roles and authorization
  • HTTP basic authentication

Please help us by:

  1. Suggesting subtopics
  2. Submitting good articles about this subject
  3. Editing the official answer

This question is related to security http authentication language-agnostic article

The answer is


Use OpenID Connect or User-Managed Access.

As nothing is more efficient than not doing it at all.


I just thought I'd share this solution that I found to be working just fine.

I call it the Dummy Field (though I haven't invented this so don't credit me).

In short: you just have to insert this into your <form> and check for it to be empty at when validating:

<input type="text" name="email" style="display:none" />

The trick is to fool a bot into thinking it has to insert data into a required field, that's why I named the input "email". If you already have a field called email that you're using you should try naming the dummy field something else like "company", "phone" or "emailaddress". Just pick something you know you don't need and what sounds like something people would normally find logical to fill in into a web form. Now hide the input field using CSS or JavaScript/jQuery - whatever fits you best - just don't set the input type to hidden or else the bot won't fall for it.

When you are validating the form (either client or server side) check if your dummy field has been filled to determine if it was sent by a human or a bot.

Example:

In case of a human: The user will not see the dummy field (in my case named "email") and will not attempt to fill it. So the value of the dummy field should still be empty when the form has been sent.

In case of a bot: The bot will see a field whose type is text and a name email (or whatever it is you called it) and will logically attempt to fill it with appropriate data. It doesn't care if you styled the input form with some fancy CSS, web-developers do it all the time. Whatever the value in the dummy field is, we don't care as long as it's larger than 0 characters.

I used this method on a guestbook in combination with CAPTCHA, and I haven't seen a single spam post since. I had used a CAPTCHA-only solution before, but eventually, it resulted in about five spam posts every hour. Adding the dummy field in the form has stopped (at least until now) all the spam from appearing.

I believe this can also be used just fine with a login/authentication form.

Warning: Of course this method is not 100% foolproof. Bots can be programmed to ignore input fields with the style display:none applied to it. You also have to think about people who use some form of auto-completion (like most browsers have built-in!) to auto-fill all form fields for them. They might just as well pick up a dummy field.

You can also vary this up a little by leaving the dummy field visible but outside the boundaries of the screen, but this is totally up to you.

Be creative!


When hashing, don't use fast hash algorithms such as MD5 (many hardware implementations exist). Use something like SHA-512. For passwords, slower hashes are better.

The faster you can create hashes, the faster any brute force checker can work. Slower hashes will therefore slow down brute forcing. A slow hash algorithm will make brute forcing impractical for longer passwords (8 digits +)


A good article about realistic password strength estimation is:

Dropbox Tech Blog » Blog Archive » zxcvbn: realistic password strength estimation


First, a strong caveat that this answer is not the best fit for this exact question. It should definitely not be the top answer!

I will go ahead and mention Mozilla’s proposed BrowserID (or perhaps more precisely, the Verified Email Protocol) in the spirit of finding an upgrade path to better approaches to authentication in the future.

I’ll summarize it this way:

  1. Mozilla is a nonprofit with values that align well with finding good solutions to this problem.
  2. The reality today is that most websites use form-based authentication
  3. Form-based authentication has a big drawback, which is an increased risk of phishing. Users are asked to enter sensitive information into an area controlled by a remote entity, rather than an area controlled by their User Agent (browser).
  4. Since browsers are implicitly trusted (the whole idea of a User Agent is to act on behalf of the User), they can help improve this situation.
  5. The primary force holding back progress here is deployment deadlock. Solutions must be decomposed into steps which provide some incremental benefit on their own.
  6. The simplest decentralized method for expressing an identity that is built into the internet infrastructure is the domain name.
  7. As a second level of expressing identity, each domain manages its own set of accounts.
  8. The form “account@domain” is concise and supported by a wide range of protocols and URI schemes. Such an identifier is, of course, most universally recognized as an email address.
  9. Email providers are already the de-facto primary identity providers online. Current password reset flows usually let you take control of an account if you can prove that you control that account’s associated email address.
  10. The Verified Email Protocol was proposed to provide a secure method, based on public key cryptography, for streamlining the process of proving to domain B that you have an account on domain A.
  11. For browsers that don’t support the Verified Email Protocol (currently all of them), Mozilla provides a shim which implements the protocol in client-side JavaScript code.
  12. For email services that don’t support the Verified Email Protocol, the protocol allows third parties to act as a trusted intermediary, asserting that they’ve verified a user’s ownership of an account. It is not desirable to have a large number of such third parties; this capability is intended only to allow an upgrade path, and it is much preferred that email services provide these assertions themselves.
  13. Mozilla offers their own service to act like such a trusted third party. Service Providers (that is, Relying Parties) implementing the Verified Email Protocol may choose to trust Mozilla's assertions or not. Mozilla’s service verifies users’ account ownership using the conventional means of sending an email with a confirmation link.
  14. Service Providers may, of course, offer this protocol as an option in addition to any other method(s) of authentication they might wish to offer.
  15. A big user interface benefit being sought here is the “identity selector”. When a user visits a site and chooses to authenticate, their browser shows them a selection of email addresses (“personal”, “work”, “political activism”, etc.) they may use to identify themselves to the site.
  16. Another big user interface benefit being sought as part of this effort is helping the browser know more about the user’s session – who they’re signed in as currently, primarily – so it may display that in the browser chrome.
  17. Because of the distributed nature of this system, it avoids lock-in to major sites like Facebook, Twitter, Google, etc. Any individual can own their own domain and therefore act as their own identity provider.

This is not strictly “form-based authentication for websites”. But it is an effort to transition from the current norm of form-based authentication to something more secure: browser-supported authentication.


I would like to add one very important comment: -

  • "In a corporate, intra- net setting," most if not all of the foregoing might not apply!

Many corporations deploy "internal use only" websites which are, effectively, "corporate applications" that happen to have been implemented through URLs. These URLs can (supposedly ...) only be resolved within "the company's internal network." (Which network magically includes all VPN-connected 'road warriors.')

When a user is dutifully-connected to the aforesaid network, their identity ("authentication") is [already ...] "conclusively known," as is their permission ("authorization") to do certain things ... such as ... "to access this website."

This "authentication + authorization" service can be provided by several different technologies, such as LDAP (Microsoft OpenDirectory), or Kerberos.

From your point-of-view, you simply know this: that anyone who legitimately winds-up at your website must be accompanied by [an environment-variable magically containing ...] a "token." (i.e. The absence of such a token must be immediate grounds for 404 Not Found.)

The token's value makes no sense to you, but, should the need arise, "appropriate means exist" by which your website can "[authoritatively] ask someone who knows (LDAP... etc.)" about any and every(!) question that you may have. In other words, you do not avail yourself of any "home-grown logic." Instead, you inquire of The Authority and implicitly trust its verdict.

Uh huh ... it's quite a mental-switch from the "wild-and-wooly Internet."


I'd like to add one suggestion I've used, based on defense in depth. You don't need to have the same auth&auth system for admins as regular users. You can have a separate login form on a separate url executing separate code for requests that will grant high privileges. This one can make choices that would be a total pain to regular users. One such that I've used is to actually scramble the login URL for admin access and email the admin the new URL. Stops any brute force attack right away as your new URL can be arbitrarily difficult (very long random string) but your admin user's only inconvenience is following a link in their email. The attacker no longer knows where to even POST to.


Definitive Article

Sending credentials

The only practical way to send credentials 100% securely is by using SSL. Using JavaScript to hash the password is not safe. Common pitfalls for client-side password hashing:

  • If the connection between the client and server is unencrypted, everything you do is vulnerable to man-in-the-middle attacks. An attacker could replace the incoming javascript to break the hashing or send all credentials to their server, they could listen to client responses and impersonate the users perfectly, etc. etc. SSL with trusted Certificate Authorities is designed to prevent MitM attacks.
  • The hashed password received by the server is less secure if you don't do additional, redundant work on the server.

There's another secure method called SRP, but it's patented (although it is freely licensed) and there are few good implementations available.

Storing passwords

Don't ever store passwords as plaintext in the database. Not even if you don't care about the security of your own site. Assume that some of your users will reuse the password of their online bank account. So, store the hashed password, and throw away the original. And make sure the password doesn't show up in access logs or application logs. OWASP recommends the use of Argon2 as your first choice for new applications. If this is not available, PBKDF2 or scrypt should be used instead. And finally if none of the above are available, use bcrypt.

Hashes by themselves are also insecure. For instance, identical passwords mean identical hashes--this makes hash lookup tables an effective way of cracking lots of passwords at once. Instead, store the salted hash. A salt is a string appended to the password prior to hashing - use a different (random) salt per user. The salt is a public value, so you can store them with the hash in the database. See here for more on this.

This means that you can't send the user their forgotten passwords (because you only have the hash). Don't reset the user's password unless you have authenticated the user (users must prove that they are able to read emails sent to the stored (and validated) email address.)

Security questions

Security questions are insecure - avoid using them. Why? Anything a security question does, a password does better. Read PART III: Using Secret Questions in @Jens Roland answer here in this wiki.

Session cookies

After the user logs in, the server sends the user a session cookie. The server can retrieve the username or id from the cookie, but nobody else can generate such a cookie (TODO explain mechanisms).

Cookies can be hijacked: they are only as secure as the rest of the client's machine and other communications. They can be read from disk, sniffed in network traffic, lifted by a cross-site scripting attack, phished from a poisoned DNS so the client sends their cookies to the wrong servers. Don't send persistent cookies. Cookies should expire at the end of the client session (browser close or leaving your domain).

If you want to autologin your users, you can set a persistent cookie, but it should be distinct from a full-session cookie. You can set an additional flag that the user has auto-logged in, and needs to log in for real for sensitive operations. This is popular with shopping sites that want to provide you with a seamless, personalized shopping experience but still protect your financial details. For example, when you return to visit Amazon, they show you a page that looks like you're logged in, but when you go to place an order (or change your shipping address, credit card etc.), they ask you to confirm your password.

Financial websites such as banks and credit cards, on the other hand, only have sensitive data and should not allow auto-login or a low-security mode.

List of external resources


My favourite rule in regards to authentication systems: use passphrases, not passwords. Easy to remember, hard to crack. More info: Coding Horror: Passwords vs. Pass Phrases


I dont't know whether it was best to answer this as an answer or as a comment. I opted for the first option.

Regarding the poing PART IV: Forgotten Password Functionality in the first answer, I would make a point about Timing Attacks.

In the Remember your password forms, an attacker could potentially check a full list of emails and detect which are registered to the system (see link below).

Regarding the Forgotten Password Form, I would add that it is a good idea to equal times between successful and unsucessful queries with some delay function.

https://crypto.stanford.edu/~dabo/papers/webtiming.pdf


I do not think the above answer is "wrong" but there are large areas of authentication that are not touched upon (or rather the emphasis is on "how to implement cookie sessions", not on "what options are available and what are the trade-offs".

My suggested edits/answers are

  • The problem lies more in account setup than in password checking.
  • The use of two-factor authentication is much more secure than more clever means of password encryption
  • Do NOT try to implement your own login form or database storage of passwords, unless the data being stored is valueless at account creation and self-generated (that is, web 2.0 style like Facebook, Flickr, etc.)

    1. Digest Authentication is a standards-based approach supported in all major browsers and servers, that will not send a password even over a secure channel.

This avoids any need to have "sessions" or cookies as the browser itself will re-encrypt the communication each time. It is the most "lightweight" development approach.

However, I do not recommend this, except for public, low-value services. This is an issue with some of the other answers above - do not try an re-implement server-side authentication mechanisms - this problem has been solved and is supported by most major browsers. Do not use cookies. Do not store anything in your own hand-rolled database. Just ask, per request, if the request is authenticated. Everything else should be supported by configuration and third-party trusted software.

So ...

First, we are confusing the initial creation of an account (with a password) with the re-checking of the password subsequently. If I am Flickr and creating your site for the first time, the new user has access to zero value (blank web space). I truly do not care if the person creating the account is lying about their name. If I am creating an account of the hospital intranet/extranet, the value lies in all the medical records, and so I do care about the identity (*) of the account creator.

This is the very very hard part. The only decent solution is a web of trust. For example, you join the hospital as a doctor. You create a web page hosted somewhere with your photo, your passport number, and a public key, and hash them all with the private key. You then visit the hospital and the system administrator looks at your passport, sees if the photo matches you, and then hashes the web page/photo hash with the hospital private key. From now on we can securely exchange keys and tokens. As can anyone who trusts the hospital (there is the secret sauce BTW). The system administrator can also give you an RSA dongle or other two-factor authentication.

But this is a lot of a hassle, and not very web 2.0. However, it is the only secure way to create new accounts that have access to valuable information that is not self-created.

  1. Kerberos and SPNEGO - single sign-on mechanisms with a trusted third party - basically the user verifies against a trusted third party. (NB this is not in any way the not to be trusted OAuth)

  2. SRP - sort of clever password authentication without a trusted third party. But here we are getting into the realms of "it's safer to use two-factor authentication, even if that's costlier"

  3. SSL client side - give the clients a public key certificate (support in all major browsers - but raises questions over client machine security).

In the end, it's a tradeoff - what is the cost of a security breach vs the cost of implementing more secure approaches. One day, we may see a proper PKI widely accepted and so no more own rolled authentication forms and databases. One day...


Examples related to security

Monitoring the Full Disclosure mailinglist Two Page Login with Spring Security 3.2.x How to prevent a browser from storing passwords JWT authentication for ASP.NET Web API How to use a client certificate to authenticate and authorize in a Web API Disable-web-security in Chrome 48+ When you use 'badidea' or 'thisisunsafe' to bypass a Chrome certificate/HSTS error, does it only apply for the current site? How does Content Security Policy (CSP) work? How to prevent Screen Capture in Android Default SecurityProtocol in .NET 4.5

Examples related to http

Access blocked by CORS policy: Response to preflight request doesn't pass access control check Axios Delete request with body and headers? Read response headers from API response - Angular 5 + TypeScript Android 8: Cleartext HTTP traffic not permitted Angular 4 HttpClient Query Parameters Load json from local file with http.get() in angular 2 Angular 2: How to access an HTTP response body? What is HTTP "Host" header? Golang read request body Angular 2 - Checking for server errors from subscribe

Examples related to authentication

Set cookies for cross origin requests How Spring Security Filter Chain works What are the main differences between JWT and OAuth authentication? http post - how to send Authorization header? ASP.NET Core Web API Authentication Token based authentication in Web API without any user interface Custom Authentication in ASP.Net-Core Basic Authentication Using JavaScript Adding ASP.NET MVC5 Identity Authentication to an existing project LDAP: error code 49 - 80090308: LdapErr: DSID-0C0903A9, comment: AcceptSecurityContext error, data 52e, v1db1

Examples related to language-agnostic

IOException: The process cannot access the file 'file path' because it is being used by another process Peak signal detection in realtime timeseries data Match linebreaks - \n or \r\n? Simple way to understand Encapsulation and Abstraction How can I pair socks from a pile efficiently? How do I determine whether my calculation of pi is accurate? What is ADT? (Abstract Data Type) How to explain callbacks in plain english? How are they different from calling one function from another function? Ukkonen's suffix tree algorithm in plain English Private vs Protected - Visibility Good-Practice Concern

Examples related to article

justify-content property isn't working HTML5 best practices; section/header/aside/article elements The definitive guide to form-based website authentication