[url] Should URL be case sensitive?

I noticed that

HTTP://STACKOVERFLOW.COM/QUESTIONS/ASK

and

http://stackoverflow.com/questions/ask

both works fine - actually the previous one is converted to lowercase.

I think that this makes sense for the user.

If I look at Google then this URL works fine:

http://www.google.com/intl/en/about/corporate/index.html  

but this one with "ABOUT" is not working:

http://www.google.com/intl/en/ABOUT/corporate/index.html   

Should the URL be case sensitive?

This question is related to url case-sensitive

The answer is


The domain name portion of a URL is not case sensitive since DNS ignores case: http://en.example.org/ and HTTP://EN.EXAMPLE.ORG/ both open the same page.

The path is used to specify and perhaps find the resource requested. It is case-sensitive, though it may be treated as case-insensitive by some servers, especially those based on Microsoft Windows.

If the server is case sensitive and http://en.example.org/wiki/URL is correct, then http://en.example.org/WIKI/URL or http://en.example.org/wiki/url will display an HTTP 404 error page, unless these URLs point to valid resources themselves.


The case sensitivity of URLs, in general (along with whether they are same or not if they are in different case), needs to be looked at from the following perspectives:

  • Resource Equivalence
  • URL Comparison

From the perspective of resource equivalence it is generally not possible to say two URLs differing by any case (lower case, upper case, sentence case, camel case ... any combination of case) are different from each other unless the resource is retrieved from both the URLs, which in many cases is not practical (RFC 3986, section 6.1, para 1). Therefore where the resource cannot be retrieved, the comparison perspective is used.

However, in case where it is possible to retrieve the resource, the matter gets more (as expected) complicated. By the provisions of RFC 3986, Section 3.3, para 5, as highlighted below

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax

it would appear that no assumption can be made for the rest of a URI/URL beyond it's scheme and authority from generic syntax (inclusive of the sensitivity question).

For scheme and host part of the authority, however, the specification does (charitably) state them to be case insensitive. Refer RFC 3986, section 3.1, para 1 and RFC 3986, section 6.2.2.1, para 2.

Having exhausted this line of inquiry one should look at the comparison perspective to determine whether URI/URLs are to be case sensitive or not.

The first hint to that direction emerges through perusal of the section 6.2.2.1 (above)

The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme

Which is further buoyed by considering RFC 2616, section 3.2.3

When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs

Then, finally, is the enquiry settled and URLs are case sensitive ... (heh!), not quite, the operative words are 'opaque', 'client' and 'comparing'.

Beyond it's syntax, The above RFC don't mention anything about the actual interpretation of the path and query except that it is 'opaque' and it only specifies how (with a SHOULD and not a MUST) a 'client' may 'compare' the URL. It mentions nothing regarding how a server (SHOULD, let alone MUST) interpret the rest of the URL beyond scheme/authority.

Therefore the server has all the latitude to interpret an URL as they please, which they do as highlighted by earlier posts by others.


URLs should be case insensitive unless there is a good reason why they are should not be.

This is not mandatory (it is not any part of an RFC) but it makes the communication and storage of URLs far more reliable.

If I have two pages on a website:

http://stackoverflow.com/ABOUT.html

and

http://stackoverflow.com/about.html

How should they differ? Maybe one is written 'shouting style' (caps) - but from an IA point of view, the distinction should never be made by a change in the case of the URL.

Moreover, it is easy to implement this in Apache - just use CheckSpelling On from mod_Speling.


Consider the following:

https://www.example.com/createuser.php?name=Paul%20McCartney

In this hypothetical example, an HTML form - using the GET method - sends the "name" parameter to a PHP script that creates a new user account.

And the point I'm making with this example is that this GET parameter needs to be case-sensitive to preserve the capitalisation of "McCartney" (or, as another example, to preserve "Walter d'Isney", as there are other ways for names to break the usual capitalisation rules).

It's cases like these which guides the W3C recommendation that scheme and host are case insensitive, but everything after that is potentially case sensitive - and is left up to the server. Forcing case insensitivity by standard would make the above example incapable of preserving the case of user input passed as a GET query parameter.

But what I'd say is that though this is necessarily the letter of the law to accommodate such cases, the spirit of the law is that, where case is irrelevant, behave in a case insensitive way. The standards, though, can't tell you where case is irrelevant because, like the examples I've given, it's a context-dependent thing.

(e.g. an account username is probably best forced to case insensitivity - as "User123" and "user123" being different accounts could prove confusing - even if their real name, as above, is best left case sensitive.)

Sometimes it's relevant, most times it isn't. But it has to be left up to the server / web developer to decide these things - and can't be prescribed by standard - as only at that level could the context be known.

The scheme and host are case insensitive (which shows the standard's preference for case insensitivity, where it can be universally prescribed). The rest is left up to you to decide, as you understand the context better. But, as has been discussed, you probably should, in the spirit of the law, default to case insensitivity unless you have a good reason not to.


It is possible to make noncase sensitive URLs

RewriteEngine on
rewritemap lowercase int:tolower
RewriteCond $1 [A-Z]
RewriteRule ^/(.*)$ /${lowercase:$1} [R=301,L]

Making Google.com..GOOGLE.com etc direct to google.com


For websites hosted in a Linux server, URL is case sensitive. http://www.google.com/about and http://www.google.com/About will be redirected to different locations. While in a Windows Server, URL is case-insensitive, as in naming a FOLDER and will be redirected to same location.


I think this and many of the answers around what the spec does or does not say is missing the point of the question.Should they be case sensitive? That's a loaded question really. From a user's point of view, case sensitivity is a pain point, not all know makes a difference. The question of whether URIs should or shouldn't be, depends on the context of the question. For technical flexibility, yes, they should be. For usability, no, they should not be.


Depends on the hosting os. Sites that are hosted on Windows tend to be case insensitive as the underlying file system is case insensitive. Sites hosted on Unix type systems tend to be case sensitive as their underlying file systems are typically case sensitive. The host name part of the URL is always case insensitive, it's the rest of the path that varies.


I am not a fan of bumping old articles but because this was one of the first responses for this particular issue I felt a need to clarify something.

As @Bhavin Shah answer states the domain part of the url is case insensitive, so

http://google.com 

and

http://GOOGLE.COM 

and

http://GoOgLe.CoM 

are all the same but everything after the domain name part is considered case sensitive.

so...

http://GOOGLE.COM/ABOUT

and

http://GOOGLE.COM/about

are different.

Note: I am talking "technically" and not "literally" in a lot of cases, most actually, servers are setup to handle these items the same, but it is possible to set them up so they are NOT handled the same.

Different servers handle this differently and in some cases they Have to be case sensitive. In many cases query string values are encoded (such as Session Ids or Base64 encoded data thats passed as a query string value) These items are case sensitive by their nature so the server has to be case sensitive in handling them.

So to answer the question, "should" servers be case sensitive in grabbing this data, the answer is "yes, most definitely."

Of course not everything needs to be case sensitive but the server should be aware of what that is and how to handle those cases.


@Hart Simha's comment basically says the same thing. I missed it before I posted so I want to give credit where credit is due.


All “insensitive”s are boldened for readability.

Domain names are case insensitive according to RFC 4343. The rest of URL is sent to the server via the GET method. This may be case sensitive or not.

Take this page for example, stackoverflow.com receives GET string /questions/7996919/should-url-be-case-sensitive, sending a HTML document to your browser. Stackoverflow.com is case insensitive because it produces the same result for /QUEStions/7996919/Should-url-be-case-sensitive.

On the other hand, Wikipedia is case sensitive except the first character of the title. The URLs https://en.wikipedia.org/wiki/Case_sensitivity and https://en.wikipedia.org/wiki/case_sensitivity leads to the same article, but https://en.wikipedia.org/wiki/CASE_SENSITIVITY returns 404.


URL characters are converted into hex code (if you've ever noticed spaces in URLs being displayed as %20, etc.), and since lower and upper case have different hex values, it makes perfect sense that URLs are most definitely case sensitive. However the spirit of the question seems to be SHOULD that be the standard and I say no, but they are. Its up to the developer/provider to account for this in their code if they want it to work regardless for an end user.


Case Preservation

URLs are case-preserving, between client and server. But portions of URLs may or may not be case-sensitive, depending on the server, for a couple of reasons.

Case Sensitivity

The following bold parts of URLs may be case-sensitive, depending on the site and/or server configuration.

    http:// www. example.com /abc/def.ghi?jkl=mno#pqr

    user @ example.com

Rationale

Case-sensitivity in URLs can have several uses. Mainly:

  1. Native compatibility with case-sensitive filesystems.
  2. More compact data encoding within URLs, such as for serialization, hashing, IDs, permalinks, and URL shorteners.

As a developer, I believe the above can often be handled in better ways, but I also understand there are cases where a situation may not permit this.

For example, imagine an existing product that requires a lot of data placed in the "GET" URL, yet it must be compatible with the maximum URL lengths of all major servers, browsers, and caching/proxy mechanisms. To fit even a moderate-length command string (under 1,024 characters for some older browsers), you'd need to use every unique URL-safe character you could (which is basically what base64url encoding is).

In an Ideal World

Whether or not URLs should be case-sensitive is debatable. I personally believe they should not be, for simplicity (though it may create longer URLs, we have percent-escapes to easily handle cases where we must ensure preservation of exact characters, and there are ways to transfer data other than right in the URL).

Many seem to agree based on the fact that case-insensitive URLs are explicitly enabled for many popular sites and services, in order to increase usability. The most prominent example is the username portion of email addresses. Most email providers will ignore case and sometimes even dots and other symbols (like "[email protected]" being the same as "[email protected]"). Even though email usernames are case-sensitive by default, according to spec.

However, the fact is that despite what I or others might want, this is the state of how things currently work. And while an eventual worldwide transition to a case-insensitive URL standard is certainly possible, it would likely take quite a long time since case-sensitivity is currently used extensively around the web for various purposes.

Best Practices

As far as best practices go, as a user you can reasonably stick to lowercase for most situations and expect things to work. The main exceptions would be URLs that use case-based encoding or document paths with direct filesystem equivalents. However, such complex URLs are typically copy-pasted (or simply clicked) rather than manually typed.

As a web developer, you should consider keeping URLs as case-insensitive as possible. Though there are clearly some difficult-to-avoid situations, depending on context, as noted above.


the question is should the url be case sensitive?

I see no use, or good practice behind case sensitive URL's. It stupid, it sucks and should be avoided at all times.

Just to back up my opinion, when someone asks what URL, how could you explain what characters of the URL are Upper or Lower case? That's nonsense and should no one ever tell you otherwise.


Old question but I stumbled here so why not take a shot at it since the question is seeking various perspective and not a definitive answer.

w3c may have its recommendations - which I care a lot - but want to rethink since the question is here.

Why does w3c consider domain names be case insensitive and leaves anything afterwards case insensitive ?

I am thinking that the rationale is that the domain part of the URL is hand typed by a user. Everything after being hyper text will be resolved by the machine (browser and server in the back).

Machines can handle case insensitivity better than humans (not the technical kind:)).

But the question is just because the machines CAN handle that should it be done that way ?

I mean what are the benefits of naming and accessing a resource sitting at hereIsTheResource vs hereistheresource ?

The lateral is very unreadable than the camel case one which is more readable. Readable to Humans (including the technical kind.)

So here are my points:-

Resource Path falls in the somewhere in the middle of programming structure and being close to an end user behind the browser sometimes.

Your URL (excluding the domain name) should be case insensitive if your users are expected to touch it or type it etc. You should develop your application to AVOID having users type the path as much as possible.

Your URL (excluding the domain name) should be case sensitive if your users would never type it by hand.

Conclusion

Path should be case sensitive. My points are weighing towards the case sensitive paths.


Look at the specification here: section 2.7.3 http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-25#page-19

The scheme and host are case-insensitive and normally provided in lowercase; all other components are compared in a case-sensitive manner.