I've got a Python program where two variables are set to the value 'public'
. In a conditional expression I have the comparison var1 is var2
which fails, but if I change it to var1 == var2
it returns True
.
Now if I open my Python interpreter and do the same "is" comparison, it succeeds.
>>> s1 = 'public'
>>> s2 = 'public'
>>> s2 is s1
True
What am I missing here?
This question is related to
python
string
comparison
identity
equality
If you're not sure what you're doing, use the '=='. If you have a little more knowledge about it you can use 'is' for known objects like 'None'.
Otherwise you'll end up wondering why things doesn't work and why this happens:
>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False
I'm not even sure if some things are guaranteed to stay the same between different python versions/implementations.
From my limited experience with python, is
is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. ==
is used to determine if the values are identical.
Here is a good example:
>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True
s1
is a unicode string, and s2
is a normal string. They are not the same type, but are the same value.
Other answers here are correct: is
is used for identity comparison, while ==
is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is
operator is simply wrong and you should be using ==
instead.
The reason is
works interactively is that (most) string literals are interned by default. From Wikipedia:
Interned strings speed up string comparisons, which are sometimes a performance bottleneck in applications (such as compilers and dynamic programming language runtimes) that rely heavily on hash tables with string keys. Without interning, checking that two different strings are equal involves examining every character of both strings. This is slow for several reasons: it is inherently O(n) in the length of the strings; it typically requires reads from several regions of memory, which take time; and the reads fills up the processor cache, meaning there is less cache available for other needs. With interned strings, a simple object identity test suffices after the original intern operation; this is typically implemented as a pointer equality test, normally just a single machine instruction with no memory reference at all.
So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn't always happen, and the rules for when this happens are quite convoluted, so please don't rely on this behavior in production code!)
Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is
operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string -- that is, it has a different identity, because it is stored in a different place in memory.
Actually, the is
operator checks for identity and == operator checks for equality.
From the language reference:
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
So from the above statement we can infer that the strings, which are immutable types, may fail when checked with "is" and may succeed when checked with "is".
The same applies for int
and tuple
which are also immutable types.
This is a side note, but in idiomatic Python, you will often see things like:
if x is None:
# Some clauses
This is safe, because there is guaranteed to be one instance of the Null Object (i.e., None).
I believe that this is known as "interned" strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.
If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.
This results in the Python "is" operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.
This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.
I think it has to do with the fact that, when the 'is' comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it's using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn't a large time gap in between it's optimized and uses the same object.
This is why you should be using the equality operator ==
, not is
, to compare the value of a string object.
>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>
In this example, I made s2, which was a different string object previously equal to 'one' but it is not the same object as s
, because the interpreter did not use the same object as I did not initially assign it to 'one', if I had it would have made them the same object.
The basic concept we have to be clear while approaching this question is to understand the difference between is and ==.
"is" is will compare the memory location. if id(a)==id(b), then a is b returns true else it returns false.
so, we can say that is is used for comparing memory locations. Whereas,
== is used for equality testing which means that it just compares only the resultant values. The below shown code may acts as an example to the above given theory.
In the case of string literals(strings without getting assigned to variables), the memory address will be same as shown in the picture. so, id(a)==id(b). remaining this is self-explanatory.
One last thing to note, you may use the sys.intern
function to ensure that you're getting a reference to the same string:
>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True
As pointed out above, you should not be using is
to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is
.
Note that the intern
function used to be a builtin on Python 2 but was moved to the sys
module in Python 3.
The ==
operator tests value equivalence. The is
operator tests object identity, and Python tests whether the two are really the same object (i.e., live at the same address in memory).
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a
and b
refers to it. The reason is that Python internally caches and reuses some strings as an optimization. There really is just a string 'banana' in memory, shared by a and b. To trigger the normal behavior, you need to use longer strings:
>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)
When you create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.
If a
refers to an object and you assign b = a
, then both variables refer to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
The is
keyword is a test for object identity while ==
is a value comparison.
If you use is
, the result will be true if and only if the object is the same object. However, ==
will be true any time the values of the object are the same.
is
is identity testing and ==
is equality testing. This means is
is a way to check whether two things are the same things, or just equivalent.
Say you've got a simple person
object. If it is named 'Jack' and is '23' years old, it's equivalent to another 23-year-old Jack, but it's not the same person.
class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __eq__(self, other):
return self.name == other.name and self.age == other.age
jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)
jack1 == jack2 # True
jack1 is jack2 # False
They're the same age, but they're not the same instance of person. A string might be equivalent to another, but it's not the same object.
is
is identity testing and ==
is equality testing (see the Python documentation).
In most cases, if a is b
, then a == b
. But there are exceptions, for example:
>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False
So, you can only use is
for identity tests, never equality tests.
is
will compare the memory location. It is used for object-level comparison.
==
will compare the variables in the program. It is used for checking at a value level.
is
checks for address level equivalence
==
checks for value level equivalence
Source: Stackoverflow.com