The Python Tourist #4: None, empty, and nothing.

In learning Python, I had read in several places that you should always use a test like if obj is None if you wanted to check for the None value. For some reason, I tend to ignore blanket statements that are presented without supporting rationale. If the underlying rationale isn't stated, I generally assume it's some sort of esoteric thing that doesn't really matter. Only after it bites me and I can understand the logic will I pay attention.

Here are a couple of cases where not explicitly testing for None has gotten me into trouble. Maybe these can help someone else avoid the same headaches.

Look at this sample:
Simple parsing function
def parse_file(filename):
    """
    Parse a file, returning a list of tags.
    Returns None on error.
    """
    
    f = open(filename,'r')
    
    if not check_format(f):
        return None  # file is wrong format

    tags = []
    
    for line in f:
        tags.append( parse_line(line) )
        
    return tags
    
if parse_file(filename):
    print "Parsed OK!"
else:
    print "** ERROR **"
Looks correct enough. The if parse_file(...) should be True if I get a list, and the else should be True if I get None. There is one little problem though. Look at the following snippet:
The boolean value of 'empty'
if []: print "True"
if not []: print "False"
This will always print "False". Coming from a C background, I want to think of None as "the absence of something", like a NULL pointer. Unfortunately, Python treats empty objects as False values. To me, an empty object is still something as opposed to None which (I think) should be nothing, so this is confusing.

I think the thing to do is recognize that this function has three exit states:
  1. None, indicating an error.
  2. An empty list, indicating an empty file.
  3. A non-empty list, holding tags.

The correct test is then:
Explicitly test for None
tags = parse_file(filename)
if tags is None:
    print "** ERROR **"
elif len(tags) == 0:
    print "Empty file"
else:
    print "OK!"
We can make this worse and give it four exit states, with the same functionality:
Now with four exit states ...
def parse_file(filename):
    """
    Parse a file, returning a list of tags.
    Returns None on error.
    """
    
    f = open(filename,'r')
    
    if not check_format(f):
        return None  # file is wrong format

    tags = []
    
    for line in f:
        # look for special end-of-file tag
        if end_of_file(l):
            return tags
        else:
            tags.append( parse_line(line) )
Although it looks like the same logic, I've introduced a (sort of) hidden fourth state: If the "end-of-file" tag isn't found, the for loop will exit without returning a value. When you don't return a value, None is returned. For example:
Not returning a value == None
def foo():
    pass
    
print "The value is %s" % foo()

Prints "The value is None".
Of course, the code sample above is buggy, I shouldn't let it fall out of the loop. Once again, my C background gave me a false sense of security. A C compiler will tell you when you exit a routine in different ways (with and without a return value), so things like this won't happen if you pay attention to the compiler warnings. The dynamic nature of Python means that it really can't do that kind of checking, since it would be impractical to run through every branch inside the function to see if the return values match.

Anyways, disregarding the buggy code for the moment, recognize that the above function has four distinct exit states:
  1. None, indicating an error.
  2. An empty list, indicating an empty file.
  3. A non-empty list, holding tags.
  4. None, indicating no return value.

The first and last cases bother me a little bit. I don't like that None can have two meanings:
  1. The value None.
  2. The absence of a value.

In my "C thinking" of the first example, I was assuming None meant "the absence of a value", so was surprised to find that an empty list was (apparently) the same as nothing. Of course, that isn't the case, it's just that an empty list evaluates to the same boolean value as None.

I appreciate that Python is a practical language. An impractical language could "fix" this by forcing you to only use (exactly) True or False in boolean expressions. Python tends to loosen the rules as much as practical, without going overboard. (Some languages like perl go overboard in their coercion rules, which I think leads to even harder to understand code.). I wish that empty lists didn't evaluate to False, but that's the way it is, so you just have to keep it in mind.
NOTE
Normally, if you don't like the way an object behaves, you can subclass it and override the behavior you don't like. In the case of boolean operators, there doesn't seem to be a way to do that. If L is a list, the expression if L: ... calls L.__len__(). Therefore an empty list returns 0, which is False. Trying to override this would break other list functionality. There is a draft proposal, PEP 335: Overloadable Boolean Operators, but even this doesn't seem to allow you to override the case of if L: ..., only the case if not L: ....
One final note: The correct test for None is if obj is None, not if obj == None. The reason not to use == is that an object can define its own __eq__ function, and might implement __eq__ in a way that would cause it to be equal to (even if not the same as) None. The "is" operator means "the same object", so is the more correct test here.
Written in WikklyText.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

len(tags)

anonymous wrote:

"len(tags) visits every element in the list"

No, it doesn't. A list knows how long it is without looking at all the elements, it is order (1) and it is fast. It's a fine idea.

There is a nice treatise on the concept of "something vs. nothing" in Python from back when Python first introuced a Bool type:

http://mail.python.org/pipermail/python-list/2002-April/136887.html

Anyway, I think the problem here is the use of None to indicate an error — that's what exceptions are for. I'd be inclined to write it:

def parse_file(filename): """ Parse a file, returning a list of tags. Raise FileParseError None on error. """

f = open(filename,'r') if not check_format(f): raise ParseError # file is wrong format tags = [] for line in f: tags.append( parse_line(line) ) return tags

now you can write:

try: tags = parse_file(file) for tag in tags: do_something except ParseError: print "whoops"


Written in WikklyText.

Thanks for the heads up!

I have to agree fully with your statements. I have noted several other points bothering me as well, and you can read them here

My specific problems were around databases. Now about 50% of my code is checking and dealing with exceptional cases (not just "exceptions" but also dealing with stuff you don't expect, but that can make your app misbehave).

Is this normal? How much error checking do you have in your code, and how does it compare to other scripting languages like Perl?

Thanks again - Nico

Python helps you to avoid

Python helps you to avoid bad references by this mechanism. Even in C it would be better to return something legal rather than a void pointer that can be passed on to wreak havok in someone else code.

consider the following, a function called foo that we expect to return a list.

if it fails you can return a null/none or an empty list. if client code has something like:

L=foo(); for(int I=0; I < len(L); ++i) { print(L[i]); }

You can expect this to go bang when foo() fails. If you always return a legal list the app wont die when the client is careless.

If you need better error detection you can return another value denoting an error code or even use exception handling. Both of these are painless and tidy in python.




if len(tags) == 0

Couldn't you just say something like:
if tags is None:
    # 'Error' handling
elif not tags: 
    # Empty list case
else:
    # Process returned list
I like it better than saying "if len(tags) == 0", and if you've already tested for None then it has the same effect. What do you think?

len(tags)

I prefer len(tags) because, to me, not tags is relying on the "falseness" of an empty list, which I find annoying. I think if I reread the above code a year later I would be confused as to why I had used tags is None followed by not tags and factor it out. I guess I find len(tags) more self-documenting. But, that's just my preference, it works either way of course.

len(tags)

len(tags) visits every element in the list. tags is None only has to check the first.

I vote for tags is None


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.