Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

On a whim, I decided to compare the processing speed of [] and list() and was surprised to discover that [] runs more than three times faster than list(). I ran the same test with {} and dict() and got almost identical results: [] and {} both take around 0.128sec / million cycles while list() and dict() take 0.428sec/million cycles.

Why is this? Do [] and {} (probably () and '', too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list(), dict(), tuple(), str()) fully go about creating an object, whether or not they actually have elements?

I have no idea how these two methods differ but I'd love to find out. I couldn't find an answer in the docs or on SO, which was a little surprising. (It may be because it's difficult to search for empty brackets.)

I got my timing results by calling timeit.timeit("[]") and timeit.timeit("list()"), and timeit.timeit("{}") and timeit.timeit("dict()"), to compare lists and dictionaries, respectively. I'm running Python2.7.9.

share|improve this question
4  
Exactly. yep yep. This is why ["wham bam"] will give you a list with one item, where as list("wham bam") will give you ["w", "a", "m", ...]. –  Torxed May 13 at 13:20
5  
As an unrelated fun part - this is also true for other languages, for example void 0 is marginally faster than undefined in JS for the same reason. –  Benjamin Gruenbaum May 13 at 13:54
    
Other languages can optimize this out, though. SBCL can optimize both (list) and '() to the same assembly. I'm pretty sure GCC can optimize out many function calls which return the same thing each time. –  Alan Shutko yesterday

4 Answers 4

up vote 185 down vote accepted

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the __builtin__ module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.

share|improve this answer
2  
So basically, Python cannot currently optimize that out like some other languages do. –  Alan Shutko yesterday
8  
@AlanShutko Python can't do that in general, because the values bound to the variables list and dict may change. Also, if you don't need the function call for one reason or the other, why wouldn't you use the literal syntax anyway? –  Robin yesterday
1  
At some point, it's known that the value for list hasn't changed. I wonder if Jython will jit the function call out of existence. –  Alan Shutko yesterday
1  
My understanding is that JIT is optimizing on types: observe and guess some type assumptions, and generate machine code based on those assumptions. It will not check for value changes or not. –  user49117 yesterday
1  
@AlanShutko: Take into account you can rebind the name at any time. I can reach into a module that is already loaded and rebind the name. –  Martijn Pieters yesterday

list() requires a global lookup and a function call but [] compiles to a single instruction. See:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None
share|improve this answer
    
I did the same, before seeing your answer. Particularly, I also created a lambda calling list instead of passing list itself to dis, so the results would be comparable. But then I tried dis.dis(list), i.e. without the lambda, and it just returned None. Can you tell me why? –  tobias_k 2 days ago
1  
list is a built-in function written in C (in CPython at least) and as such it doesn't have bytecode for dis.dis to disassemble. –  Dan D. 2 days ago

Because list is a function to convert say a string to a list object, while [] is used to create a list off the bat. Try this (might make more sense to you):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

While

y = ["wham bam"]
>>> y
["wham bam"]

Gives you a actual list containing whatever you put in it.

share|improve this answer
3  
Much more intuitive answer than just "the bytecode is different" answers. –  Inverse 2 days ago
    
Personally, I wouldn't call it intuitive. –  Sebi yesterday

Python does not have permission to optimize instantiation of base types as one may want override them. Why that might be needed? For debug purposes for example.

share|improve this answer
2  
it doesn't answer the question. –  Grijesh Chauhan 12 hours ago
1  
This has nothing to do with 'permission', or overrides. It simply cannot optimise away the list lookup or call because the name could be bound to anything, and can be rebound at any time. –  Martijn Pieters 4 hours ago
    
@GrijeshChauhan: it is an attempt to answer the question. Not a very good one, but still. Unless the OP meant this to be a response to an existing comment on this page, in which case, it would not be an answer. –  Martijn Pieters 4 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.