Why is [] faster than list()?

Question

On a whim, I decided to compare the processing speed of [] and list() and was surprised to discover that [] runs more than three times faster than list(). I ran the same test with {} and dict() and got almost identical results: [] and {} both take around 0.128sec / million cycles while list() and dict() take 0.428sec/million cycles.

Why is this? Do [] and {} (probably () and '', too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list(), dict(), tuple(), str()) fully go about creating an object, whether or not they actually have elements?

I have no idea how these two methods differ but I'd love to find out. I couldn't find an answer in the docs or on SO, which was a little surprising. (It may be because it's difficult to search for empty brackets.)

I got my timing results by calling timeit.timeit("[]") and timeit.timeit("list()"), and timeit.timeit("{}") and timeit.timeit("dict()"), to compare lists and dictionaries, respectively. I'm running Python2.7.9.

Exactly. yep yep. This is why ["wham bam"] will give you a list with one item, where as list("wham bam") will give you ["w", "a", "m", ...]. — Torxed, May 13 at 13:20
As an unrelated fun part - this is also true for other languages, for example void 0 is marginally faster than undefined in JS for the same reason. — Benjamin Gruenbaum, May 13 at 13:54
Other languages can optimize this out, though. SBCL can optimize both (list) and '() to the same assembly. I'm pretty sure GCC can optimize out many function calls which return the same thing each time. — Alan Shutko, yesterday

Martijn Pieters · Accepted Answer · 2015-05-13 13:40:48Z

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the __builtin__ module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.

So basically, Python cannot currently optimize that out like some other languages do. — Alan Shutko, yesterday
@AlanShutko Python can't do that in general, because the values bound to the variables list and dict may change. Also, if you don't need the function call for one reason or the other, why wouldn't you use the literal syntax anyway? — Robin, yesterday
At some point, it's known that the value for list hasn't changed. I wonder if Jython will jit the function call out of existence. — Alan Shutko, yesterday
My understanding is that JIT is optimizing on types: observe and guess some type assumptions, and generate machine code based on those assumptions. It will not check for value changes or not. — user49117, yesterday
@AlanShutko: Take into account you can rebind the name at any time. I can reach into a module that is already loaded and rebind the name. — Martijn Pieters♦, yesterday

rightfold · Answer 2 · 2015-05-13 19:22:23Z

up vote 40 down vote

list() requires a global lookup and a function call but [] compiles to a single instruction. See:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

edited May 13 at 19:22

rightfold
14.4k42659

answered May 13 at 13:22

Dan D.
23.1k33671

I did the same, before seeing your answer. Particularly, I also created a lambda calling list instead of passing list itself to dis, so the results would be comparable. But then I tried dis.dis(list), i.e. without the lambda, and it just returned None. Can you tell me why? – tobias_k 2 days ago

1

list is a built-in function written in C (in CPython at least) and as such it doesn't have bytecode for dis.dis to disassemble. – Dan D. 2 days ago

add a comment |

Torxed · Answer 3 · 2015-05-13 13:27:34Z

up vote 31 down vote

Because list is a function to convert say a string to a list object, while [] is used to create a list off the bat. Try this (might make more sense to you):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

While

y = ["wham bam"]
>>> y
["wham bam"]

Gives you a actual list containing whatever you put in it.

edited May 13 at 13:27

answered May 13 at 13:21

Torxed
6,05122253

3

Much more intuitive answer than just "the bytecode is different" answers. – Inverse 2 days ago

Personally, I wouldn't call it intuitive. – Sebi yesterday

add a comment |

Dmytro · Answer 4 · 2015-05-15 14:38:33Z

up vote -2 down vote

Python does not have permission to optimize instantiation of base types as one may want override them. Why that might be needed? For debug purposes for example.

answered yesterday

Dmytro
403

2

it doesn't answer the question. – Grijesh Chauhan 12 hours ago

1

This has nothing to do with 'permission', or overrides. It simply cannot optimise away the list lookup or call because the name could be bound to anything, and can be rebound at any time. – Martijn Pieters♦ 4 hours ago

@GrijeshChauhan: it is an attempt to answer the question. Not a very good one, but still. Unless the OP meant this to be a response to an existing comment on this page, in which case, it would not be an answer. – Martijn Pieters♦ 4 hours ago

add a comment |

asked	3 days ago
viewed	15710 times
active	yesterday

current community

your communities

more stack exchange communities

Why is [] faster than list()?

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged python performance list dictionary instantiation or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Why is [] faster than list()?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python performance list dictionary instantiation or ask your own question.

Related

Hot Network Questions