How can I find the duplicates in a Python list and create another list of the duplicates? The list is just integers.
|
To remove duplicates use
Note that
or, more concisely:
I don't recommend the latter style though. If list elements are not hashable, you cannot use set/dicts and have to resort to a quadratic time solution (compare each which each), for example:
|
|||||||||||||||||||||
|
|
|||||||||||||||||||||
|
You don't need the count, just whether or not the item was seen before. Adapted that answer to this problem:
Just in case speed matters, here are some timings:
Here are the results: (well done @JohnLaRooy!)
Interestingly, besides the timings itself, also the ranking slightly changes when pypy is used. Most interestingly, the Counter-based approach benefits hugely from pypy's optimizations, whereas the method caching approach I have suggested seems to have almost no effect.
Apparantly this effect is related to the "duplicatedness" of the input data. I have set
|
|||||||||||||||||||||
|
I came across this question whilst looking in to something related - and wonder why no-one offered a generator based solution? Solving this problem would be:
I was concerned with scalability, so tested several approaches, including naive items that work well on small lists, but scale horribly as lists get larger (note- would have been better to use timeit, but this is illustrative). I included @moooeeeep for comparison (it is impressively fast: fastest if the input list is completely random) and an itertools approach that is even faster again for mostly sorted lists... Now includes pandas approach from @firelynx -- slow, but not horribly so, and simple. Note - sort/tee/zip approach is consistently fastest on my machine for large mostly ordered lists, moooeeeep is fastest for shuffled lists, but your mileage may vary. Advantages
Assumptions
Fastest solution, 1m entries:
Approaches tested
The results for the 'all dupes' test were consistent, finding "first" duplicate then "all" duplicates in this array:
When the lists are shuffled first, the price of the sort becomes apparent - the efficiency drops noticeably and the @moooeeeep approach dominates, with set & dict approaches being similar but lessor performers:
|
|||||||||||||||||||||
|
collections.Counter is new in python 2.7:
In an earlier version you can use a conventional dict instead:
|
|||||||||||||
|
I would do this with pandas, because I use pandas a lot
Gives
Probably isn't very efficient, but it sure is less code than a lot of the other answers, so I thought I would contribute |
|||
|
Here's a neat and concise solution -
|
|||||
|
A bit late, but maybe helpful for some. For a largish list, I found this worked for me.
Shows just and all duplicates and preserves order. |
|||||
|
Using pandas:
|
|||
|
the third example of the accepted answer give an erroneous answer and does not attempt to give duplicates. Here is the correct version :
|
|||
|
Very simple and quick way of finding dupes with one iteration in Python is:
Output will be as follows:
This and more in my blog http://www.howtoprogramwithpython.com |
|||
|
|
|||
|
One line solution:
|
|||||||||||||
|
There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:
Notes:
|
|||
|
Here's a fast generator that uses a dict to store each element as a key with a boolean value for checking if the duplicate item has already been yielded. For lists with all elements that are hashable types:
For lists that might contain lists:
|
||||
|
You can use
or if you only want one of each duplicate this can be combined with
1 This is from a third-party library I have written: |
||||
|
How about simply loop through each element in the list by checking the number of occurrences, then adding them to a set which will then print the duplicates. Hope this helps someone out there.
|
|||
|
this is the way I had to do it because I challenged myself not to use other methods:
so your sample works as:
|
|||||
|
Use the |
||||
|
protected by Community♦ Feb 17 '16 at 12:10
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?