2011-04-11

Python: group a list in sub-lists of n items

A long list, and you want to process its items n at a time; easy, but how to split that list in sublists of n elements (except the last one, of course)?

I looked a bit into the stdlib but it doesn't seem to exist anything I could use (oh, did I say I'm still on 2.4?) so I directed my research to Google, and found a nice recipe at ActiveState, but it has the problem it discards the last list, if it has less than n items.

Searching again, I got more lucky with this article: it's a generator of tuples from a list, splitting every n elements and optionally return the last semi-full tuple. I slightly modified it to obtain:

def group_iter(iterator, n=2):
    """ Given an iterator, it returns sub-lists made of n items
    (except the last that can have len < n)
    inspired by http://countergram.com/python-group-iterator-list-function"""
    accumulator = []
    for item in iterator:
        accumulator.append(item)
        if len(accumulator) == n: # tested as fast as separate counter
            yield accumulator
            accumulator = [] # tested faster than accumulator[:] = []
            # and tested as fast as re-using one list object
    if len(accumulator) != 0:
        yield accumulator

How would you have done it?

12 comments:

Jaime said...

What I usually do is this small line...


lists = [original_list[i:i+list_size] for i in xrange(0, len(original_list), list_size)]

It's a little scary the first you see it, but it's easy. Just get the indexes from 0 to the length, in steps of list_size. Then create a sublist for each.

Jean-Paul Calderone said...

My favorite solution is zip(*[iter(input)]*N), where input is your input list and N is how many elements you want per sub-list. It's not exactly the same as your solution, since it drops dangling items instead of giving you back a short sublist. Replacing zip() with map(None gives you a solution that None-pads the last sublist if necessary, instead. However, this is "favorite" in a kind of perl golf way, not a use it in real software sort of way.

tag said...

I'd adapt the grouper recipe. It's based on izip_longest. I would also upgrade to a recent version of Python -- there are lots of new itertools goodies.

Nobu said...

Not as simple or elegant as some of the solutions, and I don't know how efficient it is, but here's something I threw together (substitute spaces for leading dashes; blogger doesn't like <code> tags):

def splitarray(array, gsize):
--arraylen = len(array)
--for i in range(arraylen / gsize):
----yield array[i * gsize:(i * gsize) + gsize]
--if arraylen % gsize != 0:
----yield array[-(arraylen % gsize):]

Nobu said...

Actually, looks like mine is the fastest of yours, Jaime's and mine. Of course, this is just a benchmark; it may be different in real-world usage:

http://pastebin.ca/2045170

Mine is t, yours is t2, Jamie's is t3.

Also note, I moved orig = range(23) into the setup part of timeit and that improved the time to 7.75~.

Gary Robinson said...

>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i::num] for i in range(num)]

I wrote a longish blog post about this some time ago... http://www.garyrobinson.net/2008/04/splitting-a-pyt.html

eswald said...

A simple translation of Jaime's solution to a generator function yields times almost identical to Nobu's:

def split(sequence, size):
    for i in xrange(0, len(sequence), size):
yield sequence[i:i+size]

Nobu said...

Which is why I still consider myself a novice. ;-)

Much more readable than mine. If I could tell what Jaime's was doing, I might've tried something like that....

Jeff said...

Sticking with iterators...

import itertools

def group_iter(iterator,n=2):
  while True:
    li = list(itertools.islice(iterator,n))
    if len(li):
      yield li
    else:
      break

Craig McQueen said...

Following your tangent... I'm curious to know why you're on 2.4. As a small-time package developer, I thought I only had to care about 2.5 and above by now.

Robert said...

Nobu

you code doesn't seem to work.
did you actually test it ?

Sandro Tosi said...

@all: thanks for all your replies and alternative solutions (some not exactly what I need, but appreciated nonetheless)

@Craig: simply because on the server where I need this snippet I only have 2.4 (and upgrade it is not an option)