I looked a bit into the stdlib but it doesn't seem to exist anything I could use (oh, did I say I'm still on 2.4?) so I directed my research to Google, and found a nice recipe at ActiveState, but it has the problem it discards the last list, if it has less than n items.
Searching again, I got more lucky with this article: it's a generator of tuples from a list, splitting every n elements and optionally return the last semi-full tuple. I slightly modified it to obtain:
def group_iter(iterator, n=2): """ Given an iterator, it returns sub-lists made of n items (except the last that can have len < n) inspired by http://countergram.com/python-group-iterator-list-function""" accumulator = [] for item in iterator: accumulator.append(item) if len(accumulator) == n: # tested as fast as separate counter yield accumulator accumulator = [] # tested faster than accumulator[:] = [] # and tested as fast as re-using one list object if len(accumulator) != 0: yield accumulator
How would you have done it?
13 comments:
What I usually do is this small line...
lists = [original_list[i:i+list_size] for i in xrange(0, len(original_list), list_size)]
It's a little scary the first you see it, but it's easy. Just get the indexes from 0 to the length, in steps of list_size. Then create a sublist for each.
My favorite solution is zip(*[iter(input)]*N), where input is your input list and N is how many elements you want per sub-list. It's not exactly the same as your solution, since it drops dangling items instead of giving you back a short sublist. Replacing zip() with map(None gives you a solution that None-pads the last sublist if necessary, instead. However, this is "favorite" in a kind of perl golf way, not a use it in real software sort of way.
I'd adapt the grouper recipe. It's based on izip_longest. I would also upgrade to a recent version of Python -- there are lots of new itertools goodies.
Not as simple or elegant as some of the solutions, and I don't know how efficient it is, but here's something I threw together (substitute spaces for leading dashes; blogger doesn't like <code> tags):
def splitarray(array, gsize):
--arraylen = len(array)
--for i in range(arraylen / gsize):
----yield array[i * gsize:(i * gsize) + gsize]
--if arraylen % gsize != 0:
----yield array[-(arraylen % gsize):]
Actually, looks like mine is the fastest of yours, Jaime's and mine. Of course, this is just a benchmark; it may be different in real-world usage:
http://pastebin.ca/2045170
Mine is t, yours is t2, Jamie's is t3.
Also note, I moved orig = range(23) into the setup part of timeit and that improved the time to 7.75~.
>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i::num] for i in range(num)]
I wrote a longish blog post about this some time ago... http://www.garyrobinson.net/2008/04/splitting-a-pyt.html
A simple translation of Jaime's solution to a generator function yields times almost identical to Nobu's:
def split(sequence, size):
for i in xrange(0, len(sequence), size):
yield sequence[i:i+size]
Which is why I still consider myself a novice. ;-)
Much more readable than mine. If I could tell what Jaime's was doing, I might've tried something like that....
Sticking with iterators...
import itertools
def group_iter(iterator,n=2):
while True:
li = list(itertools.islice(iterator,n))
if len(li):
yield li
else:
break
Following your tangent... I'm curious to know why you're on 2.4. As a small-time package developer, I thought I only had to care about 2.5 and above by now.
Nobu
you code doesn't seem to work.
did you actually test it ?
@all: thanks for all your replies and alternative solutions (some not exactly what I need, but appreciated nonetheless)
@Craig: simply because on the server where I need this snippet I only have 2.4 (and upgrade it is not an option)
Post a Comment