How do I process the elements of a sequence in batches, idiomatically?
For example, with the sequence "abcdef" and a batch size of 2, I would like to do something like the following:
for x, y in "abcdef":
print "%s%s\n" % (x, y)
ab
cd
ef
Of course, this doesn't work because it is expecting a single element from the list which itself contains 2 elements.
What is a nice, short, clean, pythonic way to process the next n elements of a list in a batch, or sub-strings of length n from a larger string (two similar problems)?
-
Responses to this question show a few methods.
-
One solution, although I challenge someone to do better ;-)
a = 'abcdef' b = [[a[i-1], a[i]] for i in range(1, len(a), 2)] for x, y in b: print "%s%s\n" % (x, y) -
I am sure someone is going to come up with some more "Pythonic" but how about:
for y in range(0, len(x), 2): print "%s%s" % (x[y], x[y+1])Note that this would only work if you know that
len(x) % 2 == 0;jcoon : start the range at 1 and then using x[y-1] will work for len(x)%2 == 1rpr : though this answer is neither quite pythonic nor genericPaolo Bergantino : It solved the OP's problem in a short and simple way. Your answer may be the most pythonic (and I even noted in my answer that it isn't pythonic) but that's hardly a reason for a downvote...rpr : Downvote is neither necessary nor relevant, I agree. However, the problem is stated as "What is a nice, short, clean, pythonic way..?" and not as "a short and simple" solution. And independent votes reflect the quality of answers as how they match and satisfy the stated question. In this case, the OP chose what he/she thinks to satisfy the need and that is it. Though, I don't quite agree with that, still. Thanks...Paolo Bergantino : Err.. right. Let's just agree to disagree. -
you can create the following generator
def chunks(seq, size): a = range(0, len(seq), size) b = range(size, len(seq) + 1, size) for i, j in zip(a, b): yield seq[i:j]and use it like this:
for i in chunks('abcdef', 2): print(i) -
A generator function would be neat:
def batch_gen(data, batch_size): for i in range(0, len(data), batch_size): yield data[i:i+batch_size]Example use:
a = "abcdef" for i in batch_gen(a, 2): print iprints:
ab cd ef -
Don't forget about the zip() function:
a = 'abcdef' for x,y in zip(a[::2], a[1::2]): print '%s%s' % (x,y)culebrón : a very elegant solution! -
but the more general way would be (inspired by this answer):
for i in zip(*(seq[i::size] for i in range(size))): print(i) # tuple of individual valueskigurai : +1 for elegant answer! But, there is one ")" too much in the end of the for-line -
I've got an alternative approach, that works for iterables that don't have a known length.
def groupsgen(seq, size): it = iter(seq) while True: values = () for n in xrange(size): values += (it.next(),) yield valuesIt works by iterating over the sequence (or other iterator) in groups of size, collecting the values in a tuple. At the end of each group, it yield the tuple.
When the iterator runs out of values, it produces a StopIteration exception which is then propagated up, indicating that groupsgen is out of values.
It assumes that the values come in sets of size (sets of 2, 3, etc). If not, any values left over are just discarded.
-
How about itertools?
from itertools import islice, groupby def chunks_islice(seq, size): while True: aux = list(islice(seq, 0, size)) if not aux: break yield "".join(aux) def chunks_groupby(seq, size): for k, chunk in groupby(enumerate(seq), lambda x: x[0] / size): yield "".join([i[1] for i in chunk])dbr : To make code-blocks, you indent the code by four spaces (instead of usingtags), the "101010" button in the editor toolbar does this for the selected text too -
>>> a = "abcdef" >>> size = 2 >>> [a[x:x+size] for x in range(0, len(a), size)] ['ab', 'cd', 'ef']..or, not as a list comprehension:
a = "abcdef" size = 2 output = [] for x in range(0, len(a), size): output.append(a[x:x+size])Or, as a generator, which would be best if used multiple times (for a one-use thing, the list comprehension is probably "best"):
def chunker(thelist, segsize): for x in range(0, len(thelist), segsize): yield thelist[x:x+segsize]..and it's usage:
>>> for seg in chunker(a, 2): ... print seg ... ab cd ef -
And then there's always the documentation.
def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) try: b.next() except StopIteration: pass return izip(a, b) def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)Note: these produce tuples instead of substrings, when given a string sequence as input.
-
s = 'abcdefgh' for e in (s[i:i+2] for i in range(0,len(s),2)): print(e)
0 comments:
Post a Comment