Sunday, May 1, 2011

Iterate over a python sequence in multiples of n?

How do I process the elements of a sequence in batches, idiomatically?

For example, with the sequence "abcdef" and a batch size of 2, I would like to do something like the following:

for x, y in "abcdef":
    print "%s%s\n" % (x, y)
ab
cd
ef

Of course, this doesn't work because it is expecting a single element from the list which itself contains 2 elements.

What is a nice, short, clean, pythonic way to process the next n elements of a list in a batch, or sub-strings of length n from a larger string (two similar problems)?

From stackoverflow
  • Responses to this question show a few methods.

  • One solution, although I challenge someone to do better ;-)

    a = 'abcdef'
    b = [[a[i-1], a[i]] for i in range(1, len(a), 2)]
    
    for x, y in b:
      print "%s%s\n" % (x, y)
    
  • I am sure someone is going to come up with some more "Pythonic" but how about:

    for y in range(0, len(x), 2):
        print "%s%s" % (x[y], x[y+1])
    

    Note that this would only work if you know that len(x) % 2 == 0;

    jcoon : start the range at 1 and then using x[y-1] will work for len(x)%2 == 1
    rpr : though this answer is neither quite pythonic nor generic
    Paolo Bergantino : It solved the OP's problem in a short and simple way. Your answer may be the most pythonic (and I even noted in my answer that it isn't pythonic) but that's hardly a reason for a downvote...
    rpr : Downvote is neither necessary nor relevant, I agree. However, the problem is stated as "What is a nice, short, clean, pythonic way..?" and not as "a short and simple" solution. And independent votes reflect the quality of answers as how they match and satisfy the stated question. In this case, the OP chose what he/she thinks to satisfy the need and that is it. Though, I don't quite agree with that, still. Thanks...
    Paolo Bergantino : Err.. right. Let's just agree to disagree.
  • you can create the following generator

    def chunks(seq, size):
        a = range(0, len(seq), size)
        b = range(size, len(seq) + 1, size)
        for i, j in zip(a, b):
            yield seq[i:j]
    

    and use it like this:

    for i in chunks('abcdef', 2):
        print(i)
    
  • A generator function would be neat:

    def batch_gen(data, batch_size):
        for i in range(0, len(data), batch_size):
                yield data[i:i+batch_size]
    

    Example use:

    a = "abcdef"
    for i in batch_gen(a, 2): print i
    

    prints:

    ab
    cd
    ef
    
  • Don't forget about the zip() function:

    a = 'abcdef'
    for x,y in zip(a[::2], a[1::2]):
      print '%s%s' % (x,y)
    
    culebrón : a very elegant solution!
  • but the more general way would be (inspired by this answer):

    for i in zip(*(seq[i::size] for i in range(size))):
        print(i)                            # tuple of individual values
    
    kigurai : +1 for elegant answer! But, there is one ")" too much in the end of the for-line
  • I've got an alternative approach, that works for iterables that don't have a known length.

       
    def groupsgen(seq, size):
        it = iter(seq)
        while True:
            values = ()        
            for n in xrange(size):
                values += (it.next(),)        
            yield values    
    

    It works by iterating over the sequence (or other iterator) in groups of size, collecting the values in a tuple. At the end of each group, it yield the tuple.

    When the iterator runs out of values, it produces a StopIteration exception which is then propagated up, indicating that groupsgen is out of values.

    It assumes that the values come in sets of size (sets of 2, 3, etc). If not, any values left over are just discarded.

  • How about itertools?

    from itertools import islice, groupby
    
    def chunks_islice(seq, size):
        while True:
            aux = list(islice(seq, 0, size))
            if not aux: break
            yield "".join(aux)
    
    def chunks_groupby(seq, size):
        for k, chunk in groupby(enumerate(seq), lambda x: x[0] / size):
            yield "".join([i[1] for i in chunk])
    
    dbr : To make code-blocks, you indent the code by four spaces (instead of using
     tags), the "101010" button in the editor toolbar does this for the selected text too
                                        
  • >>> a = "abcdef"
    >>> size = 2
    >>> [a[x:x+size] for x in range(0, len(a), size)]
    ['ab', 'cd', 'ef']
    

    ..or, not as a list comprehension:

    a = "abcdef"
    size = 2
    output = []
    for x in range(0, len(a), size):
        output.append(a[x:x+size])
    

    Or, as a generator, which would be best if used multiple times (for a one-use thing, the list comprehension is probably "best"):

    def chunker(thelist, segsize):
        for x in range(0, len(thelist), segsize):
                yield thelist[x:x+segsize]
    

    ..and it's usage:

    >>> for seg in chunker(a, 2):
    ...     print seg
    ... 
    ab
    cd
    ef
    
  • And then there's always the documentation.

    def pairwise(iterable):
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        try:
            b.next()
        except StopIteration:
            pass
        return izip(a, b)
    
    def grouper(n, iterable, padvalue=None):
        "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
        return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
    

    Note: these produce tuples instead of substrings, when given a string sequence as input.

  • 
    s = 'abcdefgh'
    for e in (s[i:i+2] for i in range(0,len(s),2)):
      print(e)
    

0 comments:

Post a Comment