Android,iOS,Gadgets,Reviews Everything About Technology

Understand the function of the grouper function in Python

484

Often on StackOverflow ask the question “How to break a sequence into parts of equal size?” . Let’s try to figure it out.

If this is really a sequence, and not an arbitrary iterated object, then you can break it using slices. However, the documentationitertools has a great way to do this using the function grouper.

Despite the fact that the example is very short and simple, not everyone understands how it works. And to understand it is worth it, because if you understand the device grouper, it will be easier to understand the more complex programming, based on iterators.

- Advertisement -

Read more: Python use of regular expressions

Pairs of objects

Let’s start with a simpler function to group an iterator over objects of even length into an iterator over pairs of objects:

def pairs(iterable):
       it = iter(iterable)
       return zip(it, it)

How it works?

First, we create an iterator on the iterated object. An iterator is an object that can render elements one at a time when the function is called next(). The iterator itself is an iterated object: when called, iter()it returns itself. Most often we find iterators in the form of objects that are returned by the generators, features mapfilterzip, functions in itertools, etc. However, we can create an iterator for any iterated object using the functioniter . For example:

>>> a = range(5) 
>>> list(a)
[0, 1, 2, 3, 4]
>>> list(a)
[0, 1, 2, 3, 4]
>>> i = iter(a) 
>>> list(i)
[0, 1, 2, 3, 4]
>>> list(i)
[]

As we walked through iduring the first call, during the second there is already nothing. For a better understanding, let’s take a look at functions like isliceor takewhilethat absorb only a part of the iterator:

>>> from itertools import islice
>>> i = iter(a)
>>> list(islice(i, 3))
[0, 1, 2]
>>> list(islice(i, 3))
[3, 4]

Perhaps you are wondering what would have happened, be an aiterator from the beginning. In this case, itersimply returns a.

If we have two references to the same iterator, and we use one of them, then the second one is also used. Creating two separate iterators for the same iterated object allows this to be avoided. For example:

>>> a = range(10)
>>> i1, i2 = iter(a), iter(a)
>>> list(i1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(i2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i1 = i2 = iter(a) 
>>> list(i1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(i2)
[]

Of course, if it was aalready an iterator, the call iter(a) would return to us  аtwice, and the results of the first and second examples would be the same.

And what happens if you apply a function zipto two references to one iterator? Each will receive through one value:

>>> i1 = i2 = iter(a)
>>> list(zip(i1, i2))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

If it’s hard for you to understand why this happens, look at how this simplified zipversion works on pure Python:

def fake_zip(i1, i2):
     while True:
         v1 = next(i1)
         v2 = next(i2)
         yield v1, v2

If i1i2is the same iterator, after the line v1 = next(i1) i1and i2will point to the next value after v1, so   v2 = next(i2)will get this value.

Note. trans.This infinite loop will end with an exception StopIterationwhen one of the iterators is finished. But since version Python 3.7, the thrown exception StopIterationinside the generator will not lead to its normal end, but will be re- thrown from the outside as RuntimeError. Thus, to support new versions of Python, this code needs to be modified. More details can be found on the official Python website .

That’s all there is to know about the function pairs.

Parts of an arbitrary size

And how do I make n references to the same iterator? There are several ways for this, but here is the simplest:

args = [iter(iterable)] * n

Now we need to figure out how to use zipin this case. Since it ziptakes any number of arguments, and not limited to two, we can use the unpacking of the argument list:

zip(*args)

And now we can almost write grouper:

def grouper(iterable, n):
     args = [iter(iterable)] * n
     return zip(*args)

Parts of different sizes

And finally, what will happen if the number of elements can not be divided into parts of equal size? For example, what happens if you want to get groups of 3 elements for range(10)? Here are a few possible options:

  1. [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
  2. [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 0, 0)]
  3. [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
  4. [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
  5. An exception will be thrown ValueError

Using zip, we get the fourth result: an incomplete group does not appear at all. Sometimes this is exactly what you need. But often you most likely want to get one of the first two results.

To achieve this, the library itertoolshas a function zip_longestizip_longestin Python 2.x). Instead of missing values, it inserts Noneeither an argument fillvalue. For example:

>>> list(zip_longest(*iters, fillvalue=0))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 0, 0)]

And here we have everything you need to write grouperand understand how it works:

def grouper(iterable, n, fillvalue=None):
       # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
       args = [iter(iterable)] * n
       return zip_longest(*args, fillvalue=fillvalue)
Comments