Extending Iterators for use as Queue

Version:0.2
Date:2007-01-15
Copyright:2005, 2007 Guenter Milde. Released under the terms of the GNU General Public License (v. 2 or later)
Changelog:2005-06-29 Initial version 2007-01-07 literate version, more examples
Abstract:There are many variants of “rich iterators” with varying efficiency, conventions, naming, and behaviour. This survey will compare them and provide a case for the inclusion of a “rich iterator wrapper” to the Python Standard Library
"""iterqueue: mutable iterators

Classes for "extended iterators" with methods to let iterators be used as
queue

   `push` or
   `append left`     -- push back values
   `peek`           -- get a value without "using it up"
   `__bool__`    -- test for empty iterator

"""

Imports

The itertools module provides a set of building blocks for the work with iterators (but misses a class for “mutable” iterators).

import itertools

The collections module with the efficient double-sided queue was introduced in Python 2.4. The following construct provides a minimal compatibility definition if it is not available:

try:
    from collections import deque
except ImportError:
    class deque(list):
        def appendleft(self, value):
            self.insert(0, value)

Iterables and Iterators

Iterables and iterators are defined by the iterator protocol as laid out in the section on Iterator Types in the Python Library Reference`:

Iterables:

One method needs to be defined for container objects to provide iteration support:

__iter__():Return an iterator object. […] If a container supports different types of iteration, additional methods can be provided to specifically request iterators for those iteration types. […]
def is_iterable(object):
    """Check if the argument is iterable"""
    return hasattr(object, "__iter__") and is_iterator(iter(object))
Iterators:

The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:

__iter__():

Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements…

__next__():

Return the next item from the container. If there are no further items, raise the StopIteration exception.

[…] once an iterator’s next() method raises StopIteration, it will continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

Check if an object is a Python3 iterator. For Python 3, we should use ABC’s: collections.Iterator as a superclass

def is_iterator(object):
    """check if the argument is an iterator"""
    if not hasattr(object, "__iter__"):
        return False
    return (object.__iter__() is object) and hasattr(object, "__next__")

Try it:

>>> import iterqueue
>>> iterqueue.is_iterator(23)
False
>>> iterqueue.is_iterator(iter(range(3)))
True

The iterator protocol was primarily designed to be the minimum necessary to work in for statements, translating (behind the scene):

|  for item in iterable:
|      <statements>

into the equivalent of:

|  iterator = iter(iterable)
|  while 1:
|    try:
|          item = next(iterator)
|    except StopIteration: break
|          <statements>

To add iterator behaviour to your classes, define an __iter__ method which returns an object with a __next__ method. If the class defines __next__, then __iter__ can just return self. (tutorial chapter on iterators)

Python’s generators provide a convenient way to implement the iterator protocol. Generator objects are returned by generator functions (functions with the yield keyword, new in 2.3) and generator expressions (new in 2.4).

Limitations of iterator objects

Most built-in Python iterator objects (including generator objects) are non-mutable (except the call to the __next__ method). They “produce the data just in time”, which is fast and memory efficient.

However:

  1. In some occasions, it is important to

    • find out whether an iterator is empty or
    • to “peek” at a data value

    without advancing the iterator.

  2. In a state machine, an iterator holding the input values can be passed around to the state handling functions. If a state handler realises that a value should be processed by another state handler, it needs to “push it back”.

  3. One might want modify the object iterated over in a for statement.

    Generally, the object in a for statement can not be changed inside the loop.

    >>> from collections import deque
    >>> it = deque(range(3))
    >>> for v in it:
    ...     print( v, sep=',')
    ...     if v == 1:
    ...        it.appendleft("eins")
    ...
    Traceback (most recent call last):
      File "doctest.py", line 1248, in __run
        compileflags, 1) in test.globs
      File "<doctest iterqueue.py.txt[8]>", line 1, in ?
        for v in it:
    RuntimeError: deque mutated during iteration
    

Pushing the limits

There are many ways to live with the limits of iterators. Most often it helps to get a true understanding of their nature and try to count for it in the code. However, the “never ending” discussion and varying recipes for enhanced iterators show the ongoing public demand. This is why I argue for the inclusion of a ‘rich iterator’ wrapper class into the standard library based on the standardisation argument in the itertools module.

Standardisation helps avoid the readability and reliability problems which arise when many different individuals create their own slightly varying implementations, each with their own quirks and naming conventions.

Recode to work with iterators as they are

The most straightforward way is to translate code like

>>> def print_first(l):
...     if not l:
...         print( "list empty" )
...     else:
...         print( l[0] )
...
>>> print_first([1, 2])
1
>>> print_first([])
list empty

into something in the line of

>>> def print_next(it):
...     try:
...         value = next(it)
...     except StopIteration:
...         print( "list empty" )
...     else:
...         print( value )
...
>>> print_next(iter([1, 2]))
1
>>> print_next(iter([]))
list empty

In a for statement, the else keyword can be utilised to call an expression (or a block) if the end of the iterator is reached:

>>> def find_five(iterable):
...     for i in iterable:
...         if i == 5:
...             print( "5 found" )
...             break
...     else:
...         print( "5 not found" )

If the loop is aborted, the else clause is skipped

>>> find_five(range(7))
5 found

Otherwise it prints its message:

>>> find_five(range(3))
5 not found

However, there might be cases where this is not applicable and a test for the emptiness or a peek at the first value without advancing the iterator would enable much cleaner code.

Use a container object

One could wrap e.g. a generator into a list or collections.deque to add random access as well as extensibility.

>>> que = deque(range(3))
>>> que.appendleft("foo")
>>> print( que )
deque(['foo', 0, 1, 2])

However, it will put all iterator values into memory which becomes a problem for large iterators (and is non-feasible for unlimited iterators).

Also, iterating in a for statement will loose the rich behaviour. Instead a construct with a while statement is needed, e.g:

>>> que = deque(range(3))
>>> while que:
...     v = que.popleft()
...     print( v, end=',' )
...     if v == 1:
...        que.appendleft("eins")
...
0,1,eins,2,

Use a rich iterator

If the argument of a for statement is an iterator (whose __iter__ method returns self), it is available unchanged inside the loop. A rich iterator provides additional methods besides the ones required for the iterator protocol.

State Reporting Iterator

An iterator that returns an indicator “full or empty” (values waiting or not) when converted to Boolean will be called state reporting iterator:

def is_state_reporting(object):
    return hasattr(object, "__bool__") or hasattr(object, "__len__")

Peekable Iterator

An iterator that provides a peek method will be called a peekable iterator:

def is_peekable(object):
    return hasattr(object, "peek")

Push Iterator

An iterator that provides for push-back will be called push-iterator:

def is_pushable(object):
    return hasattr(object, "appendleft") or hasattr(object, "push")

Push iterators can be easily extended with peek and test of emptiness (see PushIterator).

Iterator Queue

An iterator that also provides methods for appending and extending will be called iterator_queue.

Methods that need access from the “right” side or knowledge of the length of the iterator are not included in the iterator_queue specification as they clash with the “just in time” acquisition of the values that give iterators time and memory advantage over sequences.

def is_iterator_queue(object):
    return (is_state_reporting(object)
            and hasattr(object, "append")
            and hasattr(object, "appendleft")
            and hasattr(object, "extend")
            and hasattr(object, "extendleft")
            and hasattr(object, "clear")
            and hasattr(object, "rotate")
           )

Rich iterator examples

The following examples are the result of a net survey and own ideas. The will be compared and profiled in this paper.

All of them are iterator-wrappers:

def is_iterator_wrapper(obj):
    """Try if obj can wrap an iterator"""
    try:
        it = obj(range(1))
    except:
        return False
    try:
        return is_iterator(it)
    except:
        return False

Xiterwrapper

Tom Andersson suggested in the Python list an xiterable protocol for extended iterables and a wrapper to convert a “traditional” iterator to an extended version

def xiter(iterable):
    if (hasattr(iterable, "__xiter__")):
        return iterable.__xiter__()
    else:
        return xiterwrapper(iter(iterable))

class xiterwrapper(object):
    def __init__(self, it):
        self.it = it
        self.advance()
    def hasNext(self):
        return hasattr(self, "_next")
    def __next__(self):
        try:
            cur = self._next
            self.advance()
            return cur
        except AttributeError:
            raise StopIteration
    def peek(self):
        try:
            return self._next
        except AttributeError:
            raise StopIteration
    def advance(self):
        try:
            self._next = next(self.it)
        except StopIteration:
            if (hasattr(self, "_next")):
                del self._next
    def __xiter__(self):
        return self
    def __iter__(self):
        return self

Usage

>>> import iterqueue
>>> it = iterqueue.xiter(range(3))
>>> iterqueue.is_iterator(it)
True
>>> iterqueue.is_peekable(it)
True
>>> iterqueue.is_pushable(it)
False

We add the __bool__ method for a non-destructive test of waiting values to add the state reporting feature:

__bool__ = hasNext
>>> iterqueue.is_state_reporting(it)
True

Adding a push method is not possible without major changes to the code.

IteratorWrapper BFL

In a post on python-3000 Guido van Rossum argued against inclusion of an “emptiness” test in the iterator protocol, as “that’s just not something that generators can be expected to support” and hence would exclude generators from the definition of an iterator.

… you can always write a helper class that takes an iterator and returns an object that represents the same iterator, but sometimes buffers one element. But the buffering violates the coroutine-ish properties of generators, so it should not be the only (or even the default) way to access generators. …

Here’s a sample wrapper (untested)

class IteratorWrapperBFL(object):
    def __init__(self, it):
        self.it = iter(it)
        self.buffer = None
        self.buffered = False
        self.exhausted = False
    def __iter__(self):
        return self
    def __next__(self):
        if self.buffered:
            value = self.buffer
            self.buffered = False
            self.buffer = None
            return value
        if self.exhausted:
            raise StopIteration()
        try:
            return next(self.it)
        except StopIteration:
            self.exhausted = True
            raise
    def __bool__(self):
        if self.buffered:
            return True
        if self.exhausted:
            return False
        try:
            self.buffer = next(self.it)
        except StopIteration:
            self.exhausted = True
            return False
        self.buffered = True
        return True

This example provides an “emptiness” test but no peek or push-back:

>>> it = iterqueue.IteratorWrapperBFL(range(3))
>>> iterqueue.is_state_reporting(it)
True

Peeking could be easily added, though:

def peek(self):
    self.buffer = next(self)
    self.buffered = True
    return self.buffer
>>> iterqueue.is_peekable(it)
True

IteratorWrapper DD

Daniel Dittmar wrote on Di 22 Jul. 2003 on comp.lang.python

It shouldn’t be too difficult to write an iterator wrapper class that does exactly what you want (not tested):

class IteratorWrapperDD:
    def __init__ (self, iterArg):
        iterArg = iter (iterArg)
        try:
            self.firstElement = next(iterArg)
            self.isEmpty = false
            self.__next__ = self.returnFirstElement
            self.baseIter = iterArg
        except StopIteration:
            self.isEmpty = true
            self.__next__ = self.throwStopIteration

    def returnFirstElement(self):
        self.__next__ = self.baseIter.__next__
        return self.firstElement

    def throwStopIteration(self):
        raise StopIteration

PushIterator

In the slides to the Effective Python Programming OSCON 2005 tutorial by Anthony Baxter, I found a genially simple example for an iterator with a push method.

class PushIterator:
    def __init__(self, iterable):
        """Store iterator as data argument and set up cache"""
        self.it = iter(iterable)
        self.cache = []

    def __iter__(self):
        return self

    def __next__(self):
        """Return next value (from cache or iterator)"""
        if self.cache:
            return self.cache.pop()
        return next(self.it)

    def push(self, value):
        """Push back one value (will become the `next` value)"""
        self.cache.append(value)

Once push is defined, it is easy to add peek and __bool__.

The question arises, what should be returned by peek() if the iterator is empty. The easiest option is to raise StopIteration, but this might come unhandy in some cases. My proposal is to add an optional default argument, which is returned in case the iterator is empty. (As there is no sensible default value for the default argument, it cannot be implemented as keyword arg, instead an argument list is used):

def peek(self, *defaults):
    """Return next value but do not advance the iterator"""
    try:
        value = next(self)
    except StopIteration:
        if defaults:
            return defaults[0]
        raise
    self.push(value)
    return value

def __bool__(self):
    """Test whether the iterator is empty"""
    try:
        self.peek()
    except StopIteration:
        return False
    return True

An alias makes the class more compatible with collections.deque

appendleft = push

Optimisation of peek and __bool__ is is left out in favour of improved clarity.

Usage

Create an instance from an iterable object:

>>> it = iterqueue.PushIterator(range(4))

Test for values:

>>> bool(it)
True

Have a peek …

>>> it.peek(None)
0

the value is still there:

>>> next(it)
0

See what is left

>>> [i for i in it]
[1, 2, 3]

It should be empty now:

>>> bool(it)
False

So a peek will return the default:

>>> print( it.peek(None) )
None

PushIt

The wrapping of an iterator in a class leads to performance loss, as every call to next() is a relatively costly function call.

Remapping of self.__next__ leads to a more more efficient implementation of the PushIterator for the case that peek or push is called far less frequently than __next__ (‘normal’ iterating with occasional peek or backtrack).

class PushIt(PushIterator):
    def __init__(self, iterable):
        self.it = iter(iterable)
        self.cache = []
        self.__next__ = self.it.__next__

    def _next(self):
        """Return next element. Try cache first."""
        if self.cache:
            return self.cache.pop()
        self.__next__ = self.it.__next__
        return next(self)

    def push(self, value):
        """Push back one value to the iterator"""
        self.cache.append(value)
        self.__next__ = self._next

    def peek(self):
        """Return next value but do not advance the iterator"""
        if self.cache:
            return self.cache[-1]
        value = next(self.it)
        self.push(value)
        return value

IterQueue

The IterQueue class adds iterator behaviour to a double-ended queue:

class IterQueue(deque):
    """Iterator object that is also a queue"""
    def __iter__(self):
        return self
    def __next__(self):
        try:
            return self.popleft()
        except IndexError:
            raise StopIteration
    #
    def peek(self):
        """Return next value but do not advance the iterator"""
        value = next(self)
        self.appendleft(value)
        return value

Usage

Creating an instance wraps an iterable in an iterator queue

>>> it = iterqueue.IterQueue(range(3))

which is an iterator according to the iterator protocol with “queue” methods

>>> iterqueue.is_iterator_queue(it)
True

We can test whether there is data in the iterator or get the length of it:

>>> bool(it)
True
>>> len(it)
3

It is possible to modify this iterator in the middle of a for statement:

>>> for v in it:
...     print( v, end=',' )
...     if v == 1:
...        it.appendleft("eins")
...
0,1,eins,2,

As iteration is done on the object itself and not on a copy, it is exhausted now:

>>> print( it )
IterQueue([])

(the iterator advertises itself as deque, as we did not override the __str__ method)

We can make up for this

def __str__(self):
    return "{0}({1})".format( self.__class__.__name__, list(self) )

but there might be other problems left…

Problems

Converting an iterable to a collections.deque object creates a list of all values in the memory, loosing the memory saving advantages of generator objects with “just in time” production of the data.

Printing (and probably other uses as well) “use up” the iterator

>>> it = iterqueue.IterQueue(range(3))
>>> print( it )
IterQueue([0, 1, 2])
>>> print( it )
IterQueue([])

IQueue

The following class implements an iterator queue that is

  • memory efficient, as generators are kept as generators
  • mostly compatible to collections.deque (offering all methods of a deque for appends)

It does not offer

  • random access to the values, nor
  • pop from the right end,

as this would require to convert the iterator to a sequence loosing the memory-saving advantage.

Iterating over instances is less fast, as the __next__() method is redefined (a function call is needed for every step). Implementing in C would help to improve speed.

But,

itertools.queue() was rejected because it didn’t fit naturally into applications – you had to completely twist your logic around just to accommodate it. Besides, it is already simple to iterate over a list while appending items to it as needed.

—Raymond Hettinger 03-13-05 http://www.codecomments.com/message423138.html

However, both, the speed increase as well as the standardisation argument given for the itertools hold also in this case Maybe IQueue should become a collections candidate?

class IQueue:
    """Iterator with "on-line extensibility"

    An iterator with methods to append or extend it from left or right
    (even while the iterator is in use).

    Can be conceived as a mixture of `itertools.chain` and
    `collections.deque`.

    As `__next__` is redefined, there is a performance loss when iterating
    over large iterators.
    """

    def __init__(self, *iterables):
        """Convert `iterables` to a queue object"""
        self.iterators = deque(iterables)
    #
    def __iter__(self):
        return self

    def __next__(self):
        while True:
            try:
                return next(self.iterators[0])
            except AttributeError:      # convert iterable to iterator
                self.iterators[0] = iter(self.iterators[0])
            except StopIteration:       # switch to next iterator
                del(self.iterators[0])
            except IndexError:          # all iterators exhausted
                raise StopIteration
    #
    def append(self, value):
        """append `value` to self

        The value is wrapped in an iterable and
        appended to the queue of iterables
        """
        self.iterators.append(iter((value,)))
    #
    def appendleft(self, value):
        """Prepend one (scalar) value to the iterator.

        The value is wrapped in an iterable and
        inserted at the first position in the list of iterables
        """
        self.iterators.appendleft(iter((value,)))
    #
    def clear(self):
        """Remove all elements from the iterator.
        """
        self.iterators.clear()
    #
    def extend(self, iterable):
        """append `iterable` to self"""
        self.iterators.append(iter(iterable))
    #
    def extendleft(self, iterable):
        """prepend `iterable` to self"""
        self.iterators.appendleft(iter(iterable))
    #
    def peek(self):
        """Return the next value without advancing the iterator

        Yield next value but push back a copy of the result.
        This way you may "peak" at an iterator without loss.

        Raises `StopIteration` if the iterator is empty.
        """
        value = next(self)
        self.iterators.appendleft(iter((value,)))
        return value
    #
    def rotate(self, n=1):
        """append the next `n` values to the end of the iterator

        Similar to `container.deque.rotate`, but
         * negative `n` leads to error
         * a list of the `n` rotated values is returned
        """
        result = list(itertools.islice(self, n))
        self.iterators.append(result)
        return result

    #
    def __repr__(self):
        """Return a string representation"""
        return "IQueue({0!r})".format(self.iterators)
    #
    def  __bool__(self):
        """Test for a non-zero length of the iterator"""
        if len(self.iterators) > 1:
            return True
        try:
            self.peek()
        except StopIteration:
            return False
        return True

XIter

The XIter class is an optimised version of the IQueue for the case when appending of a value is a done less frequently than calling next. It could do so by aliasing next to the underlying iterators next method in case there is only one iterator in the iterables chain.

class XIter:
    """'Mutable iterator' class"""
    def __init__(self, *iterables):
        self.iterators = deque(iter(i) for i in iterables)
        #if len(self.iterators) is 1:
        #    self.__next__ = self.iterators[0].__next__
        #else:
        #    self.__next__ = self._next     # iterate over argument
    #
    def __iter__(self): return self        # "I am an iterator!"
    #
    def __next__(self):
        """get next in turn if there are more than one iterators"""
        try:
            return next(self.iterators[0])
        except StopIteration:
            del(self.iterators[0])             # switch to next iterator
            if len(self.iterators) == 0: raise
            #assert len(self.iterators) >= 1
            #if len(self.iterators) is 1:
            #    self.__next__ = self.iterators[0].__next__
            return next(self)
        except IndexError:
            raise StopIteration
    #
    def append(self, element):
        """append `element` to self"""
        self.iterators.append(iter((element,)))
        #self.__next__ = self._next        # iterate over cache
    #
    def appendleft(self, element):
        """prepend `element` to self"""
        self.iterators.appendleft(iter((element,)))
        #self.__next__ = self._next        # iterate over cache
    #
    def extend(self, iterable):
        """append `iterable` to self"""
        self.iterators.append(iter(iterable))
        #self.__next__ = self._next        # iterate over cache
    #
    def extendleft(self, iterable):
        """prepend `iterable` to self"""
        self.iterators.appendleft(iter(iterable))
        #self.__next__ = self._next        # iterate over cache

    #
    def peek(self):
        """Return the next value without advancing the iterator

        Yield next value but push back a copy of the result.
        This way you may "peak" at an iterator without loss.

        Raises `StopIteration` if the iterator is empty.
        """
        value = next(self)
        self.appendleft(value)
        return value
    #
    def rotate(self, n=1):
        """append the next `n` values to the end of the iterator

        Similar to `container.deque.rotate`, but
         * negative `n` leads to error
         * a list of the `n` rotated values is returned
        """
        result = list(itertools.islice(self, n))
        self.iterators.append(result)
        return result

    #
    def __repr__(self):
        """Return a string representation"""
        return "XIter({0!r})".format(self.iterators)
    #
    def __bool__(self):
        """Test for a non-zero length of the iterator"""
        if len(self.iterators) > 1:
            return True
        try:
            self.peek()
        except StopIteration:
            return False
        return True

Some optimisation could be done adapting a round-robin example posted by R. Hettinger on 2004-04-30 15:58 in comp.lang.python

##
# For the record, here a simple and efficient roundrobin task
# server based on collections.deque:
#
# def roundrobin(*iterables):
#     pending = deque( iter(i).__next__ for i in iterables)
#     gettask, scheduletask = pending.popleft, pending.append
#     while pending:
#         task = gettask()
#         try:
#             yield task()
#         except StopIteration:
#             continue
#         scheduletask(task)
#
# for value in roundrobin('abc', 'd', 'efgh'):
#     print( value )

Do a doctest if the module is run in nosetests:

def test():
    import doctest
    print( "doctest" )
    doctest.testmod( verbose=True )

if __name__ == "__main__":
    test()