Simple understanding of generator based state machines in Python

  • 2020-05-09 18:44:43
  • OfStack

The   simple generator has many advantages. In addition to being able to express a flow of type 1 problems in a more natural way, the generator greatly improves many of the inefficiencies. In Python, function calls are expensive; Among other things, it will take a while to work through the function parameter list (analyzing the location parameters and the default parameters, among other things). There are also some setup steps to take to initialize the framework object (according to Tim Peters on comp.lang.python, there are more than 100 lines of C language programs; I haven't checked the Python source code myself. In contrast, restoring a generator is quite effortless; The parameters have been parsed, and the framework object is "sitting idle" waiting for recovery (with little extra initialization required). Of course, if speed is of the essence, you shouldn't use dynamic languages where bytecode has been compiled; But even when speed isn't a major concern, it's better to be faster than slower.
Recall state machine

In another article before "cute Python," I introduced the StateMachine class, which allows users to add as many state handlers as a given machine needs. In the model, one or more states are defined as the final state (end state) and only one state is defined as the initial state (start state) (which is configured by calling class methods). Each handler has some kind of required structure; The handler will perform a series of 1 operations, and after a while, it will return to the loop in the StateMachine.run () method with a flag that indicates the desired next state. Similarly, the cargo variable allows one state to pass some (unprocessed) information to the next state.

The typical use of the StateMachine class I described is to use input in a stateful manner. For example, one of the text processing tools I used (Txt2Html) reads several lines from a file; Each row needs to be treated in a special way, depending on the category it belongs to. However, you often need to look at the context provided by the previous lines to determine which category the current row belongs to (and what to do with it). An implementation of this procedure built on the StateMachine class can define an A handler that reads a few lines and then processes them in a manner similar to A. Before long, a condition is met so that the next batch of lines should be processed by the B handler. A passes the control back to the.run () loop, indicating a switch to the B state, as well as any additional lines that A cannot handle correctly and that B should process before reading the additional lines. Finally, a handler passes its control to a state designated as an end state and the processing stops (halt).

For the specific code example in the previous section, I used a simplified application. Instead of dealing with multiple lines of content, I deal with a digital stream generated by an iterative function. Each state handler prints only those Numbers that are in the range of the desired number (and a few messages about the valid state). When one of the Numbers in the stream moves to a different range, a different handler takes over the "processing". For this part, we'll look at another way to implement the same digital stream processing with a generator (with some extra tricks and features). However, an example of a more complex generator might handle the input stream more like the one mentioned in the previous paragraph. Let's take a look at the truncated version of the previous state machine:
Listing 1. statemachine_test. py


from statemachine import StateMachine
def ones_counter(val):
  print "ONES State:  ",
  while 1:
    if val <= 0 or val >= 30:
      newState = "Out_of_Range" ; break
    elif 20 <= val < 30:
      newState = "TWENTIES";   break
    elif 10 <= val < 20:
      newState = "TENS";     break
    else:
      print " @ %2.1f+" % val,
    val = math_func(val)
  print " >>"
  return (newState, val)
# ... other handlers ...
def math_func(n):
  from math import sin
  return abs(sin(n))*31
if __name__== "__main__":
  m = StateMachine()
  m.add_state("ONES", ones_counter)
  m.add_state("TENS", tens_counter)
  m.add_state("TWENTIES", twenties_counter)
  m.add_state("OUT_OF_RANGE", None, end_state=1)
  m.set_start("ONES")
  m.run(1)

Readers who are next interested in the imported StateMachine class and how its methods work should take a look at the previous article.


Use generator

The full version of the generator-based state machine is slightly longer than the code sample I'd like to cover in this column. However, the code sample below is a complete application and does not need to import a separate statemachine module to provide support. Overall, this version is a bit shorter than the class-based version (we'll see that it's a bit special and quite powerful).
Listing 2. stategen_test. py


from __future__ import generators
import sys
def math_gen(n):  # Iterative function becomes a generator
  from math import sin
  while 1:
    yield n
    n = abs(sin(n))*31
# Jump targets not state-sensitive, only to simplify example
def jump_to(val):
  if  0 <= val < 10: return 'ONES'
  elif 10 <= val < 20: return 'TENS'
  elif 20 <= val < 30: return 'TWENTIES'
  else:        return 'OUT_OF_RANGE'
def get_ones(iter):
  global cargo
  while 1:
    print "\nONES State:   ",
    while jump_to(cargo)=='ONES':
      print "@ %2.1f " % cargo,
      cargo = iter.next()
    yield (jump_to(cargo), cargo)
def get_tens(iter):
  global cargo
  while 1:
    print "\nTENS State:   ",
    while jump_to(cargo)=='TENS':
      print "#%2.1f " % cargo,
      cargo = iter.next()
    yield (jump_to(cargo), cargo)
def get_twenties(iter):
  global cargo
  while 1:
    print "\nTWENTIES State: ",
    while jump_to(cargo)=='TWENTIES':
      print "*%2.1f " % cargo,
      cargo = iter.next()
    yield (jump_to(cargo), cargo)
def exit(iter):
  jump = raw_input('\n\n[co-routine for jump?] ').upper()
  print "...Jumping into middle of", jump
  yield (jump, iter.next())
  print "\nExiting from exit()..."
  sys.exit()
def scheduler(gendct, start):
  global cargo
  coroutine = start
  while 1:
    (coroutine, cargo) = gendct[coroutine].next()
if __name__ == "__main__":
  num_stream = math_gen(1)
  cargo = num_stream.next()
  gendct = {'ONES'    : get_ones(num_stream),
       'TENS'    : get_tens(num_stream),
       'TWENTIES'  : get_twenties(num_stream),
       'OUT_OF_RANGE': exit(num_stream)     }
  scheduler(gendct, jump_to(cargo))

There is much to learn about state machines based on generators. Point 1 is largely superficial. We've arranged that stategen_test.py can only use functions, not classes (at least in my sense, generators have a more functional programming feel than object-oriented programming (OOP)). However, you can easily wrap the same generic model into one or more classes if you wish.

The main function in our sample is scheduler(), which is completely 1-like (but much shorter than StateMachine in the previous model). The function scheduler() requires a generator - iterator object dictionary (" instantiated "generator) as a parameter. The string name for each generator can be any name you want, and the literal name of the generator factory function is an obvious choice, but I use uppercase keyword names in my example. The scheduler() function also takes "initial state" as an argument, but you can probably automatically select a default if you want.

Each "scheduled" generator follows a few simple conventions. Each generator runs for a period of time, and then produces a pair of values containing the expected jump and some "cargo" just as in the previous model 1. No generator is explicitly marked as an "end state." Instead, we allow each generator to choose to generate an error to end scheduler(). In special cases, if the generator "leaves" the terminal state or reaches an return state, the generator will generate an StopIteration exception. You can catch this exception (or a different exception) if you want. In our example, we use sys.exit () to terminate the application, which we will encounter in the exit() generator, sys.exit ().

Notice two small things about the code. The sample above USES a cleaner loop generator - iterator, rather than using an iteration function to generate our sequence of Numbers. The generator emits only one (infinite/indeterminate) stream of Numbers with each subsequent call, rather than continuously returning the "last value". This is a small but useful sample generator. Furthermore, the "state transition" is isolated in a separate function. In a real program, a state transition jump is more context-dependent and may be determined in a real generator. This approach simplifies the sample. It may not be useful, but let's just say that we can simplify this one step further by using a function factory to generate generator functions. But in the 1 case, each generator will not be similar enough to the others to make this approach practical.

Coroutines and semi-coroutines

Careful readers may have noticed that we actually stumbled into a much more useful flow control structure than initially indicated. In the sample code, there is more than just a state machine. In fact, the pattern above is a very efficient coroutine common system. Most readers will probably need some background here.

A coroutine is a collection of program functions that allow arbitrary branching into other control contexts and arbitrary recovery flows from branch points. The subroutine we are familiar with in most programming languages is a very limited branching case of a generic coroutine. The subroutine enters only from the top one fixed point and exits only once (it cannot be restored). The subroutine also always sends the stream back to its caller. In essence, each coroutine represents a callable continuation, although adding a new word does not necessarily clarify the meaning of the word to people who do not know it. The "Cocall Sequence Between Two Processes" illustration in Randall HydeAn The Art of Assembly goes a long way in explaining the coroutine. See resources for a link to this diagram. See resources for a link to Hyde's comprehensive discussion, which is quite good.

Regardless of the negative effects, you'll notice that the infamous goto statement in many languages also allows arbitrary branching, but in a less structured context, it can lead to "macaroni code."

The generator of Python 2.2+ takes a big step towards a coroutine. This 1 big step means that the generator and function/subroutine streams are recoverable and can be evaluated after multiple calls. However, the Python generator is nothing more than the "semi-coroutine" described by Donald Knuth. The generator is recoverable and can branch control elsewhere, but it can only branch control back to the caller who called it directly. Specifically, the generator context (like any context 1) can call other generators or functions on its own or even make its own recursive calls, but each final return must be passed through a linear hierarchy of the return context. The Python generator does not consider the common use of "producer" and "consumer" coroutines (feel free to continue from the middle of each other).

Fortunately, it is fairly easy to simulate a fully equipped coroutine with the Python generator. The easy trick is to use the scheduler() function similar to generator 10 in the sample code above. In fact, our proposed state machine itself is a much more common pattern of a coroutine framework. Adapting to this pattern can overcome the small flaws that still exist in the Python generator (allowing even a careless programmer to get the full power of macaroni code).


Stategen in operation

The easiest way to know exactly what's going on in stategen_test.py is to run it:
Listing 3. Running STATEGEN (manual jump control)


% python stategen_test.py
ONES State:    @ 1.0
TWENTIES State:  *26.1  *25.3
ONES State:    @ 4.2
TWENTIES State:  *26.4  *29.5  *28.8
TENS State:    #15.2  #16.3  #16.0
ONES State:    @ 9.5  @ 3.8
TENS State:    #18.2  #17.9
TWENTIES State:  *24.4
TENS State:    #19.6
TWENTIES State:  *21.4
TENS State:    #16.1  #12.3
ONES State:    @ 9.2  @ 6.5  @ 5.7
TENS State:    #16.7
TWENTIES State:  *26.4  *30.0
[co-routine for jump?] twenties
 ...Jumping into middle of TWENTIES
TWENTIES State:
TENS State:    #19.9
TWENTIES State:  *26.4  *29.4  *27.5  *22.7
TENS State:    #19.9
TWENTIES State:  *26.2  *26.8
Exiting from exit()...

This output is essentially the same as the output in the previous statemachine_test.py. Each line in the result represents the flow used in a particular handler or generator, respectively; The flow context is declared at the beginning of the line. However, whenever another coroutine branch goes into the generator, the generator version resumes execution (in a loop), rather than just calling the handler function again. Assuming that all the get_*() coroutines are contained in an infinite loop, this difference is less obvious.

To understand the essential differences in stategen_test.py, see what happens in the exit() generator. The first time a generator-iterator is called, a jump target is collected from the user (this is a simple case of event-driven branching decisions that might be exploited in a real-world application). However, when exit() is called again, it displays the exit message in a later stream context of the generator and calls sys.exit (). The user in the interaction sample can jump directly to "out_of_range" and exit without going to another "handler" (but it will perform a recursive jump into the same generator).


conclusion

As I said in the introduction, I expect the state machine version of the coroutine to run much faster than the "class-with-callback-handler" version. Recovery generators-iterators are much more efficient. The specific example is so simple that it is hardly a criterion, but I welcome reader feedback on the specific results.

But whatever progress the "coroutine pattern" I've described might make in terms of speed will be overshadowed by the amazing universal flow control it implements. Many readers on the comp.lang.python newsgroup have asked how generic Python's new generator is. I think the usability of the framework I described was answered by the following: "and what you want!" For most things related to Python, there are certain things that are usually much easier to program than to understand. Try my model; I think you'll find it useful.


Related articles: