Simply understand the generator based state machine in Python

  • 2020-05-07 20:01:16
  • OfStack

The   simple generator has many advantages. In addition to being able to express the flow of type 1 problems in a more natural way, the generator greatly improves many of the inefficiencies. In Python, function calls are expensive; Among other things, it will take a while to work out the function parameter list (analyzing, among other things, location parameters and default parameters). There are also 1 set of steps to initialize the framework object (more than 100 lines of C language, according to comp.lang.python; I haven't checked the Python source code myself. In contrast, restoring a generator is relatively painless; The arguments have been parsed, and the framework object is "idling around" waiting for recovery (with little extra initialization required). Of course, if speed is of the essence, you should not use dynamic languages where bytecode has been compiled; But even when speed isn't a major concern, it's better to be faster than slower.
recall state machine

In another article before "cute Python," I introduced the StateMachine class, which allows users to add as many state handlers as a given machine requires. In the model, one or more states are defined as final (end state), and only one state is defined as initial (start state) (which is configured by calling class methods). Each handler has some kind of required structure; The handler will perform a series of 1 operations, and after a while, it will return to the loop in the StateMachine.run () method with a flag that indicates the desired next state. Also, the cargo variable allows one state to pass one bit of information to the next state.

The typical use of the StateMachine class I described is to use the input in a stateful manner. For example, one of my text-processing tools (Txt2Html) reads several lines from a file; Depending on the category to which each line belongs, it needs to be processed in a special way. However, you often need to look at the context provided in the previous lines to determine which category the current row belongs to (and how it should be handled). An implementation of this procedure built on the StateMachine class can define an A handler that reads a few lines and then processes them in a manner similar to A. Before long, a condition is met so that the next batch of lines should be processed by the B processor. A passes control back to the.run () loop, indicating to switch to the B status and any additional lines that A cannot handle correctly and that B should handle before reading the additional lines. Finally, a handler passes its control to a state specified as an end state, and the processing stops (halt).

For the specific code example in the previous part, I used a simplified application. Instead of dealing with multiple lines of content, I deal with a digital stream generated by an iterative function. Each state handler prints only those Numbers that are in the expected range of Numbers (and a few messages about the valid state). When one of the Numbers in the stream moves to a different range, a different handler takes over. For this part, we'll look at another way to implement the same digital stream processing with a generator (with some extra tricks and functionality). However, an example of a more complex generator might handle the input stream more like the one mentioned in the previous paragraph. Let's look at the truncated version of the previous state machine:
Listing 1. statemachine_test. py


from statemachine import StateMachine
def ones_counter(val):
  print "ONES State:  ",
  while 1:
    if val <= 0 or val >= 30:
      newState = "Out_of_Range" ; break
    elif 20 <= val < 30:
      newState = "TWENTIES";   break
    elif 10 <= val < 20:
      newState = "TENS";     break
    else:
      print " @ %2.1f+" % val,
    val = math_func(val)
  print " >>"
  return (newState, val)
# ... other handlers ...
def math_func(n):
  from math import sin
  return abs(sin(n))*31
if __name__== "__main__":
  m = StateMachine()
  m.add_state("ONES", ones_counter)
  m.add_state("TENS", tens_counter)
  m.add_state("TWENTIES", twenties_counter)
  m.add_state("OUT_OF_RANGE", None, end_state=1)
  m.set_start("ONES")
  m.run(1)

Readers who are next interested in the imported StateMachine class and how its methods work should take a look at the previous article.


USES the generator

The full version of the generator-based state machine is slightly longer than the code sample I'd like to cover in this column. However, the following code sample is a complete application and does not need to import a separate statemachine module to provide support. In general, this version is a bit shorter than the class-based version (we'll see that it's a bit special and quite powerful).
Listing 2. stategen_test. py


from __future__ import generators
import sys
def math_gen(n):  # Iterative function becomes a generator
  from math import sin
  while 1:
    yield n
    n = abs(sin(n))*31
# Jump targets not state-sensitive, only to simplify example
def jump_to(val):
  if  0 <= val < 10: return 'ONES'
  elif 10 <= val < 20: return 'TENS'
  elif 20 <= val < 30: return 'TWENTIES'
  else:        return 'OUT_OF_RANGE'
def get_ones(iter):
  global cargo
  while 1:
    print "\nONES State:   ",
    while jump_to(cargo)=='ONES':
      print "@ %2.1f " % cargo,
      cargo = iter.next()
    yield (jump_to(cargo), cargo)
def get_tens(iter):
  global cargo
  while 1:
    print "\nTENS State:   ",
    while jump_to(cargo)=='TENS':
      print "#%2.1f " % cargo,
      cargo = iter.next()
    yield (jump_to(cargo), cargo)
def get_twenties(iter):
  global cargo
  while 1:
    print "\nTWENTIES State: ",
    while jump_to(cargo)=='TWENTIES':
      print "*%2.1f " % cargo,
      cargo = iter.next()
    yield (jump_to(cargo), cargo)
def exit(iter):
  jump = raw_input('\n\n[co-routine for jump?] ').upper()
  print "...Jumping into middle of", jump
  yield (jump, iter.next())
  print "\nExiting from exit()..."
  sys.exit()
def scheduler(gendct, start):
  global cargo
  coroutine = start
  while 1:
    (coroutine, cargo) = gendct[coroutine].next()
if __name__ == "__main__":
  num_stream = math_gen(1)
  cargo = num_stream.next()
  gendct = {'ONES'    : get_ones(num_stream),
       'TENS'    : get_tens(num_stream),
       'TWENTIES'  : get_twenties(num_stream),
       'OUT_OF_RANGE': exit(num_stream)     }
  scheduler(gendct, jump_to(cargo))

There is much more to learn about state machines based on generators. The first point is largely superficial. We've arranged that stategen_test.py can only use functions, not classes (at least in my sense, generators have a more functional programming feel than object oriented programming (OOP)). However, you can easily wrap the same generic model into one or more classes if you wish.

The main function in our sample is scheduler(), which is completely 1 general (but much shorter than StateMachine in the previous pattern). The function scheduler() requires a generator - iterator object dictionary (" instantiated "generator) as an argument. The string name for each generator can be any name you want, and the literal name of the producer factory function is an obvious choice, but I use uppercase keyword names in the example. The scheduler() function also takes "initial state" as an argument, but you can probably automatically select a default if you want.

Each "scheduled" generator follows a few simple conventions. Each generator runs for a period of time, and then produces a pair of values with the expected jump and some "cargo" just like in model 1 above. No generator is explicitly marked as an "end state". Instead, we allow each generator to choose to produce an error to end scheduler(). In special cases, if the generator "leaves" the terminal state or reaches an return state, the generator will produce an StopIteration exception. You can catch this exception (or a different exception) if you want. In our example, we used sys.exit () to terminate the application, which we encountered in the exit() generator, sys.exit ().

Notice two minor problems with the code. The above sample USES a cleaner loop generator - iterator instead of using an iteration function to generate our sequence of Numbers. The generator only emits one (infinite/indeterminate) stream of Numbers with each subsequent call, rather than continuously returning the "last value". This is a small but useful sample generator. Furthermore, the "state transition" is isolated in a separate function. In a real program, a state transition jump is more contextual and may be determined in an actual generator. This approach simplifies the sample. It might not be very useful, but let's just say that we can simplify one step further by generating generator functions from a function factory. But in general, each generator is not similar enough to the others to make this approach practical.

coroutine and semi-coroutine

Careful readers may have noticed that we've actually stumbled into a much more useful flow control structure than initially suggested. In the sample code, there is more than just a state machine. In fact, the above pattern is a very effective common system for coroutines. Most readers will probably need some background here.

A coroutine is a collection of program functions that allow arbitrary branching into other control contexts and arbitrary recovery flows from branch points. The subroutine that we are familiar with in most programming languages is a very limited branching case of a common coroutine. The subroutine enters only from the top one fixed point and exits only once (it cannot be restored). The subroutine also always sends the stream back to its caller. In essence, each coroutine represents a callable continuation, although adding a new word does not necessarily clarify the meaning of the word to those who do not know it. The "Cocall Sequence Between Two Processes" illustration in The Art of Assembly of Randall HydeAn goes a long way in explaining the coroutines. See resources for a link to this diagram. Resources also has a link to Hyde's comprehensive discussion, which is pretty good.

Regardless of the negative effects, you'll notice that the infamous goto statement in many languages also allows arbitrary branching, but in a less structured context, it can lead to "macaroni code."

The generator of Python 2.2+ takes a big step towards a coroutine. This 1 big step means that the generator and function/subroutine streams are recoverable and can be evaluated after multiple calls. However, the Python generator is nothing more than a "semi-coroutine" as described by Donald Knuth. The generator is recoverable and can branch control elsewhere, but it can only branch control back to the caller who directly invokes it. Rather, the generator context (like any context 1) can call other generators or functions on its own or even make recursive calls on its own, but each final return must be passed through a linear hierarchy of return contexts. The Python generator does not take into account the common usage of "producer" and "consumer" (you can continue from the middle of each other at will).

Fortunately, it is fairly easy to simulate a fully equipped coroutine with the Python generator. The easy trick is to class the scheduler() function like generator 10 in the sample code above. In fact, the proposed state machine itself is a much more common pattern of the coroutine framework. Adapting to this pattern overcomes the small flaws that still exist in the Python generator (allowing even careless programmers to get the full power out of macaroni code).


Stategen in operation

The easiest way to know exactly what's going on in stategen_test.py is to run it:
Listing 3. Running STATEGEN (manual jump control)


% python stategen_test.py
ONES State:    @ 1.0
TWENTIES State:  *26.1  *25.3
ONES State:    @ 4.2
TWENTIES State:  *26.4  *29.5  *28.8
TENS State:    #15.2  #16.3  #16.0
ONES State:    @ 9.5  @ 3.8
TENS State:    #18.2  #17.9
TWENTIES State:  *24.4
TENS State:    #19.6
TWENTIES State:  *21.4
TENS State:    #16.1  #12.3
ONES State:    @ 9.2  @ 6.5  @ 5.7
TENS State:    #16.7
TWENTIES State:  *26.4  *30.0
[co-routine for jump?] twenties
 ...Jumping into middle of TWENTIES
TWENTIES State:
TENS State:    #19.9
TWENTIES State:  *26.4  *29.4  *27.5  *22.7
TENS State:    #19.9
TWENTIES State:  *26.2  *26.8
Exiting from exit()...

This output is essentially identical to the output in the previous statemachine_test.py. Each line in the result represents the stream used in a particular handler or generator; The flow context is declared at the beginning of the line. However, each time another coroutine branch goes into the generator, the generator version resumes execution (in a loop), rather than just calling the handler function again. Assuming that all the get_*() coroutines are contained in an infinite loop, this difference is not obvious.

To understand the essential differences in stategen_test.py, see what happens in the exit() generator. The first time the generator-iterator is invoked, a jump target is collected from the user (this is a simple case of event-driven branching decisions that might be exploited in a real-world application). However, when exit() is called again, it displays the exit message in a later stream context of the generator and calls sys.exit (). The user in the interaction sample can simply jump to "out_of_range" and exit without going to another "handler" (but it will perform a recursive jump to the same generator).


conclusion

As I said in the introduction, I expect the state machine version of the coroutine to run much faster than the "class-with-callback-handler" version of the callback handler. Recovery generators-iterators are much more efficient. The specific examples are so simple that they are hardly enough to be judged, but I welcome reader feedback on specific results.

But whatever progress the "co-op pattern" I've described might make in terms of speed will be overshadowed by the amazing universal flow control it achieves. Many readers on the comp.lang.python newsgroup have asked how generic Python's new generator is. I think the usability of the framework I described answered "and you want 1!" For most things related to Python, there are certain things that are usually much easier to program than to understand. Try my pattern; I think you'll find it useful.


Related articles: