Simplify the introduction of complex programming models with Python's SimPy library

2020-05-09 18:50:00
OfStack

I learned about the SimPy package from one of its founders, Klaus Miller, when I met him. Dr. Miller has read several lovely Python columns that suggest the use of Python 2.2 + generators to implement semi-collaborative routines and "lightweight" threads. In particular (to my delight), he found these techniques useful when implementing Simula-67 style simulations with Python.

It turned out that Tony Vignaux and Chang Chui had previously created another Python library that was conceptually closer to Simscript, and that used standard threading techniques rather than my semi-co-routine techniques. At the time of one study, the team decided that a generating-based style was much more effective, and recently launched a project on SourceForge to use GPL, called SimPy (see resources for a link to the SimPy home page), currently in beta beta. Professor Vignaux wants to use the SimPy package in his future university teaching at Victoria university of Wellington (University of Victoria); I believe that the library is also well suited to all kinds of practical problems.

I admit that prior to the recent communications and research, I had no basic knowledge of simulation in the programming world. I suspect that most readers of this column know as little as I do. Although one might find this style of programming somewhat novel, simulations are useful in understanding the behavior of real-world systems with limited resources. Whether you are interested in bandwidth-limited networks, automotive traffic behavior, market and commercial optimization, biological/evolutionary interactions, or other "random" systems, SimPy provides a simple Python tool for such modeling.
Random definition

Similar to "connected," it's one of the most appropriate words to describe their assignments - and it couldn't be more appropriate:

Random (stochastic), from the Greek stokhastikos (adjective)
1) speculative, related to or characteristic of speculation; Good guess.
2) in statistics: 1 random variable or multiple random variables are involved or included, or chance or probability is involved.

Source: Dictionary com

In this column, I'll use a fairly simple example of a multiaisle payment area in a grocery store. Using the simulation demonstrated, we can ask questions based on the economic and wait time implications of various changes made to scanner technology, shopper habits, staffing requirements, and so on. The advantage of this modeling is that it allows you to plan ahead when you have a clear idea of what the changes will mean. Obviously, most readers will not specialize in running a grocery store, but these techniques can be applied to a wide variety of systems.
The concept of simulation

The SimPy library provides only three abstract/parent classes, and they correspond to the three basic concepts of the simulation. There are many other general functions and constants used to control the running of the simulation, but important concepts are combined with these classes in 1.

The core concept in simulation is the process. A process is just an object that completes certain tasks and then sometimes waits for a while before it is ready to complete the next task. In SimPy, you can also "passivate" a process, which means that after one process has completed one task, it will only do it if other processes ask it to complete other tasks. It is often useful to think of a process as an attempt to accomplish a goal. When you write a process, you usually write it as a loop in which you can perform multiple operations. Between each operation, an Python "yield" statement can be inserted, which allows the mock scheduler to perform the operation of each waiting process before returning control.

Much of what a process does depends on the use of resources. Resources are limited only in terms of availability. In biological models, the resource might be the food supply; In the network model, resources can be routers or limited bandwidth channels. In our market simulation, the resource is the payment channel. The only task a resource performs is to limit its use to one particular process at any given time. Under the SimPy programming model, the process alone decides how long it wants to keep the resource, and the resource itself is passive. In a real system, the SimPy model may or may not fit the conceptual scheme. It is easy to imagine that the resource will essentially limit its utilization (for example, if the server computer does not get a satisfactory response within the required time frame, it will break the connection). But as a programming matter, it doesn't really matter whether a process or resource is an "active" party (just make sure you understand your intent).

The last class, SimPy, is the monitor. The monitor isn't really important, it's just convenient. All the monitor does is record the events reported to it and keep statistics about them (mean, count, variance, and so on). The Monitor class provided by the library is a useful tool for logging simulation measures, but you can also log events using any other technique you want. In fact, my example subclassed Monitor to provide some (slightly) enhanced capabilities.

Set up shop: program the simulation

For most of the articles I've written, I'll give you a sample application right away, but in this case, I think it's more useful to take you through each step of the grocery store application. You can cut and paste each part in 1 if you like; The creators of SimPy will include my example in a future release.

The first step in the SimPy simulation is a few general import (import) statements:
Listing 1. Import the SimPy library


#!/usr/bin/env python
from __future__ import generators
from SimPy import Simulation
from SimPy.Simulation import hold, request, release, now
from SimPy.Monitor import Monitor
import random
from math import sqrt

Some of the examples that come with SimPy use the import * style, but I prefer to make the namespace I populate clearer. For Python 2.2 (the minimum version required for SimPy), you will need to import the generator feature, as indicated. For versions of Python 2.3 and beyond, this is not required.

For my application, I've defined several runtime constants that describe several scenarios that I'm interested in during a particular simulation run. When I change the schema, I have to edit these constants in the main script. If the application had been more substantial, I might have configured these parameters with command-line options, environment variables, or configuration files. But for now, it's enough:
Listing 2. Configuring the simulation parameters


AISLES = 5     # Number of open aisles
ITEMTIME = 0.1   # Time to ring up one item
AVGITEMS = 20   # Average number of items purchased
CLOSING = 60*12  # Minutes from store open to store close
AVGCUST = 1500   # Average number of daily customers
RUNS = 10     # Number of times to run the simulation

The primary task for our simulation is to define one or more processes. For the simulated grocery store, the process we are interested in is the customer paying at the aisle.
Listing 3. Defining customer actions


class Customer(Simulation.Process):
  def __init__(self):
    Simulation.Process.__init__(self)
    # Randomly pick how many items this customer is buying
    self.items = 1 + int(random.expovariate(1.0/AVGITEMS))
  def checkout(self):
    start = now()      # Customer decides to check out
    yield request, self, checkout_aisle
    at_checkout = now()   # Customer gets to front of line
    waittime.tally(at_checkout-start)
    yield hold, self, self.items*ITEMTIME
    leaving = now()     # Customer completes purchase
    checkouttime.tally(leaving-at_checkout)
    yield release, self, checkout_aisle

Each customer has decided to purchase a fixed quantity of goods. (our simulation does not involve selecting items from the grocery aisle; Customers just push their trolleys to the checkout.) I'm not sure that the exponential variable distribution here is really an accurate model. At the low end I felt right, but I felt a little misinformed about the maximum limit of how much actual shoppers actually purchased. In any case, you can see how easy it is to tune our simulation if you can use better model information.

What the customer is doing is what we're looking at. The customer's "execution method" is.checkout (). This process method is usually named.run () or.execute (), but in my example,.checkout () seems to be the most descriptive. You can call it whatever you want. The actual action taken by the Customer object is simply to check the simulation time at a few points and record the duration in the waittime and checkouttime monitors. But between these operations is the crucial yield statement. In case 1, the customer requests the resource (payment channel). Only when the customer has the resources they need can they do anything else. Once at the checkout aisle, the customer is actually paying -- the time spent is proportional to the number of items purchased. Finally, after passing through the checkout counter, the customer releases the resource so that other customers can use it.

The code above defines the operation of the Customer class, but we need to create some actual customer objects before running the simulation. We can generate a customer object for each customer who will be shopping in one day and assign a corresponding payment time to each customer. But a simpler approach is to have the factory object generate the required customer objects "when each customer arrives at the store." In fact, the simulation is not interested in all the customers who will be shopping in one day, but only those who will be competing for the payment channel at the same time. Note: the Customer_Factory class itself is part 1 of the simulation -- it is a process. Although you might associate this customer factory with artificial robot workers (la Fritz Lang Metropolis), you should think of it only as a convenient tool for programming; It does not directly correspond to anything in the modeled domain.
Listing 4. Generating the customer flow


class Customer_Factory(Simulation.Process):
  def run(self):
    while 1:
      c = Customer()
      Simulation.activate(c, c.checkout())
      arrival = random.expovariate(float(AVGCUST)/CLOSING)
      yield hold, self, arrival

As I mentioned earlier, I want to collect some statistics that the current SimPy Monitor class does not address. That is, I am not only interested in the average payment time, but also in the worst-case scenario for a given scenario. So I created an enhanced monitor that collects minimum and maximum counts.
Monitor the simulation with a monitor


class Monitor2(Monitor):
  def __init__(self):
    Monitor.__init__(self)
    self.min, self.max = (int(2**31-1),0)
  def tally(self, x):
    Monitor.tally(self, x)
    self.min = min(self.min, x)
    self.max = max(self.max, x)

The last step in our simulation is of course to run it. In most standard examples, you only run the simulation once. But for my grocery store, I decided to run a loop through several simulations, each corresponding to a given day of business. This seems like a good idea, since some statistics can vary considerably from day to day (since the number of customers who arrive and the number of items purchased use randomly generated values).
Listing 6. Running the simulation every day


for run in range(RUNS):
  waittime = Monitor2()
  checkouttime = Monitor2()
  checkout_aisle = Simulation.Resource(AISLES)
  Simulation.initialize()
  cf = Customer_Factory()
  Simulation.activate(cf, cf.run(), 0.0)
  Simulation.simulate(until=CLOSING)
  #print "Customers:", checkouttime.count()
  print "Waiting time average: %.1f" % waittime.mean(), \
     "(std dev %.1f, maximum %.1f)" % (sqrt(waittime.var()),waittime.max)
  #print "Checkout time average: %1f" % checkouttime.mean(), \
  #   "(standard deviation %.1f)" % sqrt(checkouttime.var())
print 'AISLES:', AISLES, ' ITEM TIME:', ITEMTIME

3 people don't like: 1 some results (and what they mean)

When I first thought about the grocery store model, I thought the simulation would answer several immediate questions. For example, I imagine the owner might choose to buy an improved scanner (ITEMTIME less) or hire more staff (AISLES more). I want to just run the simulation under each scenario (assuming the employee and technology costs are given) and determine which of the above two options is more cost efficient.

It was only when I ran the simulation that I realized something more interesting than I had expected might happen. Looking at all the data collected, I realized I didn't know what to try to optimize. What. For example, which is more important, reducing the average payment time or reducing the worst-case time? What will improve overall customer satisfaction? Also, how do you compare the time it takes a customer to wait before paying and the time it takes to scan the purchase? In my personal experience, I get impatient in the waiting queue, but I don't have much trouble scanning my items (even if it takes a while).

Of course, I don't run a grocery store, so I don't know the answers to all of these questions. But the simulation did allow me to determine exactly what the compromise was; And it's simple enough to be adapted for many behaviors (including those that aren't explicitly parameterized yet -- for example, "will customers really keep coming in all day? ).

I only need to show the last example to illustrate the value of this model. I wrote above that the behavior of complex systems is difficult to conceptualize. I think the example here proves this fact. What do you think will happen when the number of channels available is reduced from 6 to 5 (other parameters unchanged)? Initially, I thought I would increase the payment time slightly in the worst-case scenario. But that's not the case:
Listing 7. Two samples run before and after the number of channels changes


% python Market.py
Waiting time average: 0.5 (std dev 0.9, maximum 4.5)
Waiting time average: 0.3 (std dev 0.6, maximum 3.7)
Waiting time average: 0.4 (std dev 0.8, maximum 5.6)
Waiting time average: 0.4 (std dev 0.8, maximum 5.2)
Waiting time average: 0.4 (std dev 0.8, maximum 5.8)
Waiting time average: 0.3 (std dev 0.6, maximum 5.2)
Waiting time average: 0.5 (std dev 1.1, maximum 5.2)
Waiting time average: 0.5 (std dev 1.0, maximum 5.4)
AISLES: 6  ITEM TIME: 0.1
% python Market.py
Waiting time average: 2.1 (std dev 2.3, maximum 9.5)
Waiting time average: 1.8 (std dev 2.3, maximum 10.9)
Waiting time average: 1.3 (std dev 1.7, maximum 7.3)
Waiting time average: 1.7 (std dev 2.1, maximum 9.5)
Waiting time average: 4.2 (std dev 5.6, maximum 21.3)
Waiting time average: 1.6 (std dev 2.6, maximum 12.0)
Waiting time average: 1.3 (std dev 1.6, maximum 7.5)
Waiting time average: 1.5 (std dev 2.1, maximum 11.2)
AISLES: 5  ITEM TIME: 0.1

Instead of increasing the average wait time by 1/5 or something like that, the reduction of one payment channel increases it by about 4 times. Moreover, the waiting time for the most unlucky customers (during these particular runs) increased from 6 minutes to 21 minutes. If I were a manager, I think it is extremely important for customer satisfaction to understand this extreme situation. Who would have known that?