Problems with each loop referencing variables in Python and Ruby (a hidden BUG?)

  • 2020-04-02 13:40:59
  • OfStack

Although I encountered this problem in Python, it is easier to explain in Ruby. In Ruby, there are many ways to traverse an array. The two most common are for and each:


arr = ['a', 'b', 'c']

arr.each { |e|
  puts e
}

for e in arr
  puts e
end

Usually, I prefer the latter, seems to be written up more good-looking, but from the efficiency of the former should be slightly faster, because the latter is actually in the process of traverse to invoke a lambda function for each element, although usually not obvious, but the called function to set the context and is overhead, especially in dynamic languages (JIT inline optimization). This time, though, the problem isn't performance. However, it does have to do with "each creates a new scope for each element and for does not".

Take a look at the following code:


arr = ['a', 'b', 'c']
h1 = Hash.new
h2 = Hash.new

arr.each { |e|
  h1[e] = lambda { e+'!'}
}

for e in arr
  h2[e] = lambda { e+'!' }
end

h1['a'].call # => ?
h2['a'].call # => ?

What do I get for each call? Guess what? Respectively is' a! 'and' c! ', which is why it's 'c! 'because for didn't recreate a scope at every step of the loop, the three lambda closures refer to the same variable, which was assigned to 'c' the last time, resulting in this result.

The problem is actually a piece of code in a small program written in Python that looks something like this:


for prop in public_props:
    setattr(proxy, 'get_%s'%prop, lambda: self.get_prop(prop))

Proxy is a proxy object provided by me, which exposes some public properties of self. Since I want to restrict access to non-public properties, I don't want to store any reference to self in this proxy. Otherwise, it is easy to access it in Python without access restrictions in a way like proxy._orig_self.some_private_prop. So I ended up doing what I did above.

Unfortunately, because, as I said, for does not create a separate scope every time, the closure references all of the same variables, causing all of the property values to be fetched as the last property. If I saw such a weird bug in C/C++, I would definitely suspect that it was a memory or pointer problem. But thought for a long time just finally suddenly understand! However, Python does not have Ruby as convenient as the use of each, lambda is also very good, so finally by defining a local function to solve:


def proxy_prop(name):
    setattr(proxy, 'get_%s'%prop, lambda: self.get_prop(name)
for prop in public_props:
    proxy_prop(prop)

Last but not least, in the Ruby example, if you reverse the order of each and for, you get a different result:

arr = ['a', 'b', 'c']
h1 = Hash.new
h2 = Hash.new

for e in arr
  h2[e] = lambda { e+'!' }
end

arr.each { |e|
  h1[e] = lambda { e+'!'}
}

h1['a'].call # => 'c!'
h2['a'].call # => 'c!'

Now both of them are 'c '! '! This is because the parameters of blocks in Ruby 1.8 can be assigned to anything from local variables to global variables, rather than the usual arguments of a lambda function. Since the previous for statement creates an e as a local variable in the current scope, each directly assigns the local variable, so each reference becomes the same thing again, leading to a hidden Bug!

Thankfully, this "feature" of the block has been removed in Ruby 1.9, and the parameters of the block can only be normal, so there is no such problem anymore. Hopefully 1.9 will catch on soon!


Related articles: