C Vigilant Variable Sharing Instance Analysis Caused by Anonymous Methods

2021-08-28 20:54:06
OfStack

This article illustrates the variable sharing caused by C # alert anonymous methods. Share it for your reference, as follows:

Anonymous method

Anonymous method is an advanced feature introduced in. NET 2.0. The word "anonymous" means that it can write the implementation inline in a method to form a delegate object without having an explicit method name, such as:


static void Test()
{
  Action<string> action = delegate(string value)
  {
    Console.WriteLine(value);
  };
  action("Hello World");
}

But the key of anonymous method is not only the word "anonymous". The most powerful feature is that the anonymous method forms a closure, which can be passed to another method as a parameter, but can also access the local variables of the method and other members of the current class. For example:


class TestClass
{
  private void Print(string message)
  {
    Console.WriteLine(message);
  }
  public void Test()
  {
    string[] messages = new string[] { "Hello", "World" };
    int index = 0;
    Action<string> action = (m) =>
    {
      this.Print((index++) + ". " + m);
    };
    Array.ForEach(messages, action);
    Console.WriteLine("index = " + index);
  }
}

As shown above, in the Test method of TestClass, the action delegate calls the private method Print, which is also in the TestClass class, and reads and writes the local variable index in the Test method. The use of anonymous methods has been greatly extended with the addition of new features of Lambda expressions in C # 3.0. However, if used improperly, anonymous methods can easily cause problems that are difficult to find.

Problem case

A brother recently in a simple data import program, the main job is to read data from text files, analysis and reorganization, and then write to the database. The logic is roughly as follows:


static void Process()
{
  List<Item> batchItems = new List<Item>();
  foreach (var item in ...)
  {
    batchItems.Add(item);
    if (batchItems.Count > 1000)
    {
      DataContext db = new DataContext();
      db.Items.InsertAllOnSubmit(batchItems);
      db.SubmitChanges();
      batchItems = new List<Item>();
    }
  }
}

After reading data from the data source, add it to the batchItems list, and commit it once when batchItems reaches 1000. This code function works normally, but unfortunately time is stuck in database submission. Data is acquired and processed quickly, but it takes a long time to submit it once. So think about it, there will be no resource conflict between data submission and data processing, so put data submission on another thread for processing! So, use ThreadPool to rewrite the code:


static void Process()
{ 
  List<Item> batchItems = new List<Item>();
  foreach (var item in ...)
  {
    batchItems.Add(item);
    if (batchItems.Count > 1000)
    {
      ThreadPool.QueueUserWorkItem((o) =>
      {
        DataContext db = new DataContext();
        db.Items.InsertAllOnSubmit(batchItems);
        db.SubmitChanges();
      });
      batchItems = new List<Item>();
    }
  }
}

We now hand over the data submission to ThreadPoll, which is initiated when there are additional threads in the thread pool. The data submission operation does not block the data processing, so according to the brother's intention, the data will be processed continuously, and finally all the databases will be submitted. It's a good idea, but unfortunately the runtime finds that the code that used to run normally (without multithreading) will now throw exceptions "inexplicably". Even more bizarre is the loss of data in the database: 1 million pieces of data were processed and "submitted", but one part of the database was missing. So I looked at the code left and right, and I was puzzled.

Do you see the cause of the problem?

Analyze the cause

To discover the problem, we must understand how anonymous methods are implemented in the. NET environment.

There are no "anonymous methods" in. NET, and there are no similar new features. An "anonymous method" is a complete compiler magic that includes all the members of an anonymous method that need to be accessed in a closure, ensuring that all member calls comply with the. NET standard. For example, the second example in Section 1 of this article actually looks like this after being processed by the compiler (natural field names are "friendly"):


class TestClass
{
  ...
  private sealed class AutoGeneratedHelperClass
  {
    public TestClass m_testClassInstance;
    public int m_index;
    public void Action(string m)
    {
      this.m_index++;
      this.m_testClassInstance.Print(m);
    }
  }
  public void TestAfterCompiled()
  {
    AutoGeneratedHelperClass helper = new AutoGeneratedHelperClass();
    helper.m_testClassInstance = this;
    helper.m_index = 0;
    string[] messages = new string[] { "Hello", "World" };
    Action<string> action = new Action<string>(helper.Action);
    Array.ForEach(messages, action);
    Console.WriteLine(helper.m_index);
  }
}

This shows how the compiler implements a closure:

The compiler automatically generates a private inner helper class and sets it to sealed, and the instance of this class will become a closure object.

If an anonymous method requires access to a parameter or local variable of the method, the parameter or local variable will be "upgraded" to a public Field field in the helper class.

If the anonymous method needs to access other methods in the class, the current instance of the class is saved in the helper class.

It is worth mentioning that all the above three theories may not be satisfied in practice. In some particularly simple cases (for example, local variables and other methods are not involved at all in anonymous methods), the compiler will simply generate a static method to construct a delegate instance, because this can achieve better performance.

For the previous case, we now rewrite it once, so that we can "avoid" the use of anonymous objects and clearly show the cause of the problem:


private class AutoGeneratedClass
{
  public List<Item> m_batchItems;
  public void WaitCallback(object o)
  {
    DataContext db = new DataContext();
    db.Items.InsertAllOnSubmit(this.m_batchItems);
    db.SubmitChanges();
  }
}
static void Process()
{ 
  var helper = new AutoGeneratedClass();
  helper.m_batchItems = new List<Item>();
  foreach (var item in ...)
  {
    helper.m_batchItems.Add(item);
    if (helper.m_batchItems.Count > 1000)
    {
      ThreadPool.QueueUserWorkItem(helper.WaitCallback);
      helper.m_batchItems = new List<Item>();
    }
  }
}

The compiler automatically generates an AutoGeneratedClass class and uses an instance of this class in the Process method instead of the original batchItems local variable. Similarly, the delegate object given to ThreadPool has changed from an anonymous method to a public method of the AutoGeneratedClass instance. Therefore, every time the thread pool calls the WaitCallback method of this instance.

Should the problem be clear now? Every time a delegate is given to the thread pool, the thread pool does not execute immediately, but stays until the appropriate time. When the WaitCallback method executes, it reads the object referenced by the m_batchItems field "Current". At the same time, the Process method has "discarded" the data we originally wanted to submit, thus causing the data submitted to the database to be lost. At the same time, in the process of preparing each batch of data, it is very likely that two data submissions will be initiated. When two threads submit the same batch of Item, the so-called "inexplicable" exception will be thrown.

Solve a problem

Find the problem and solve it easily:


private class WrapperClass
{
  private List<Item> m_items;
  public WrapperClass(List<Item> items)
  {
    this.m_items = items;
  }
  public void WaitCallback(object o)
  {
    DataContext db = new DataContext();
    db.Items.InsertAllOnSubmit(this.m_items);
    db.SubmitChanges();
  }
}
static void Process()
{
  List<Item> batchItems = new List<Item>();
  foreach (var item in ...)
  {
    batchItems.Add(item);
    if (batchItems.Count > 1000)
    {
      ThreadPool.QueueUserWorkItem(
        new WrapperClass(batchItems).WaitCallback);
      batchItems = new List<Item>();
    }
  }
}

Here we explicitly prepare a wrapper class to keep the data we need to commit. However, when submitting each time, the reserved data is used, which naturally will not happen "data sharing", thus avoiding the occurrence of errors.

Summarize

Anonymous methods are powerful, but they also create some imperceptible traps. You need to be mindful of delegates created using anonymous methods if they are not immediately executed synchronously and if local variables of the method are used. This is because the "local variable" has actually been converted from the compiler to an Field field on an instance of an automatic class, which will be shared by the current method and delegate object. If you modify the shared "local variables" after creating the delegate object, please make sure that doing so is in line with your intention and will not cause problems.

This kind of problem will not only appear in anonymous methods. If you use an Lambda expression to create an expression tree that also uses a "local variable", the expression tree will also get the "current" value when parsing or executing, instead of the value when the expression tree was created.

This is why the inline writing in Java-anonymous class-must be modified with the final keyword if you want to share the "local variable" within the method: this variable can only be assigned at declaration time, avoiding the "odd problem" that may be caused by subsequent "modifications".

I hope this article is helpful to everyone's C # programming.