Principle Analysis of Android ANR

  • 2021-12-13 09:33:48
  • OfStack

Catalog Caton principle Caton monitoring ANR principle

Caton principle

Time-consuming operation of the main thread will cause jamming, which exceeds the threshold and triggers ANR. When the application process starts, Zygote will reflect and call main method of ActivityThread to start loop loop. ActivityThread (api29)


    public static void main(String[] args) {
        Looper.prepareMainLooper();
        ...
        Looper.loop();
        throw new RuntimeException("Main thread loop unexpectedly exited");
    }

loop method of Looper:


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}

for loop exists in loop, and the main thread can run for a long time. When executing tasks in the main thread, you can go to the message queue through Handler post1 tasks, get msg through loop cycle, and hand it over to target (Handler) of msg for processing.

There are two places that may lead to Caton:

Note 1 queue. next () Note 3 dispatchMessage Time consuming

MessageQueue. next Time-consuming code (api29)


    @UnsupportedAppUsage
    Message next() {
        for (;;) {
            // 1 , nextPollTimeoutMillis Not for 0 Is blocked 
            nativePollOnce(ptr, nextPollTimeoutMillis);
            // 2 First judge the current number 1 Is the message a synchronization barrier message, 
            if (msg != null && msg.target == null) {
                    // 3 If you encounter a synchronous barrier message, you will skip over and take the following asynchronous message to process it. The synchronous message is equivalent to being set up a barrier 
                    // Stalled by a barrier.  Find the next asynchronous message in the queue.
                    do {
                        prevMsg = msg;
                        msg = msg.next;
                    } while (msg != null && !msg.isAsynchronous());
             }
             // 4 Normal message processing, judging whether delay or not 
             if (msg != null) {
                    if (now < msg.when) {
                        // Next message is not ready.  Set a timeout to wake up when it is ready.
                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                    } else {
                        // Got a message.
                        mBlocked = false;
                        if (prevMsg != null) {
                            prevMsg.next = msg.next;
                        } else {
                            mMessages = msg.next;
                        }
                        msg.next = null;
                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                        msg.markInUse();
                        return msg;
                    }
                } else {
                    // 5 If no asynchronous message is fetched, loop to watch the next time 1 , nativePollOnce For -1 , will 1 Straight blockage 
                    // No more messages.
                    nextPollTimeoutMillis = -1;
                }
        }
    }
MessageQueue is a linked list data structure, judging whether the MessageQueue header (the first message) is a synchronous barrier message (adding a layer of barrier to the synchronous message, so that the synchronous message will not be processed, but only the asynchronous message will be processed); If a synchronization barrier message is encountered, the synchronization message in MessageQueue will be skipped and only the asynchronous message will be processed. If there is no asynchronous message, go to comment 5, nextPollTimeoutMillis is-1, and the next round call to nativePollOnce of comment 1 will block; If looper can normally obtain messages, regardless of asynchronous/synchronous messages, processing flow 1, in comment 4, judge whether it is delayed, if so, nextPollTimeoutMillis is assigned, and the next call to nativePollOnce in comment 1 will block for 1 period of time. If it is not an delay message, it is directly returned to msg for processing by handler.

The next method continuously takes messages from MessageQueue, processes them when there is a message, and calls nativePollOnce blocking when there is no message. The bottom layer is epoll mechanism of Linux, and Linux IO multiplexing.

Linux IO multiplexing schemes include select, poll and epoll. Among them, epoll has the best performance and supports the largest concurrency.

select: It is a system call function provided by the operating system, which can send the array of file descriptors to the operating system, and the operating system will traverse it to determine which descriptor can be read and written, and tell us to deal with it. poll: The main difference between select and select is that select can only listen to 1024 file descriptors. epoll: Improves three optimizable points of select.

1 That is maintained in the kernel 1 A set of file descriptors, without the need for the user to re-pass each time, only to tell the kernel to modify the part. 
2 The kernel no longer polls to find the ready file descriptor, but uses asynchronous IO Event awakening. 
3 The kernel will only have IO The file descriptor of is returned to the user, and the user does not need to traverse the entire file descriptor collection. 

Synchronous barrier message

Android App cannot directly invoke the synchronous message barrier, MessageQueue (api29) code


    @TestApi
    public int postSyncBarrier() {
        return postSyncBarrier(SystemClock.uptimeMillis());
    }

    private int postSyncBarrier(long when) {
        ...
    }

The high priority operation of the system uses synchronous barrier messages, for example, when View is drawn, scheduleTraversals method of ViewRootImpl inserts synchronous barrier messages, and removes synchronous barrier messages after drawing. ViewRootImpl api29


    @UnsupportedAppUsage
    void scheduleTraversals() {
        if (!mTraversalScheduled) {
            mTraversalScheduled = true;
            mTraversalBarrier = mHandler.getLooper().getQueue().postSyncBarrier();
            mChoreographer.postCallback(
                    Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null);
            if (!mUnbufferedInputDispatch) {
                scheduleConsumeBatchedInput();
            }
            notifyRendererOfFramePending();
            pokeDrawLockIfNeeded();
        }
    }
    
    void unscheduleTraversals() {
        if (mTraversalScheduled) {
            mTraversalScheduled = false;
            mHandler.getLooper().getQueue().removeSyncBarrier(mTraversalBarrier);
            mChoreographer.removeCallbacks(
                    Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null);
        }
    }

In order to ensure that the drawing process of View is not affected by other tasks of the main thread, View will insert a synchronization barrier message into MessageQueue before drawing, and then register Vsync signal monitoring, and Choreographer $FrameDisplayEventReceiver monitoring will receive vsync signal callback.


private final class FrameDisplayEventReceiver extends DisplayEventReceiver
            implements Runnable {
            @Override
            public void onVsync(long timestampNanos, long physicalDisplayId, int frame) {
                Message msg = Message.obtain(mHandler, this);
                // 1 Send asynchronous messages 
                msg.setAsynchronous(true);
                mHandler.sendMessageAtTime(msg, timestampNanos / TimeUtils.NANOS_PER_MS);
              }
              
                      @Override
            public void run() {
                // 2 , doFrame Priority implementation 
                doFrame(mTimestampNanos, mFrame);
              }
            }

After receiving the Vsync signal callback, comment 1 sends asynchronous messages to the main thread MessageQueue post1, ensuring that comment 2 doFrame takes precedence.

doFrame is the place where View really starts drawing, which will call doTraversal and performTraversals of ViewRootIml, while performTraversals will call onMeasure, onLayout and onDraw of View.

Although app cannot send synchronous barrier messages, the use of asynchronous messages is allowed.

Asynchronous message SDK restricts App from post asynchronous message to MessageQueue, Message class


    @UnsupportedAppUsage
    /*package*/ int flags;

Use asynchronous messages carefully, and if they are not used properly, the main thread may fake animation.

Handler#dispatchMessage


    /**
     * Handle system messages here.
     */
    public void dispatchMessage(@NonNull Message msg) {
        if (msg.callback != null) {
            handleCallback(msg);
        } else {
            if (mCallback != null) {
                if (mCallback.handleMessage(msg)) {
                    return;
                }
            }
            handleMessage(msg);
        }
    }
Handler#post(Runnable r) Transmission of construction method CallBack Handler overrides handlerMessage method

The application of Catton is generally caused by Handler processing messages too time-consuming (method itself, algorithm efficiency, cpu preempted, insufficient memory, IPC timeout, etc.)

Caton monitoring

Caton Monitoring Scheme 1 Looper # loop


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}

logging. println in Notes 2 and 4 is an interface provided by api, which can monitor the time spent on Handler, and get the time before and after the message through Looper. getMainLooper (). setMessageLogging (printer). After listening for Caton, dispatchMessage has already been called, and the stack does not contain Caton code.

Obtain the main thread stack regularly, the time is key, the stack information is value, save it in map, and jamming occurs, so it is feasible to take out the stack within the jamming time. Suitable for offline use.

logging. println has string splicing, frequent calls, creation of a large number of objects, and memory jitter. Background frequent access to the main thread stack, impact on performance, access to the main thread stack, suspension of the main thread running.

Caton Monitoring Scheme 2

For online jamming monitoring, bytecode pile insertion technology is needed.

With Gradle Plugin + ASM, one line of code is inserted at the beginning and end of each method at compile time, and the time consumption is counted. For example, the Caton monitoring scheme used by WeChat Matrix. Note:

Avoid the explosion of methods: assign independent ID as parameters Filter simple functions: add black sheets to reduce statistics of unnecessary functions

WeChat Matrix has been greatly optimized, the packet volume has increased by 1% ~ 2%, the frame rate has decreased by less than 2 frames, and gray packets have been used.

Principle of ANR

Service Timeout: The foreground service 20s is not completed, and the background service is 10s BroadcastQueue Timeout: The foreground broadcast 10s is completed, and the background broadcast 60s ContentProvider Timeout: publish Timeout 10s InputDispatching Timeout: Input events distributed over 5s, including keystroke and touch events.

ActivityManagerService api29


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
0

ANR Trigger Flow

Burying a bomb

Background sevice call: Context. startService-- > AMS.startService-- > ActiveService.startService-- > ActiveService.realStartServiceLocked


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
1

Note 1 Internal call to scheduleServiceTimeoutLocked


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
2

Before Note 2 notifies AMS to start the service, Note 1 sends an handler delay message. If the 20s (foreground service) is not processed, ActiveServices # serviceTimeout is called.

Disarm a bomb

To start an Service, it must be managed by AMS first, then AMS notifies the application to execute the life cycle of Service, and the handlerCreateService method of ActivityThread is called.


    @UnsupportedAppUsage
    private void handleCreateService(CreateServiceData data) {
        try {
            Application app = packageInfo.makeApplication(false, mInstrumentation);
            service.attach(context, this, data.info.name, data.token, app,
                    ActivityManager.getService());
            // 1 , service onCreate Call 
            service.onCreate();
            mServices.put(data.token, service);
            try {
                // 2 , defuse the bomb 
                ActivityManager.getService().serviceDoneExecuting(
                        data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
            } catch (RemoteException e) {
                throw e.rethrowFromSystemServer();
            }
        }
    }

Note 1, onCreate method of Service is called Note 2, serviceDoneExecuting method of AMS is called, and ActiveServices. serviceDoneExecutingLocked is finally called


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
4

When onCreate is invoked, the delay message is removed and the bomb is defused.

Detonate the bomb. Assuming that onCreate of Service executes more than 10s, the bomb will detonate, that is, the ActiveServices # serviceTimeout method will be called. api29


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
5

All ANR, eventually with the appNotResponding method that calls ProcessRecord. api29


//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
6 Write to event log Write main log Generate tracesFile Output ANR logcat (visible on the console) If tracesFile is not acquired, SIGNAL_QUIT signal will be sent to trigger the flow of collecting thread stack information and write it to traceFile Output to drapbox Background ANR, kill the process directly Error reporting ANR dialog calls the AppErrors # handleShowAnrUi method.

//  Run Message Queuing in a thread. 1 Be sure to call 
public static void loop() {
        for (;;) {
            // 1 , fetch the message 
            Message msg = queue.next(); // might block
            ...
            // This must be in a local variable, in case a UI event sets the logger
            // 2 Callback before message processing 
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            ...
            // 3 Message processing begins 
            msg.target.dispatchMessage(msg);
            ...
            // 4 Callback after message processing 
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }
        }
}
7

Grab the system's data/anr/trace. txt file, but higher version systems require root permissions to read this directory.

ANRWatchDog github. com/SalomonBrys …

Automatic detection of ANR open source library

The above is Android ANR principle analysis of the details, more information about Android ANR principle please pay attention to other related articles on this site!


Related articles: