Traps about Spring Cloud health check

  • 2021-11-02 00:52:13
  • OfStack

Traps for SpringCloud Health Checks

Health examination

Health checks based on Spring Boot Actuator are a necessary component of Spring Cloud microservices to ensure that our services are available.

After introducing Spring Boot Actuator, through http:/ip: port/health, we can see that HealthEndPoint provides us with default monitoring results, including disk detection and database detection. As follows


{
    "status": "UP",
    "diskSpace": {
        "status": "UP",
        "total": 398458875904,
        "free": 315106918400,
        "threshold": 10485760
    },
    "db": {
        "status": "UP",
        "database": "MySQL",
        "hello": 1
    }
}

Exclude unnecessary health examination items

One day, the caller suddenly reported that he couldn't adjust our service. Look at the Eureka console and find that the service status is UP. Check that the service process 1 is normal. When I was helpless, I suddenly thought of whether the health examination was at work, because Eureka Client judged whether the service was available or not based on the health examination. However, any one of the monitoring items of Spring Boot Actuator is DOWN, and the health status of the whole application is DOWN, so the caller regards the service as unavailable.

Looking at http://ip: port/health again, I found that one email health check was hung up.

Recently, the project introduced spring-boot-starter-mail to realize the function of sending mail.

The mailbox server hangs up, causing the monitoring and inspection status of the whole service to be DOWN.


{
  "status": "DOWN",
  "mail": {
    "status": "DOWN",
    "location": "email-smtp.test.com:-1",
    "error": "javax.mail.AuthenticationFailedException: 535 Authentication Credentials Invalid\n"
  },
  "diskSpace": {
    "status": "UP",
    "total": 266299998208,
    "free": 146394308608,
    "threshold": 10485760
  },
  "hystrix": {
    "status": "UP"
  }
}

Since mail sending is not a core function, non-core components can be excluded from health checks to avoid making the whole service unavailable.

Turn off mailbox health check with the following configuration.


management.health.mail.enabled=false

springcloud-health Check for Big Pit Caused by Timeout

0. Premises

service: Only 1 microservice

server: app that only provides one micro-service, and one service has multiple server.

1. Introduction to issues

On-line springcloud encounters the problem that all server of an service will be removed at some point.

2. Cause analysis

health-url of springboot-actauctor is used as health check by default in springcloud, and the default check timeout time is 10s. If the production environment encounters problems such as network, db and redis are slow or hung up, the health check request will time out, and the springcloud registry will think that the server is abnormal, thus changing the server status to critial, and the service caller (feign) will remove the abnormal server from the load (HealthServiceServerListFilter).

If you encounter a network segment or a larger network, db, etc., it will cause all server of an service to be removed by the registry, making the service unavailable.

But in fact the server is only partially problematic-for example, only db or redis is slow and not unavailable, but it is still forcibly removed by the registry.

3. Solutions

3.1 Universal Solutions

Turn off the health check and always return to the up state. As long as the program starts normally, it is considered that it can provide normal service.

The following is the default health check result for project template output:


{
 "description": "",
 "status": "UP",
 "diskSpace": {
  "description": "",
  "status": "UP",
  "total": 50715856896,
  "free": 7065239552,
  "threshold": 10485760
 },
 "solr": {
  "description": "",
  "status": "UP",
  "solrStatus": "OK"
 },
 "redis": {
  "description": "",
  "status": "UP",
  "version": "2.8.21"
 },
 "db": {
  "description": "",
  "status": "UP",
  "authDataSource": {
   "description": "",
   "status": "UP",
   "database": "MySQL",
   "hello": "x"
  },
  "autodealerDataSource": {
   "description": "",
   "status": "UP",
   "database": "Microsoft SQL Server",
   "hello": "x"
  }
 }
}

How to turn off health check:


# application*.yml Medium 
management:
  health:
    defaults:
      enabled: false

health check results after shutdown:


{
 "description": "",
 "status": "UP",
 "application": {
  "description": "",
  "status": "UP"
 }
}

4. If there is a need for a specific health check

After turning off health checking, if you need a certain type of health checking requirement, you need to configure it separately as follows:


management:
  health:
    defaults:
      enabled: false
    #  Open with the following configuration db-health Check 
    db:
      enabled: true

health check results are as follows:


{
 "description": "",
 "status": "UP",
 "db": {
  "description": "",
  "status": "UP",
  "authDataSource": {
   "description": "",
   "status": "UP",
   "database": "MySQL",
   "hello": "x"
  },
  "autodealerDataSource": {
   "description": "",
   "status": "UP",
   "database": "Microsoft SQL Server",
   "hello": "x"
  }
 }
}

Related articles: