PostgreSQL pg_ctl start starts timeout instance analysis

  • 2020-05-06 11:56:16
  • OfStack

1. Problem

pg_ctl start starts times error exit: pg_ctl:server did not start in time . What is the timeout time? From when to which stage is a timeout calculated?

Analysis: this information is printed in the do_start function, which shows

1. After pg_ctl start invokes start_postmaster to start PG's main process, check whether postmaster.pid has been written to ready/standby
every 0.1ms

2. A total of 600 checks will be made. That is, after starting the main process, wait for 60s at most

3. The default waiting time is 60s. If pg_ctl start-t specifies the waiting time, the waiting time is

3. When is postmaster.pid written to ready/standby

1, if it is the host regardless of whether hot standby
is set

1) when the startup process exits after recovery, the proc_exit function is called to send SIGCHLD signal to the main process and exit

2) after the main process receives the signal, signal processing function reaper calls AddToDataDirLockFile to write ready
to postmaster.pid file

2. If it is a backup, recovery.cnf file is in the data directory, and hot standby is set

1) startup process sends PMSIGNAL_RECOVERY_STARTED signal to the main process, and the main process calls the signal processing function sigusr1_handler, pmState=PM_RECOVERY

2) every time before the next xlog is read, the CheckRecoveryConsistency function is called for consistency check:

2.1 into the consistency state, starup process sends PMSIGNAL_BEGIN_HOT_STANDBY signal to the main process, the main process after receiving the signal calls sigusr1_handler-> AddToDataDirLockFile writes ready
to postmaster.pid

3. If it is a standby, data has recovery.cnf file in the data directory, and hot standby is set,
is not reached before the actual recovery

1) startup process sends PMSIGNAL_RECOVERY_STARTED signal to the main process, the main process calls the signal processing function sigusr1_handler, pmState=PM_RECOVERY

2) every time before the next xlog is read, the CheckRecoveryConsistency function will be called for consistency check. If you do not enter the consistency state

3) local log recovery is completed. When switching log sources, CheckRecoveryConsistency function is also called to check the consistency of

3.1 into the consistency state, starup process sends PMSIGNAL_BEGIN_HOT_STANDBY signal to the main process, the main process after receiving the signal calls sigusr1_handler-> AddToDataDirLockFile writes ready
to the postmaster.pid file

4. If it is a standby, data has recovery.cnf file in the data directory, and hot standby is set to reach the consistent position
before the actual recovery

1) startup process sends PMSIGNAL_RECOVERY_STARTED signal to the main process, the main process calls the signal processing function sigusr1_handler, pmState=PM_RECOVERY

2) the CheckRecoveryConsistency function conducts consistency check and sends the PMSIGNAL_BEGIN_HOT_STANDBY signal to the main process. After the main process receives the signal, it calls sigusr1_handler-> AddToDataDirLockFile writes ready
to postmaster.pid

5. If it is on standby, data directory contains recovery.cnf files, hot standby
is not set

1) the startup process sends PMSIGNAL_RECOVERY_STARTED signal
to the main process

2) after the main process receives the signal, it will send postmaster. pmState=PM_RECOVERY

Code analysis

1, pg_ctl start process


do_start->
 pm_pid = start_postmaster();
 if (do_wait){
  print_msg(_("waiting for server to start..."));
  switch (wait_for_postmaster(pm_pid, false)){
   case POSTMASTER_READY:
    print_msg(_(" done\n"));
    print_msg(_("server started\n"));
    break;
   case POSTMASTER_STILL_STARTING:
    print_msg(_(" stopped waiting\n"));
    write_stderr(_("%s: server did not start in time\n"), progname);
    exit(1);
    break;
   case POSTMASTER_FAILED:
    print_msg(_(" stopped waiting\n"));
    write_stderr(_("%s: could not start server\n" "Examine the log output.\n"), progname);
    exit(1);
    break;
  }
 }else
  print_msg(_("server starting\n"));

wait_for_postmaster->
 for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++){
  if ((optlines = readfile(pid_file, &numlines)) != NULL && numlines >= LOCK_FILE_LINE_PM_STATUS){
   pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
   pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
   if (pmstart >= start_time - 2 && pmpid == pm_pid){
    char  *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
    if (strcmp(pmstatus, PM_STATUS_READY) == 0 || strcmp(pmstatus, PM_STATUS_STANDBY) == 0){
     /* postmaster is done starting up */
     free_readfile(optlines);
     return POSTMASTER_READY;
    }
   }
  }
  free_readfile(optlines);
  if (waitpid((pid_t) pm_pid, &exitstatus, WNOHANG) == (pid_t) pm_pid)
   return POSTMASTER_FAILED;
  pg_usleep(USEC_PER_SEC / WAITS_PER_SEC);
 }
 /* out of patience; report that postmaster is still starting up */
 return POSTMASTER_STILL_STARTING;

2, server main process and signal processing function


PostmasterMain->
 pqsignal_no_restart(SIGUSR1, sigusr1_handler); /* message from child process */
 pqsignal_no_restart(SIGCHLD, reaper); /* handle child termination */
 ...
 StartupXLOG();
 ...
 proc_exit(0);//exit The function is sent to the main process SIGCHLD signal 

reaper->// A signal that a process is terminated or stopped 
 AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_READY);

postmaster Process received signal: 
sigusr1_handler->
 if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
  pmState == PM_STARTUP && Shutdown == NoShutdown){
  CheckpointerPID = StartCheckpointer();
  BgWriterPID = StartBackgroundWriter();
  if (XLogArchivingAlways())
   PgArchPID = pgarch_start();
  //hot_standby in postgresql.conf File configuration TRUE
  // Means that the connection is allowed while restoring 
  if (!EnableHotStandby){
   // will standby write postmaster.pid File, representing up Connections are not allowed 
   AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STANDBY);
  }
  pmState = PM_RECOVERY;
 }
 if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
  pmState == PM_RECOVERY && Shutdown == NoShutdown){
  PgStatPID = pgstat_start();
  // will ready write postmaster.pid File, allow connection 
  AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_READY);
  pmState = PM_HOT_STANDBY;
 }
 ...

3, Startup process


StartupXLOG->
 ReadCheckpointRecord
 if (ArchiveRecoveryRequested && IsUnderPostmaster){// There are recovery.conf Files are ArchiveRecoveryRequested for TRUE
  // There are recovery.conf Files are ArchiveRecoveryRequested for TRUE
  PublishStartupProcessInformation();
  SetForwardFsyncRequests();
  // to master The process to send PMSIGNAL_RECOVERY_STARTED signal 
  SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
  bgwriterLaunched = true;
 }
 CheckRecoveryConsistency();-->...
 |-- if (standbyState == STANDBY_SNAPSHOT_READY && !LocalHotStandbyActive &&
 |  reachedConsistency && IsUnderPostmaster){
 |  SpinLockAcquire(&XLogCtl->info_lck);
 |  XLogCtl->SharedHotStandbyActive = true;
 |  SpinLockRelease(&XLogCtl->info_lck);
 |  LocalHotStandbyActive = true;
 |  SendPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY);
 |-- }
 ...
  Replay a record After that, read the next one at a time record All before CheckRecoveryConsistency

summary


Related articles: