Fixed upstream as of commit e8bf0161
env: 2-node cluster with qdevice/qnetd in lms mode with a slow (eg. `sleep 10`) heuristics configured reproducer: 1. pcs cluster stop node2 --wait 2. on node2: systemctl start corosync && sleep 3 && systemctl start pacemaker (or just fence the node, if fencing is configured to reboot and cluster services to start on boot) before fix (pacemaker-2.1.2-2.el9) ================================== > [root@virt-273 ~]# journalctl -u corosync -u corosync-qdevice -u pacemaker -f > Feb 01 14:18:29 virt-273 systemd[1]: Starting Corosync Cluster Engine... > Feb 01 14:18:29 virt-273 corosync[3567]: [MAIN ] Corosync Cluster Engine 3.1.5 starting up > Feb 01 14:18:29 virt-273 corosync[3567]: [MAIN ] Corosync built-in features: dbus systemd xmlconf vqsim nozzle snmp pie relro bindnow > Feb 01 14:18:29 virt-273 corosync[3567]: [TOTEM ] Initializing transport (Kronosnet). > Feb 01 14:18:29 virt-273 corosync[3567]: [TOTEM ] totemknet initialized > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib64/kronosnet/crypto_nss.so > Feb 01 14:18:29 virt-273 corosync[3567]: [SERV ] Service engine loaded: corosync configuration map access [0] > Feb 01 14:18:29 virt-273 corosync[3567]: [QB ] server name: cmap > Feb 01 14:18:29 virt-273 corosync[3567]: [SERV ] Service engine loaded: corosync configuration service [1] > Feb 01 14:18:29 virt-273 corosync[3567]: [QB ] server name: cfg > Feb 01 14:18:29 virt-273 corosync[3567]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] > Feb 01 14:18:29 virt-273 corosync[3567]: [QB ] server name: cpg > Feb 01 14:18:29 virt-273 corosync[3567]: [SERV ] Service engine loaded: corosync profile loading service [4] > Feb 01 14:18:29 virt-273 corosync[3567]: [QUORUM] Using quorum provider corosync_votequorum > Feb 01 14:18:29 virt-273 corosync[3567]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] > Feb 01 14:18:29 virt-273 corosync[3567]: [QB ] server name: votequorum > Feb 01 14:18:29 virt-273 corosync[3567]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] > Feb 01 14:18:29 virt-273 corosync[3567]: [QB ] server name: quorum > Feb 01 14:18:29 virt-273 corosync[3567]: [TOTEM ] Configuring link 0 > Feb 01 14:18:29 virt-273 corosync[3567]: [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:111, port=5405 > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 1 has no active links > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 1 has no active links > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 1 has no active links > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0) > Feb 01 14:18:29 virt-273 corosync[3567]: [KNET ] host: host: 2 has no active links > Feb 01 14:18:29 virt-273 corosync[3567]: [QUORUM] Sync members[1]: 2 > Feb 01 14:18:29 virt-273 corosync[3567]: [QUORUM] Sync joined[1]: 2 > Feb 01 14:18:29 virt-273 corosync[3567]: [TOTEM ] A new membership (2.6f) was formed. Members joined: 2 > Feb 01 14:18:29 virt-273 corosync[3567]: [QUORUM] Members[1]: 2 > Feb 01 14:18:29 virt-273 corosync[3567]: [MAIN ] Completed service synchronization, ready to provide service. > Feb 01 14:18:29 virt-273 systemd[1]: Started Corosync Cluster Engine. > Feb 01 14:18:31 virt-273 corosync[3567]: [KNET ] rx: host: 1 link: 0 is up > Feb 01 14:18:31 virt-273 corosync[3567]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:18:31 virt-273 corosync[3567]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 1157 to 1365 > Feb 01 14:18:31 virt-273 corosync[3567]: [KNET ] pmtud: Global data MTU changed to: 1365 > Feb 01 14:18:31 virt-273 corosync[3567]: [QUORUM] Sync members[2]: 1 2 > Feb 01 14:18:31 virt-273 corosync[3567]: [QUORUM] Sync joined[1]: 1 > Feb 01 14:18:31 virt-273 corosync[3567]: [TOTEM ] A new membership (1.73) was formed. Members joined: 1 > Feb 01 14:18:32 virt-273 systemd[1]: Started Pacemaker High Availability Cluster Manager. > Feb 01 14:18:32 virt-273 pacemakerd[3577]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log > Feb 01 14:18:32 virt-273 pacemakerd[3577]: notice: Starting Pacemaker 2.1.2-2.el9 > Feb 01 14:18:32 virt-273 pacemakerd[3577]: crit: Could not enable Corosync CFG shutdown tracker: CS_ERR_TRY_AGAIN > Feb 01 14:18:32 virt-273 systemd[1]: pacemaker.service: Main process exited, code=exited, status=76/PROTOCOL > Feb 01 14:18:32 virt-273 systemd[1]: pacemaker.service: Failed with result 'exit-code'. > Feb 01 14:18:32 virt-273 systemd[1]: pacemaker.service: Scheduled restart job, restart counter is at 1. > Feb 01 14:18:32 virt-273 systemd[1]: Stopped Pacemaker High Availability Cluster Manager. > Feb 01 14:18:32 virt-273 systemd[1]: Started Pacemaker High Availability Cluster Manager. > Feb 01 14:18:32 virt-273 pacemakerd[3579]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log > Feb 01 14:18:32 virt-273 pacemakerd[3579]: notice: Starting Pacemaker 2.1.2-2.el9 > Feb 01 14:18:32 virt-273 pacemakerd[3579]: crit: Could not enable Corosync CFG shutdown tracker: CS_ERR_TRY_AGAIN > Feb 01 14:18:32 virt-273 systemd[1]: pacemaker.service: Main process exited, code=exited, status=76/PROTOCOL > Feb 01 14:18:32 virt-273 systemd[1]: pacemaker.service: Failed with result 'exit-code'. > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Scheduled restart job, restart counter is at 2. > Feb 01 14:18:33 virt-273 systemd[1]: Stopped Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 systemd[1]: Started Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 pacemakerd[3580]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log > Feb 01 14:18:33 virt-273 pacemakerd[3580]: notice: Starting Pacemaker 2.1.2-2.el9 > Feb 01 14:18:33 virt-273 pacemakerd[3580]: crit: Could not enable Corosync CFG shutdown tracker: CS_ERR_TRY_AGAIN > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Main process exited, code=exited, status=76/PROTOCOL > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Failed with result 'exit-code'. > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Scheduled restart job, restart counter is at 3. > Feb 01 14:18:33 virt-273 systemd[1]: Stopped Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 systemd[1]: Started Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 pacemakerd[3581]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log > Feb 01 14:18:33 virt-273 pacemakerd[3581]: notice: Starting Pacemaker 2.1.2-2.el9 > Feb 01 14:18:33 virt-273 pacemakerd[3581]: crit: Could not enable Corosync CFG shutdown tracker: CS_ERR_TRY_AGAIN > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Main process exited, code=exited, status=76/PROTOCOL > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Failed with result 'exit-code'. > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Scheduled restart job, restart counter is at 4. > Feb 01 14:18:33 virt-273 systemd[1]: Stopped Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 systemd[1]: Started Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 pacemakerd[3582]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log > Feb 01 14:18:33 virt-273 pacemakerd[3582]: notice: Starting Pacemaker 2.1.2-2.el9 > Feb 01 14:18:33 virt-273 pacemakerd[3582]: crit: Could not enable Corosync CFG shutdown tracker: CS_ERR_TRY_AGAIN > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Main process exited, code=exited, status=76/PROTOCOL > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Failed with result 'exit-code'. > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Scheduled restart job, restart counter is at 5. > Feb 01 14:18:33 virt-273 systemd[1]: Stopped Pacemaker High Availability Cluster Manager. > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Start request repeated too quickly. > Feb 01 14:18:33 virt-273 systemd[1]: pacemaker.service: Failed with result 'exit-code'. > Feb 01 14:18:33 virt-273 systemd[1]: Failed to start Pacemaker High Availability Cluster Manager. > Feb 01 14:18:41 virt-273 corosync[3567]: [QUORUM] This node is within the primary component and will provide service. > Feb 01 14:18:41 virt-273 corosync[3567]: [QUORUM] Members[2]: 1 2 > Feb 01 14:18:41 virt-273 corosync[3567]: [MAIN ] Completed service synchronization, ready to provide service. Result: pacemaker.service repeatedly fails with CS_ERR_TRY_AGAIN immediately after a start attempt while corosync is in the process of establishing quorum (due to the slow heuristics script). after fix (pacemaker-2.1.2-3.el9) ================================= > [root@virt-007 ~]# journalctl -u corosync -u corosync-qdevice -u pacemaker -f > Feb 01 14:34:42 virt-007 systemd[1]: Starting Corosync Cluster Engine... > Feb 01 14:34:42 virt-007 corosync[129907]: [MAIN ] Corosync Cluster Engine 3.1.5 starting up > Feb 01 14:34:42 virt-007 corosync[129907]: [MAIN ] Corosync built-in features: dbus systemd xmlconf vqsim nozzle snmp pie relro bindnow > Feb 01 14:34:42 virt-007 corosync[129907]: [TOTEM ] Initializing transport (Kronosnet). > Feb 01 14:34:43 virt-007 corosync[129907]: [TOTEM ] totemknet initialized > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib64/kronosnet/crypto_nss.so > Feb 01 14:34:43 virt-007 corosync[129907]: [SERV ] Service engine loaded: corosync configuration map access [0] > Feb 01 14:34:43 virt-007 corosync[129907]: [QB ] server name: cmap > Feb 01 14:34:43 virt-007 corosync[129907]: [SERV ] Service engine loaded: corosync configuration service [1] > Feb 01 14:34:43 virt-007 corosync[129907]: [QB ] server name: cfg > Feb 01 14:34:43 virt-007 corosync[129907]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] > Feb 01 14:34:43 virt-007 corosync[129907]: [QB ] server name: cpg > Feb 01 14:34:43 virt-007 corosync[129907]: [SERV ] Service engine loaded: corosync profile loading service [4] > Feb 01 14:34:43 virt-007 corosync[129907]: [QUORUM] Using quorum provider corosync_votequorum > Feb 01 14:34:43 virt-007 corosync[129907]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] > Feb 01 14:34:43 virt-007 corosync[129907]: [QB ] server name: votequorum > Feb 01 14:34:43 virt-007 corosync[129907]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] > Feb 01 14:34:43 virt-007 corosync[129907]: [QB ] server name: quorum > Feb 01 14:34:43 virt-007 corosync[129907]: [TOTEM ] Configuring link 0 > Feb 01 14:34:43 virt-007 corosync[129907]: [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:7, port=5405 > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] host: host: 1 has no active links > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] host: host: 1 has no active links > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:34:43 virt-007 corosync[129907]: [KNET ] host: host: 1 has no active links > Feb 01 14:34:43 virt-007 corosync[129907]: [QUORUM] Sync members[1]: 2 > Feb 01 14:34:43 virt-007 corosync[129907]: [QUORUM] Sync joined[1]: 2 > Feb 01 14:34:43 virt-007 corosync[129907]: [TOTEM ] A new membership (2.1b) was formed. Members joined: 2 > Feb 01 14:34:43 virt-007 corosync[129907]: [QUORUM] Members[1]: 2 > Feb 01 14:34:43 virt-007 corosync[129907]: [MAIN ] Completed service synchronization, ready to provide service. > Feb 01 14:34:43 virt-007 systemd[1]: Started Corosync Cluster Engine. > Feb 01 14:34:44 virt-007 corosync[129907]: [KNET ] rx: host: 1 link: 0 is up > Feb 01 14:34:44 virt-007 corosync[129907]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) > Feb 01 14:34:45 virt-007 corosync[129907]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 1157 to 1365 > Feb 01 14:34:45 virt-007 corosync[129907]: [KNET ] pmtud: Global data MTU changed to: 1365 > Feb 01 14:34:45 virt-007 corosync[129907]: [QUORUM] Sync members[2]: 1 2 > Feb 01 14:34:45 virt-007 corosync[129907]: [QUORUM] Sync joined[1]: 1 > Feb 01 14:34:45 virt-007 corosync[129907]: [TOTEM ] A new membership (1.1f) was formed. Members joined: 1 > Feb 01 14:34:46 virt-007 systemd[1]: Started Pacemaker High Availability Cluster Manager. > Feb 01 14:34:46 virt-007 pacemakerd[129921]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log > Feb 01 14:34:46 virt-007 pacemakerd[129921]: notice: Starting Pacemaker 2.1.2-3.el9 > Feb 01 14:34:56 virt-007 corosync[129907]: [QUORUM] This node is within the primary component and will provide service. > Feb 01 14:34:56 virt-007 corosync[129907]: [QUORUM] Members[2]: 1 2 > Feb 01 14:34:56 virt-007 corosync[129907]: [MAIN ] Completed service synchronization, ready to provide service. > Feb 01 14:35:01 virt-007 pacemakerd[129921]: notice: Pacemaker daemon successfully started and accepting connections Result: pacemaker.service does not exit with a fatal error while corosync is establishing quorum, but patiently retries.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: pacemaker), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2293