Bug 1466875
| Summary: | [pacemaker/libqb integration] "crm_mon -d" cannot outlive particular pacemaker session, reconnecting to new one | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Jan Pokorný [poki] <jpokorny> |
| Component: | pacemaker | Assignee: | Chris Lumens <clumens> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 8.0 | CC: | cluster-maint, kgaillot, mnovacek, msmazova |
| Target Milestone: | pre-dev-freeze | Keywords: | Reopened, Triaged |
| Target Release: | 8.4 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.0.5-6.el8 | Doc Type: | No Doc Update |
| Doc Text: |
This affects few enough users that documentation is not needed, especially since there is no pcs interface.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-18 15:26:41 UTC | Type: | Enhancement |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jan Pokorný [poki]
2017-06-30 15:39:11 UTC
Furthermore: # lsof -p $(pidof crm_mon) > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > [...] > crm_mon 30758 root DEL REG 0,17 599553 /dev/shm/qb-cib_ro-control-30389-30758-20 > [...] > crm_mon 30758 root 5u unix 0xffff88008ae0bc00 0t0 600577 @qb-cib_ro-30389-30758-20-response > crm_mon 30758 root 6u unix 0xffff88008ae0e000 0t0 600578 @qb-cib_ro-30389-30758-20-event # kill -0 30389 > -bash: kill: (30389) - No such process Looks like the UNIX socket should be long gone at this point, and it is not. Observation: - it actually worked for me correctly several times, but most of the time, it won't - in failing cases, mon_cib_connection_destroy never gets called - the previous means that client->destroy_fn from the mainloop's perspective won't get called - the previous means that mainloop_gio_destroy won't get called Need to find a link between a gracefull cib daemon shutdown and an attempt to run all the connection at least partially through the above mentioned call stack. Hopefully there will be some. Due to time constraints, this will not make 7.5 Reproducer will be added once solution is settled After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. This is still a priority and is being tracked by an upstream bz. This bz will be reopened once developer time becomes available to address it. Fix has been merged upstream as of commit 8c51b49 QA: Reproducer: 1. Start a cluster. 2. Run on any node: crm_mon --output-to=$ANY_FILE --daemonize 3. That should create $ANY_FILE and update it with the cluster status every 5 seconds. (You can cause various changes to see a different status.) 4. Restart pacemaker on the same node. Before the fix, it would no longer update the file with new events; after the fix, it will. before fix ----------- > [root@virt-153 ~]# rpm -q pacemaker > pacemaker-2.0.4-6.el8.x86_64 > [root@virt-153 ~]# pcs status > Cluster name: STSRHTS19672 > Cluster Summary: > * Stack: corosync > * Current DC: virt-154 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum > * Last updated: Mon Feb 15 21:32:00 2021 > * Last change: Mon Feb 15 18:42:42 2021 by root via cibadmin on virt-153 > * 2 nodes configured > * 2 resource instances configured > Node List: > * Online: [ virt-153 virt-154 ] > Full List of Resources: > * fence-virt-153 (stonith:fence_xvm): Started virt-153 > * fence-virt-154 (stonith:fence_xvm): Started virt-154 > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled run `crm_mon --daemonize`: > [root@virt-153 ~]# crm_mon --output-to=out_before.html --daemonize --output-as=html create resource: > [root@virt-153 ~]# pcs resource create dummy ocf:pacemaker:Dummy backup output file: > [root@virt-153 ~]# cp out_before.html out_before-old.html stop cluster: > [root@virt-153 ~]# pcs cluster stop --all > virt-153: Stopping Cluster (pacemaker)... > virt-154: Stopping Cluster (pacemaker)... > virt-154: Stopping Cluster (corosync)... > virt-153: Stopping Cluster (corosync)... start cluster again: > [root@virt-153 ~]# pcs cluster start --all --wait > virt-153: Starting Cluster... > virt-154: Starting Cluster... > Waiting for node(s) to start... > virt-154: Started > virt-153: Started create new resource: > [root@virt-153 ~]# pcs resource create dummy1 ocf:pacemaker:Dummy backup output file again: > [root@virt-153 ~]# cp out_before.html out_before-new.html see diff: > [root@virt-153 ~]# diff out_before.html out_before-new.html > 26c26 > < <span class="bold">Current DC: </span><span>virt-154 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum</span> > --- > > <span class="bold">Current DC: </span><span>virt-153 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum</span> > 29c29 > < <span class="bold">Last updated: </span><span>Mon Feb 15 21:32:01 2021</span> > --- > > <span class="bold">Last updated: </span><span>Mon Feb 15 21:32:41 2021</span> > 32c32 > < <span class="bold">Last change: </span><span>Mon Feb 15 18:42:42 2021 by root via cibadmin on virt-153</span> > --- > > <span class="bold">Last change: </span><span>Mon Feb 15 21:32:41 2021 by root via cibadmin on virt-153</span> > 35c35 > < <li><span>2 resource instances configured</span></li> > --- > > <li><span>4 resource instances configured</span></li> > 49a50 > > <li><span class="rsc-ok">dummy (ocf::pacemaker:Dummy): Started virt-153</span></li> after fix ---------- > [root@virt-175 ~]# rpm -q pacemaker > pacemaker-2.0.5-6.el8.x86_64 > [root@virt-175 ~]# pcs status > Cluster name: STSRHTS6310 > Cluster Summary: > * Stack: corosync > * Current DC: virt-175 (version 2.0.5-6.el8-ba59be7122) - partition with quorum > * Last updated: Mon Feb 15 17:42:19 2021 > * Last change: Mon Feb 15 17:38:19 2021 by root via cibadmin on virt-175 > * 2 nodes configured > * 2 resource instances configured > Node List: > * Online: [ virt-175 virt-176 ] > Full List of Resources: > * fence-virt-175 (stonith:fence_xvm): Started virt-175 > * fence-virt-176 (stonith:fence_xvm): Started virt-176 > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled run `crm_mon --daemonize`: > [root@virt-175 ~]# crm_mon --output-to=out_after.html --daemonize --output-as=html create resource: > [root@virt-175 ~]# pcs resource create dummy ocf:pacemaker:Dummy backup output file: > [root@virt-175 ~]# cp out_after.html out_after-old.html stop cluster: > [root@virt-175 ~]# pcs cluster stop --all > virt-176: Stopping Cluster (pacemaker)... > virt-175: Stopping Cluster (pacemaker)... > virt-176: Stopping Cluster (corosync)... > virt-175: Stopping Cluster (corosync)... start cluster again: > [root@virt-175 ~]# pcs cluster start --all --wait > virt-176: Starting Cluster... > virt-175: Starting Cluster... > Waiting for node(s) to start... > virt-176: Started > virt-175: Started create new resource: > [root@virt-175 ~]# pcs resource create dummy1 ocf:pacemaker:Dummy backup output file: > [root@virt-175 ~]# cp out_after.html out_after-new.html see diff: > [root@virt-175 ~]# diff out_after-old.html out_after-new.html > 29c29 > < <span class="bold">Last updated: </span><span>Mon Feb 15 17:42:23 2021</span> > --- > > <span class="bold">Last updated: </span><span>Mon Feb 15 17:43:44 2021</span> > 32c32 > < <span class="bold">Last change: </span><span>Mon Feb 15 17:42:20 2021 by root via cibadmin on virt-175</span> > --- > > <span class="bold">Last change: </span><span>Mon Feb 15 17:43:41 2021 by root via cibadmin on virt-175</span> > 35c35 > < <li><span>3 resource instances configured</span></li> > --- > > <li><span>4 resource instances configured</span></li> > 50a51 > > <li><span class="rsc-ok">dummy1 (ocf::pacemaker:Dummy): Started virt-176</span></li> I tried several times to reproduce the original issue, but was not successful. Verified as SanityOnly in pacemaker-2.0.5-6.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:1782 |