Hide Forgot
Description of problem: If for some reason systemd fails to start sbd it will still start up pacemaker leading to pacemaker not being observed by sbd, watchdog possibly not being engaged and poison-pill not read from disk. Version-Release number of selected component (if applicable): sbd-1.3.1-2.el7 How reproducible: Steps to Reproduce: 1. start with an sbd-config with shared disk 2. make sbd wait for msgwait on startup /etc/sysconfig/sbd: SBD_DELAY_START=yes 2. choose a short (5s) startup-timeout for sbd-service /etc/systemd/system/sbd.service.d/sbd.conf [Service] TimeoutStartSec=5 3. configure msgwait=10s [root@node2 ~]# sbd -d /dev/vdb dump ==Dumping header on disk /dev/vdb Header version : 2.1 UUID : ad2116f8-fa24-431b-a2e0-fa3f373e049e Number of slots : 255 Sector size : 512 Timeout (watchdog) : 45 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 ==Header on disk /dev/vdb is dumped 4. check that pacemaker, corosync & sbd are down 5. start pacemaker (systemctl start pacemaker) Actual results: sbd times out on being started by systemd (expected) pacemaker still comes up (not expected) Expected results: pacemaker shouldn't come up if sbd doesn't Additional info: Having /etc/systemd/system/pacemaker.service.d/sbd.conf: [Unit] Requires=sbd.service solves the issue. sbd-service being 'PartOf' corosync is just not enough to make systemd wait/check for sbd-start before starting pacemaker. As installation of the sbd-package shouldn't automatically enable it or require it to be enabled just adding this file as is to the sbd-package is no solution. Though adding templates for both files /etc/systemd/system/sbd.service.d/sbd.conf /etc/systemd/system/pacemaker.service.d/sbd.conf with the actual content being commented out might be useful. Another, possibly cleaner solution, would be the introduction of an additional target - containing corosync and optionally sbd - that is required by pacemaker. What is again open is the distribution of the target and the dependencies on it to on the 3 packages corosync, sbd and pacemaker. Cleanest would be modification of all 3 adding the target to corosync. Make corosync have an [Install] section that adds it to the new target. Make sbd have an [Install] section that adds it to the new target. And make pacemkaker depend on the new target. All directly in the 3 unit-files ... Though it should be possible to achieve the modifications for corosync and pacemaker by sbd delivering the appropriate snippets. So sbd would be the only package to be modified - with the drawback of being a little bit more confusing.
Even simpler fix: sbd.service: ... [Install] ... RequiredBy=pacemaker.service
(In reply to Klaus Wenninger from comment #2) > Even simpler fix: > > sbd.service: > ... > [Install] > ... > RequiredBy=pacemaker.service Sounds good :) However the target idea is interesting on its own ... perhaps a 'cluster-layer.target' that includes any components below pacemaker in the cluster stack. That might simplify supporting multiple cluster-layer implementations again in the future.
True - we should think the target-idea over. But for now I would suggest to go with the solution suggested in https://github.com/ClusterLabs/sbd/pull/39 as it solves the issue with a change in just a single package. A solution touching multiple packages would as well introduce strict version dependencies to work properly.
With https://github.com/ClusterLabs/sbd/pull/42 such a conditional reenabling would be added. I was kind of surprised that this isn't being taken care of somehow automatically or that systemd-rpm-scriptlet-macros under /lib/rpm/macros.d/macros.systemd wouldn't at least provide a macro for that purpose.
(In reply to Klaus Wenninger from comment #10) See a controversal discussion on the pull-request. Personally I'm not so convinced that an automatized reenable is that good an idea. Actually the missing update of the dependency-links is a general issue. I don't even think there is a need to document it - maybe in a release note that states the bug fixes. Actually the man-page of systemctl is quite explicit on that topic already: reenable NAME... Reenable one or more unit files, as specified on the command line. This is a combination of disable and enable and is useful to reset the symlinks a unit is enabled with to the defaults configured in the "[Install]" section of the unit file.
Steven: Yes, changed a lot but I guess it still says what I meant. But it leaves the impression that sbd is always needed by pacemaker. Guess if we alter the first sentence a little bit that should clarify the issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0924