Bug 1902208
| Summary: | LVM-activate: Node is fenced during reboot when a cluster-managed VG uses an iSCSI-attached PV | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> |
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.3 | CC: | agk, cfeist, cluster-maint, cluster-qe, fdinitto, kgaillot, milind.kulkarni, mjuricek, oalbrigt, obenes, phagara |
| Target Milestone: | rc | Keywords: | EasyFix |
| Target Release: | 8.4 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | resource-agents-4.1.1-79.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1901688 | Environment: | |
| Last Closed: | 2021-05-18 15:12:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Reid Wahl
2020-11-27 11:07:58 UTC
Oyvind suggested installing a systemd drop-in "After=blk-availability.service" directive to resource-agents-deps.target during the LVM-activate start operation, as the legacy LVM agent did: - https://github.com/ClusterLabs/resource-agents/blob/8f7e35455453e8cb355fccf895d7e07b7c64eb30/heartbeat/LVM#L232-L234 This sounded like a good idea, considering the definition of blk-availability, and I think this does put us on the right track. The definition includes iscsi-shutdown.service as well as some other storage presentation services, so it should be more versatile in preventing issues like the one reported in this bug. ~~~ [Unit] Description=Availability of block devices Before=shutdown.target After=lvm2-activation.service iscsi-shutdown.service iscsi.service iscsid.service fcoe.service rbdmap.service DefaultDependencies=no Conflicts=shutdown.target [Service] Type=oneshot ExecStart=/usr/bin/true ExecStop=/usr/sbin/blkdeactivate -u -l wholevg -m disablequeueing -r wait RemainAfterExit=yes ~~~ Surprisingly, it didn't work. This is because blk-availability.service is not enabled by default and thus does not start automatically. So it doesn't get stopped during shutdown. [root@fastvm-rhel-8-0-24 ~]# systemctl status blk-availability.service ● blk-availability.service - Availability of block devices Loaded: loaded (/usr/lib/systemd/system/blk-availability.service; disabled; vendor preset: disabled) Active: inactive (dead) I checked my RHEL 7 system and found that blk-availability.service was active there despite being disabled. [root@fastvm-rhel-7-6-22 ~]# systemctl status blk-availability.service ● blk-availability.service - Availability of block devices Loaded: loaded (/usr/lib/systemd/system/blk-availability.service; disabled; vendor preset: disabled) Active: active (exited) since Fri 2020-11-27 16:40:31 PST; 16s ago ... As it turns out, this is because multipathd.service is enabled on my RHEL 7 system: [root@fastvm-rhel-7-6-22 ~]# systemctl show blk-availability | egrep '(Wanted|Required)By=' WantedBy=multipathd.service [root@fastvm-rhel-7-6-22 ~]# systemctl is-enabled multipathd enabled So I suspect that the systemd_drop_in() approach for blk-availability works on RHEL 7 **only if** blk-availability.service already gets started as a dependency of another service like multipathd.service. In other words, **this is probably a bug in the legacy LVM resource agent**, but probably not one that's worth fixing at this point. Note that adding a "Wants=blk-availability.service" directive as shown below also doesn't work. ~~~ if systemd_is_running; then systemd_drop_in "99-LVM-activate-after" "After" \ "blk-availability.service" systemd_drop_in "99-LVM-activate-wants" "Wants" \ "blk-availability.service" fi ~~~ This is because the directive is only added **after** pacemaker.service has started and the LVM-activate resource starts. I think we have two pretty straightforward options here: (1) Configure an "After=/Wants=" dependency on blk-availability.service **before** pacemaker.service gets started (and thus before the LVM-activate resource agent runs). We could ship an /etc/systemd/system/resource-agents-deps.target.d/99-LVM-activate.conf that includes these directives. (2) Run `systemctl start blk-availability.service` from within the resource agent at the time when we install the dependencies. @Oyvind: If this sounds good to you, let me know which of those approaches sounds better to you (I'm guessing #2, to keep it within the agent). One of us can submit the PR next week. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (resource-agents bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1736 *** Bug 1972035 has been marked as a duplicate of this bug. *** |