RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1872378 - [RFE] Provide a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources
Summary: [RFE] Provide a way to add a scsi fencing device to a cluster without requiri...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pcs
Version: 8.3
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 8.5
Assignee: Miroslav Lisik
QA Contact: cluster-qe@redhat.com
Steven J. Levine
URL:
Whiteboard:
Depends On: 1872376
Blocks: 1894575 2023845 2024522 2035332
TreeView+ depends on / blocked
 
Reported: 2020-08-25 15:40 UTC by Chris Feist
Modified: 2021-12-23 16:10 UTC (History)
13 users (show)

Fixed In Version: pcs-0.10.8-4.el8
Doc Type: Enhancement
Doc Text:
.New `pcs` command to update SCSI fencing device without causing restart of all other resources Updating a SCSI fencing device with the `pcs stonith update` command causes a restart of all resources running on the same node where the stonith resource was running. The new `pcs stonith update-scsi-devices` command allows you to update SCSI devices without causing a restart of other cluster resources.
Clone Of: 1872376
: 2023845 2035332 (view as bug list)
Environment:
Last Closed: 2021-11-09 17:33:51 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
proposed fix + tests (114.38 KB, patch)
2021-07-08 14:18 UTC, Miroslav Lisik
no flags Details | Diff
additional fixes + tests (63.73 KB, patch)
2021-07-21 11:20 UTC, Miroslav Lisik
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4526971 0 None None None 2021-01-04 18:31:50 UTC
Red Hat Product Errata RHSA-2021:4142 0 None None None 2021-11-09 17:34:43 UTC

Internal Links: 1991654 1992668

Description Chris Feist 2020-08-25 15:40:49 UTC
+++ This bug was initially created as a clone of Bug #1872376 +++

Provide a way to add a scsi fencing device to a cluster without requiring a restart of all cluster resources

---

We are still in the early stages of planning this, but pcs would need to add keys to a scsi device (or call the resource agent to add the new keys) and then call pacemaker with a special command to update the stonith device without unfencing.

We would need to add a special pcs command to add a scsi device to the scsi fencing stonith agent.

Comment 1 Ken Gaillot 2020-08-25 21:15:23 UTC
See Bug 1872376 for how I'm thinking this might work.

pcs would unfence each node by directly executing the agent, and make a note of the current timestamp. Then it would call a new crm_resource option to generate a hash of the new resource parameters. Then it could make the parameter changes in the CIB, simultaneously updating the op-digest (and potentially op-secure-digest) attributes of the lrm_rsc_op for the fence device's start operation on the node where it's currently active, as well as the #node-unfenced, #digests-all, and #digests-secure node attributes for any node that has them, using the saved timestamp and generated hash.

It's ugly but I can't think of anything better.

Comment 6 Ken Gaillot 2021-01-12 18:37:53 UTC
FYI, the required Pacemaker support has been added upstream and should land in 8.4. Bug 1872376 has an example and the supported feature set.

Comment 11 Miroslav Lisik 2021-07-08 14:18:48 UTC
Created attachment 1799690 [details]
proposed fix + tests

Added command:
pcs stonith update-scsi-devices

Test:

# pcs stonith update-scsi-devices scsi-fencing set <device-path1> <device-path2>

Scsi devices should be unfenced and updated and no resources should be restarted.

Comment 12 Miroslav Lisik 2021-07-09 07:21:28 UTC
Test:

[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.8-3.el8.x86_64

export disk1=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b
export disk2=/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe
export disk3=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4
export disk4=/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x23, there are NO registered reservation keys
  PR generation=0xf, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# pcs stonith create scsi-fencing fence_scsi devices="$disk1" pcmk_host_check="static-list" pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action="off" meta provides="unfencing"
[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x25, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0xf, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
  PR generation=0x0, there are NO registered reservation keys
[root@r8-node-01 ~]# for i in $(seq -w 01 04); do pcs resource create d-$i ocf:pacemaker:Dummy; done
[root@r8-node-01 ~]# pcs stonith
  * scsi-fencing        (stonith:fence_scsi):    Started r8-node-01
[root@r8-node-01 ~]# pcs resource
  * d-01        (ocf::pacemaker:Dummy):  Started r8-node-02
  * d-02        (ocf::pacemaker:Dummy):  Started r8-node-01
  * d-03        (ocf::pacemaker:Dummy):  Started r8-node-02
  * d-04        (ocf::pacemaker:Dummy):  Started r8-node-01
[root@r8-node-01 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)


### Updating scsi devices (adding 3 new devices):

[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk1 $disk2 $disk3 $disk4

Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.

[root@r8-node-01 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)


[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:22:19 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul  8 15:22:20 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-163.bz2
Jul  8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: Transition 8 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-163.bz2): Complete
Jul  8 15:22:20 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Jul  8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul  8 15:22:39 r8-node-01 kernel: sda: sda1
Jul  8 15:22:39 r8-node-01 kernel: sdd: sdd1
Jul  8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul  8 15:22:40 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul  8 15:22:40 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-164.bz2
Jul  8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Complete
Jul  8 15:22:40 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul  8 15:22:39 r8-node-02 kernel: sd 6:0:0:0: Parameters changed
Jul  8 15:22:39 r8-node-02 kernel: sda: sda1
Jul  8 15:22:39 r8-node-02 kernel: sdd: sdd1
Jul  8 15:22:40 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)

[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x26, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x11, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x2, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x2, 2 registered reservation keys follow:
    0x14080000
    0x14080001

### Updating scsi devices (removing 2 devices):


[root@r8-node-01 ~]# pcs stonith update-scsi-devices scsi-fencing set $disk3 $disk4

Result: scsi devices have been updated, no resources have been restarted and devices are unfenced, there are keys frome each node on the devices.
NOTE: command does not undo unfencing which is previous cluster behavior

[root@r8-node-01 ~]# pcs stonith config
 Resource: scsi-fencing (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052f8c6f3de01047c29b72040f4,/dev/disk/by-id/scsi-360014058c228bdd68b1499c89b426063 pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-fencing-monitor-interval-60s)

[root@r8-node-01 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul  8 15:34:48 r8-node-01 pacemaker-fenced[624025]: notice: Added 'scsi-fencing' to device list (1 active device)
Jul  8 15:34:48 r8-node-01 pacemaker-schedulerd[624028]: notice: Calculated transition 10, saving inputs in /var/lib/pacemaker/pengine/pe-input-165.bz2
Jul  8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: Transition 10 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete
Jul  8 15:34:48 r8-node-01 pacemaker-controld[624029]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@r8-node-02 ~]# tail -n 0 -f /var/log/messages
Jul  8 15:34:48 r8-node-02 pacemaker-fenced[577901]: notice: Added 'scsi-fencing' to device list (1 active device)


[root@r8-node-01 ~]# for d in $disk{1..4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x26, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x11, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x3, 2 registered reservation keys follow:
    0x14080000
    0x14080001
  PR generation=0x3, 2 registered reservation keys follow:
    0x14080000
    0x14080001

### Fence node r8-node-02:

[root@r8-node-01 ~]# pcs stonith fence r8-node-02
Node: r8-node-02 fenced
[root@r8-node-01 ~]# for d in $disk{3,4}; do sg_persist -n -i -k -d $d ; done
  PR generation=0x4, 1 registered reservation key follows:
    0x14080000
  PR generation=0x4, 1 registered reservation key follows:
    0x14080000

Comment 17 Miroslav Lisik 2021-07-21 11:20:26 UTC
Created attachment 1804075 [details]
additional fixes + tests

Fixes for 3 special cases.

Modified command:
* pcs stonith update-scsi-devices


Test:

Environment:
A 3-node cluster with configured fence_scsi fencing and resources running on each node.

A)
1. Stop cluster on one of cluster nodes
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command is successful

B)
1. Shutdown a cluster node or stop pcsd
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices without and with option --skip-offline
3. Command is successful using --skip-offline node

C)
1. on one cluster node fail shared device that is going to be added
2. Use command 'pcs stonith update-scsi-devices' to update scsi devices
3. Command failed and devices are not updated

Comment 18 Miroslav Lisik 2021-07-21 11:29:38 UTC
DevTestResults:

[root@r8-node-01 pcs]# rpm -q pcs
pcs-0.10.8-4.el8.x86_64

A)

[root@r8-node-01 pcs]# pcs status pcsd
  r8-node-03: Online
  r8-node-02: Online
  r8-node-01: Online
[root@r8-node-01 pcs]# pcs status nodes corosync
Corosync Nodes:
 Online: r8-node-01 r8-node-02
 Offline: r8-node-03
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)


[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
[root@r8-node-01 ~]# echo $?
0
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

B)

[root@r8-node-01 pcs]# pcs status pcsd
  r8-node-03: Offline
  r8-node-02: Online
  r8-node-01: Online
[root@r8-node-01 pcs]# pcs status nodes corosync
Corosync Nodes:
 Online: r8-node-01 r8-node-02
 Offline: r8-node-03
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
Error: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused), use --skip-offline to override
Error: Errors have occurred, therefore pcs is unable to continue
[root@r8-node-01 ~]# echo $?
1
[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2 --skip-offline
Warning: Unable to connect to r8-node-03 (Failed to connect to r8-node-03 port 2224: Connection refused)
[root@r8-node-01 ~]# echo $?
0
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b,/dev/disk/by-id/scsi-3600140547721f8ee2774aa8bac6d8ebe pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

C)

[root@r8-node-01 pcs]# pcs status pcsd
  r8-node-03: Online
  r8-node-02: Online
  r8-node-01: Online
[root@r8-node-01 pcs]# pcs status nodes corosync
Corosync Nodes:
 Online: r8-node-01 r8-node-02 r8-node-03
 Offline:
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)


### Failing $disk2 on the r8-node-3:
[root@r8-node-03 ~]# echo offline > /sys/block/sda/device/state


[root@r8-node-01 ~]# pcs stonith update-scsi-devices fence-scsi set $disk1 $disk2
Error: r8-node-03: Unfencing failed:
2021-07-20 19:40:02,530 ERROR: Cannot get registration keys

2021-07-20 19:40:02,531 ERROR: Please use '-h' for usage
Error: Errors have occurred, therefore pcs is unable to continue
[root@r8-node-01 ~]# echo $?
1
[root@r8-node-01 ~]# pcs stonith config
 Resource: fence-scsi (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/scsi-360014052bc36324cf7d4a709a959340b pcmk_host_check=static-list pcmk_host_list="r8-node-01 r8-node-02 r8-node-03" pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-monitor-interval-60s)

Comment 24 errata-xmlrpc 2021-11-09 17:33:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4142


Note You need to log in before you can comment on or make changes to this bug.