Bug 1901357

Summary: crypt resource agent appears incapable of opening the crypt device itself at resource definition time
Product: Red Hat Enterprise Linux 8 Reporter: Corey Marthaler <cmarthal>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.4CC: agk, cluster-maint, fdinitto, mjuricek, sbradley
Target Milestone: rcKeywords: Triaged
Target Release: 8.0Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: resource-agents-4.1.1-80.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 15:12:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2020-11-24 22:59:04 UTC
Description of problem:
When going through the current GFS+crypt documentation I noticed that the 'cryptsetup luksOpen' cmd was absent, which makes sense, since if the cluster is going to manage the location and starting/stopping of this resource it will need to do the luksOpen itself. 

I was only able to get the GFS+crypt resource working by cloning the HA group that contained the shared LVM volume early on, disabling stonith, and doing a manual luksOpen myself to ensure the volume was available and active on all nodes before the crypt resource definition.

I then set out to see if w/o these manual hacks, I could get an exclusive crypt resource running w/o GFS in the picture and debug why it was failing otherwise.


/tmp/luks_key_file -> host-073:/etc/luks_key_file
/tmp/luks_key_file -> host-092:/etc/luks_key_file
/tmp/luks_key_file -> host-093:/etc/luks_key_file
Creating single VG STSRHTS13085 out of /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
host-073: vgchange --lock-start STSRHTS13085
host-092: vgchange --lock-start STSRHTS13085
host-093: vgchange --lock-start STSRHTS13085

Creating HA striped LV(s) and ext4 filesystems on VG STSRHTS13085
        lvcreate --yes --activate ey --type striped -L 8G -i 2 -n lv1 STSRHTS13085

cryptsetup luksFormat /dev/STSRHTS13085/lv1 --type luks2 --key-file=/etc/luks_key_file
LUKS_UUID=487adddd-fd62-46a8-98d8-98940fe39d7e

pcs resource create STSRHTS13085 --group HA_STSRHTS13085 ocf:heartbeat:LVM-activate vgname="STSRHTS13085" activation_mode=exclusive vg_access_mode=lvmlockd

# didn't want to deal with nodes being fenced
[root@host-092 ~]# pcs property set stonith-enabled=false

# Currently a healthy cluster with one lvm resource, currently exclusive and active on only one node (host-092):
[root@host-092 ~]# pcs status
Cluster name: STSRHTS13085
Cluster Summary:
  * Stack: corosync
  * Current DC: host-073 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
  * Last updated: Tue Nov 24 16:23:41 2020
  * Last change:  Tue Nov 24 16:23:24 2020 by root via cibadmin on host-092
  * 3 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ host-073 host-092 host-093 ]

Full List of Resources:
  * fence-host-073      (stonith:fence_xvm):     Started host-073
  * fence-host-092      (stonith:fence_xvm):     Started host-092
  * fence-host-093      (stonith:fence_xvm):     Started host-093
  * Clone Set: locking-clone [locking]:
    * Started: [ host-073 host-092 host-093 ]
  * Resource Group: HA_STSRHTS13085:
    * STSRHTS13085      (ocf::heartbeat:LVM-activate):   Started host-092

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


# Quick verification that this is a valid/active LV w/ luks formatting and able to be opened and closed:
[root@host-092 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                  
  lv1             STSRHTS13085  -wi-a-----   8.00g                                                       /dev/sda1(0),/dev/sdb1(0)
  [lvol0_pmspare] rhel_host-092 ewi-------   4.00m                                                       /dev/vda2(0)             
  pool00          rhel_host-092 twi-aotz--  <4.79g               68.51  51.76                            pool00_tdata(0)          
  [pool00_tdata]  rhel_host-092 Twi-ao----  <4.79g                                                       /dev/vda2(1)             
  [pool00_tmeta]  rhel_host-092 ewi-ao----   4.00m                                                       /dev/vda2(1226)          
  root            rhel_host-092 Vwi-aotz--  <4.79g pool00        68.51                                                            
  swap            rhel_host-092 -wi-ao---- 820.00m                                                       /dev/vda2(1227)          
[root@host-092 ~]# cryptsetup luksOpen /dev/STSRHTS13085/lv1 luks_lv1 --key-file=/etc/luks_key_file
[root@host-092 ~]# dmsetup ls
luks_lv1        (253:7)
STSRHTS13085-lv1        (253:6)
[root@host-092 ~]# cryptsetup luksClose luks_lv1 

# So far so good. Now to define the crypt resource and attach it to the HA group that has the LVM resource in it so they run and are activated together

[root@host-092 ~]# pcs resource create crypt1 --force --group HA_STSRHTS13085 ocf:heartbeat:crypt crypt_dev="luks_lv1" crypt_type=luks2 key_file=/etc/luks_key_file encrypted_dev=487adddd-fd62-46a8-98d8-98940fe39d7e
[root@host-092 ~]# 

# This quickly fails.

# from host-092 log:
Nov 24 16:28:55 host-092 pacemaker-controld[1449112]: notice: Result of probe operation for crypt1 on host-092: not running
Nov 24 16:28:55 host-092 pacemaker-attrd[1449110]: notice: Setting fail-count-crypt1#stop_0[host-073]: (unset) -> INFINITY
Nov 24 16:28:55 host-092 pacemaker-attrd[1449110]: notice: Setting last-failure-crypt1#stop_0[host-073]: (unset) -> 1606256935
Nov 24 16:28:55 host-092 pacemaker-attrd[1449110]: notice: Setting fail-count-crypt1#stop_0[host-093]: (unset) -> INFINITY
Nov 24 16:28:55 host-092 pacemaker-attrd[1449110]: notice: Setting last-failure-crypt1#stop_0[host-093]: (unset) -> 1606256935



[root@host-092 ~]# pcs status
Cluster name: STSRHTS13085
Cluster Summary:
  * Stack: corosync
  * Current DC: host-073 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
  * Last updated: Tue Nov 24 16:32:39 2020
  * Last change:  Tue Nov 24 16:28:54 2020 by root via cibadmin on host-092
  * 3 nodes configured
  * 11 resource instances configured (1 BLOCKED from further action due to failure)

Node List:
  * Online: [ host-073 host-092 host-093 ]

Full List of Resources:
  * fence-host-073      (stonith:fence_xvm):     Started host-073
  * fence-host-092      (stonith:fence_xvm):     Started host-092
  * fence-host-093      (stonith:fence_xvm):     Started host-093
  * Clone Set: locking-clone [locking]:
    * Started: [ host-073 host-092 host-093 ]
  * Resource Group: HA_STSRHTS13085:
    * STSRHTS13085      (ocf::heartbeat:LVM-activate):   Started host-092
    * crypt1    (ocf::heartbeat:crypt):  FAILED (blocked) [ host-093 host-073 ]

Failed Resource Actions:
  * crypt1_stop_0 on host-093 'invalid parameter' (2): call=38, status='complete', exitreason='Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible', last-rc-change='2020-11-24 16:28:55 -06:00', queued=0ms, exec=73ms
  * crypt1_stop_0 on host-073 'invalid parameter' (2): call=38, status='complete', exitreason='Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible', last-rc-change='2020-11-24 16:28:55 -06:00', queued=0ms, exec=69ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


# Why is the cluster trying to run the crypt resource on host-073 when currently the LVM is active on host-092, and if it's going to run on host-073, it needs to relocate the lvm resource which is a part of that group as well. Then, once (and only once the lvm vol is active) can the luksOpen be attempted and the resource brought online on host-073.


# From host-073 log:
Nov 24 16:28:54 host-073 pacemaker-controld[760698]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Nov 24 16:28:54 host-073 pacemaker-schedulerd[760697]: notice:  * Start      crypt1             ( host-092 )
Nov 24 16:28:54 host-073 pacemaker-schedulerd[760697]: notice: Calculated transition 530, saving inputs in /var/lib/pacemaker/pengine/pe-input-57.bz2
Nov 24 16:28:54 host-073 pacemaker-controld[760698]: notice: Initiating monitor operation crypt1_monitor_0 on host-093
Nov 24 16:28:54 host-073 pacemaker-controld[760698]: notice: Initiating monitor operation crypt1_monitor_0 on host-092
Nov 24 16:28:54 host-073 pacemaker-controld[760698]: notice: Initiating monitor operation crypt1_monitor_0 locally on host-073
Nov 24 16:28:55 host-073 crypt(crypt1)[1755199]: ERROR: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible
Nov 24 16:28:55 host-073 pacemaker-execd[760695]: notice: crypt1_monitor_0[1755199] error output [ ocf-exit-reason:Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible ]
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Result of probe operation for crypt1 on host-073: invalid parameter
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: host-073-crypt1_monitor_0:37 [ ocf-exit-reason:Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible\n ]
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 530 aborted by operation crypt1_monitor_0 'modify' on host-073: Event failed
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 530 action 11 (crypt1_monitor_0 on host-073): expected 'not running' but got 'invalid parameter'
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 530 action 13 (crypt1_monitor_0 on host-093): expected 'not running' but got 'invalid parameter'
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 530 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-57.bz2): Complete
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for probe of crypt1 on host-093 at Nov 24 16:28:54 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: If it is not possible for crypt1 to run on host-093, see the resource-discovery option for location constraints
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for probe of crypt1 on host-093 at Nov 24 16:28:54 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: If it is not possible for crypt1 to run on host-093, see the resource-discovery option for location constraints
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for probe of crypt1 on host-073 at Nov 24 16:28:54 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: If it is not possible for crypt1 to run on host-073, see the resource-discovery option for location constraints
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-073 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for probe of crypt1 on host-073 at Nov 24 16:28:54 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: If it is not possible for crypt1 to run on host-073, see the resource-discovery option for location constraints
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-073 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Resource crypt1 is active on 2 nodes (attempting recovery)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice:  * Recover    crypt1             ( host-093 -> host-092 )
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Calculated transition 531 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-7.bz2
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Initiating stop operation crypt1_stop_0 locally on host-073
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Initiating stop operation crypt1_stop_0 on host-093
Nov 24 16:28:55 host-073 crypt(crypt1)[1755217]: ERROR: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible
Nov 24 16:28:55 host-073 pacemaker-execd[760695]: notice: crypt1_stop_0[1755217] error output [ ocf-exit-reason:Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible ]
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Result of stop operation for crypt1 on host-073: invalid parameter
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: host-073-crypt1_stop_0:38 [ ocf-exit-reason:Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible\n ]
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 531 aborted by operation crypt1_stop_0 'modify' on host-073: Event failed
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 531 action 12 (crypt1_stop_0 on host-073): expected 'ok' but got 'invalid parameter'
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 531 action 8 (crypt1_stop_0 on host-093): expected 'ok' but got 'invalid parameter'
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 531 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-error-7.bz2): Complete
Nov 24 16:28:55 host-073 pacemaker-attrd[760696]: notice: Setting fail-count-crypt1#stop_0[host-073]: (unset) -> INFINITY
Nov 24 16:28:55 host-073 pacemaker-attrd[760696]: notice: Setting last-failure-crypt1#stop_0[host-073]: (unset) -> 1606256935
Nov 24 16:28:55 host-073 pacemaker-attrd[760696]: notice: Setting fail-count-crypt1#stop_0[host-093]: (unset) -> INFINITY
Nov 24 16:28:55 host-073 pacemaker-attrd[760696]: notice: Setting last-failure-crypt1#stop_0[host-093]: (unset) -> 1606256935
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-093 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-093 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-073 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-073 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-073 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-073 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-073 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-073 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Resource crypt1 is active on 2 nodes (attempting recovery)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Calculated transition 532 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-8.bz2
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-093 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-093 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-073 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-073 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-073 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: No further recovery can be attempted for crypt1 because stop on host-073 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible) was recorded for stop of crypt1 on host-073 at Nov 24 16:28:55 2020
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Preventing crypt1 from restarting on host-073 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/487adddd-fd62-46a8-98d8-98940fe39d7e not accessible)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Forcing crypt1 away from host-073 after 1000000 failures (max=1000000)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: warning: Forcing crypt1 away from host-093 after 1000000 failures (max=1000000)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Resource crypt1 is active on 2 nodes (attempting recovery)
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Nov 24 16:28:55 host-073 pacemaker-schedulerd[760697]: error: Calculated transition 533 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-9.bz2
Nov 24 16:28:55 host-073 pacemaker-controld[760698]: notice: Transition 533 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-9.bz2): Complete



Version-Release number of selected component (if applicable):
resource-agents-4.1.1-74.el8.x86_64


How reproducible:
Everytime

Comment 1 Oyvind Albrigtsen 2020-11-25 08:28:40 UTC
Can you add output from starting the crypt resource with "pcs resource debug-start --full"?

Comment 3 Oyvind Albrigtsen 2020-11-26 12:00:38 UTC
https://github.com/ClusterLabs/resource-agents/pull/1587

Comment 6 Corey Marthaler 2020-11-30 18:06:04 UTC
I dont see a difference in behavior here with the latest rpm.

I set up a "perfectly healthy" LVM-activate resource, and even relocated it to ensure all is well before attempting the crypt resource definition.

[root@host-073 ~]# rpm -qi resource-agents
Name        : resource-agents
Version     : 4.1.1
Release     : 79.el8
Architecture: x86_64
Install Date: Mon 30 Nov 2020 10:59:38 AM CST
Group       : System Environment/Base
Size        : 1509374
License     : GPLv2+ and LGPLv2+
Signature   : (none)
Source RPM  : resource-agents-4.1.1-79.el8.src.rpm
Build Date  : Mon 30 Nov 2020 04:49:33 AM CST
Build Host  : x86-vm-09.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/ClusterLabs/resource-agents
Summary     : Open Source HA Reusable Cluster Resource Scripts
Description :
A set of scripts to interface with several services to operate in a
High Availability environment for both Pacemaker and rgmanager
service managers.



/tmp/luks_key_file -> host-073:/etc/luks_key_file
/tmp/luks_key_file -> host-092:/etc/luks_key_file
/tmp/luks_key_file -> host-093:/etc/luks_key_file
Creating single VG STSRHTS13085 out of /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
host-073: vgchange --lock-start STSRHTS13085
host-092: vgchange --lock-start STSRHTS13085
host-093: vgchange --lock-start STSRHTS13085

Creating HA striped LV(s) and ext4 filesystems on VG STSRHTS13085
        lvcreate --yes --activate ey --type striped -L 8G -i 2 -n lv1 STSRHTS13085

cryptsetup luksFormat /dev/STSRHTS13085/lv1 --type luks2 --key-file=/etc/luks_key_file
LUKS_UUID=f84b9e75-72c0-43aa-b550-a563b13ae517

pcs resource create STSRHTS13085 --group HA_STSRHTS13085 ocf:heartbeat:LVM-activate vgname="STSRHTS13085" activation_mode=exclusive vg_access_mode=lvmlockd

[root@host-093 ~]# pcs status
Cluster name: STSRHTS13085
Cluster Summary:
  * Stack: corosync
  * Current DC: host-073 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
  * Last updated: Mon Nov 30 11:47:13 2020
  * Last change:  Mon Nov 30 11:47:01 2020 by root via cibadmin on host-093
  * 3 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ host-073 host-092 host-093 ]

Full List of Resources:
  * fence-host-073      (stonith:fence_xvm):     Started host-073
  * fence-host-092      (stonith:fence_xvm):     Started host-092
  * fence-host-093      (stonith:fence_xvm):     Started host-093
  * Clone Set: locking-clone [locking]:
    * Started: [ host-073 host-092 host-093 ]
  * Resource Group: HA_STSRHTS13085:
    * STSRHTS13085      (ocf::heartbeat:LVM-activate):   Started host-093

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@host-093 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                  
  lv1             STSRHTS13085  -wi-a-----   8.00g                                                       /dev/sda1(0),/dev/sdb1(0)

[root@host-093 ~]# pcs resource move HA_STSRHTS13085 host-073
[root@host-093 ~]# pcs status
Cluster name: STSRHTS13085
Cluster Summary:
  * Stack: corosync
  * Current DC: host-073 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
  * Last updated: Mon Nov 30 11:49:04 2020
  * Last change:  Mon Nov 30 11:48:53 2020 by root via crm_resource on host-093
  * 3 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ host-073 host-092 host-093 ]

Full List of Resources:
  * fence-host-073      (stonith:fence_xvm):     Started host-073
  * fence-host-092      (stonith:fence_xvm):     Started host-092
  * fence-host-093      (stonith:fence_xvm):     Started host-093
  * Clone Set: locking-clone [locking]:
    * Started: [ host-073 host-092 host-093 ]
  * Resource Group: HA_STSRHTS13085:
    * STSRHTS13085      (ocf::heartbeat:LVM-activate):   Started host-073

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# LVM resource is now properly active on host-073:
[root@host-073 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                  
  lv1             STSRHTS13085  -wi-a-----   8.00g                                                       /dev/sdd1(0),/dev/sdb1(0)


# Attempt crypt definition:
[root@host-073 ~]# pcs resource create crypt1 --force --group HA_STSRHTS13085 ocf:heartbeat:crypt crypt_dev="luks_lv1" crypt_type=luks2 key_file=/etc/luks_key_file encrypted_dev=f84b9e75-72c0-43aa-b550-a563b13ae517

[root@host-073 ~]# pcs status
Cluster name: STSRHTS13085
Cluster Summary:
  * Stack: corosync
  * Current DC: host-073 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
  * Last updated: Mon Nov 30 11:51:31 2020
  * Last change:  Mon Nov 30 11:50:57 2020 by root via cibadmin on host-073
  * 3 nodes configured
  * 11 resource instances configured (1 BLOCKED from further action due to failure)

Node List:
  * Online: [ host-073 host-092 host-093 ]

Full List of Resources:
  * fence-host-073      (stonith:fence_xvm):     Started host-073
  * fence-host-092      (stonith:fence_xvm):     Started host-092
  * fence-host-093      (stonith:fence_xvm):     Started host-093
  * Clone Set: locking-clone [locking]:
    * Started: [ host-073 host-092 host-093 ]
  * Resource Group: HA_STSRHTS13085:
    * STSRHTS13085      (ocf::heartbeat:LVM-activate):   Started host-073
    * crypt1    (ocf::heartbeat:crypt):  FAILED (blocked) [ host-093 host-092 ]

Failed Resource Actions:
  * crypt1_stop_0 on host-093 'invalid parameter' (2): call=41, status='complete', exitreason='Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible', last-rc-change='2020-11-30 11:50:57 -06:00', queued=0ms, exec=75ms
  * crypt1_stop_0 on host-092 'invalid parameter' (2): call=38, status='complete', exitreason='Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible', last-rc-change='2020-11-30 11:50:57 -06:00', queued=0ms, exec=55ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Nov 30 11:48:57 host-073 LVM-activate(STSRHTS13085)[5179]: INFO: STSRHTS13085: activated successfully.
Nov 30 11:48:57 host-073 pacemaker-controld[3415]: notice: Result of start operation for STSRHTS13085 on host-073: ok
Nov 30 11:48:57 host-073 pacemaker-controld[3415]: notice: Initiating monitor operation STSRHTS13085_monitor_30000 locally on host-073
Nov 30 11:48:57 host-073 pacemaker-controld[3415]: notice: Result of monitor operation for STSRHTS13085 on host-073: ok
Nov 30 11:48:57 host-073 pacemaker-controld[3415]: notice: Transition 8 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-8.bz2): Complete
Nov 30 11:48:57 host-073 pacemaker-controld[3415]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Nov 30 11:48:57 host-073 systemd[1]: dnf-makecache.service: Succeeded.
Nov 30 11:48:57 host-073 systemd[1]: Started dnf makecache.
Nov 30 11:49:53 host-073 pcsd[1899]: INFO:tornado.access:200 GET /remote/get_configs?cluster_name=STSRHTS13085 (10.15.105.92) 7.74ms
Nov 30 11:49:53 host-073 restraintd[1714]: *** Current Time: Mon Nov 30 11:49:53 2020 Localwatchdog at:  * Disabled! *
Nov 30 11:50:53 host-073 restraintd[1714]: *** Current Time: Mon Nov 30 11:50:53 2020 Localwatchdog at:  * Disabled! *
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice:  * Start      crypt1             (             host-073 )
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-9.bz2
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Initiating monitor operation crypt1_monitor_0 on host-093
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Initiating monitor operation crypt1_monitor_0 on host-092
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Initiating monitor operation crypt1_monitor_0 locally on host-073
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 9 aborted by operation crypt1_monitor_0 'modify' on host-092: Event failed
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 9 action 12 (crypt1_monitor_0 on host-092): expected 'not running' but got 'invalid parameter'
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 9 action 13 (crypt1_monitor_0 on host-093): expected 'not running' but got 'invalid parameter'
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Result of probe operation for crypt1 on host-073: not running
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 9 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-9.bz2): Complete
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for probe of crypt1 on host-093 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: If it is not possible for crypt1 to run on host-093, see the resource-discovery option for location constraints
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for probe of crypt1 on host-093 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: If it is not possible for crypt1 to run on host-093, see the resource-discovery option for location constraints
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for probe of crypt1 on host-092 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: If it is not possible for crypt1 to run on host-092, see the resource-discovery option for location constraints
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-092 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for probe of crypt1 on host-092 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: If it is not possible for crypt1 to run on host-092, see the resource-discovery option for location constraints
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-092 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Resource crypt1 is active on 2 nodes (attempting recovery)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice:  * Recover    crypt1             ( host-093 -> host-073 )
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Calculated transition 10 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-0.bz2
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Initiating stop operation crypt1_stop_0 on host-092
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Initiating stop operation crypt1_stop_0 on host-093
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 10 aborted by operation crypt1_stop_0 'modify' on host-092: Event failed
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 10 action 12 (crypt1_stop_0 on host-092): expected 'ok' but got 'invalid parameter'
Nov 30 11:50:57 host-073 pacemaker-attrd[3413]: notice: Setting fail-count-crypt1#stop_0[host-092]: (unset) -> INFINITY
Nov 30 11:50:57 host-073 pacemaker-attrd[3413]: notice: Setting last-failure-crypt1#stop_0[host-092]: (unset) -> 1606758657
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 10 aborted by transient_attributes.2 'create': Transient attribute change
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 10 action 8 (crypt1_stop_0 on host-093): expected 'ok' but got 'invalid parameter'
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 10 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-error-0.bz2): Complete
Nov 30 11:50:57 host-073 pacemaker-attrd[3413]: notice: Setting fail-count-crypt1#stop_0[host-093]: (unset) -> INFINITY
Nov 30 11:50:57 host-073 pacemaker-attrd[3413]: notice: Setting last-failure-crypt1#stop_0[host-093]: (unset) -> 1606758657
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-093 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-093 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-092 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-092 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-092 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-092 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-092 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-092 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Forcing crypt1 away from host-092 after 1000000 failures (max=1000000)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Resource crypt1 is active on 2 nodes (attempting recovery)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Calculated transition 11 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-1.bz2
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-093 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-093 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-093 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-093 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-092 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-092 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-092 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: No further recovery can be attempted for crypt1 because stop on host-092 failed (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Unexpected result (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible) was recorded for stop of crypt1 on host-092 at Nov 30 11:50:57 2020
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Preventing crypt1 from restarting on host-092 because of hard failure (invalid parameter: Encrypted device /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 not accessible)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Forcing crypt1 away from host-092 after 1000000 failures (max=1000000)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: warning: Forcing crypt1 away from host-093 after 1000000 failures (max=1000000)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Resource crypt1 is active on 2 nodes (attempting recovery)
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Nov 30 11:50:57 host-073 pacemaker-schedulerd[3414]: error: Calculated transition 12 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-2.bz2
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: Transition 12 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-2.bz2): Complete
Nov 30 11:50:57 host-073 pacemaker-controld[3415]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE

Comment 7 Corey Marthaler 2020-11-30 18:16:28 UTC
Also, to again ensure and verify the device is present and capable of being luksOpen'ed, i ran that command manually on the node with the LVM-activate resource running and it worked fine.

[root@host-073 ~]# lvs
  LV     VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv1    STSRHTS13085  -wi-a-----   8.00g                                                      
[root@host-073 ~]# ls /dev/STSRHTS13085/lv1 
/dev/STSRHTS13085/lv1

[root@host-073 ~]# cryptsetup luksOpen /dev/STSRHTS13085/lv1 luks_lv1 --key-file=/etc/luks_key_file
[root@host-073 ~]# dmsetup ls
luks_lv1        (253:7)
STSRHTS13085-lv1        (253:6)

I assume the resource-agent is basically just running this above open command to present this crypt volume?

Comment 8 Corey Marthaler 2020-11-30 18:28:31 UTC
FWIW, I also attempted using the uuid instead of the LVM name, and that worked as well.

[root@host-073 ~]# cryptsetup luksClose luks_lv1

[root@host-073 ~]# cryptsetup luksOpen /dev/disk/by-uuid/f84b9e75-72c0-43aa-b550-a563b13ae517 luks_lv1 --key-file=/etc/luks_key_file
[root@host-073 ~]# dmsetup ls
luks_lv1        (253:7)
STSRHTS13085-lv1        (253:6)

Which raises the question. If we're suggesting that we use lvm volumes/resources below, which one of the benefits is unified logical volume naming, shouldn't we just be using the LVM lv name/path instead of this luks uuid name?

Comment 9 Oyvind Albrigtsen 2020-12-01 09:14:40 UTC
https://github.com/ClusterLabs/resource-agents/pull/1593

Comment 10 Corey Marthaler 2020-12-01 20:05:00 UTC
Fixed in the latest rpm resource-agents-4.1.1-80.el8.x86_64, marking Verified.

/tmp/luks_key_file -> host-073:/etc/luks_key_file
/tmp/luks_key_file -> host-092:/etc/luks_key_file
/tmp/luks_key_file -> host-093:/etc/luks_key_file

Creating single VG STSRHTS13085 out of /dev/sdg1 /dev/sda1 /dev/sdh1 /dev/sde1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sdf1
Creating HA striped LV(s) and gfs2 filesystems on VG STSRHTS13085
        lvcreate --yes --activate sy --type striped -L 8G -i 2 -n lv1 STSRHTS13085
cryptsetup luksFormat /dev/STSRHTS13085/lv1 --type luks2 --key-file=/etc/luks_key_file
LUKS_UUID=2b471d0c-2fde-4a67-99c5-359db40a0f1c
cryptsetup luksOpen /dev/STSRHTS13085/lv1 luks_lv1 --key-file=/etc/luks_key_file
        mkfs.gfs2 -j 3 -J 32 -t STSRHTS13085:STSRHTS13085-lv1 /dev/mapper/luks_lv1 -O

cryptsetup luksClose luks_lv1
pcs resource create STSRHTS13085 --group HA_STSRHTS13085 ocf:heartbeat:LVM-activate vgname="STSRHTS13085" activation_mode=shared vg_access_mode=lvmlockd
pcs resource create crypt1 --force --group HA_STSRHTS13085 ocf:heartbeat:crypt crypt_dev="luks_lv1" crypt_type=luks2 key_file=/etc/luks_key_file encrypted_dev=2b471d0c-2fde-4a67-99c5-359db40a0f1c
pcs resource create fs1 --group HA_STSRHTS13085 Filesystem device="/dev/mapper/luks_lv1" directory="/mnt/fs1" fstype="gfs2" "options=noatime" op monitor interval=10s
pcs resource clone HA_STSRHTS13085
        lvcreate --yes --activate sy --type striped -L 8G -i 2 -n lv2 STSRHTS13085
cryptsetup luksFormat /dev/STSRHTS13085/lv2 --type luks2 --key-file=/etc/luks_key_file
LUKS_UUID=9c8013cf-a7d0-4f1e-a620-8a086a305b2e
cryptsetup luksOpen /dev/STSRHTS13085/lv2 luks_lv2 --key-file=/etc/luks_key_file
        mkfs.gfs2 -j 3 -J 32 -t STSRHTS13085:STSRHTS13085-lv2 /dev/mapper/luks_lv2 -O

cryptsetup luksClose luks_lv2
pcs resource create crypt2 --force --group HA_STSRHTS13085 ocf:heartbeat:crypt crypt_dev="luks_lv2" crypt_type=luks2 key_file=/etc/luks_key_file encrypted_dev=9c8013cf-a7d0-4f1e-a620-8a086a305b2e
pcs resource create fs2 --group HA_STSRHTS13085 Filesystem device="/dev/mapper/luks_lv2" directory="/mnt/fs2" fstype="gfs2" "options=noatime" op monitor interval=10s

pcs constraint order start locking-clone then HA_STSRHTS13085-clone

Running cleanup to fix any potential timing issues during setup
pcs resource cleanup

Checking status of resources on all nodes

[root@host-073 ~]# pcs status
Cluster name: STSRHTS13085
Cluster Summary:
  * Stack: corosync
  * Current DC: host-093 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
  * Last updated: Tue Dec  1 14:00:42 2020
  * Last change:  Tue Dec  1 13:58:05 2020 by root via cibadmin on host-093
  * 3 nodes configured
  * 24 resource instances configured

Node List:
  * Online: [ host-073 host-092 host-093 ]

Full List of Resources:
  * fence-host-073      (stonith:fence_xvm):     Started host-073
  * fence-host-092      (stonith:fence_xvm):     Started host-092
  * fence-host-093      (stonith:fence_xvm):     Started host-093
  * Clone Set: locking-clone [locking]:
    * Started: [ host-073 host-092 host-093 ]
  * Clone Set: HA_STSRHTS13085-clone [HA_STSRHTS13085]:
    * Started: [ host-073 host-092 host-093 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 12 errata-xmlrpc 2021-05-18 15:12:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (resource-agents bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1736