Hide Forgot
Description of problem: ocf:pacemaker:controld: configdir attribute doesn't allow specification of configfs mount point. /sys/kernel/config is the default, but configdir parameterizes it. https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/controld#L67: 67 <parameter name="configdir" unique="1"> 68 <longdesc lang="en"> 69 The location where configfs is or should be mounted 70 </longdesc> 71 <shortdesc lang="en">Location of configfs</shortdesc> 72 <content type="string" default="/sys/kernel/config" /> 73 </parameter> The /sys/kernel/config location is hard-coded into at least three locations: - https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/controld#L183 - https://github.com/wferi/dlm/blob/master/dlm_controld/action.c (throughout) - https://github.com/wferi/dlm/blob/master/init/dlm.init#L33 configdir was parameterized 10 years ago in the following: - https://github.com/ClusterLabs/pacemaker/commit/d7b606c077e38b32936306a98220f4d0575c4ed4 We can trace back the first hard-coding of /sys/kernel/config in the resource agent to the following commit 5 years ago. (The location and exact invocation has since changed.) - https://github.com/ClusterLabs/pacemaker/commit/f30a4d4eb08181f556271a796ef8a63f3c003690 It is not clear that there is a valid use case for the configdir attribute. If there is not, then rather than fixing the hard-coding issues in both the resource agent and dlm, perhaps we can deprecate the attribute. If we opt to make the parameter usable instead of deprecating and removing it, then we will need to: - parameterize the location of the addr_list in the resource agent (easy) - pass the parameter into dlm_controld, maybe as "CLUSTER_DIR" -------------------- Version-Release number of selected component (if applicable): pacemaker-1.1.18-11.el7.x86_64 upstream master -------------------- How reproducible: always -------------------- Steps to Reproduce: I tried a few different approaches to make the controld resource and the dlm_controld process use the specified mount point for the configdir. None worked. With /sys/kernel/config still mounted: # # Show current mount # mount | grep configfs configfs on /sys/kernel/config type configfs (rw,relatime) # # Create custom mount point on all nodes # mkdir /testmnt # # Use custom configdir # pcs resource update dlm configdir=/testmnt # tail /var/log/messages ... Apr 17 22:22:39 fastvm-rhel-7-4-22 controld(dlm)[16687]: ERROR: /testmnt/dlm not available Apr 17 22:22:39 fastvm-rhel-7-4-22 crmd[1154]: notice: Result of start operation for dlm on fastvm-rhel-7-4-22: 5 (not installed) Apr 17 22:22:39 fastvm-rhel-7-4-22 crmd[1154]: notice: Result of stop operation for dlm on fastvm-rhel-7-4-22: 0 (ok) # # Add dlm subdirectory on all nodes and try again # mkdir /testmnt/dlm # pcs resource cleanup # # controld resource is now Started, but DLM is still using /sys/kernel/config # pcs resource show | grep -A1 dlm Clone Set: dlm-clone [dlm] Started: [ fastvm-rhel-7-4-22 fastvm-rhel-7-4-23 ] # mount | grep configfs configfs on /sys/kernel/config type configfs (rw,relatime) # ls /sys/kernel/config/dlm cluster # ls /testmnt/dlm # # # Prevent anything from using /sys/kernel/config # # Disable controld/clvm resources, unmount /sys/kernel/config on both nodes, update configdir, enable resources # pcs resource disable clvmd # pcs resource disable dlm # umount /sys/kernel/config # pcs resource update dlm configdir=/testmnt # pcs resource enable dlm # pcs resource enable clvmd # # ocf:pacemaker:controld has mounted configfs on /testmnt (the configdir) # mount | grep config none on /testmnt type configfs (rw,relatime) # # However, DLM still depends on /sys/kernel/config being present # tail /var/log/messages ... Apr 17 22:39:10 fastvm-rhel-7-4-22 dlm_controld[3164]: 275 dlm_controld 4.0.7 started Apr 17 22:39:10 fastvm-rhel-7-4-22 dlm_controld[3164]: 275 No /sys/kernel/config/dlm, is the dlm loaded? Apr 17 22:39:11 fastvm-rhel-7-4-22 crmd[1142]: notice: Result of start operation for dlm on fastvm-rhel-7-4-22: 7 (not running) Apr 17 22:39:11 fastvm-rhel-7-4-22 crmd[1142]: notice: Result of stop operation for dlm on fastvm-rhel-7-4-22: 0 (ok) # # Reboot both nodes. configfs is once again mounted on /sys/kernel/config (by a kmod?), but now there is no dlm subdirectory and the controld resource fails to start. # ls /sys/kernel/config/ # ls /testmnt/dlm # tail /var/log/messages ... Apr 17 22:43:22 fastvm-centos-7-4-22 dlm_controld[1388]: 40 dlm_controld 4.0.7 started Apr 17 22:43:22 fastvm-centos-7-4-22 dlm_controld[1388]: 40 Is dlm missing from kernel? No misc devices found. Apr 17 22:43:22 fastvm-centos-7-4-22 dlm_controld[1388]: 40 No /sys/kernel/config/dlm, is the dlm loaded? Apr 17 22:43:23 fastvm-centos-7-4-22 crmd[1148]: notice: Result of start operation for dlm on fastvm-centos-7-4-22: 7 (not running) -------------------- Actual results: See "Steps to Reproduce." -------------------- Expected results: - configfs mounted on ${OCF_RESKEY_configdir} - ocf:pacemaker:controld resource in Started state - dlm_controld process running - data in ${OCF_RESKEY_configdir}/dlm/cluster -------------------- Additional info: N/A
Disregard CentOS messages. Copy-paste mistake from another trial.
I agree that it is not worth the effort to fully implement this, as demand is nonexistent. My plan for RHEL 7.6 is to have ocf:pacemaker:controld ignore the parameter, always using /sys/kernel/config, and to indicate in the meta-data that it is deprecated and ignored. Then we can remove it in a later RHEL version. QA: test is trivial, check the controld metadata to see whether it says the configdir parameter is deprecated, and try configuring a controld resource with a nonstandard configdir location (without changing the actual mountpoint) -- before the fix, the resource should fail, and after the fix, it should work (using the standard mountpoint).
fixed upstream by commit b65defeb
before: ======= > [root@virt-257 ~]# rpm -q pacemaker > pacemaker-1.1.18-11.el7.x86_64 > [root@virt-257 ~]# pcs resource describe ocf:pacemaker:controld > ocf:pacemaker:controld - DLM Agent for cluster file systems > > This Resource Agent can control the dlm_controld services needed by cluster-aware file systems. > It assumes that dlm_controld is in your default PATH. > In most cases, it should be run as an anonymous clone. > > Resource options: > args: Any additional options to start the dlm_controld service with > configdir: The location where configfs is or should be mounted > daemon: The daemon to start - supports gfs_controld(.pcmk) and dlm_controld(.pcmk) > allow_stonith_disabled: Allow DLM start-up even if STONITH/fencing is disabled in the cluster. Setting this option to true will cause cluster malfunction and hangs on fail-over for DLM clients that > require fencing (such as GFS2, OCFS2, and cLVM2). This option is advanced use only. > > Default operations: > start: interval=0s timeout=90 > stop: interval=0s timeout=100 > monitor: interval=10 start-delay=0 timeout=20 > [root@virt-257 ~]# pcs resource create controld ocf:pacemaker:controld allow_stonith_disabled=true configdir=/non/existent/path > [root@virt-257 ~]# pcs status > <snip> > Full list of resources: > > controld (ocf::pacemaker:controld): Stopped > > Failed Actions: > * controld_start_0 on virt-257.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=75, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:05:39 2018', queued=0ms, exec=102ms > * controld_start_0 on virt-261.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=41, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:05:40 2018', queued=0ms, exec=100ms > <snip> > [root@virt-257 ~]# pcs resource update controld configdir=/tmp > [root@virt-257 ~]# pcs status > <snip> > Full list of resources: > > controld (ocf::pacemaker:controld): Stopped > > Failed Actions: > * controld_start_0 on virt-257.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=112, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:19:43 2018', queued=1ms, exec=103ms > * controld_start_0 on virt-261.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=66, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:19:43 2018', queued=0ms, exec=101ms > <snip> > [root@virt-257 ~]# mkdir /tmp/dlm > [root@virt-257 ~]# pcs resource cleanup > [root@virt-257 ~]# pcs resource > controld (ocf::pacemaker:controld): Started virt-257.cluster-qe.lab.eng.brq.redhat.com > [root@virt-257 ~]# umount /sys/kernel/config > [root@virt-257 ~]# pcs resource restart controld > Error: Error performing operation: Timer expired > > Set 'controld' option: id=controld-meta_attributes-target-role set=controld-meta_attributes name=target-role=stopped > Waiting for 1 resources to stop: > * controld > Deleted 'controld' option: id=controld-meta_attributes-target-role name=target-role > Waiting for 1 resources to start again: > * controld > Could not complete restart of controld, 1 resources remaining > * controld > [root@virt-257 ~]# pcs status > <snip> > Full list of resources: > > controld (ocf::pacemaker:controld): Stopped > > Failed Actions: > * controld_start_0 on virt-257.cluster-qe.lab.eng.brq.redhat.com 'not running' (7): call=123, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:25:26 2018', queued=2ms, exec=1144ms > * controld_start_0 on virt-261.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=73, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:25:27 2018', queued=0ms, exec=121ms > <snip> > [root@virt-257 ~]# mkdir /custom/configfs > [root@virt-257 ~]# mount -t configfs configfs /custom/configfs > [root@virt-257 ~]# pcs resource update controld configdir=/custom/configfs > [root@virt-257 ~]# pcs status > <snip> > Full list of resources: > > controld (ocf::pacemaker:controld): Stopped > > Failed Actions: > * controld_start_0 on virt-257.cluster-qe.lab.eng.brq.redhat.com 'not running' (7): call=125, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:30:18 2018', queued=1ms, exec=1190ms > * controld_start_0 on virt-261.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=75, status=complete, exitreason='', > last-rc-change='Mon Jul 16 18:30:19 2018', queued=0ms, exec=98ms > <snip> > [root@virt-257 ~]# mount -t configfs configfs /sys/kernel/config > [root@virt-257 ~]# pcs resource update controld configdir= > [root@virt-257 ~]# pcs resource > controld (ocf::pacemaker:controld): Started virt-257.cluster-qe.lab.eng.brq.redhat.com Controld agent returns with a "not installed" error when cluster attempts to start the resource with non-existing config directory. With an invalid, but existing config directory (/tmp where no "dlm" subdir existst), the same "not installed" error is returned. Creating a fake "dlm" directory in /tmp allows the initial checks of resource agent to pass, and it does start successfully -- but using the wrong configfs path (/sys/kernel/config instead of /tmp). This claim is easily verified by unmounting /sys/kernel/config and forcing resource restart -- resource fails to start with "not running". The same error ("not running") is returned when we create a real configfs mount in non-standard location. Everything returns back to normal once we mount configfs in its standard location and remove the configdir resource agent option. after: ====== > [root@virt-257 ~]# rpm -q pacemaker > pacemaker-1.1.19-3.el7.x86_64 > [root@virt-257 ~]# pcs resource describe ocf:pacemaker:controld > ocf:pacemaker:controld - DLM Agent for cluster file systems > > This Resource Agent can control the dlm_controld services needed by cluster-aware file systems. > It assumes that dlm_controld is in your default PATH. > In most cases, it should be run as an anonymous clone. > > Resource options: > args: Any additional options to start the dlm_controld service with > configdir: This parameter is deprecated and ignored > daemon: The daemon to start - supports gfs_controld(.pcmk) and dlm_controld(.pcmk) > allow_stonith_disabled: Allow DLM start-up even if STONITH/fencing is disabled in the cluster. Setting this option to true will cause cluster malfunction and hangs on fail-over for DLM clients that > require fencing (such as GFS2, OCFS2, and cLVM2). This option is advanced use only. > > Default operations: > start: interval=0s timeout=90 > stop: interval=0s timeout=100 > monitor: interval=10 start-delay=0 timeout=20 > [root@virt-257 ~]# pcs resource create controld ocf:pacemaker:controld allow_stonith_disabled=true configdir=/non/existent/path > [root@virt-257 ~]# pcs resource > controld (ocf::pacemaker:controld): Started virt-257.cluster-qe.lab.eng.brq.redhat.com No warning shown when creating resource, no deprecation message present in cluster logs -- the deprecated configdir option is silently ignored. Controld resource successfully starts using the default configfs path. Marking verified in 1.1.19-3.el7.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3055