Bug 1870873

Summary: "cibsecret sync" fails if node name is different from hostname
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.3CC: cluster-maint, msmazova
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.3   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.0.4-6.el8 Doc Type: No Doc Update
Doc Text:
This fix is for a build that has not been released
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 04:00:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1793860    

Description Ken Gaillot 2020-08-20 21:47:46 UTC
Description of problem: "cibsecret sync" filters out the local node from the list of all nodes by searching for the output of "uname -n". This causes two issues: if the node name is different from the hostname, this will cause local secrets to be removed; and if another node's name is a superset of the local node's name (e.g. "node10" vs. "node1"), secrets will not be synced to it.


Version-Release number of selected component (if applicable): 2.0.4-5


How reproducible: consistent


Steps to Reproduce:
1. Configure a cluster with node names different from hostnames.
2. Configure a resource and use cibsecret to make a parameter secret.
3. Run "cibsecret sync".

Actual results: Secrets are removed from the local node.


Expected results: Local secrets are synced to all other nodes.

Comment 1 Ken Gaillot 2020-08-20 21:54:23 UTC
Fixed upstream by commit afca6af

Comment 5 Markéta Smazová 2020-09-23 11:30:58 UTC
before fix
----------
Please see bug 1793860, comment 9 (case 2).

after fix
----------

>   [root@virt-023 ~]# rpm -q pacemaker
>   pacemaker-2.0.4-6.el8.x86_64

>   [root@virt-023 ~]# pcs status
>   Cluster name: STSRHTS25177
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-024 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
>     * Last updated: Thu Sep 10 13:17:44 2020
>     * Last change:  Thu Sep 10 12:45:22 2020 by hacluster via crmd on virt-024
>     * 3 nodes configured
>     * 9 resource instances configured
>
>   Node List:
>     * Online: [ virt-023 virt-024 virt-031 ]
>
>   Full List of Resources:
>     * fence-virt-023	(stonith:fence_xvm):	 Started virt-023
>     * fence-virt-024	(stonith:fence_xvm):	 Started virt-024
>     * fence-virt-031	(stonith:fence_xvm):	 Started virt-031
>     * Clone Set: locking-clone [locking]:
>       * Started: [ virt-023 virt-024 virt-031 ]
>
>   Daemon Status:
>     corosync: active/enabled
>     pacemaker: active/enabled
>     pcsd: active/enabled

Check that hostnames are different from cluster node names.

>   [root@virt-023 ~]# uname -n
>   virt-023.cluster-qe.lab.eng.brq.redhat.com

>   [root@virt-024 ~]# uname -n
>   virt-024.cluster-qe.lab.eng.brq.redhat.com

>   [root@virt-031 ~]# uname -n
>   virt-031.cluster-qe.lab.eng.brq.redhat.com

Remove node `virt-031` from the cluster.

>   [root@virt-023 ~]# pcs cluster node remove virt-031
>   Destroying cluster on hosts: 'virt-031'...
>   virt-031: Successfully destroyed cluster
>   Sending updated corosync.conf to nodes...
>   virt-023: Succeeded
>   virt-024: Succeeded
>   virt-023: Corosync configuration reloaded

Put the cluster in maintenance mode.

>   [root@virt-023 ~]# pcs property set maintenance-mode=true
>   [root@virt-023 ~]# echo $?
>   0

Set the `delay` attribute for `fence-virt-023` stonith resource as a secret.

>   [root@virt-023 ~]# cibsecret set fence-virt-023 delay 10
>   INFO: syncing /var/lib/pacemaker/lrm/secrets/fence-virt-023/delay to  virt-024  ...
>   Set 'fence-virt-023' option: id=fence-virt-023-instance_attributes-delay name=delay value=lrm://

Add node `virt-031` back to the cluster.

>   [root@virt-023 ~]# pcs cluster node add virt-031
>   No addresses specified for host 'virt-031', using 'virt-031'
>   Disabling sbd...
>   virt-031: sbd disabled
>   Sending 'corosync authkey', 'pacemaker authkey' to 'virt-031'
>   virt-031: successful distribution of the file 'corosync authkey'
>   virt-031: successful distribution of the file 'pacemaker authkey'
>   Sending updated corosync.conf to nodes...
>   virt-024: Succeeded
>   virt-031: Succeeded
>   virt-023: Succeeded
>   virt-023: Corosync configuration reloaded

On the new cluster node `virt-031` start Corosync and verify that it started.

>   [root@virt-031 ~]# systemctl start corosync.service
>   [root@virt-031 ~]# systemctl is-active corosync.service
>   active

Check cluster nodes status.

>   [root@virt-023 ~]# pcs status nodes
>   Pacemaker Nodes:
>    Online: virt-023 virt-024
>    Standby: virt-031
>    Standby with resource(s) running:
>    Maintenance:
>    Offline:
>   [...]

Run `cibsecret sync` to synchronize the secret file to the new node `virt-031`.

>   [root@virt-023 ~]# cibsecret sync
>   INFO: syncing /var/lib/pacemaker/lrm/secrets to  virt-024 virt-031  ...

Check if the secret file is synchronized across all cluster nodes.

>   [root@virt-023 ~]# ls -l /var/lib/pacemaker/lrm/secrets/fence-virt-023
>   total 8
>   -rw-------. 1 root root  3 Sep 10 13:19 delay
>   -rw-------. 1 root root 33 Sep 10 13:19 delay.sign

>   [root@virt-024 ~]# ls -l /var/lib/pacemaker/lrm/secrets/fence-virt-023
>   total 8
>   -rw-------. 1 root root  3 Sep 10 13:19 delay
>   -rw-------. 1 root root 33 Sep 10 13:19 delay.sign

>   [root@virt-031 ~]# ls -l /var/lib/pacemaker/lrm/secrets/fence-virt-023
>   total 8
>   -rw-------. 1 root root  3 Sep 10 13:19 delay
>   -rw-------. 1 root root 33 Sep 10 13:19 delay.sign

Running `cibsecret get` on new node `virt-031` will not work, until Pacemaker is started on the node.

>   [root@virt-031 ~]# cibsecret get fence-virt-023 delay
>   ERROR: pacemaker not running? cibsecret needs pacemaker

Start cluster services (including Pacemaker) on new node `virt-031`.

>   [root@virt-023 ~]# pcs cluster start virt-031
>   virt-031: Starting Cluster...

Turn off cluster maintenance mode.

>   [root@virt-023 ~]# pcs property set maintenance-mode=false
>   [root@virt-023 ~]# pcs status
>   Cluster name: STSRHTS25177
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-024 (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
>     * Last updated: Thu Sep 10 13:23:29 2020
>     * Last change:  Thu Sep 10 13:20:42 2020 by hacluster via crmd on virt-024
>     * 3 nodes configured
>     * 9 resource instances configured

>   Node List:
>     * Online: [ virt-023 virt-024 virt-031 ]

>   Full List of Resources:
>     * fence-virt-023	(stonith:fence_xvm):	 Started virt-023
>     * fence-virt-024	(stonith:fence_xvm):	 Started virt-024
>     * fence-virt-031	(stonith:fence_xvm):	 Started virt-031
>     * Clone Set: locking-clone [locking]:
>       * Started: [ virt-023 virt-024 virt-031 ]

>   Daemon Status:
>     corosync: active/enabled
>     pacemaker: active/enabled
>     pcsd: active/enabled

Use `cibsecret get` to verify that `delay` secret value can be displayed on all cluster nodes.

>   [root@virt-024 ~]# cibsecret get fence-virt-023 delay
>   10

>   [root@virt-031 ~]# cibsecret get fence-virt-023 delay
>   10

>   [root@virt-023 ~]# cibsecret get fence-virt-023 delay
>   10


marking verified in pacemaker-2.0.4-6.el8

Comment 8 errata-xmlrpc 2020-11-04 04:00:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4804