1417936 – When deploying a cluster using short hostnames, resources running via pacemaker-remote won't work correctly if the remote's hostname is a FQDN

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1417936 - When deploying a cluster using short hostnames, resources running via pacemaker-remote won't work correctly if the remote's hostname is a FQDN

Summary: When deploying a cluster using short hostnames, resources running via pacemak...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.3
Hardware:	All
OS:	All
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	7.4
Assignee:	Andrew Beekhof
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1420426
TreeView+	depends on / blocked

Reported:	2017-01-31 13:04 UTC by Michele Baldessari
Modified:	2017-08-01 17:54 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pacemaker-1.1.16-3.el7
Doc Type:	Bug Fix
Doc Text:	Prior to this update, if a resource agent used the crm_node command to obtain the node name, the resource agent sometimes received incorrect information if it was running on a Pacemaker remote node. This negatively affected the functionality of resource agents that use the node name. Now, Pacemaker automatically sets an environment variable with the node name, and crm_node uses this variable when it is available. As a result, the described problem no longer occurs.
Clone Of:
Clones:	1420426 (view as bug list)
Environment:
Last Closed:	2017-08-01 17:54:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1862	0	normal	SHIPPED_LIVE	pacemaker bug fix and enhancement update	2017-08-01 18:04:15 UTC

Description Michele Baldessari 2017-01-31 13:04:23 UTC

Description of problem:
(NB: This bug is being filed after a discussion with Andrew, Damien and myself)

In OSP we deploy a pacemaker cluster using the short hostname since the dawn of time. So no matter if the hostname is set to controller-0 or its fqdn counterpart (controller-0.localdomain), it all works correctly. The problem starts when we add a pacemaker remote resource (for consistency reasons we add it via the short hostname '/usr/sbin/pcs resource create overcloud-galera-0 remote server=172.17.0.10 reconnect_interval=60 op monitor interval=20') [1].

Now the problem is that most resource agents that make use of the NODENAME environment variable cannot work on the remote node. The reason for this is
mainly here:
ocf_local_nodename() in ocf-shellfuncs where we do:
ocf_local_nodename() {
        # use crm_node -n for pacemaker > 1.1.8
        which pacemakerd > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                local version=$(pacemakerd -$ | grep "Pacemaker .*" | awk '{ print $2 }')
                version=$(echo $version | awk -F- '{ print $1 }')
                ocf_version_cmp "$version" "1.1.8"
                if [ $? -eq 2 ]; then
                        which crm_node > /dev/null 2>&1
                        if [ $? -eq 0 ]; then
                                crm_node -n
                                return
                        fi
                fi
        fi
 
        # otherwise use uname -n
        uname -n
}

I say 'mainly' because the same kind of code can be found in lib/cluster/cluster.c:get_local_node_name() -> get_node_name(0).

The problem is that NODENAME on the remote nodes will be the FQDN hostname which is not known to pacemaker. So both setting per-node attributes *and* relying on NODENAME like galera does will break.

Andrew suggested a backwards-compatible change that adds another environment
variable that gets exported by pacemaker which always contains the name of the node as known by pacemaker. (PCMK_NODENAME, maybe?) and does not make use of any uname/hostname calls.

Resource agents can then be tweaked to make use of this variable, if it exists and thereby avoiding this issue entirely.

[1] Note that just adding the remote as a fqdn name, like '/usr/sbin/pcs resource create overcloud-galera-0.localdomain remote server=172.17.0.10 reconnect_interval=60 op monitor interval=20', won't really solve things. Take the galera example: We have to add to the galera RA all the galera node names in an RA metaparameter and we could not really start passing the short hostname if it is a corosync node and an fqdn if it is a remote node.

[2] Deploying everything via FQDNs (so both corosync nodes and pacemaker-remote node) is not possible because we would not be able to manage upgrades in any sensible way.

Comment 1 Michele Baldessari 2017-01-31 13:15:06 UTC

Note, that we could then simply ocf_local_nodename() to make use of this variable when it exists, so that changes to any RA won't actually be necessary.

Comment 3 Andrew Beekhof 2017-01-31 22:41:36 UTC

Thinking more, it might make sense to have crm_node itself look for PCMK_NODENAME so that anyone calling it directly from an agent will be assured of getting the "right" value.

Comment 4 Ken Gaillot 2017-02-01 18:39:12 UTC

Bug 1374175 is essentially the same as this. The proposed solution there is to move some of crm_node's intelligence to the daemons, so that crm_node does not need to be linked against libcrmcluster (which should not be a requirement on remote nodes). crm_node would ask a local daemon what the node name is.

It would be possible to use an environment variable as suggested here instead. The cluster would set the variable when calling the agent, and crm_node (called by ocf_local_nodename) would use it if present. However the other bug is more general, and deals with crm_node called in any context, not just resource agents.

I'll see if I can raise the priority on that, but the development cycle for 7.4 is quite short.

*** This bug has been marked as a duplicate of bug 1374175 ***

Comment 5 Andrew Beekhof 2017-02-01 22:17:43 UTC

I don't think OSP can wait for 7.5
Due to other constraints, all OSP deployments are deployed affected by this - preventing us from running any agent that uses attrd (galera, etc) on remote nodes.

Comment 6 Andrew Beekhof 2017-02-06 01:10:09 UTC

[12:08 PM] beekhof@fedora ~/Development/sources/pacemaker/devel ☺ # git diff lib pengine/graph.c tools
diff --git a/lib/common/utils.c b/lib/common/utils.c
index 83072c5..3e3abd3 100644
--- a/lib/common/utils.c
+++ b/lib/common/utils.c
@@ -894,6 +894,8 @@ filter_action_parameters(xmlNode * param_set, const char *version)
         XML_ATTR_ID,
         XML_ATTR_CRM_VERSION,
         XML_LRM_ATTR_OP_DIGEST,
+        XML_LRM_ATTR_TARGET,
+        XML_LRM_ATTR_TARGET_UUID,
     };
 
     gboolean do_delete = FALSE;
diff --git a/pengine/graph.c b/pengine/graph.c
index 569cf6e..81d8355 100644
--- a/pengine/graph.c
+++ b/pengine/graph.c
@@ -948,6 +948,9 @@ action2xml(action_t * action, gboolean as_input, pe_working_set_t *data_set)
         if (router_node) {
             crm_xml_add(action_xml, XML_LRM_ATTR_ROUTER_NODE, router_node->details->uname);
         }
+
+        g_hash_table_insert(action->meta, strdup(XML_LRM_ATTR_TARGET), strdup(action->node->details->uname));
+        g_hash_table_insert(action->meta, strdup(XML_LRM_ATTR_TARGET_UUID), strdup(action->node->details->id));
     }
 
     /* No details if this action is only being listed in the inputs section */
diff --git a/tools/crm_node.c b/tools/crm_node.c
index d927f31..a76e550 100644
--- a/tools/crm_node.c
+++ b/tools/crm_node.c
@@ -951,7 +951,11 @@ main(int argc, char **argv)
     }
 
     if (command == 'n') {
-        fprintf(stdout, "%s\n", get_local_node_name());
+        const char *name = getenv(CRM_META"_"XML_LRM_ATTR_TARGET);
+        if(name == NULL) {
+            name = get_local_node_name();
+        }
+        fprintf(stdout, "%s\n", name);
         crm_exit(pcmk_ok);
 
     } else if (command == 'N') {

Comment 10 Ken Gaillot 2017-02-06 21:12:48 UTC

We can keep the BZs separate -- this one can address the immediate workaround, and Bug 1374175 can address the longer-term fix.

The difference is that the fix here (in Comment 6) only applies when crm_node is run by a resource agent via the cluster. It does not do anything if crm_node (or the resource agent) is run from the command-line.

Comment 19 Ken Gaillot 2017-02-08 00:02:47 UTC

Fixed upstream by commit e0eb9e7

Comment 23 michal novacek 2017-05-16 12:12:25 UTC

I have verified that remote node can run resource when specified with both
short name and fqdn with pacemaker-1.1.16-9.el7.x86_64

---

Common setup (two cases):
  1/ create cluster with guest node running a resource, cluster nodes short and
  guest nodes having short names [1], [2]

  2/ create cluster with guest node running a resource, cluster nodes and guest
  nodes having fqdn names [3], [4]

before the patch pacemaker-1.1.16-2.el7.x86_64
==============================================
[root@tardis-03 ~]# pcs resource update R-pool-10-34-69-100 meta \
remote-node=pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com

(wait for pacemaker to recognize taht it will not start on either node)

[root@tardis-01 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4 (version 1.1.16-2.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 11:35:42 2017
Last change: Tue May 16 11:31:37 2017 by hacluster via crmd on tardis-01.ipv4

3 nodes configured
17 resources configured

Online: [ tardis-01.ipv4 tardis-03.ipv4 ]

Full list of resources:

 fence-tardis-03        (stonith:fence_ipmilan):        Started tardis-01.ipv4
 fence-tardis-01        (stonith:fence_ipmilan):        Started tardis-03.ipv4
 Clone Set: dlm-clone [dlm]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
     Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
     Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: shared-vg-clone [shared-vg]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 Clone Set: etc-libvirt-clone [etc-libvirt]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 Clone Set: images-clone [images]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
>> R-pool-10-34-69-57     (ocf::heartbeat:VirtualDomain): Stopped
>> dummy  (ocf::heartbeat:Dummy): Started tardis-01.ipv4

Failed Actions:
>>  * pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com_start_0 on tardis-01.ipv4 'unknown error' (1): call=16, status=Timed Out, exitreason='none',
    last-rc-change='Tue May 16 11:31:39 2017', queued=0ms, exec=0ms
>>  * pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com_start_0 on tardis-03.ipv4 'unknown error' (1): call=1, status=Timed Out, exitreason='none',
    last-rc-change='Tue May 16 11:34:04 2017', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

after the patch pacemaker-1.1.16-9.el7.x86_64
=============================================

>> 1) hosts having short names
> update guest node to have fqdn as remote-node identifier

[root@tardis-03 ~]# pcs resource update R-pool-10-34-69-100 meta \
remote-node=pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com

[root@tardis-03 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4 (version 1.1.16-9.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 09:30:47 2017
Last change: Tue May 16 09:29:50 2017 by root via cibadmin on tardis-03.ipv4

3 nodes configured
17 resources configured

Online: [ tardis-01.ipv4 tardis-03.ipv4 ]
>> GuestOnline: [ pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com ]

Full list of resources:

 fence-tardis-03        (stonith:fence_ipmilan):        Started tardis-01.ipv4
 fence-tardis-01        (stonith:fence_ipmilan):        Started tardis-01.ipv4
 Clone Set: dlm-clone [dlm]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
     Stopped: [ pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
     Stopped: [ pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: shared-vg-clone [shared-vg]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 Clone Set: etc-libvirt-clone [etc-libvirt]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 Clone Set: images-clone [images]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 R-pool-10-34-69-100    (ocf::heartbeat:VirtualDomain): Started tardis-03.ipv4
>> dummy  (ocf::heartbeat:Dummy): Started pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

>> 2) hosts having fully qualified domain names
[root@tardis-01 ~]# pcs resource update R-pool-10-34-69-57 meta \
remote-node=pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com

[root@tardis-01 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (version 1.1.16-2.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 14:09:49 2017
Last change: Tue May 16 14:09:01 2017 by root via cibadmin on tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com

3 nodes configured
17 resources configured

Online: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
> GuestOnline: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com.cluster-qe.lab.eng.brq.redhat.com ]

Full list of resources:

 fence-tardis-03        (stonith:fence_ipmilan):        Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
 fence-tardis-01        (stonith:fence_ipmilan):        Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
 Clone Set: dlm-clone [dlm]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
     Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
     Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: shared-vg-clone [shared-vg]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: etc-libvirt-clone [etc-libvirt]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: images-clone [images]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
 R-pool-10-34-69-57     (ocf::heartbeat:VirtualDomain): Started tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
> dummy  (ocf::heartbeat:Dummy): Started pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

-----

>>> (1) pcs status (hosts having short names)
[root@tardis-03 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4 (version 1.1.16-9.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 09:26:29 2017
Last change: Tue May 16 09:25:28 2017 by root via cibadmin on tardis-03.ipv4

3 nodes configured
17 resources configured

Online: [ tardis-01.ipv4 tardis-03.ipv4 ]
GuestOnline: [ pool-10-34-69-100 ]

Full list of resources:

 fence-tardis-03        (stonith:fence_ipmilan):        Started tardis-01.ipv4
 fence-tardis-01        (stonith:fence_ipmilan):        Started tardis-01.ipv4
 Clone Set: dlm-clone [dlm]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
     Stopped: [ pool-10-34-69-100 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
     Stopped: [ pool-10-34-69-100 ]
 Clone Set: shared-vg-clone [shared-vg]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 Clone Set: etc-libvirt-clone [etc-libvirt]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 Clone Set: images-clone [images]
     Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
 R-pool-10-34-69-100    (ocf::heartbeat:VirtualDomain): Started tardis-03.ipv4
 dummy  (ocf::heartbeat:Dummy): Started pool-10-34-69-100

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

>> (2) pcs config (hosts having short names)
[root@tardis-03 ~]# pcs config
Cluster Name: STSRHTS27159
Corosync Nodes:
 tardis-03.ipv4 tardis-01.ipv4
Pacemaker Nodes:
 tardis-01.ipv4 tardis-03.ipv4

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Clone: shared-vg-clone
  Meta Attrs: clone-max=2 interleave=true 
  Resource: shared-vg (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=false partial_activation=false volgrpname=shared
   Operations: monitor interval=10 timeout=30 (shared-vg-monitor-interval-10)
               start interval=0s timeout=30 (shared-vg-start-interval-0s)
               stop interval=0s timeout=30 (shared-vg-stop-interval-0s)
 Clone: etc-libvirt-clone
  Meta Attrs: clone-max=2 interleave=true 
  Resource: etc-libvirt (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
   Operations: monitor interval=30s (etc-libvirt-monitor-interval-30s)
               start interval=0s timeout=60 (etc-libvirt-start-interval-0s)
               stop interval=0s timeout=60 (etc-libvirt-stop-interval-0s)
 Clone: images-clone
  Meta Attrs: clone-max=2 interleave=true 
  Resource: images (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
   Operations: monitor interval=30s (images-monitor-interval-30s)
               start interval=0s timeout=60 (images-start-interval-0s)
               stop interval=0s timeout=60 (images-stop-interval-0s)
 Resource: R-pool-10-34-69-100 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/pool-10-34-69-100.xml hypervisor=qemu:///system
  Meta Attrs: remote-node=pool-10-34-69-100 
  Utilization: cpu=2 hv_memory=1024
  Operations: monitor interval=10 timeout=30 (R-pool-10-34-69-100-monitor-interval-10)
              start interval=0s timeout=90 (R-pool-10-34-69-100-start-interval-0s)
              stop interval=0s timeout=90 (R-pool-10-34-69-100-stop-interval-0s)
 Resource: dummy (class=ocf provider=heartbeat type=Dummy)
  Operations: monitor interval=10 timeout=20 (dummy-monitor-interval-10)
              start interval=0s timeout=20 (dummy-start-interval-0s)
              stop interval=0s timeout=20 (dummy-stop-interval-0s)

Stonith Devices:
 Resource: fence-tardis-03 (class=stonith type=fence_ipmilan)
  Attributes: delay=5 ipaddr=tardis-03-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-03
  Operations: monitor interval=60s (fence-tardis-03-monitor-interval-60s)
 Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
  Attributes: ipaddr=tardis-01-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-01
  Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: clvmd-clone
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-69-100--INFINITY)
  Resource: dlm-clone
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-69-100--INFINITY)
  Resource: etc-libvirt-clone
    Enabled on: tardis-03.ipv4 (score:INFINITY) (id:location-etc-libvirt-clone-tardis-03.ipv4-INFINITY)
    Enabled on: tardis-01.ipv4 (score:INFINITY) (id:location-etc-libvirt-clone-tardis-01.ipv4-INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-etc-libvirt-clone-pool-10-34-69-100--INFINITY)
  Resource: images-clone
    Enabled on: tardis-03.ipv4 (score:INFINITY) (id:location-images-clone-tardis-03.ipv4-INFINITY)
    Enabled on: tardis-01.ipv4 (score:INFINITY) (id:location-images-clone-tardis-01.ipv4-INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-images-clone-pool-10-34-69-100--INFINITY)
  Resource: shared-vg-clone
    Enabled on: tardis-03.ipv4 (score:INFINITY) (id:location-shared-vg-clone-tardis-03.ipv4-INFINITY)
    Enabled on: tardis-01.ipv4 (score:INFINITY) (id:location-shared-vg-clone-tardis-01.ipv4-INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-shared-vg-clone-pool-10-34-69-100--INFINITY)
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory)
  start clvmd-clone then start shared-vg-clone (kind:Mandatory)
  start shared-vg-clone then start etc-libvirt-clone (kind:Mandatory)
  start shared-vg-clone then start images-clone (kind:Mandatory)
  start etc-libvirt-clone then start R-pool-10-34-69-100 (kind:Mandatory)
  start images-clone then start R-pool-10-34-69-100 (kind:Mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
  shared-vg-clone with clvmd-clone (score:INFINITY)
  images-clone with shared-vg-clone (score:INFINITY)
  etc-libvirt-clone with shared-vg-clone (score:INFINITY)
  R-pool-10-34-69-100 with images-clone (score:INFINITY)
  R-pool-10-34-69-100 with etc-libvirt-clone (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS27159
 dc-version: 1.1.16-9.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1494858450
 no-quorum-policy: freeze

Quorum:
  Options:

>> (3) pcs status (hosts having fqdn)
[root@tardis-01 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (version 1.1.16-2.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 14:05:24 2017
Last change: Tue May 16 14:03:33 2017 by root via cibadmin on tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com

3 nodes configured
17 resources configured

Online: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
GuestOnline: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]

Full list of resources:

 fence-tardis-03        (stonith:fence_ipmilan):        Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
 fence-tardis-01        (stonith:fence_ipmilan):        Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
 Clone Set: dlm-clone [dlm]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
     Stopped: [ pool-10-34-69-57 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
     Stopped: [ pool-10-34-69-57 ]
 Clone Set: shared-vg-clone [shared-vg]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: etc-libvirt-clone [etc-libvirt]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: images-clone [images]
     Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
 R-pool-10-34-69-57     (ocf::heartbeat:VirtualDomain): Started tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
 dummy  (ocf::heartbeat:Dummy): Started pool-10-34-69-57

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

>> (4) pcs config (hosts having fqdn)
[root@tardis-01 ~]# pcs config
Cluster Name: STSRHTS27159
Corosync Nodes:
 tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
 tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Clone: shared-vg-clone
  Meta Attrs: clone-max=2 interleave=true 
  Resource: shared-vg (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=false partial_activation=false volgrpname=shared
   Operations: monitor interval=10 timeout=30 (shared-vg-monitor-interval-10)
               start interval=0s timeout=30 (shared-vg-start-interval-0s)
               stop interval=0s timeout=30 (shared-vg-stop-interval-0s)
 Clone: etc-libvirt-clone
  Meta Attrs: clone-max=2 interleave=true 
  Resource: etc-libvirt (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
   Operations: monitor interval=30s (etc-libvirt-monitor-interval-30s)
               start interval=0s timeout=60 (etc-libvirt-start-interval-0s)
               stop interval=0s timeout=60 (etc-libvirt-stop-interval-0s)
 Clone: images-clone
  Meta Attrs: clone-max=2 interleave=true 
  Resource: images (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
   Operations: monitor interval=30s (images-monitor-interval-30s)
               start interval=0s timeout=60 (images-start-interval-0s)
               stop interval=0s timeout=60 (images-stop-interval-0s)
 Resource: R-pool-10-34-69-57 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/pool-10-34-69-57.xml hypervisor=qemu:///system
  Meta Attrs: remote-node=pool-10-34-69-57 
  Utilization: cpu=2 hv_memory=1024
  Operations: monitor interval=10 timeout=30 (R-pool-10-34-69-57-monitor-interval-10)
              start interval=0s timeout=90 (R-pool-10-34-69-57-start-interval-0s)
              stop interval=0s timeout=90 (R-pool-10-34-69-57-stop-interval-0s)
 Resource: dummy (class=ocf provider=heartbeat type=Dummy)
  Operations: monitor interval=10 timeout=20 (dummy-monitor-interval-10)
              start interval=0s timeout=20 (dummy-start-interval-0s)
              stop interval=0s timeout=20 (dummy-stop-interval-0s)

Stonith Devices:
 Resource: fence-tardis-03 (class=stonith type=fence_ipmilan)
  Attributes: delay=5 ipaddr=tardis-03-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-03
  Operations: monitor interval=60s (fence-tardis-03-monitor-interval-60s)
 Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
  Attributes: ipaddr=tardis-01-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-01
  Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: clvmd-clone
    Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-69-57--INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-69-100--INFINITY)
  Resource: dlm-clone
    Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-69-57--INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-69-100--INFINITY)
  Resource: etc-libvirt-clone
    Enabled on: tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-etc-libvirt-clone-tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Enabled on: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-etc-libvirt-clone-tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-etc-libvirt-clone-pool-10-34-69-57--INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-etc-libvirt-clone-pool-10-34-69-100--INFINITY)
  Resource: images-clone
    Enabled on: tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-images-clone-tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Enabled on: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-images-clone-tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-images-clone-pool-10-34-69-57--INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-images-clone-pool-10-34-69-100--INFINITY)
  Resource: shared-vg-clone
    Enabled on: tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-shared-vg-clone-tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Enabled on: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-shared-vg-clone-tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-shared-vg-clone-pool-10-34-69-57--INFINITY)
    Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-shared-vg-clone-pool-10-34-69-100--INFINITY)
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory)
  start clvmd-clone then start shared-vg-clone (kind:Mandatory)
  start shared-vg-clone then start etc-libvirt-clone (kind:Mandatory)
  start shared-vg-clone then start images-clone (kind:Mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
  shared-vg-clone with clvmd-clone (score:INFINITY)
  images-clone with shared-vg-clone (score:INFINITY)
  etc-libvirt-clone with shared-vg-clone (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS27159
 dc-version: 1.1.16-2.el7-94ff4df
 have-watchdog: false
 no-quorum-policy: freeze

Quorum:
  Options:

Comment 24 errata-xmlrpc 2017-08-01 17:54:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1862

Note You need to log in before you can comment on or make changes to this bug.