Bug 1417936
| Summary: | When deploying a cluster using short hostnames, resources running via pacemaker-remote won't work correctly if the remote's hostname is a FQDN | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michele Baldessari <michele> | |
| Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.3 | CC: | abeekhof, cfeist, cluster-maint, dciabrin, fdinitto, jherrman, jkortus, kgaillot, mnovacek | |
| Target Milestone: | rc | Keywords: | Reopened, ZStream | |
| Target Release: | 7.4 | |||
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | pacemaker-1.1.16-3.el7 | Doc Type: | Bug Fix | |
| Doc Text: |
Prior to this update, if a resource agent used the crm_node command to obtain the node name, the resource agent sometimes received incorrect information if it was running on a Pacemaker remote node. This negatively affected the functionality of resource agents that use the node name. Now, Pacemaker automatically sets an environment variable with the node name, and crm_node uses this variable when it is available. As a result, the described problem no longer occurs.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1420426 (view as bug list) | Environment: | ||
| Last Closed: | 2017-08-01 17:54:39 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1420426 | |||
Note, that we could then simply ocf_local_nodename() to make use of this variable when it exists, so that changes to any RA won't actually be necessary. Thinking more, it might make sense to have crm_node itself look for PCMK_NODENAME so that anyone calling it directly from an agent will be assured of getting the "right" value. Bug 1374175 is essentially the same as this. The proposed solution there is to move some of crm_node's intelligence to the daemons, so that crm_node does not need to be linked against libcrmcluster (which should not be a requirement on remote nodes). crm_node would ask a local daemon what the node name is. It would be possible to use an environment variable as suggested here instead. The cluster would set the variable when calling the agent, and crm_node (called by ocf_local_nodename) would use it if present. However the other bug is more general, and deals with crm_node called in any context, not just resource agents. I'll see if I can raise the priority on that, but the development cycle for 7.4 is quite short. *** This bug has been marked as a duplicate of bug 1374175 *** I don't think OSP can wait for 7.5 Due to other constraints, all OSP deployments are deployed affected by this - preventing us from running any agent that uses attrd (galera, etc) on remote nodes. [12:08 PM] beekhof@fedora ~/Development/sources/pacemaker/devel ☺ # git diff lib pengine/graph.c tools
diff --git a/lib/common/utils.c b/lib/common/utils.c
index 83072c5..3e3abd3 100644
--- a/lib/common/utils.c
+++ b/lib/common/utils.c
@@ -894,6 +894,8 @@ filter_action_parameters(xmlNode * param_set, const char *version)
XML_ATTR_ID,
XML_ATTR_CRM_VERSION,
XML_LRM_ATTR_OP_DIGEST,
+ XML_LRM_ATTR_TARGET,
+ XML_LRM_ATTR_TARGET_UUID,
};
gboolean do_delete = FALSE;
diff --git a/pengine/graph.c b/pengine/graph.c
index 569cf6e..81d8355 100644
--- a/pengine/graph.c
+++ b/pengine/graph.c
@@ -948,6 +948,9 @@ action2xml(action_t * action, gboolean as_input, pe_working_set_t *data_set)
if (router_node) {
crm_xml_add(action_xml, XML_LRM_ATTR_ROUTER_NODE, router_node->details->uname);
}
+
+ g_hash_table_insert(action->meta, strdup(XML_LRM_ATTR_TARGET), strdup(action->node->details->uname));
+ g_hash_table_insert(action->meta, strdup(XML_LRM_ATTR_TARGET_UUID), strdup(action->node->details->id));
}
/* No details if this action is only being listed in the inputs section */
diff --git a/tools/crm_node.c b/tools/crm_node.c
index d927f31..a76e550 100644
--- a/tools/crm_node.c
+++ b/tools/crm_node.c
@@ -951,7 +951,11 @@ main(int argc, char **argv)
}
if (command == 'n') {
- fprintf(stdout, "%s\n", get_local_node_name());
+ const char *name = getenv(CRM_META"_"XML_LRM_ATTR_TARGET);
+ if(name == NULL) {
+ name = get_local_node_name();
+ }
+ fprintf(stdout, "%s\n", name);
crm_exit(pcmk_ok);
} else if (command == 'N') {
We can keep the BZs separate -- this one can address the immediate workaround, and Bug 1374175 can address the longer-term fix. The difference is that the fix here (in Comment 6) only applies when crm_node is run by a resource agent via the cluster. It does not do anything if crm_node (or the resource agent) is run from the command-line. Fixed upstream by commit e0eb9e7
I have verified that remote node can run resource when specified with both
short name and fqdn with pacemaker-1.1.16-9.el7.x86_64
---
Common setup (two cases):
1/ create cluster with guest node running a resource, cluster nodes short and
guest nodes having short names [1], [2]
2/ create cluster with guest node running a resource, cluster nodes and guest
nodes having fqdn names [3], [4]
before the patch pacemaker-1.1.16-2.el7.x86_64
==============================================
[root@tardis-03 ~]# pcs resource update R-pool-10-34-69-100 meta \
remote-node=pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com
(wait for pacemaker to recognize taht it will not start on either node)
[root@tardis-01 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4 (version 1.1.16-2.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 11:35:42 2017
Last change: Tue May 16 11:31:37 2017 by hacluster via crmd on tardis-01.ipv4
3 nodes configured
17 resources configured
Online: [ tardis-01.ipv4 tardis-03.ipv4 ]
Full list of resources:
fence-tardis-03 (stonith:fence_ipmilan): Started tardis-01.ipv4
fence-tardis-01 (stonith:fence_ipmilan): Started tardis-03.ipv4
Clone Set: dlm-clone [dlm]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: clvmd-clone [clvmd]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: shared-vg-clone [shared-vg]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Clone Set: etc-libvirt-clone [etc-libvirt]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Clone Set: images-clone [images]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
>> R-pool-10-34-69-57 (ocf::heartbeat:VirtualDomain): Stopped
>> dummy (ocf::heartbeat:Dummy): Started tardis-01.ipv4
Failed Actions:
>> * pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com_start_0 on tardis-01.ipv4 'unknown error' (1): call=16, status=Timed Out, exitreason='none',
last-rc-change='Tue May 16 11:31:39 2017', queued=0ms, exec=0ms
>> * pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com_start_0 on tardis-03.ipv4 'unknown error' (1): call=1, status=Timed Out, exitreason='none',
last-rc-change='Tue May 16 11:34:04 2017', queued=0ms, exec=0ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
after the patch pacemaker-1.1.16-9.el7.x86_64
=============================================
>> 1) hosts having short names
> update guest node to have fqdn as remote-node identifier
[root@tardis-03 ~]# pcs resource update R-pool-10-34-69-100 meta \
remote-node=pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com
[root@tardis-03 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4 (version 1.1.16-9.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 09:30:47 2017
Last change: Tue May 16 09:29:50 2017 by root via cibadmin on tardis-03.ipv4
3 nodes configured
17 resources configured
Online: [ tardis-01.ipv4 tardis-03.ipv4 ]
>> GuestOnline: [ pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com ]
Full list of resources:
fence-tardis-03 (stonith:fence_ipmilan): Started tardis-01.ipv4
fence-tardis-01 (stonith:fence_ipmilan): Started tardis-01.ipv4
Clone Set: dlm-clone [dlm]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Stopped: [ pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: clvmd-clone [clvmd]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Stopped: [ pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: shared-vg-clone [shared-vg]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Clone Set: etc-libvirt-clone [etc-libvirt]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Clone Set: images-clone [images]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
R-pool-10-34-69-100 (ocf::heartbeat:VirtualDomain): Started tardis-03.ipv4
>> dummy (ocf::heartbeat:Dummy): Started pool-10-34-69-100.cluster-qe.lab.eng.brq.redhat.com
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
>> 2) hosts having fully qualified domain names
[root@tardis-01 ~]# pcs resource update R-pool-10-34-69-57 meta \
remote-node=pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com
[root@tardis-01 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (version 1.1.16-2.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 14:09:49 2017
Last change: Tue May 16 14:09:01 2017 by root via cibadmin on tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
3 nodes configured
17 resources configured
Online: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
> GuestOnline: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com.cluster-qe.lab.eng.brq.redhat.com ]
Full list of resources:
fence-tardis-03 (stonith:fence_ipmilan): Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
fence-tardis-01 (stonith:fence_ipmilan): Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
Clone Set: dlm-clone [dlm]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: clvmd-clone [clvmd]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Stopped: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: shared-vg-clone [shared-vg]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: etc-libvirt-clone [etc-libvirt]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: images-clone [images]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
R-pool-10-34-69-57 (ocf::heartbeat:VirtualDomain): Started tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
> dummy (ocf::heartbeat:Dummy): Started pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
-----
>>> (1) pcs status (hosts having short names)
[root@tardis-03 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4 (version 1.1.16-9.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 09:26:29 2017
Last change: Tue May 16 09:25:28 2017 by root via cibadmin on tardis-03.ipv4
3 nodes configured
17 resources configured
Online: [ tardis-01.ipv4 tardis-03.ipv4 ]
GuestOnline: [ pool-10-34-69-100 ]
Full list of resources:
fence-tardis-03 (stonith:fence_ipmilan): Started tardis-01.ipv4
fence-tardis-01 (stonith:fence_ipmilan): Started tardis-01.ipv4
Clone Set: dlm-clone [dlm]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Stopped: [ pool-10-34-69-100 ]
Clone Set: clvmd-clone [clvmd]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Stopped: [ pool-10-34-69-100 ]
Clone Set: shared-vg-clone [shared-vg]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Clone Set: etc-libvirt-clone [etc-libvirt]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
Clone Set: images-clone [images]
Started: [ tardis-01.ipv4 tardis-03.ipv4 ]
R-pool-10-34-69-100 (ocf::heartbeat:VirtualDomain): Started tardis-03.ipv4
dummy (ocf::heartbeat:Dummy): Started pool-10-34-69-100
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
>> (2) pcs config (hosts having short names)
[root@tardis-03 ~]# pcs config
Cluster Name: STSRHTS27159
Corosync Nodes:
tardis-03.ipv4 tardis-01.ipv4
Pacemaker Nodes:
tardis-01.ipv4 tardis-03.ipv4
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
Clone: shared-vg-clone
Meta Attrs: clone-max=2 interleave=true
Resource: shared-vg (class=ocf provider=heartbeat type=LVM)
Attributes: exclusive=false partial_activation=false volgrpname=shared
Operations: monitor interval=10 timeout=30 (shared-vg-monitor-interval-10)
start interval=0s timeout=30 (shared-vg-start-interval-0s)
stop interval=0s timeout=30 (shared-vg-stop-interval-0s)
Clone: etc-libvirt-clone
Meta Attrs: clone-max=2 interleave=true
Resource: etc-libvirt (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
Operations: monitor interval=30s (etc-libvirt-monitor-interval-30s)
start interval=0s timeout=60 (etc-libvirt-start-interval-0s)
stop interval=0s timeout=60 (etc-libvirt-stop-interval-0s)
Clone: images-clone
Meta Attrs: clone-max=2 interleave=true
Resource: images (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
Operations: monitor interval=30s (images-monitor-interval-30s)
start interval=0s timeout=60 (images-start-interval-0s)
stop interval=0s timeout=60 (images-stop-interval-0s)
Resource: R-pool-10-34-69-100 (class=ocf provider=heartbeat type=VirtualDomain)
Attributes: config=/etc/libvirt/qemu/pool-10-34-69-100.xml hypervisor=qemu:///system
Meta Attrs: remote-node=pool-10-34-69-100
Utilization: cpu=2 hv_memory=1024
Operations: monitor interval=10 timeout=30 (R-pool-10-34-69-100-monitor-interval-10)
start interval=0s timeout=90 (R-pool-10-34-69-100-start-interval-0s)
stop interval=0s timeout=90 (R-pool-10-34-69-100-stop-interval-0s)
Resource: dummy (class=ocf provider=heartbeat type=Dummy)
Operations: monitor interval=10 timeout=20 (dummy-monitor-interval-10)
start interval=0s timeout=20 (dummy-start-interval-0s)
stop interval=0s timeout=20 (dummy-stop-interval-0s)
Stonith Devices:
Resource: fence-tardis-03 (class=stonith type=fence_ipmilan)
Attributes: delay=5 ipaddr=tardis-03-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-03
Operations: monitor interval=60s (fence-tardis-03-monitor-interval-60s)
Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
Attributes: ipaddr=tardis-01-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-01
Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: clvmd-clone
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-69-100--INFINITY)
Resource: dlm-clone
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-69-100--INFINITY)
Resource: etc-libvirt-clone
Enabled on: tardis-03.ipv4 (score:INFINITY) (id:location-etc-libvirt-clone-tardis-03.ipv4-INFINITY)
Enabled on: tardis-01.ipv4 (score:INFINITY) (id:location-etc-libvirt-clone-tardis-01.ipv4-INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-etc-libvirt-clone-pool-10-34-69-100--INFINITY)
Resource: images-clone
Enabled on: tardis-03.ipv4 (score:INFINITY) (id:location-images-clone-tardis-03.ipv4-INFINITY)
Enabled on: tardis-01.ipv4 (score:INFINITY) (id:location-images-clone-tardis-01.ipv4-INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-images-clone-pool-10-34-69-100--INFINITY)
Resource: shared-vg-clone
Enabled on: tardis-03.ipv4 (score:INFINITY) (id:location-shared-vg-clone-tardis-03.ipv4-INFINITY)
Enabled on: tardis-01.ipv4 (score:INFINITY) (id:location-shared-vg-clone-tardis-01.ipv4-INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-shared-vg-clone-pool-10-34-69-100--INFINITY)
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory)
start clvmd-clone then start shared-vg-clone (kind:Mandatory)
start shared-vg-clone then start etc-libvirt-clone (kind:Mandatory)
start shared-vg-clone then start images-clone (kind:Mandatory)
start etc-libvirt-clone then start R-pool-10-34-69-100 (kind:Mandatory)
start images-clone then start R-pool-10-34-69-100 (kind:Mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY)
shared-vg-clone with clvmd-clone (score:INFINITY)
images-clone with shared-vg-clone (score:INFINITY)
etc-libvirt-clone with shared-vg-clone (score:INFINITY)
R-pool-10-34-69-100 with images-clone (score:INFINITY)
R-pool-10-34-69-100 with etc-libvirt-clone (score:INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS27159
dc-version: 1.1.16-9.el7-94ff4df
have-watchdog: false
last-lrm-refresh: 1494858450
no-quorum-policy: freeze
Quorum:
Options:
>> (3) pcs status (hosts having fqdn)
[root@tardis-01 ~]# pcs status
Cluster name: STSRHTS27159
Stack: corosync
Current DC: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (version 1.1.16-2.el7-94ff4df) - partition with quorum
Last updated: Tue May 16 14:05:24 2017
Last change: Tue May 16 14:03:33 2017 by root via cibadmin on tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
3 nodes configured
17 resources configured
Online: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
GuestOnline: [ pool-10-34-69-57.cluster-qe.lab.eng.brq.redhat.com ]
Full list of resources:
fence-tardis-03 (stonith:fence_ipmilan): Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
fence-tardis-01 (stonith:fence_ipmilan): Started tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
Clone Set: dlm-clone [dlm]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Stopped: [ pool-10-34-69-57 ]
Clone Set: clvmd-clone [clvmd]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Stopped: [ pool-10-34-69-57 ]
Clone Set: shared-vg-clone [shared-vg]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: etc-libvirt-clone [etc-libvirt]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
Clone Set: images-clone [images]
Started: [ tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com ]
R-pool-10-34-69-57 (ocf::heartbeat:VirtualDomain): Started tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
dummy (ocf::heartbeat:Dummy): Started pool-10-34-69-57
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
>> (4) pcs config (hosts having fqdn)
[root@tardis-01 ~]# pcs config
Cluster Name: STSRHTS27159
Corosync Nodes:
tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
Clone: shared-vg-clone
Meta Attrs: clone-max=2 interleave=true
Resource: shared-vg (class=ocf provider=heartbeat type=LVM)
Attributes: exclusive=false partial_activation=false volgrpname=shared
Operations: monitor interval=10 timeout=30 (shared-vg-monitor-interval-10)
start interval=0s timeout=30 (shared-vg-start-interval-0s)
stop interval=0s timeout=30 (shared-vg-stop-interval-0s)
Clone: etc-libvirt-clone
Meta Attrs: clone-max=2 interleave=true
Resource: etc-libvirt (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
Operations: monitor interval=30s (etc-libvirt-monitor-interval-30s)
start interval=0s timeout=60 (etc-libvirt-start-interval-0s)
stop interval=0s timeout=60 (etc-libvirt-stop-interval-0s)
Clone: images-clone
Meta Attrs: clone-max=2 interleave=true
Resource: images (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
Operations: monitor interval=30s (images-monitor-interval-30s)
start interval=0s timeout=60 (images-start-interval-0s)
stop interval=0s timeout=60 (images-stop-interval-0s)
Resource: R-pool-10-34-69-57 (class=ocf provider=heartbeat type=VirtualDomain)
Attributes: config=/etc/libvirt/qemu/pool-10-34-69-57.xml hypervisor=qemu:///system
Meta Attrs: remote-node=pool-10-34-69-57
Utilization: cpu=2 hv_memory=1024
Operations: monitor interval=10 timeout=30 (R-pool-10-34-69-57-monitor-interval-10)
start interval=0s timeout=90 (R-pool-10-34-69-57-start-interval-0s)
stop interval=0s timeout=90 (R-pool-10-34-69-57-stop-interval-0s)
Resource: dummy (class=ocf provider=heartbeat type=Dummy)
Operations: monitor interval=10 timeout=20 (dummy-monitor-interval-10)
start interval=0s timeout=20 (dummy-start-interval-0s)
stop interval=0s timeout=20 (dummy-stop-interval-0s)
Stonith Devices:
Resource: fence-tardis-03 (class=stonith type=fence_ipmilan)
Attributes: delay=5 ipaddr=tardis-03-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-03
Operations: monitor interval=60s (fence-tardis-03-monitor-interval-60s)
Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
Attributes: ipaddr=tardis-01-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=tardis-01
Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: clvmd-clone
Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-69-57--INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-69-100--INFINITY)
Resource: dlm-clone
Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-69-57--INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-69-100--INFINITY)
Resource: etc-libvirt-clone
Enabled on: tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-etc-libvirt-clone-tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
Enabled on: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-etc-libvirt-clone-tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-etc-libvirt-clone-pool-10-34-69-57--INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-etc-libvirt-clone-pool-10-34-69-100--INFINITY)
Resource: images-clone
Enabled on: tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-images-clone-tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
Enabled on: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-images-clone-tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-images-clone-pool-10-34-69-57--INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-images-clone-pool-10-34-69-100--INFINITY)
Resource: shared-vg-clone
Enabled on: tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-shared-vg-clone-tardis-03.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
Enabled on: tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-shared-vg-clone-tardis-01.ipv4.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
Disabled on: pool-10-34-69-57 (score:-INFINITY) (id:location-shared-vg-clone-pool-10-34-69-57--INFINITY)
Disabled on: pool-10-34-69-100 (score:-INFINITY) (id:location-shared-vg-clone-pool-10-34-69-100--INFINITY)
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory)
start clvmd-clone then start shared-vg-clone (kind:Mandatory)
start shared-vg-clone then start etc-libvirt-clone (kind:Mandatory)
start shared-vg-clone then start images-clone (kind:Mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY)
shared-vg-clone with clvmd-clone (score:INFINITY)
images-clone with shared-vg-clone (score:INFINITY)
etc-libvirt-clone with shared-vg-clone (score:INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS27159
dc-version: 1.1.16-2.el7-94ff4df
have-watchdog: false
no-quorum-policy: freeze
Quorum:
Options:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1862 |
Description of problem: (NB: This bug is being filed after a discussion with Andrew, Damien and myself) In OSP we deploy a pacemaker cluster using the short hostname since the dawn of time. So no matter if the hostname is set to controller-0 or its fqdn counterpart (controller-0.localdomain), it all works correctly. The problem starts when we add a pacemaker remote resource (for consistency reasons we add it via the short hostname '/usr/sbin/pcs resource create overcloud-galera-0 remote server=172.17.0.10 reconnect_interval=60 op monitor interval=20') [1]. Now the problem is that most resource agents that make use of the NODENAME environment variable cannot work on the remote node. The reason for this is mainly here: ocf_local_nodename() in ocf-shellfuncs where we do: ocf_local_nodename() { # use crm_node -n for pacemaker > 1.1.8 which pacemakerd > /dev/null 2>&1 if [ $? -eq 0 ]; then local version=$(pacemakerd -$ | grep "Pacemaker .*" | awk '{ print $2 }') version=$(echo $version | awk -F- '{ print $1 }') ocf_version_cmp "$version" "1.1.8" if [ $? -eq 2 ]; then which crm_node > /dev/null 2>&1 if [ $? -eq 0 ]; then crm_node -n return fi fi fi # otherwise use uname -n uname -n } I say 'mainly' because the same kind of code can be found in lib/cluster/cluster.c:get_local_node_name() -> get_node_name(0). The problem is that NODENAME on the remote nodes will be the FQDN hostname which is not known to pacemaker. So both setting per-node attributes *and* relying on NODENAME like galera does will break. Andrew suggested a backwards-compatible change that adds another environment variable that gets exported by pacemaker which always contains the name of the node as known by pacemaker. (PCMK_NODENAME, maybe?) and does not make use of any uname/hostname calls. Resource agents can then be tweaked to make use of this variable, if it exists and thereby avoiding this issue entirely. [1] Note that just adding the remote as a fqdn name, like '/usr/sbin/pcs resource create overcloud-galera-0.localdomain remote server=172.17.0.10 reconnect_interval=60 op monitor interval=20', won't really solve things. Take the galera example: We have to add to the galera RA all the galera node names in an RA metaparameter and we could not really start passing the short hostname if it is a corosync node and an fqdn if it is a remote node. [2] Deploying everything via FQDNs (so both corosync nodes and pacemaker-remote node) is not possible because we would not be able to manage upgrades in any sensible way.