Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1514492

Summary:	regression in pacemaker-1.1.16-12.el7_4.4.x86_64 / setup with remote-nodes not working anymore
Product:	Red Hat Enterprise Linux 7	Reporter:	Renaud Marigny <rmarigny>
Component:	pacemaker	Assignee:	Ken Gaillot <kgaillot>
Status:	CLOSED ERRATA	QA Contact:	Patrik Hagara <phagara>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.4	CC:	abeekhof, bruno.travouillon, cluster-maint, jruemker, kgaillot, mnovacek, phagara, sbradley
Target Milestone:	rc
Target Release:	7.6
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	pacemaker-1.1.18-13.el7	Doc Type:	No Doc Update
Doc Text:	The release note for Bug 1489728 should be sufficient to cover the issue here.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-30 07:57:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1609081

Comment 3 Bruno Travouillon 2017-11-18 10:30:05 UTC

I have been able to reproduce this issue with the following configuration:

----8<----
[root@support0 ~]# pcs config show
Cluster Name: supportHA
Corosync Nodes:
 support0.lab.local
Pacemaker Nodes:
 support0.lab.local

Resources:
 Resource: vm-cli1 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/cli1.xml
  Meta Attrs: remote-node=cli1
  Utilization: cpu=1 hv_memory=2048
  Operations: stop interval=0s timeout=90 (vm-cli1-stop-interval-0s)
              monitor interval=30s (vm-cli1-monitor-interval-30s)
              start interval=0s timeout=120 (vm-cli1-start-interval-0s)
 Resource: vg1A (class=ocf provider=heartbeat type=LVM)
  Attributes: volgrpname=vg1A
  Operations: start interval=0s timeout=30 (vg1A-start-interval-0s)
              stop interval=0s timeout=30 (vg1A-stop-interval-0s)
              monitor interval=10 timeout=30 (vg1A-monitor-interval-10)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: vg1A
    Constraint: location-vg1A
      Rule: score=-INFINITY (id:location-vg1A-rule)
        Expression: #kind eq container (id:location-vg1A-rule-expr)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: supportHA
 dc-version: 1.1.16-12.el7_4.4.debug-94ff4df
 have-watchdog: false
 stonith-enabled: false

Quorum:
  Options:
----8<----

The issue is related to the container trying to monitor the vg1A resource while the location constraint score is -INFINITY.

In the corosync.log, we can see the following message:
Nov 17 13:40:03 [28056] support0.lab.local crmd: notice: te_rsc_command: Initiating monitor operation vg1A_monitor_0 locally on cli1 | action 4

This regression has been introduced in commit 12d453cc where the skip of active resource detection on container is removed.

----8<----
diff --git a/pengine/native.c b/pengine/native.c
index 37cf541..2e40a4c 100644
--- a/pengine/native.c
+++ b/pengine/native.c
@@ -2784,10 +2784,6 @@ native_create_probe(resource_t * rsc, node_t * node, action_t * complete,
     if (force == FALSE && is_not_set(data_set->flags, pe_flag_startup_probes)) {
         pe_rsc_trace(rsc, "Skipping active resource detection for %s", rsc->id);
         return FALSE;
-    } else if (force == FALSE && is_container_remote_node(node)) {
-        pe_rsc_trace(rsc, "Skipping active resource detection for %s on container %s",
-                     rsc->id, node->details->id);
-        return FALSE;
     }
 
     if (is_remote_node(node)) {
----8<----

When reverting this change on top of pacemaker-1.1.16-12.el7_4.4, the crmd don't try to initiate the monitoring of the vg1A resource anymore.

After some research in the pacemaker.git history, it looks like the patch is legit. The monitor action in the log does not deal with standard resource monitoring but with probes (one-time monitor operation). There is a location property to disable the resource discovery: resource-discovery=never.

With the following change in the configuration, I can't reproduce the issue with pacemaker-1.1.16-12.el7_4.4.

----8<----
# pcs constraint location remove location-vg1A
# pcs constraint location vg1A rule resource-discovery=never score=-INFINITY '#kind' eq container
# pcs constraint location show --full
Location Constraints:
  Resource: vg1A
    Constraint: location-vg1A (resource-discovery=never)
      Rule: score=-INFINITY (id:location-vg1A-rule)
        Expression: #kind eq container (id:location-vg1A-rule-expr)
----8<----

Comment 4 Ken Gaillot 2017-11-20 16:48:58 UTC

(In reply to Bruno Travouillon from comment #3)
> After some research in the pacemaker.git history, it looks like the patch is
> legit. The monitor action in the log does not deal with standard resource
> monitoring but with probes (one-time monitor operation). There is a location
> property to disable the resource discovery: resource-discovery=never.
> 
> With the following change in the configuration, I can't reproduce the issue
> with pacemaker-1.1.16-12.el7_4.4.

Yes, you have it exactly right here.

This was a long-planned fix for a limitation of guest nodes -- they didn't get probed at resource start-up like other nodes. As you saw, the probes are not attempts to start the resource, but attempts to determine the current status. They allow Pacemaker to ensure that resources aren't running where they're not supposed to be, and to properly re-detect resources that have been cleaned up.

Unfortunately, we did not consider cases where users would be relying on the absence of probes.

The good news is that the solution you found, setting resource-discovery=never, is exactly the right answer. That tells Pacemaker not to probe the resource in the constraint on that node, and is intended to be used in situations like this, where the software is not installed on the node, so probing isn't necessary.

I will look into what we can do to prevent people from getting bit by this, but the new behavior fixes other important scenarios, so it's unlikely we'll revert it.

Thanks for discovering, reporting, and investigating this issue.

Comment 5 Ken Gaillot 2017-11-20 22:07:42 UTC

We will put a release note in 7.5 about the issue, and also for 7.5 (if approved), we will make Pacemaker log a warning the first time any probe fails, like:

warning: Processing failed op monitor for rsc1 on node1: unknown error
(1)
warning: If it is not possible for rsc1 to run on node1, see the
resource-discovery option for location constraints

This will probably not be backported to 7.4.

Comment 6 Ken Gaillot 2017-11-20 22:15:24 UTC

QA: Test procedure:

1. Configure a cluster of at least one cluster node and one guest node.
2. Configure a resource that can run on the cluster node, but requires software that isn't installed on the guest node.
3. Configure a location constraint banning the resource from the guest node (omitting the resource-discovery option).
4. Start the cluster.

Using the 1.1.18-6 or earlier packages for 7.5, or the 1.1.16-12.4 or 1.1.16-12.5 packages for 7.4, cluster status will show a failed monitor for the resource on the guest node, and the logs will show a warning about the failed monitor. After the fix here, the behavior will be the same, but there will be an additional log message referring users to the resource-discovery option, the first (and only the first) time the monitor fails.

Comment 7 Ken Gaillot 2017-11-22 15:20:12 UTC

The log message (covered by this bz) will be done for 7.6 due to a tight schedule for 7.5. However the release note about the issue (covered by Bug 1489728) will be for 7.5.

Comment 12 Ken Gaillot 2018-05-02 23:22:31 UTC

(In reply to Ken Gaillot from comment #6)
> QA: Test procedure:

Updated ...

> 1. Configure a cluster of at least one cluster node and one guest node.
> 2. Configure a resource that can run on the cluster node, but requires
> software that isn't installed on the guest node.
> 3. Configure a location constraint banning the resource from the guest node
> (omitting the resource-discovery option).
> 4. Start the cluster.
> 
> Using the 1.1.18-6 or earlier packages for 7.5, or the 1.1.16-12.4 or
> 1.1.16-12.5 packages for 7.4, cluster status will show a failed monitor for
> the resource on the guest node, and the logs will show a warning about the
> failed monitor. After the fix here, the behavior will be the same, but there
> will be an additional log message referring users to the resource-discovery

The new message will only appear in the DC's logs. It will appear every time the failure is processed, however it will only be logged for actual failed probes (as opposed to unexpected running/stopped status). The message will be something like:

warning: Processing failed probe of rsc1 on node1: some error here
notice: If it is not possible for rsc1 to run on node1, see the resource-discovery option for location constraints

Comment 13 Ken Gaillot 2018-05-07 20:16:11 UTC

After investigating further, I realized that most resource agents return "not running" rather than a failure when their respective software is not installed, and thus do not have this problem.

I also found the LVM resource agent supplied with 7.5 no longer returns a failure in this case, either.

Thus, to test, it is necessary to use the LVM resource agent from 7.4 (or any agent that can return a failure for probes).

The good news for LVM agent users is that an upgrade to 7.5 should fix the issue. I am not aware of any other agents that would return a failure in this situation, but there probably are some.

Comment 14 Ken Gaillot 2018-06-01 15:35:23 UTC

The log message is upstream as of commit 57800a92

Comment 16 Patrik Hagara 2018-08-16 15:59:33 UTC

environment: a single-node cluster + one remote node

before:
=======

Installed package versions:

> [root@virt-161 ~]# rpm -q pacemaker
> pacemaker-1.1.18-12.el7.x86_64
> [root@virt-161 ~]# ssh virt-162 rpm -q pacemaker-remote
> pacemaker-remote-1.1.18-12.el7.x86_64

Copy LVM resource agent from RHEL-7.4 (as per comment #13):

> [root@virt-161 ~]# cp LVM-agent-7.4 /usr/lib/ocf/resource.d/heartbeat/LVM
> cp: overwrite ‘/usr/lib/ocf/resource.d/heartbeat/LVM’? y
> [root@virt-161 ~]# scp LVM-agent-7.4 virt-162:/usr/lib/ocf/resource.d/heartbeat/LVM
> LVM-agent-7.4                                         100%   20KB  12.4MB/s   00:00

Create a (local) PV/VG/LV accessible only to the virt-161 cluster node:

> [root@virt-161 ~]# truncate --size 1G loop
> [root@virt-161 ~]# losetup -f loop
> [root@virt-161 ~]# losetup -l
> NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
> /dev/loop0         0      0         0  0 /root/loop
> [root@virt-161 ~]# pvcreate /dev/loop0
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   Physical volume "/dev/loop0" successfully created.
> [root@virt-161 ~]# vgcreate vg_test /dev/loop0
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   Volume group "vg_test" successfully created
> [root@virt-161 ~]# lvcreate -n lv_test -l +100%free vg_test
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   Logical volume "lv_test" created.

Create LVM cluster resource for the VG:

> [root@virt-161 ~]# pcs resource create vg ocf:heartbeat:LVM volgrpname=vg_test

Create a -INFINITY remote node location constraint for the LVM resource:

> [root@virt-161 ~]# pcs resource ban vg virt-162.cluster-qe.lab.eng.brq.redhat.com
> Warning: Creating location constraint cli-ban-vg-on-virt-162.cluster-qe.lab.eng.brq.redhat.com with a score of -INFINITY for resource vg on node virt-161.cluster-qe.lab.eng.brq.redhat.com.
> This will prevent vg from running on virt-162.cluster-qe.lab.eng.brq.redhat.com until the constraint is removed. This will be the case even if virt-162.cluster-qe.lab.eng.brq.redhat.com is the last node in the cluster.

Restart the cluster and examine cluster status:

> [root@virt-161 ~]# pcs cluster stop --all
> virt-161.cluster-qe.lab.eng.brq.redhat.com: Stopping Cluster (pacemaker)...
> virt-161.cluster-qe.lab.eng.brq.redhat.com: Stopping Cluster (corosync)...
> [root@virt-161 ~]# date
> Thu Aug 16 14:38:29 CEST 2018
> [root@virt-161 ~]# pcs cluster start --all --wait                              
> virt-161.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster (corosync)...
> virt-161.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster (pacemaker)...
> Waiting for node(s) to start...
> virt-161.cluster-qe.lab.eng.brq.redhat.com: Started
> [root@virt-161 ~]# pcs status
> Cluster name: bzzt
> Stack: corosync
> Current DC: virt-161.cluster-qe.lab.eng.brq.redhat.com (version 1.1.18-12.el7-2b07d5c5a9) - partition with quorum
> Last updated: Thu Aug 16 14:39:14 2018
> Last change: Tue Aug 14 13:12:48 2018 by root via crm_resource on virt-161.cluster-qe.lab.eng.brq.redhat.com
> 
> 2 nodes configured
> 2 resources configured
> 
> Online: [ virt-161.cluster-qe.lab.eng.brq.redhat.com ]
> RemoteOnline: [ virt-162.cluster-qe.lab.eng.brq.redhat.com ]
> 
> Full list of resources:
> 
>  virt-162.cluster-qe.lab.eng.brq.redhat.com	(ocf::pacemaker:remote):	Started virt-161.cluster-qe.lab.eng.brq.redhat.com
>  vg	(ocf::heartbeat:LVM):	Started virt-161.cluster-qe.lab.eng.brq.redhat.com
> 
> Failed Actions:
> * vg_monitor_0 on virt-162.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=2984, status=complete, exitreason='LVM Volume vg_test is not available',
>     last-rc-change='Thu Aug 16 14:39:09 2018', queued=0ms, exec=91ms
> 
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled

Wait a bit (cluster-recheck-interval, default 15 min) so that the probe on remote node triggers again and check DC logs:

> [root@virt-161 ~]# grep pengine: /var/log/cluster/corosync.log | cut -d' ' -f 3,6-
> 14:38:45    pengine:     info: crm_log_init:	Changed active directory to /var/lib/pacemaker/cores
> 14:38:45    pengine:     info: qb_ipcs_us_publish:	server name: pengine
> 14:38:45    pengine:     info: main:	Starting pengine
> 14:39:08    pengine:  warning: unpack_config:	Blind faith: not fencing unseen nodes
> 14:39:08    pengine:     info: determine_online_status:	Node virt-161.cluster-qe.lab.eng.brq.redhat.com is online
> 14:39:08    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:39:08    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:39:08    pengine:     info: common_print:	virt-162.cluster-qe.lab.eng.brq.redhat.com	(ocf::pacemaker:remote):	Stopped
> 14:39:08    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	Stopped
> 14:39:08    pengine:     info: RecurringOp:	 Start recurring monitor (60s) for virt-162.cluster-qe.lab.eng.brq.redhat.com on virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:08    pengine:     info: RecurringOp:	 Start recurring monitor (10s) for vg on virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:08    pengine:   notice: LogAction:	 * Start      virt-162.cluster-qe.lab.eng.brq.redhat.com     ( virt-161.cluster-qe.lab.eng.brq.redhat.com )  
> 14:39:08    pengine:   notice: LogAction:	 * Start      vg                                             ( virt-161.cluster-qe.lab.eng.brq.redhat.com )  
> 14:39:08    pengine:   notice: process_pe_message:	Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-24.bz2
> 14:39:09    pengine:     info: determine_online_status:	Node virt-161.cluster-qe.lab.eng.brq.redhat.com is online
> 14:39:09    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:39:09    pengine:     info: unpack_node_loop:	Node virt-162.cluster-qe.lab.eng.brq.redhat.com is already processed
> 14:39:09    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:39:09    pengine:     info: unpack_node_loop:	Node virt-162.cluster-qe.lab.eng.brq.redhat.com is already processed
> 14:39:09    pengine:     info: common_print:	virt-162.cluster-qe.lab.eng.brq.redhat.com	(ocf::pacemaker:remote):	Started virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:09    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	Stopped
> 14:39:09    pengine:     info: RecurringOp:	 Start recurring monitor (60s) for virt-162.cluster-qe.lab.eng.brq.redhat.com on virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:09    pengine:     info: RecurringOp:	 Start recurring monitor (10s) for vg on virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:09    pengine:     info: LogActions:	Leave   virt-162.cluster-qe.lab.eng.brq.redhat.com	(Started virt-161.cluster-qe.lab.eng.brq.redhat.com)
> 14:39:09    pengine:   notice: LogAction:	 * Start      vg                                             ( virt-161.cluster-qe.lab.eng.brq.redhat.com )  
> 14:39:09    pengine:   notice: process_pe_message:	Calculated transition 1, saving inputs in /var/lib/pacemaker/pengine/pe-input-25.bz2
> 14:39:09    pengine:     info: determine_online_status:	Node virt-161.cluster-qe.lab.eng.brq.redhat.com is online
> 14:39:09    pengine:  warning: unpack_rsc_op_failure:	Processing failed op monitor for vg on virt-162.cluster-qe.lab.eng.brq.redhat.com: unknown error (1)
> 14:39:09    pengine:  warning: unpack_rsc_op_failure:	Processing failed op monitor for vg on virt-162.cluster-qe.lab.eng.brq.redhat.com: unknown error (1)
> 14:39:09    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:39:09    pengine:     info: unpack_node_loop:	Node virt-162.cluster-qe.lab.eng.brq.redhat.com is already processed
> 14:39:09    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:39:09    pengine:     info: unpack_node_loop:	Node virt-162.cluster-qe.lab.eng.brq.redhat.com is already processed
> 14:39:09    pengine:     info: common_print:	virt-162.cluster-qe.lab.eng.brq.redhat.com	(ocf::pacemaker:remote):	Started virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:09    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	FAILED virt-162.cluster-qe.lab.eng.brq.redhat.com
> 14:39:09    pengine:     info: RecurringOp:	 Start recurring monitor (10s) for vg on virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:39:09    pengine:     info: LogActions:	Leave   virt-162.cluster-qe.lab.eng.brq.redhat.com	(Started virt-161.cluster-qe.lab.eng.brq.redhat.com)
> 14:39:09    pengine:   notice: LogAction:	 * Recover    vg                                             ( virt-162.cluster-qe.lab.eng.brq.redhat.com -> virt-161.cluster-qe.lab.eng.brq.redhat.com )  
> 14:39:09    pengine:   notice: process_pe_message:	Calculated transition 2, saving inputs in /var/lib/pacemaker/pengine/pe-input-26.bz2
> 14:54:10    pengine:     info: determine_online_status:	Node virt-161.cluster-qe.lab.eng.brq.redhat.com is online
> 14:54:10    pengine:  warning: unpack_rsc_op_failure:	Processing failed op monitor for vg on virt-162.cluster-qe.lab.eng.brq.redhat.com: unknown error (1)
> 14:54:10    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:54:10    pengine:     info: unpack_node_loop:	Node virt-162.cluster-qe.lab.eng.brq.redhat.com is already processed
> 14:54:10    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 14:54:10    pengine:     info: unpack_node_loop:	Node virt-162.cluster-qe.lab.eng.brq.redhat.com is already processed
> 14:54:10    pengine:     info: common_print:	virt-162.cluster-qe.lab.eng.brq.redhat.com	(ocf::pacemaker:remote):	Started virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:54:10    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	Started virt-161.cluster-qe.lab.eng.brq.redhat.com
> 14:54:10    pengine:     info: LogActions:	Leave   virt-162.cluster-qe.lab.eng.brq.redhat.com	(Started virt-161.cluster-qe.lab.eng.brq.redhat.com)
> 14:54:10    pengine:     info: LogActions:	Leave   vg	(Started virt-161.cluster-qe.lab.eng.brq.redhat.com)
> 14:54:10    pengine:   notice: process_pe_message:	Calculated transition 3, saving inputs in /var/lib/pacemaker/pengine/pe-input-27.bz2
> [root@virt-161 ~]# grep resource-discovery /var/log/cluster/corosync.log
> [root@virt-161 ~]# echo $?
> 1

Two probe failures logged for remote node -- first at 14:39 when cluster was starting and then at 14:54 when cluster-recheck-interval was reached (and subsequently every 15 min after that). No resource-discovery hint in logs.


after:
======

> [root@virt-149 ~]# rpm -q pacemaker
> pacemaker-1.1.19-6.el7.x86_64
> [root@virt-149 ~]# ssh virt-150 rpm -q pacemaker-remote
> pacemaker-remote-1.1.19-6.el7.x86_64
> [root@virt-149 ~]# cp LVM-agent-7.4 /usr/lib/ocf/resource.d/heartbeat/LVM
> cp: overwrite ‘/usr/lib/ocf/resource.d/heartbeat/LVM’? y
> [root@virt-149 ~]# scp LVM-agent-7.4 virt-150:/usr/lib/ocf/resource.d/heartbeat/LVM
> LVM-agent-7.4                                                    100%   20KB  11.3MB/s   00:00
> [root@virt-149 ~]# truncate --size 1G loop
> [root@virt-149 ~]# losetup -f loop
> [root@virt-149 ~]# losetup -l
> NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
> /dev/loop0         0      0         0  0 /root/loop
> [root@virt-149 ~]# pvcreate /dev/loop0
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   Physical volume "/dev/loop0" successfully created.
> [root@virt-149 ~]# vgcreate vg_test /dev/loop0
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   Volume group "vg_test" successfully created
> [root@virt-149 ~]# lvcreate -n lv_test -l +100%free vg_test
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   Logical volume "lv_test" created.
> [root@virt-149 ~]# pcs resource create vg ocf:heartbeat:LVM volgrpname=vg_test
> [root@virt-149 ~]# pcs resource ban vg virt-150.cluster-qe.lab.eng.brq.redhat.com
> Warning: Creating location constraint cli-ban-vg-on-virt-150.cluster-qe.lab.eng.brq.redhat.com with a score of -INFINITY for resource vg on node virt-150.cluster-qe.lab.eng.brq.redhat.com.
> This will prevent vg from running on virt-150.cluster-qe.lab.eng.brq.redhat.com until the constraint is removed. This will be the case even if virt-150.cluster-qe.lab.eng.brq.redhat.com is the last node in the cluster.
> [root@virt-149 ~]# pcs cluster stop --all
> virt-149.cluster-qe.lab.eng.brq.redhat.com: Stopping Cluster (pacemaker)...
> virt-149.cluster-qe.lab.eng.brq.redhat.com: Stopping Cluster (corosync)...
> [root@virt-149 ~]# date
> Thu Aug 16 16:48:14 CEST 2018
> [root@virt-149 ~]# pcs cluster start --all --wait
> virt-149.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster (corosync)...
> virt-149.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster (pacemaker)...
> Waiting for node(s) to start...
> virt-149.cluster-qe.lab.eng.brq.redhat.com: Started
> [root@virt-149 ~]# pcs status
> Cluster name: bzzt
> Stack: corosync
> Current DC: virt-149.cluster-qe.lab.eng.brq.redhat.com (version 1.1.19-6.el7-c3c624ea3d) - partition with quorum
> Last updated: Thu Aug 16 16:50:41 2018
> Last change: Thu Aug 16 16:46:47 2018 by root via crm_resource on virt-149.cluster-qe.lab.eng.brq.redhat.com
> 
> 2 nodes configured
> 2 resources configured
> 
> Online: [ virt-149.cluster-qe.lab.eng.brq.redhat.com ]
> RemoteOnline: [ virt-150.cluster-qe.lab.eng.brq.redhat.com ]
> 
> Full list of resources:
> 
>  virt-150.cluster-qe.lab.eng.brq.redhat.com	(ocf::pacemaker:remote):	Started virt-149.cluster-qe.lab.eng.brq.redhat.com
>  vg	(ocf::heartbeat:LVM):	Started virt-149.cluster-qe.lab.eng.brq.redhat.com
> 
> Failed Actions:
> * vg_monitor_0 on virt-150.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=19, status=complete, exitreason='LVM Volume vg_test is not available',
>     last-rc-change='Thu Aug 16 16:48:48 2018', queued=0ms, exec=65ms
> 
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> [root@virt-149 ~]# grep pengine: /var/log/cluster/corosync.log | cut -d' ' -f 3,6-
> 16:48:24    pengine:     info: crm_log_init:	Changed active directory to /var/lib/pacemaker/cores
> 16:48:24    pengine:     info: qb_ipcs_us_publish:	server name: pengine
> 16:48:24    pengine:     info: main:	Starting pengine
> 16:48:47    pengine:  warning: unpack_config:	Blind faith: not fencing unseen nodes
> 16:48:47    pengine:     info: determine_online_status:	Node virt-149.cluster-qe.lab.eng.brq.redhat.com is online
> 16:48:47    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 16:48:47    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 16:48:47    pengine:     info: common_print:	virt-150.cluster-qe.lab.eng.brq.redhat.com(ocf::pacemaker:remote):	Stopped
> 16:48:47    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	Stopped
> 16:48:47    pengine:     info: RecurringOp:	 Start recurring monitor (60s) for virt-150.cluster-qe.lab.eng.brq.redhat.com on virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:47    pengine:     info: RecurringOp:	 Start recurring monitor (10s) for vg on virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:47    pengine:   notice: LogAction:	 * Start      virt-150.cluster-qe.lab.eng.brq.redhat.com     ( virt-149.cluster-qe.lab.eng.brq.redhat.com )  
> 16:48:47    pengine:   notice: LogAction:	 * Start      vg                                             ( virt-149.cluster-qe.lab.eng.brq.redhat.com )  
> 16:48:47    pengine:   notice: process_pe_message:	Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-20.bz2
> 16:48:48    pengine:     info: determine_online_status:	Node virt-149.cluster-qe.lab.eng.brq.redhat.com is online
> 16:48:48    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 16:48:48    pengine:     info: unpack_node_loop:	Node virt-150.cluster-qe.lab.eng.brq.redhat.com is already processed
> 16:48:48    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 16:48:48    pengine:     info: unpack_node_loop:	Node virt-150.cluster-qe.lab.eng.brq.redhat.com is already processed
> 16:48:48    pengine:     info: common_print:	virt-150.cluster-qe.lab.eng.brq.redhat.com(ocf::pacemaker:remote):	Started virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:48    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	Stopped
> 16:48:48    pengine:     info: RecurringOp:	 Start recurring monitor (60s) for virt-150.cluster-qe.lab.eng.brq.redhat.com on virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:48    pengine:     info: RecurringOp:	 Start recurring monitor (10s) for vg on virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:48    pengine:     info: LogActions:	Leave   virt-150.cluster-qe.lab.eng.brq.redhat.com	(Started virt-149.cluster-qe.lab.eng.brq.redhat.com)
> 16:48:48    pengine:   notice: LogAction:	 * Start      vg                                             ( virt-149.cluster-qe.lab.eng.brq.redhat.com )  
> 16:48:48    pengine:   notice: process_pe_message:	Calculated transition 1, saving inputs in /var/lib/pacemaker/pengine/pe-input-21.bz2
> 16:48:49    pengine:     info: determine_online_status:	Node virt-149.cluster-qe.lab.eng.brq.redhat.com is online
> 16:48:49    pengine:  warning: unpack_rsc_op_failure:	Processing failed probe of vg on virt-150.cluster-qe.lab.eng.brq.redhat.com: unknown error | rc=1
> 16:48:49    pengine:   notice: unpack_rsc_op_failure:	If it is not possible for vg to run on virt-150.cluster-qe.lab.eng.brq.redhat.com, see the resource-discovery option for location constraints
> 16:48:49    pengine:  warning: unpack_rsc_op_failure:	Processing failed probe of vg on virt-150.cluster-qe.lab.eng.brq.redhat.com: unknown error | rc=1
> 16:48:49    pengine:   notice: unpack_rsc_op_failure:	If it is not possible for vg to run on virt-150.cluster-qe.lab.eng.brq.redhat.com, see the resource-discovery option for location constraints
> 16:48:49    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 16:48:49    pengine:     info: unpack_node_loop:	Node virt-150.cluster-qe.lab.eng.brq.redhat.com is already processed
> 16:48:49    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 16:48:49    pengine:     info: unpack_node_loop:	Node virt-150.cluster-qe.lab.eng.brq.redhat.com is already processed
> 16:48:49    pengine:     info: common_print:	virt-150.cluster-qe.lab.eng.brq.redhat.com(ocf::pacemaker:remote):	Started virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:49    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	FAILED virt-150.cluster-qe.lab.eng.brq.redhat.com
> 16:48:49    pengine:     info: RecurringOp:	 Start recurring monitor (10s) for vg on virt-149.cluster-qe.lab.eng.brq.redhat.com
> 16:48:49    pengine:     info: LogActions:	Leave   virt-150.cluster-qe.lab.eng.brq.redhat.com	(Started virt-149.cluster-qe.lab.eng.brq.redhat.com)
> 16:48:49    pengine:   notice: LogAction:	 * Recover    vg                                             ( virt-150.cluster-qe.lab.eng.brq.redhat.com -> virt-149.cluster-qe.lab.eng.brq.redhat.com )  
> 16:48:49    pengine:   notice: process_pe_message:	Calculated transition 2, saving inputs in /var/lib/pacemaker/pengine/pe-input-22.bz2
> 17:03:50    pengine:     info: determine_online_status:	Node virt-149.cluster-qe.lab.eng.brq.redhat.com is online
> 17:03:50    pengine:  warning: unpack_rsc_op_failure:	Processing failed probe of vg on virt-150.cluster-qe.lab.eng.brq.redhat.com: unknown error | rc=1
> 17:03:50    pengine:   notice: unpack_rsc_op_failure:	If it is not possible for vg to run on virt-150.cluster-qe.lab.eng.brq.redhat.com, see the resource-discovery option for location constraints
> 17:03:50    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 17:03:50    pengine:     info: unpack_node_loop:	Node virt-150.cluster-qe.lab.eng.brq.redhat.com is already processed
> 17:03:50    pengine:     info: unpack_node_loop:	Node 1 is already processed
> 17:03:50    pengine:     info: unpack_node_loop:	Node virt-150.cluster-qe.lab.eng.brq.redhat.com is already processed
> 17:03:50    pengine:     info: common_print:	virt-150.cluster-qe.lab.eng.brq.redhat.com(ocf::pacemaker:remote):	Started virt-149.cluster-qe.lab.eng.brq.redhat.com
> 17:03:50    pengine:     info: common_print:	vg	(ocf::heartbeat:LVM):	Started virt-149.cluster-qe.lab.eng.brq.redhat.com
> 17:03:50    pengine:     info: LogActions:	Leave   virt-150.cluster-qe.lab.eng.brq.redhat.com	(Started virt-149.cluster-qe.lab.eng.brq.redhat.com)
> 17:03:50    pengine:     info: LogActions:	Leave   vg	(Started virt-149.cluster-qe.lab.eng.brq.redhat.com)
> 17:03:50    pengine:   notice: process_pe_message:	Calculated transition 3, saving inputs in /var/lib/pacemaker/pengine/pe-input-23.bz2

Cluster behavior remained the same, except now an additional message is logged on the DC pointing the administrator towards configuration fix:

> If it is not possible for vg to run on virt-150.cluster-qe.lab.eng.brq.redhat.com, see the resource-discovery option for location constraints

Marking verified.

Comment 18 errata-xmlrpc 2018-10-30 07:57:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3055