Bug 1246149 - Allow disabling resource-discovery at resource creation time
Summary: Allow disabling resource-discovery at resource creation time
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pacemaker
Version: 8.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Ken Gaillot
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-23 14:37 UTC by michal novacek
Modified: 2023-08-10 15:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pcs cluster report output (3.87 MB, application/x-bzip)
2015-07-23 14:37 UTC, michal novacek
no flags Details
reproducer commands (3.99 KB, application/x-shellscript)
2015-07-24 14:53 UTC, michal novacek
no flags Details
pcs cluster report output (6.44 MB, application/x-bzip)
2015-07-24 14:54 UTC, michal novacek
no flags Details
added ip_nonlocal_bind to reproducer script (4.42 KB, text/plain)
2015-08-06 09:26 UTC, michal novacek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker KCSOPP-1837 0 None None None 2023-07-25 19:54:57 UTC
Red Hat Knowledge Base (Solution) 3553051 0 None None None 2018-08-06 15:59:10 UTC

Description michal novacek 2015-07-23 14:37:17 UTC
Created attachment 1055413 [details]
pcs cluster report output

Description of problem:
I have this strange problem with haproxy resource agent.

In our automated scenario I set up systemd:haproxy clone and start (enable) it.
It is configured to start only two instances on four node cluster (clone-max=2)
and is restricted by constraints to start on two specific nodes only. 

What happens is that sometimes the haproxy-clone will not start at all. It is
needed to run 'pcs resource cleanup haproxy-clone' after which it starts and
all is happy ever after.

haproxy can be started with 'systemctl start haproxy' and with 'pcs resource
debug-start haproxy' in the case where it is not started after 'pcs resource
enable haproxy-clone'


I'd like you to have a look at the attached crm-report whether there is
something that you can see related to this problem (because I don't) and or
help me find this problem.


Version-Release number of selected component (if applicable):
pcs-0.9.137-13.el7_1.3.x86_64
pacemaker-1.1.13-5.el7.x86_64

How reproducible: most of the time

Steps to Reproduce:
1. pcs resource enable haproxy-clone

Actual results: 
    haproxy clone not started unless 'pcs resource cleanup haproxy-clone'

Expected results: 
    haproxy clone started

Comment 2 Chris Feist 2015-07-23 21:54:34 UTC
I'm seeing an error from haproxy about not being able to bind to a socket:

Jul 23 12:01:48 virt-094 haproxy-systemd-wrapper: [ALERT] 203/120148 (16806) : Starting frontend vip: cannot bind socket [10.34.71.198:80]

and then a few lines later, it looks like the IP address is setup.

Can you add a constraint for the vip to start before haproxy and see if that solves the issue?

Comment 3 Fabio Massimo Di Nitto 2015-07-24 08:40:40 UTC
https://github.com/beekhof/osp-ha-deploy/blob/master/pcmk/lb.scenario#L50

when using haproxy in clone mode you need to allow haproxy to bind to non-local IP.

Comment 5 michal novacek 2015-07-24 11:07:28 UTC
Adding the collocation constraint for haproxy-clone and
vip starts correctly haproxy on the node where vip runs. haproxy will move
with vip when vip is moved.  On the other node (=where vip is not running) haproxy does not start with the 'Cannot bind to socket' message event though net.ipv4.ip_nonlocal_bind=1 is setp

What would you recommend checking next?

Comment 6 michal novacek 2015-07-24 14:52:27 UTC
I have found a reproducer.

1/ have cluster configured and running (1)
2/ run commands.sh
3/ see haproxy-clone not started
4/ run 'pcs resource cleanup haproxy-clone'
5/ watch haproxy-clone started 


(1)
Cluster Name: STSRHTS20027
Corosync Nodes:
 virt-094 virt-095 virt-096 virt-097 
Pacemaker Nodes:
 virt-094 virt-095 virt-096 virt-097 

Resources: 
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
               stop interval=0s timeout=100 (dlm-stop-timeout-100)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1 
   Operations: start interval=0s timeout=90 (clvmd-start-timeout-90)
               stop interval=0s timeout=90 (clvmd-stop-timeout-90)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)

Stonith Devices: 
 Resource: fence-virt-094 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-094 pcmk_host_map=virt-094:virt-094.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-094-monitor-interval-60s)
 Resource: fence-virt-095 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-095 pcmk_host_map=virt-095:virt-095.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-095-monitor-interval-60s)
 Resource: fence-virt-096 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-096 pcmk_host_map=virt-096:virt-096.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-096-monitor-interval-60s)
 Resource: fence-virt-097 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-097 pcmk_host_map=virt-097:virt-097.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-097-monitor-interval-60s)
Fencing Levels: 

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS20027
 dc-version: 1.1.12-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1437746699
 no-quorum-policy: freeze

[root@virt-095 ~]# pcs status
Cluster name: STSRHTS20027
Last updated: Fri Jul 24 16:08:29 2015          Last change: Fri Jul 24 16:07:43 2015
Stack: corosync
Current DC: virt-097 (version 1.1.12-44eb2dd) - partition with quorum
4 nodes and 12 resources configured

Online: [ virt-094 virt-095 virt-096 virt-097 ]

Full list of resources:

 fence-virt-094 (stonith:fence_xvm):    Started virt-094
 fence-virt-095 (stonith:fence_xvm):    Started virt-095
 fence-virt-096 (stonith:fence_xvm):    Started virt-096
 fence-virt-097 (stonith:fence_xvm):    Started virt-097
 Clone Set: dlm-clone [dlm]
     Started: [ virt-094 virt-095 virt-096 virt-097 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-094 virt-095 virt-096 virt-097 ]

PCSD Status:
  virt-094: Online
  virt-095: Online
  virt-096: Online
  virt-097: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 7 michal novacek 2015-07-24 14:53:07 UTC
Created attachment 1055794 [details]
reproducer commands

Comment 8 michal novacek 2015-07-24 14:54:27 UTC
Created attachment 1055795 [details]
pcs cluster report output

Comment 9 Fabio Massimo Di Nitto 2015-07-24 15:39:18 UTC
You really need to fix your sysctl as I mentioned before.

cat /proc/sys/net/ipv4/ip_nonlocal_bind
0
[root@virt-094 ~]# 

[root@virt-095 ~]# cat /proc/sys/net/ipv4/ip_nonlocal_bind
1

Comment 10 Chris Feist 2015-08-05 22:29:02 UTC
Michal,

Have you had a chance to make the change from comment #9?

Thanks,
Chris

Comment 11 michal novacek 2015-08-06 09:25:30 UTC
Yes, and the problem still stands. I put ip_nonlocal_bind to the new reproducer script so it is clear that it I use them.

I'm not sure thought that it is a pcs problem, it seems more likely to be resource-agent.

Comment 12 michal novacek 2015-08-06 09:26:44 UTC
Created attachment 1059823 [details]
added ip_nonlocal_bind to reproducer script

Comment 13 michal novacek 2015-08-06 09:35:11 UTC
The following versions of compononents have been used when writing comment #11

pcs-0.9.137-13.el7_1.3.x86_64
pacemaker-1.1.13-6.el7.x86_64
resource-agents-3.9.5-50.el7.x86_64

Comment 15 Tomas Jelinek 2016-01-28 12:09:14 UTC
I do not think pcs is the right component to blame here as it serves merely as a tool for creating CIB and running resources cleanup in pacemaker. Moving to resource-agents for further investigation.

Comment 16 Fabio Massimo Di Nitto 2016-01-28 12:36:58 UTC
moving to pacemaker. If anything systemd resources are internal to pcmk.

Comment 17 Ken Gaillot 2016-01-28 18:28:00 UTC
The reproducer script adds the resources, pushes that to the cluster, *then* adds the constraints. This means that the cluster will initially schedule the resources without the constraints, and services might be started on undesired nodes. It would be better to do all the commands in a single file, then push that one file to the cluster (or at least, put any constraints related to a resource in the same file that creates it, so they go into effect immediately).

I haven't had time to thoroughly analyze the logs yet. If haproxy initially failed to start on all nodes, that would explain the behavior.

Keep in mind that even if constraints forbid a resource from running on a particular node, by default pacemaker will still run a one-time monitor (probe) on the node to ensure that the resource is indeed not running there. So if the software is not installed on that node, the probe can fail and cause problems.

Comment 18 michal novacek 2016-02-01 13:41:26 UTC
(In reply to Ken Gaillot from comment #17)
> The reproducer script adds the resources, pushes that to the cluster, *then*
> adds the constraints. This means that the cluster will initially schedule
> the resources without the constraints, and services might be started on
> undesired nodes. It would be better to do all the commands in a single file,
> then push that one file to the cluster (or at least, put any constraints
> related to a resource in the same file that creates it, so they go into
> effect immediately).
> ...

This might be the problem. We really do push resources in one batch and than all the constraints for the cluster in another. And yes, the haproxy is installed only on two nodes (out of four) where it is supposed to run by constraints.

Unluckily, this behavior would not be easy to change in our testing framework. Is there any other way on how this can be done "the right way"? Something like putting all nodes into standby mode and then push all resources and constraints and un-standby?

> 
> I haven't had time to thoroughly analyze the logs yet. If haproxy initially
> failed to start on all nodes, that would explain the behavior.
> 
> Keep in mind that even if constraints forbid a resource from running on a
> particular node, by default pacemaker will still run a one-time monitor
> (probe) on the node to ensure that the resource is indeed not running there.
> So if the software is not installed on that node, the probe can fail and
> cause problems.

Comment 19 Ken Gaillot 2016-02-01 17:51:28 UTC
Michael,

Your "prefers" constraints are fine, but instead of "avoids", you need the advanced constraint command:

  pcs constraint location add vip-avoids-node2 vip ${nodes[2]} -INFINITY resource-discovery=never

for each resource/node combination ("vip-avoids_node2" is an arbitrary ID). This is identical to "avoids", except that resource-discovery=never disables startup probes on that node. This is desirable when the software is not installed on all nodes.

Unfortunately, you still have a problem before the constraint is created. Neither standby mode, maintenance mode, nor disabling the resource at creation will prevent startup probes. I don't know of a way around that; the ideal is really to push the resource creation and constraints together. Of course, you could just cleanup after creating the constraints.

Comment 20 Ken Gaillot 2016-05-16 16:05:12 UTC
As mentioned in Comment 19, modifying the constraints will help, but pushing the resource creation and constraints together is the only way currently to completely avoid the issue. Leaving this BZ open as a feature request to allow disabling resource-discovery at resource creation time, which will not be addressed in the 7.3 timeframe, but will be evaluated for 7.4.

Comment 21 Ken Gaillot 2017-01-10 21:46:42 UTC
This will not be implemented in the 7.4 timeframe

Comment 22 Ken Gaillot 2017-10-09 17:15:41 UTC
Due to time constraints, this will not make 7.5

Comment 24 Ken Gaillot 2019-03-27 16:24:07 UTC
Moving to RHEL 8, as new features will no longer be added to RHEL 7 as of 7.8


Note You need to log in before you can comment on or make changes to this bug.