Bug 1444020
| Summary: | Improve SBD Storage Device Timeouts | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Peess <dpeess> |
| Component: | sbd | Assignee: | Klaus Wenninger <kwenning> |
| Status: | CLOSED NOTABUG | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | abeekhof, cfeist, dpeess, jfriesse, jruemker, kwenning, marcel.fischer, mreinke, oalbrigt, sfroemer |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-07-25 16:53:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1468580 | ||
| Bug Blocks: | 1466531 | ||
|
Description
Daniel Peess
2017-04-20 12:44:50 UTC
Actually message-timeout should be set to >2x watchdog timeout. So sbd should rather enforce this then proceed without any warnings. Anyway the warning you've observed is coming from the fence-agent where power-timeout is used as timeout in the generic part (fencing.py). So it would actually make sense to raise that if configuring higher values for mesage-timeout. Have you tried to do so? Of couse it would be argueable as we have message-timeout in fence_sbd to use this rather as a timeout. Thinking over it again there is of course kind of a chicken and egg issue with using message-timeout as a general timeout here as it is being read from the device and that reading might timeout as well. Checking and using cluster-property stonith-watchdog-timeout might make more sense. Maybe just throw a warning and not bail out with an error as setting that is just strongly recommended but not a hard must. Hi,
we set power_timeout 180 and the warning disappeared. Now currently the timeout values are like these:
Stonith Devices:
Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd)
Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=onoff power_timeout=200
Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s)
stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s)
monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s)
Resource: sbd-vglvmha-bsul0799a01 (class=stonith type=fence_sbd)
Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=5 port=bsul0799a01 plug=bsul0799a01 method=onoff power_timeout=200
Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0799a01-start-interval-0s)
stop interval=0s timeout=60s (sbd-vglvmha-bsul0799a01-stop-interval-0s)
monitor interval=60s (sbd-vglvmha-bsul0799a01-monitor-interval-60s)
and...
[root@bsul0799 ~]# sbd -d /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd dump
==Dumping header on disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd
Header version : 2.1
UUID : d3ef8e16-3a60-4e93-8b38-a198ccdb25fe
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 60
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 120
==Header on disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd is dumped
and...
[root@bsul0799 ~]# pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: multisite
dc-version: 1.1.15-11.el7_3.2-e174ec8
default-action-timeout: 60s
have-watchdog: true
last-lrm-refresh: 1494411936
stonith-enabled: true
stonith-timeout: 300s
stonith-watchdog-timeout: 0
Additionally we set the following:
cat /etc/corosync/corosync.conf
...
token: 110000
consensus: 5000
...
As Daniel Peess said in Bug 1449155 we need to disable the fencing resources based on a ping resource to our network gateway.
Thats the reason for the high token value. Because we need to wait for the pingd resource to disable or stop the fence resource.
With all that values, I looks like that everything is working. But we have some strange wait times:
Jul 04 16:48:10 bsul0799 corosync[1258]: [TOTEM ] A processor failed, forming new configuration.
Jul 04 16:48:15 bsul0799 corosync[1258]: [TOTEM ] A new membership (10.40.221.22:2492) was formed. Members left: 1
Jul 04 16:48:15 bsul0799 corosync[1258]: [TOTEM ] Failed to receive the leave message. failed: 1
Jul 04 16:48:15 bsul0799 corosync[1258]: [QUORUM] Members[1]: 2
Jul 04 16:48:15 bsul0799 corosync[1258]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 04 16:48:15 bsul0799 stonith-ng[1611]: notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 stonith-ng[1611]: notice: Purged 1 peers with id=1 and/or uname=bsul0798a01 from the membership cache
Jul 04 16:48:15 bsul0799 pacemakerd[1585]: notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 attrd[1614]: notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 attrd[1614]: notice: Removing all bsul0798a01 attributes for peer loss
Jul 04 16:48:15 bsul0799 attrd[1614]: notice: Lost attribute writer bsul0798a01
Jul 04 16:48:15 bsul0799 attrd[1614]: notice: Purged 1 peers with id=1 and/or uname=bsul0798a01 from the membership cache
Jul 04 16:48:15 bsul0799 crmd[1618]: notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 crmd[1618]: warning: Our DC node (bsul0798a01) left the cluster
Jul 04 16:48:15 bsul0799 cib[1610]: notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 cib[1610]: notice: Purged 1 peers with id=1 and/or uname=bsul0798a01 from the membership cache
Jul 04 16:48:15 bsul0799 crmd[1618]: notice: State transition S_NOT_DC -> S_ELECTION
Jul 04 16:48:15 bsul0799 crmd[1618]: notice: State transition S_ELECTION -> S_INTEGRATION
Jul 04 16:48:15 bsul0799 crmd[1618]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
Jul 04 16:48:15 bsul0799 crmd[1618]: notice: Watchdog may be enabled but stonith-watchdog-timeout is disabled: 0
Jul 04 16:48:16 bsul0799 pengine[1616]: notice: Relying on watchdog integration for fencing
Jul 04 16:48:16 bsul0799 pengine[1616]: warning: Node bsul0798a01 will be fenced because the node is no longer part of the cluster
Jul 04 16:48:16 bsul0799 pengine[1616]: warning: Node bsul0798a01 is unclean
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action halvmvg_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action fsvarwwwhtml_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action vip1_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action httpd_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action myservicegateways:1_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action myservicegateways:1_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action sbd-vglvmha-bsul0799a01_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Action sbd-vglvmha-bsul0799a01_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Scheduling Node bsul0798a01 for STONITH
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Move halvmvg (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Move fsvarwwwhtml (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Move vip1 (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Move httpd (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Stop dummy (bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Stop myservicegateways:1 (bsul0798a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: notice: Stop sbd-vglvmha-bsul0799a01 (bsul0798a01)
Jul 04 16:48:17 bsul0799 pengine[1616]: warning: Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-54.bz2
Jul 04 16:48:17 bsul0799 crmd[1618]: notice: Initiating stop operation dummy_stop_0 locally on bsul0799a01
Jul 04 16:48:17 bsul0799 crmd[1618]: notice: Requesting fencing (reboot) of node bsul0798a01
Jul 04 16:48:17 bsul0799 stonith-ng[1611]: notice: Client crmd.1618.ffe5e286 wants to fence (reboot) 'bsul0798a01' with device '(any)'
Jul 04 16:48:17 bsul0799 stonith-ng[1611]: notice: Requesting peer fencing (reboot) of bsul0798a01
Jul 04 16:48:17 bsul0799 crmd[1618]: notice: Result of stop operation for dummy on bsul0799a01: 0 (ok)
Jul 04 16:48:17 bsul0799 stonith-ng[1611]: notice: sbd-vglvmha-bsul0798a01 can fence (reboot) bsul0798a01: dynamic-list
...
Jul 04 16:52:28 bsul0799 stonith-ng[1611]: notice: Operation 'reboot' [4556] (call 2 from crmd.1618) for host 'bsul0798a01' with device 'sbd-vglvmha-bsul0798a01' returned: 0 (OK)
Jul 04 16:52:28 bsul0799 stonith-ng[1611]: notice: Operation reboot of bsul0798a01 by bsul0799a01 for crmd.1618: OK
Jul 04 16:52:28 bsul0799 crmd[1618]: notice: Stonith operation 2/37:0:0:ce07d574-a428-4c34-ab75-c25e4b58ebeb: OK (0)
Jul 04 16:52:28 bsul0799 crmd[1618]: notice: Peer bsul0798a01 was terminated (reboot) by bsul0799a01 for bsul0799a01: OK (ref=5efd041a-6e6a-49be-864e-3f7182a92ffe) by client crmd.1618
So it took about 4 Minutes to receive an OK message from sbd. Does this have to do with the high token value?
(In reply to Marcel Fischer from comment #14) > > So it took about 4 Minutes to receive an OK message from sbd. Does this have > to do with the high token value? In my test I didn't experience the time for fencing via sbd enlonged by the token-timeout. My token-timeout is 10s and my msgwait-timeout 10s as well. With '-v' option hacked into generate_sbd_command in /usr/sbin/fence_sbd the log shows just 10s between stonith-ng finding the device and getting the execution-ok. Do you get more delay than the msgwait-timeout with a shorter token-timeout as well? Jul 4 20:47:32 localhost stonith-ng[17416]: notice: sbd-fencing can fence (reboot) remote_node1: dynamic-list Jul 4 20:47:32 localhost sbd[18419]: info: main: Verbose mode enabled. Jul 4 20:47:32 localhost sbd[18419]: info: main: Watchdog enabled. Jul 4 20:47:32 localhost sbd[18421]: info: main: Verbose mode enabled. Jul 4 20:47:32 localhost sbd[18421]: info: main: Watchdog enabled. Jul 4 20:47:32 localhost sbd[18423]: info: main: Verbose mode enabled. Jul 4 20:47:32 localhost sbd[18423]: info: main: Watchdog enabled. Jul 4 20:47:32 localhost sbd[18424]: info: sbd_make_realtime: Scheduler priority is now 99 Jul 4 20:47:32 localhost sbd[18424]: info: sbd_memlock: Locked ourselves in memory Jul 4 20:47:32 localhost sbd[18424]: /dev/vdb: info: slot_msg_wrapper: Delivery process handling /dev/vdb Jul 4 20:47:32 localhost sbd[18424]: /dev/vdb: info: slot_msg: Device UUID: aea922dc-0da6-4e43-b9e2-5ab550a3f453 Jul 4 20:47:32 localhost sbd[18424]: /dev/vdb: info: slot_lookup: remote_node1 owns slot 2 Jul 4 20:47:32 localhost sbd[18424]: /dev/vdb: info: slot_msg: Writing reset to node slot remote_node1 Jul 4 20:47:32 localhost sbd[18424]: /dev/vdb: info: slot_msg: Messaging delay: 10 Jul 4 20:47:42 localhost sbd[18424]: /dev/vdb: info: slot_msg: reset successfully delivered to remote_node1 Jul 4 20:47:42 localhost sbd[18423]: info: messenger: Process 18424 succeeded. Jul 4 20:47:42 localhost sbd[18423]: info: messenger: Message successfully delivered. Jul 4 20:47:42 localhost stonith-ng[17416]: notice: Operation 'reboot' [18412] (call 2 from stonith_admin.18399) for host 'remote_node1' with device 'sbd-fencing' returned: 0 (OK) (In reply to Klaus Wenninger from comment #15) > (In reply to Marcel Fischer from comment #14) > > > > So it took about 4 Minutes to receive an OK message from sbd. Does this have > > to do with the high token value? > > In my test I didn't experience the time for fencing via sbd enlonged by the > token-timeout. > My token-timeout is 10s and my msgwait-timeout 10s as well. > With '-v' option hacked into generate_sbd_command in /usr/sbin/fence_sbd the > log shows just 10s between stonith-ng finding the device and getting the > execution-ok. > > Do you get more delay than the msgwait-timeout with a shorter token-timeout > as well? Well it seems token-timeout has no influence. Some time ago I changed the method parameter of the sbd device to onoff: [root@bsul0799 ~]# pcs stonith show sbd-vglvmha-bsul0798a01 Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd) Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=onoff power_timeout=200 Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s) stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s) monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s) I changed this back to "cycle" as configured by Daniel Peess. Now I have around 2 minutes until an OK is return by sbd. ... Jul 05 11:42:04 [38583] bsul0799 stonith-ng: info: call_remote_stonith: Total timeout set to 300 for peer's fencing of bsul0798a01 for crmd.38587|id=388e13c4-f549-4c87-ac50-5acbe7fb148d Jul 05 11:42:04 [38583] bsul0799 stonith-ng: info: call_remote_stonith: Requesting that 'bsul0799a01' perform op 'bsul0798a01 reboot' for crmd.38587 (360s, 0s) Jul 05 11:42:04 [38583] bsul0799 stonith-ng: notice: can_fence_host_with_device: sbd-vglvmha-bsul0798a01 can fence (reboot) bsul0798a01: dynamic-list Jul 05 11:42:04 [38583] bsul0799 stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'bsul0798a01' ... Jul 05 11:44:15 [38583] bsul0799 stonith-ng: notice: log_operation: Operation 'reboot' [26596] (call 4 from crmd.38587) for host 'bsul0798a01' with device 'sbd-vglvmha-bsul0798a01' returned: 0 (OK) This would match with the msgwait-timeout of sbd Could you tell me the difference between those two methods? And how did you configure this '-v' to fence_sbd. Could you tell me the line in this file? (In reply to Marcel Fischer from comment #16) > > Could you tell me the difference between those two methods? Guess for 'onoff' configuration to make sense your watchdog would have to be set in a way that it does an off as well instead of rebooting. But as you wouldn't have an 'on' then the usefulness is a little questionable. If you want to keep pacemaker down on the sbd-fenced node I would rather go with SBD_STARTMODE=clean What you are actually experiencing I have to investigate though ... > And how did you configure this '-v' to fence_sbd. Could you tell me the line > in this file? Around line 114: 'cmd += " %s %s" % (command, arguments)' --> 'cmd += " %s %s -v" % (command, arguments)' I'll check that I get that in officially switchable ... (In reply to Klaus Wenninger from comment #17) > (In reply to Marcel Fischer from comment #16) > > > > > Could you tell me the difference between those two methods? > > Guess for 'onoff' configuration to make sense your watchdog would have to be > set in a way that it does an off as well instead of rebooting. But as you > wouldn't have an 'on' then the usefulness is a little questionable. If you > want to keep pacemaker down on the sbd-fenced node I would rather go with > SBD_STARTMODE=clean > > What you are actually experiencing I have to investigate though ... That makes absolutly sense. I guess thats the reason for the four minutes wait. Just two times msgwait, two minutes for "off" and and two minutes for "on" > > And how did you configure this '-v' to fence_sbd. Could you tell me the line > > in this file? > > Around line 114: > > 'cmd += " %s %s" % (command, arguments)' --> 'cmd += " %s %s -v" % (command, > arguments)' > > I'll check that I get that in officially switchable ... switchable would be great! One other question, currently we are using ping resource (to ping default gateway) to disable the sbd resource. The reason for that is, that we dont want fecing from nodes with no working network access. It works quite fine, but configuring timeouts for that is not clear to me. [root@bsul0799 ~]# pcs resource show myservicegateways-clone Clone: myservicegateways-clone Resource: myservicegateways (class=ocf provider=pacemaker type=ping) Attributes: dampen=20s multiplier=10000 host_list=10.41.92.1 Operations: start interval=0s timeout=60 (myservicegateways-start-interval-0s) stop interval=0s timeout=20 (myservicegateways-stop-interval-0s) monitor interval=10 (myservicegateways-monitor-interval-10) The dampen value is understable, wait 20s before doing anything. But additionally it takes about 20 to 30 seconds for the cluster to detect that the ping is not possible anymore Example: [root@bsul0798 ~]# iptables -I OUTPUT -d 'X.X.X.X' -p ICMP --icmp-type 8 -j DROP;date Wed Jul 5 15:29:18 CEST 2017 ==> /var/log/cluster/corosync.log <== Jul 05 15:29:42 [46253] bsul0798 attrd: info: attrd_peer_update: Setting pingd[bsul0798a01]: 10000 -> 0 from bsul0798a01 ==> /var/log/cluster/corosync.log <== Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: Diff: --- 0.211.82 2 Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: Diff: +++ 0.211.83 (null) Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: + /cib: @num_updates=83 Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-pingd']: @value=0 Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=bsul0799a01/attrd/13, version=0.211.83) ... Whats the reason for the wait fron 15:29:18 to 15:29:42? (In reply to Marcel Fischer from comment #18) > (In reply to Klaus Wenninger from comment #17) > > (In reply to Marcel Fischer from comment #16) > > > > > > > > Could you tell me the difference between those two methods? > > > > Guess for 'onoff' configuration to make sense your watchdog would have to be > > set in a way that it does an off as well instead of rebooting. But as you > > wouldn't have an 'on' then the usefulness is a little questionable. If you > > want to keep pacemaker down on the sbd-fenced node I would rather go with > > SBD_STARTMODE=clean > > > > What you are actually experiencing I have to investigate though ... > > That makes absolutly sense. I guess thats the reason for the four minutes > wait. Just two times msgwait, two minutes for "off" and and two minutes for > "on" Makes sense in a way that what you are saying is probably the reason for the behaviour experienced. But 'onoff' doesn't make sense on a fencing device that is physically lacking the ability to turn anything 'on' again. Have to think over it but it might make sense to remove that mode from the fence-agent. > > > > And how did you configure this '-v' to fence_sbd. Could you tell me the line > > > in this file? > > > > Around line 114: > > > > 'cmd += " %s %s" % (command, arguments)' --> 'cmd += " %s %s -v" % (command, > > arguments)' > > > > I'll check that I get that in officially switchable ... > > switchable would be great! > > One other question, currently we are using ping resource (to ping default > gateway) to disable the sbd resource. The reason for that is, that we dont > want fecing from nodes with no working network access. It works quite fine, > but configuring timeouts for that is not clear to me. > [root@bsul0799 ~]# pcs resource show myservicegateways-clone > Clone: myservicegateways-clone > Resource: myservicegateways (class=ocf provider=pacemaker type=ping) > Attributes: dampen=20s multiplier=10000 host_list=10.41.92.1 > Operations: start interval=0s timeout=60 > (myservicegateways-start-interval-0s) > stop interval=0s timeout=20 > (myservicegateways-stop-interval-0s) > monitor interval=10 (myservicegateways-monitor-interval-10) > > > The dampen value is understable, wait 20s before doing anything. > But additionally it takes about 20 to 30 seconds for the cluster to detect > that the ping is not possible anymore > Example: > [root@bsul0798 ~]# iptables -I OUTPUT -d 'X.X.X.X' -p ICMP --icmp-type 8 -j > DROP;date > Wed Jul 5 15:29:18 CEST 2017 > ==> /var/log/cluster/corosync.log <== > Jul 05 15:29:42 [46253] bsul0798 attrd: info: attrd_peer_update: > Setting pingd[bsul0798a01]: 10000 -> 0 from bsul0798a01 > ==> /var/log/cluster/corosync.log <== > Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: > Diff: --- 0.211.82 2 > Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: > Diff: +++ 0.211.83 (null) > Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: + > /cib: @num_updates=83 > Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_perform_op: + > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/ > instance_attributes[@id='status-1']/nvpair[@id='status-1-pingd']: @value=0 > Jul 05 15:30:02 [46250] bsul0798 cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=bsul0799a01/attrd/13, version=0.211.83) > ... > > Whats the reason for the wait fron 15:29:18 to 15:29:42? default timeout is 20s, dampen is 20s, monitor-interval is 10s equals to a delay of 40-50s. (In reply to Klaus Wenninger from comment #19) > default timeout is 20s, > dampen is 20s, > monitor-interval is 10s > equals to a delay of 40-50s. Ok dampen and monitor-interval are understandable. But where comes the default timeout? Operations default? [root@bsul0798 ~]# pcs config ... Resources Defaults: resource-stickiness: 10000 migration-threshold: 2 Operations Defaults: timeout: 60s Cluster Properties: cluster-infrastructure: corosync cluster-name: multisite dc-version: 1.1.15-11.el7_3.2-e174ec8 default-action-timeout: 60s have-watchdog: true last-lrm-refresh: 1494411936 stonith-enabled: true stonith-timeout: 300s stonith-watchdog-timeout: 0 ... Its set to 60s in our case. (In reply to Marcel Fischer from comment #20) > (In reply to Klaus Wenninger from comment #19) > > default timeout is 20s, > > dampen is 20s, > > monitor-interval is 10s > > equals to a delay of 40-50s. > > Ok dampen and monitor-interval are understandable. But where comes the > default timeout? Operations default? > [root@bsul0798 ~]# pcs config > ... > Resources Defaults: > resource-stickiness: 10000 > migration-threshold: 2 > Operations Defaults: > timeout: 60s > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: multisite > dc-version: 1.1.15-11.el7_3.2-e174ec8 > default-action-timeout: 60s > have-watchdog: true > last-lrm-refresh: 1494411936 > stonith-enabled: true > stonith-timeout: 300s > stonith-watchdog-timeout: 0 > ... > > Its set to 60s in our case. If host_list is set it devides OCF_RESKEY_CRM_meta_timeout by the number of attempts and gives ping that much time for each attempt summing up again to OCF_RESKEY_CRM_meta_timeout. (In reply to Klaus Wenninger from comment #21) > > > > Its set to 60s in our case. > > If host_list is set it devides OCF_RESKEY_CRM_meta_timeout by the number of > attempts and gives ping that much time for each attempt summing up again to > OCF_RESKEY_CRM_meta_timeout. Okay confuses me a bit, but I will test some different settings to get a feeling for that. Just one more question regarding the reset triggered by sbd. Is it possible to change that to poweroff or halt? Probably with method=onoff, but I like to have an method only with "off". Because as we saw, onoff takes the double msgwait timeout to complete. (In reply to Marcel Fischer from comment #22) > Just one more question regarding the reset triggered by sbd. Is it possible > to change that to poweroff or halt? Probably with method=onoff, but I like > to have an method only with "off". Because as we saw, onoff takes the double > msgwait timeout to complete. Unfortunately the kernel UAPI seems not to offer a possibility to configure the watchdog in a standardized way regarding which action it should perform. Thus sbd can't do that setting for you based on the method configured. So you would have to go with proprietary tooling matching your watchdog-device or do it in the BIOS anyway. Thus I would suggest to keep it on cycle and configure the watchdog to actually trigger a shutdown if you prefer that. Probably it makes sense to take the 'onoff' out of the fence-agent to not foster expectations that can't be satisfied. (In reply to Klaus Wenninger from comment #23) > Thus I would suggest to keep it on cycle and configure the watchdog to > actually trigger a shutdown if you prefer that. > Probably it makes sense to take the 'onoff' out of the fence-agent to not > foster expectations that can't be satisfied. Currently we are using KVM watchdog (i6300esb). virsh dumpxml bsul0798 ... <watchdog model='i6300esb' action='poweroff'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </watchdog> virsh dumpxml bsul0799 <watchdog model='i6300esb' action='poweroff'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </watchdog> We already configured that for "poweroff". (In reply to Marcel Fischer from comment #24) > (In reply to Klaus Wenninger from comment #23) > > Thus I would suggest to keep it on cycle and configure the watchdog to > > actually trigger a shutdown if you prefer that. > > Probably it makes sense to take the 'onoff' out of the fence-agent to not > > foster expectations that can't be satisfied. > > Currently we are using KVM watchdog (i6300esb). > virsh dumpxml bsul0798 > ... > <watchdog model='i6300esb' action='poweroff'> > <alias name='watchdog0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x08' > function='0x0'/> > </watchdog> > virsh dumpxml bsul0799 > <watchdog model='i6300esb' action='poweroff'> > <alias name='watchdog0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x08' > function='0x0'/> > </watchdog> > > We already configured that for "poweroff". Btw. I have to revise my tip to use attribute method=cycle while having configure the watchdog to poweroff. As long as the reset/poweroff is really done by the watchdog that wouldn't make any difference but when receiving the stonith-command via block-device sbd tries to do that via sysrq and thus it is important which command is being sent by pacemaker. adding stonith-action=poweroff and having the fence-agent attribute method=onoff together with the watchdog config from your xml should have the desired effect. Although when I just tried to test it I still got a reboot of the fenced node instead of it being just powered off. Let me quickly investigate and come back to you. (In reply to Klaus Wenninger from comment #25) > Although when I just tried to test it I still got a reboot of the fenced > node instead of it being just powered off. Let me quickly investigate and > come back to you. This is a race between a shutdown being triggered via sysrq and a reboot that comes directly after that for the case access to sysrq wouldn't work properly. Had fixed that issue upstream already but missed to take it into 7.4. See bz1468580 (In reply to Klaus Wenninger from comment #26) > (In reply to Klaus Wenninger from comment #25) > > > Although when I just tried to test it I still got a reboot of the fenced > > node instead of it being just powered off. Let me quickly investigate and > > come back to you. > > This is a race between a shutdown being triggered via sysrq and a reboot > that comes directly after that for the case access to sysrq wouldn't work > properly. > Had fixed that issue upstream already but missed to take it into 7.4. > See bz1468580 Great thanks, Steffen Froemer gave me an updated rpm with your fix. But now I have a new problem. ==> /var/log/messages <== Jul 10 10:52:46 bsul0799 stonith-ng[2520]: notice: sbd-vglvmha-bsul0798a01 can fence (poweroff) bsul0798a01: dynamic-list Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Failed: Unrecognised action 'poweroff' Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Please use '-h' for usage ==> /var/log/cluster/corosync.log <== Jul 10 10:52:46 [2520] bsul0799 stonith-ng: warning: log_action: fence_sbd[24239] stderr: [ Failed: Unrecognised action 'poweroff' ] Jul 10 10:52:46 [2520] bsul0799 stonith-ng: warning: log_action: fence_sbd[24239] stderr: [ ] Jul 10 10:52:46 [2520] bsul0799 stonith-ng: warning: log_action: fence_sbd[24239] stderr: [ Please use '-h' for usage ] Jul 10 10:52:46 [2520] bsul0799 stonith-ng: warning: log_action: fence_sbd[24239] stderr: [ ] Jul 10 10:52:46 [2520] bsul0799 stonith-ng: info: internal_stonith_action_execute: Attempt 2 to execute fence_sbd (poweroff). remaining timeout is 300 [root@bsul0799 ~]# pcs property Cluster Properties: cluster-infrastructure: corosync cluster-name: multisite dc-version: 1.1.15-11.el7_3.2-e174ec8 default-action-timeout: 60s have-watchdog: true last-lrm-refresh: 1494411936 stonith-action: poweroff stonith-enabled: true stonith-timeout: 300s stonith-watchdog-timeout: 0 [root@bsul0799 ~]# pcs config ... Stonith Devices: Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd) Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=onoff power_timeout=200 Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s) stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s) monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s) Resource: sbd-vglvmha-bsul0799a01 (class=stonith type=fence_sbd) Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=5 port=bsul0799a01 plug=bsul0799a01 method=onoff power_timeout=200 Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0799a01-start-interval-0s) stop interval=0s timeout=60s (sbd-vglvmha-bsul0799a01-stop-interval-0s) monitor interval=60s (sbd-vglvmha-bsul0799a01-monitor-interval-60s) Without stonith-action it works better: pcs property unset stonith-action ... ==> /var/log/messages <== Jul 10 11:03:18 bsul0798 sbd[3496]: warning: inquisitor_child: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd requested a shutoff Jul 10 11:03:18 bsul0798 sbd[3496]: emerg: do_exit: Rebooting system: off [ 1514.207606] i6300esb: Unexpected close, not stopping watchdog! [ 1514.208969] SysRq : Power Off [ 1514.209723] sd 2:0:0:2: [sdb] Synchronizing SCSI cache [ 1514.210957] sd 2:0:0:3: [sda] Synchronizing SCSI cache Jul 10 11:03:18 bsul0798 kernel: i6300esb: Unexpected close, not stopping watchdog! Jul 10 11:03:18 bsul0798 kernel: SysRq : Power Off [ 1514.216329] ACPI: Preparing to enter system sleep state S5 [ 1514.217577] Power down. But the other node waits 2xmsg timeout because of that "off -> on" stuff. In our case around 4 minutes. Could we fix that and remove the "on" part after "off". As you said, it doesn't make much sense with watchdog devices. (In reply to Marcel Fischer from comment #27) > Great thanks, Steffen Froemer gave me an updated rpm with your fix. But now > I have a new problem. > > ==> /var/log/messages <== > Jul 10 10:52:46 bsul0799 stonith-ng[2520]: notice: sbd-vglvmha-bsul0798a01 > can fence (poweroff) bsul0798a01: dynamic-list > Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Failed: Unrecognised action > 'poweroff' > Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Please use '-h' for usage Hi Marcel, try 'shutdown' instead of poweroff. pcs stonith update sbd-fencing method=onoff pcs property set stonith-action=shutdown (In reply to Steffen Froemer from comment #29) > > pcs stonith update sbd-fencing method=onoff > pcs property set stonith-action=shutdown Sorry for that - was a misstake in my email. stonith-action=poweroff is fine and that is what I've tested (In reply to Marcel Fischer from comment #28) > Without stonith-action it works better: > pcs property unset stonith-action > > ... > > ==> /var/log/messages <== > Jul 10 11:03:18 bsul0798 sbd[3496]: warning: inquisitor_child: > /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd requested a shutoff > Jul 10 11:03:18 bsul0798 sbd[3496]: emerg: do_exit: Rebooting system: off > [ 1514.207606] i6300esb: Unexpected close, not stopping watchdog! > [ 1514.208969] SysRq : Power Off > [ 1514.209723] sd 2:0:0:2: [sdb] Synchronizing SCSI cache > [ 1514.210957] sd 2:0:0:3: [sda] Synchronizing SCSI cache > Jul 10 11:03:18 bsul0798 kernel: i6300esb: Unexpected close, not stopping > watchdog! > Jul 10 11:03:18 bsul0798 kernel: SysRq : Power Off > [ 1514.216329] ACPI: Preparing to enter system sleep state S5 > [ 1514.217577] Power down. > > But the other node waits 2xmsg timeout because of that "off -> on" stuff. In > our case around 4 minutes. > > Could we fix that and remove the "on" part after "off". As you said, it > doesn't make much sense with watchdog devices. This behaviour is understandable as removing the stonith-action leads to falling back to the default of reboot. So the behaviour that pacemaker tries to map that to a sequence of 'off', 'on' on a device that doesn't support 'reboot' (you set method='onoff') is correct. We have to get it working with stonith-action=poweroff - which it strangely does in my setup. Difference might be that I was testing with quite a current upstream-master-version of pacemaker. Although I consider this difference in behaviour with your older pacemaker quite unlikely. But I'll investigate ... (In reply to Klaus Wenninger from comment #31) > Difference might be that I was testing with quite a current > upstream-master-version of pacemaker. Although I consider this difference in > behaviour with your older pacemaker quite unlikely. But I'll investigate ... a misbehaviour of pacemaker in older versions is at least not known to anybody I've talked to ... I know that your config doesn't show it but would it be possible that you had pcmk_off_action set at the time when you did run your test? Or there was some debugging-code left in the fence-agent? Steffen Froemer provided me the newest versions, but still the same error: [root@bsul0799 ~]# rpm -qi pacemaker Name : pacemaker Version : 1.1.16 Release : 8.el7 Architecture: x86_64 [root@bsul0799 ~]# rpm -qi sbd Name : sbd Version : 1.3.0 Release : 3.shutdown_issue.0.el7 [root@bsul0799 ~]# rpm -qi fence-agents-sbd Name : fence-agents-sbd Version : 4.0.11 Release : 59.el7 ==> /var/log/messages <== Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: [ Failed: Unrecognised action 'poweroff' ] Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: [ ] Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: [ Please use '-h' for usage ] Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: [ ] Jul 10 14:12:20 bsul0799 stonith-ng[3916]: error: Operation 'poweroff' [6442] (call 12 from crmd.3920) for host 'bsul0798a01' with device 'sbd-vglvmha-bsul0798a01' returned: -95 (Operation not supported) Jul 10 14:12:20 bsul0799 stonith-ng[3916]: notice: Couldn't find anyone to fence (poweroff) bsul0798a01 with any device (In reply to Marcel Fischer from comment #33) > Steffen Froemer provided me the newest versions, but still the same error: > > [root@bsul0799 ~]# rpm -qi pacemaker > Name : pacemaker > Version : 1.1.16 > Release : 8.el7 > Architecture: x86_64 > > [root@bsul0799 ~]# rpm -qi sbd > Name : sbd > Version : 1.3.0 > Release : 3.shutdown_issue.0.el7 > > [root@bsul0799 ~]# rpm -qi fence-agents-sbd > Name : fence-agents-sbd > Version : 4.0.11 > Release : 59.el7 > > > ==> /var/log/messages <== > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > [ Failed: Unrecognised action 'poweroff' ] > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > [ ] > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > [ Please use '-h' for usage ] > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > [ ] > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: error: Operation 'poweroff' > [6442] (call 12 from crmd.3920) for host 'bsul0798a01' with device > 'sbd-vglvmha-bsul0798a01' returned: -95 (Operation not supported) > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: notice: Couldn't find anyone to > fence (poweroff) bsul0798a01 with any device Was assuming that you first would trigger a test-fencing using pcs. There is definitely a misbehaviour here regarding what it does when fencing is triggered via pcs or when pacemaker is triggering fencing. When I let pacemaker trigger fencing with your setting I get the same issue that poweroff is passed through to the fence-agent not understanding it. For now we can categorize it as kind of a documentation-issue although further analysis has to be done and there is definitely some inconsistency outside documentation. Anyway for now you should be able to set stonith-action=off. At least that lead to the desired result in my test-setup even if I let pacemaker trigger the fencing. (In reply to Klaus Wenninger from comment #34) > (In reply to Marcel Fischer from comment #33) > > Steffen Froemer provided me the newest versions, but still the same error: > > > > [root@bsul0799 ~]# rpm -qi pacemaker > > Name : pacemaker > > Version : 1.1.16 > > Release : 8.el7 > > Architecture: x86_64 > > > > [root@bsul0799 ~]# rpm -qi sbd > > Name : sbd > > Version : 1.3.0 > > Release : 3.shutdown_issue.0.el7 > > > > [root@bsul0799 ~]# rpm -qi fence-agents-sbd > > Name : fence-agents-sbd > > Version : 4.0.11 > > Release : 59.el7 > > > > > > ==> /var/log/messages <== > > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > > [ Failed: Unrecognised action 'poweroff' ] > > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > > [ ] > > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > > [ Please use '-h' for usage ] > > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: warning: fence_sbd[6442] stderr: > > [ ] > > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: error: Operation 'poweroff' > > [6442] (call 12 from crmd.3920) for host 'bsul0798a01' with device > > 'sbd-vglvmha-bsul0798a01' returned: -95 (Operation not supported) > > Jul 10 14:12:20 bsul0799 stonith-ng[3916]: notice: Couldn't find anyone to > > fence (poweroff) bsul0798a01 with any device > > Was assuming that you first would trigger a test-fencing using pcs. > There is definitely a misbehaviour here regarding what it does when fencing > is triggered via pcs or when pacemaker is triggering fencing. > When I let pacemaker trigger fencing with your setting I get the same issue > that poweroff is passed through to the fence-agent not understanding it. > For now we can categorize it as kind of a documentation-issue although > further analysis has to be done and there is definitely some inconsistency > outside documentation. > > Anyway for now you should be able to set stonith-action=off. > At least that lead to the desired result in my test-setup even if I let > pacemaker trigger the fencing. stonith-action=off leads to fencing triggered via pcs not working thus the better workaround is probably: cluster-property stonith-action=reboot (or empty as reboot this is the default) fencing-agent attribute pcmk_reboot_action=off (In reply to Klaus Wenninger from comment #35) > > stonith-action=off leads to fencing triggered via pcs not working > > thus the better workaround is probably: > > cluster-property stonith-action=reboot (or empty as reboot this is the > default) > fencing-agent attribute pcmk_reboot_action=off Yes, with pcmk_reboot_action=off it works. pcs config ... Stonith Devices: Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd) Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=cycle power_timeout=200 pcmk_reboot_action=off Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s) stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s) monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s) Resource: sbd-vglvmha-bsul0799a01 (class=stonith type=fence_sbd) Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=5 port=bsul0799a01 plug=bsul0799a01 method=cycle power_timeout=200 pcmk_reboot_action=off Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0799a01-start-interval-0s) stop interval=0s timeout=60s (sbd-vglvmha-bsul0799a01-stop-interval-0s) monitor interval=60s (sbd-vglvmha-bsul0799a01-monitor-interval-60s) Cluster Properties: cluster-infrastructure: corosync cluster-name: multisite dc-version: 1.1.16-8.el7-94ff4df default-action-timeout: 60s have-watchdog: true last-lrm-refresh: 1494411936 stonith-action: reboot stonith-enabled: true stonith-timeout: 300s stonith-watchdog-timeout: 0 Depending on how the disabling of a fencing-resource is done bz1474463 can be triggered. If the device is disabled with a queued action bz1470262 triggers as well. And it has to be noted that location-rules for fencing-resources may not contain score attributes or alike so that the resulting score doesn't depend on anything but the location-rule itself. This is important as the rule is just evaluated once by stonithd and no reevaluation e.g. required by a score-attribute having changed is triggered. Valid triggers are just changes in the resources and in the constraints section of the cib. And even this is just working properly when configured dynamically if the fixes for bz1474463 (target-role) & bz1454933 (location-rules) are included. The need for the fixes for bz1474463 & bz1454933 might be worked around (has to be verified): If location-rules are not created and deleted but score is switched e.g. between -INF and INF. If target-role is switched between stopped and started instead of deleting it for the latter case. Because this bug was a sort-of catch-all for the discussion of this customer's needs and concerns, I am trying to move individual points of concern to appropriate bugs that can be focused on that singular aspect. We'll continue to open new reports as anything else comes up. So for reference, before I work on closing this, here are the bugs that are relevant to this discussion or have spawned out of it: Bug #1454933 - Fencing occurs from a node even if fencing resource is banned from that node Bug #1470262 - disabling a fencing-device that has queued actions leads to stonithd receiving SIGABRT Bug #1474463 - fencing-device not properly registered after disable/enable cycle Bug #1474905 - stonith: dynamic enabling/disabling of stonith resources by rule-constraints Bug #1413573 - [RFE] qdevice: Include support for heuristics Bug #1474917 - pcs: Simplify configuration of sbd timeouts in various components Bug #1470813 - stonith: Be less susceptible to fence-agent internal timeout failures (power_timeout, login_timeout, shell_timeout) + Note: This hasn't been discussed here, but I'm including it as its relevant to the concerns around complexity of timeout configuration. If we work to de-enforce agent-internal timeout (and sbd's msgwait timeout in turn), then configuration of timeouts around sbd can be mostly done at the pacemaker level - reducing some complexity. I will follow up outside this bug on overall discussion of our path forward with this customer. If any points of possible product improvement or concern were not touched on by these bugs, please raise awareness of them. For now, I'm closing this out. |