1444020 – Improve SBD Storage Device Timeouts

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1444020 - Improve SBD Storage Device Timeouts

Summary: Improve SBD Storage Device Timeouts

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	sbd
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Klaus Wenninger
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1468580
Blocks:	1466531
TreeView+	depends on / blocked

Reported:	2017-04-20 12:44 UTC by Daniel Peess
Modified:	2020-12-14 08:33 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-07-25 16:53:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1474917	1	None	None	None	2021-01-20 06:05:38 UTC

Internal Links: 1474917

Description Daniel Peess 2017-04-20 12:44:50 UTC

Description of problem:
SBD Storage Timeouts are hard to configure properly.
Default timeouts are low for FC SAN DMMPIO environments.

Version-Release number of selected component (if applicable):
sbd-1.3.0-1.el7.x86_64.rpm
fence-agents-sbd-4.0.11-59.el7.x86_64.rpm
fence-agents-common-4.0.11-59.el7.x86_64.rpm

How reproducible:


[root@bsul0798 ~]# sbd -d '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd' create;
Initializing device /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd
Creating version 2.1 header on device 4 (uuid: 6861ea47-db29-410c-ae5a-d3264b094c42)
Initializing 255 slots on device 4
Device /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd is initialized.

## WARNING: These default timeouts will never work in the enterprise world!
## Check and document your own storage layer timeouts!
## And add a good amount of extra time for SAN latency on production workload.

[root@bsul0798 ~]# sbd -d '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd' dump;
==Dumping header on disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd
Header version     : 2.1
UUID               : 6861ea47-db29-410c-ae5a-d3264b094c42
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 10
==Header on disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd is dumped

## TODO: bug report: for any SBD disk creation msgwait setting above and including 20 seconds,
## i do get this warning after cluster and its SBD STONITH RAs have fully started:
## $ sbd -W -w '/dev/watchdog' -1 '60' -4 '20' -d '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd' create;
## stonith-ng:  warning: log_action:      fence_sbd[26973] stderr: [ power timeout needs to be                 greater then sbd message timeout ]
## this does not happen for values below and including 19 seconds, but it should be possible to use settings higher then the stonith watchdog timeout of 60 seconds without any warnings.

## "Stonith-timeout should be configured greater than msgwait time (for example ST=120sec and MSW=90sec), in order the cluster to consider the "power off" SBD message as delivered."
## https://forums.suse.com/archive/index.php/t-8700.html
## bsul0799 stonith-ng:  warning: log_action:      fence_sbd[19312] stderr: [ power timeout needs to be                 greater then sbd message timeout ]
## bsul0799 stonith-ng:  warning: log_action:      fence_sbd[19332] stderr: [ power timeout needs to be                 greater then sbd message timeout ]
## bsul0799 stonith-ng:  warning: log_action:      fence_sbd[19386] stderr: [ power timeout needs to be                 greater then sbd message timeout ]

Steps to Reproduce:
1. sbd -d '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd' create;
2. sbd -d '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd' dump;
3. sbd -W -w '/dev/watchdog' -1 '60' -4 '20' -d '/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd' create;

Actual results:
fence_sbd[19312] stderr: [ power timeout needs to be greater then sbd message timeout ]

Expected results:
this does not happen for values below and including 19 seconds, but it should be possible to use settings higher then the stonith watchdog timeout of 60 seconds without any warnings.

Additional info:
RHEL-HA 7.3 + all updates 2016-12 + RHEL 7.4 SBD

Comment 3 Klaus Wenninger 2017-04-21 07:21:29 UTC

Actually message-timeout should be set to >2x watchdog timeout.
So sbd should rather enforce this then proceed without any warnings.
Anyway the warning you've observed is coming from the fence-agent where power-timeout is used as timeout in the generic part (fencing.py).
So it would actually make sense to raise that if configuring higher values for mesage-timeout.
Have you tried to do so?
Of couse it would be argueable as we have message-timeout in fence_sbd to use this rather as a timeout.

Comment 4 Klaus Wenninger 2017-04-21 12:28:35 UTC

Thinking over it again there is of course kind of a chicken and egg issue with using message-timeout as a general timeout here as it is being read from the device and that reading might timeout as well.
Checking and using cluster-property stonith-watchdog-timeout might make more sense. Maybe just throw a warning and not bail out with an error as setting that is just strongly recommended but not a hard must.

Comment 14 Marcel Fischer 2017-07-04 15:04:39 UTC

Hi,

we set power_timeout 180 and the warning disappeared. Now currently the timeout values are like these:
Stonith Devices:
 Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=onoff power_timeout=200
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s)
 Resource: sbd-vglvmha-bsul0799a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=5 port=bsul0799a01 plug=bsul0799a01 method=onoff power_timeout=200
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0799a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0799a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0799a01-monitor-interval-60s)

and...

[root@bsul0799 ~]# sbd -d /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd dump
==Dumping header on disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd
Header version     : 2.1
UUID               : d3ef8e16-3a60-4e93-8b38-a198ccdb25fe
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 60
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 120
==Header on disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd is dumped

and...

[root@bsul0799 ~]# pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: multisite
 dc-version: 1.1.15-11.el7_3.2-e174ec8
 default-action-timeout: 60s
 have-watchdog: true
 last-lrm-refresh: 1494411936
 stonith-enabled: true
 stonith-timeout: 300s
 stonith-watchdog-timeout: 0

Additionally we set the following:
cat /etc/corosync/corosync.conf
...
    token: 110000
    consensus: 5000
...

As Daniel Peess said in Bug 1449155 we need to disable the fencing resources based on a ping resource to our network gateway.
Thats the reason for the high token value. Because we need to wait for the pingd resource to disable or stop the fence resource.
With all that values, I looks like that everything is working. But we have some strange wait times:

Jul 04 16:48:10 bsul0799 corosync[1258]:  [TOTEM ] A processor failed, forming new configuration.
Jul 04 16:48:15 bsul0799 corosync[1258]:  [TOTEM ] A new membership (10.40.221.22:2492) was formed. Members left: 1
Jul 04 16:48:15 bsul0799 corosync[1258]:  [TOTEM ] Failed to receive the leave message. failed: 1
Jul 04 16:48:15 bsul0799 corosync[1258]:  [QUORUM] Members[1]: 2
Jul 04 16:48:15 bsul0799 corosync[1258]:  [MAIN  ] Completed service synchronization, ready to provide service.
Jul 04 16:48:15 bsul0799 stonith-ng[1611]:   notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 stonith-ng[1611]:   notice: Purged 1 peers with id=1 and/or uname=bsul0798a01 from the membership cache
Jul 04 16:48:15 bsul0799 pacemakerd[1585]:   notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 attrd[1614]:   notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 attrd[1614]:   notice: Removing all bsul0798a01 attributes for peer loss
Jul 04 16:48:15 bsul0799 attrd[1614]:   notice: Lost attribute writer bsul0798a01
Jul 04 16:48:15 bsul0799 attrd[1614]:   notice: Purged 1 peers with id=1 and/or uname=bsul0798a01 from the membership cache
Jul 04 16:48:15 bsul0799 crmd[1618]:   notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 crmd[1618]:  warning: Our DC node (bsul0798a01) left the cluster
Jul 04 16:48:15 bsul0799 cib[1610]:   notice: Node bsul0798a01 state is now lost
Jul 04 16:48:15 bsul0799 cib[1610]:   notice: Purged 1 peers with id=1 and/or uname=bsul0798a01 from the membership cache
Jul 04 16:48:15 bsul0799 crmd[1618]:   notice: State transition S_NOT_DC -> S_ELECTION
Jul 04 16:48:15 bsul0799 crmd[1618]:   notice: State transition S_ELECTION -> S_INTEGRATION
Jul 04 16:48:15 bsul0799 crmd[1618]:  warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
Jul 04 16:48:15 bsul0799 crmd[1618]:   notice: Watchdog may be enabled but stonith-watchdog-timeout is disabled: 0
Jul 04 16:48:16 bsul0799 pengine[1616]:   notice: Relying on watchdog integration for fencing
Jul 04 16:48:16 bsul0799 pengine[1616]:  warning: Node bsul0798a01 will be fenced because the node is no longer part of the cluster
Jul 04 16:48:16 bsul0799 pengine[1616]:  warning: Node bsul0798a01 is unclean
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action halvmvg_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action fsvarwwwhtml_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action vip1_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action httpd_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action myservicegateways:1_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action myservicegateways:1_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action sbd-vglvmha-bsul0799a01_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Action sbd-vglvmha-bsul0799a01_stop_0 on bsul0798a01 is unrunnable (offline)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Scheduling Node bsul0798a01 for STONITH
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Move    halvmvg        (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Move    fsvarwwwhtml        (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Move    vip1        (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Move    httpd        (Started bsul0798a01 -> bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Stop    dummy        (bsul0799a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Stop    myservicegateways:1        (bsul0798a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:   notice: Stop    sbd-vglvmha-bsul0799a01        (bsul0798a01)
Jul 04 16:48:17 bsul0799 pengine[1616]:  warning: Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-54.bz2
Jul 04 16:48:17 bsul0799 crmd[1618]:   notice: Initiating stop operation dummy_stop_0 locally on bsul0799a01
Jul 04 16:48:17 bsul0799 crmd[1618]:   notice: Requesting fencing (reboot) of node bsul0798a01
Jul 04 16:48:17 bsul0799 stonith-ng[1611]:   notice: Client crmd.1618.ffe5e286 wants to fence (reboot) 'bsul0798a01' with device '(any)'
Jul 04 16:48:17 bsul0799 stonith-ng[1611]:   notice: Requesting peer fencing (reboot) of bsul0798a01
Jul 04 16:48:17 bsul0799 crmd[1618]:   notice: Result of stop operation for dummy on bsul0799a01: 0 (ok)
Jul 04 16:48:17 bsul0799 stonith-ng[1611]:   notice: sbd-vglvmha-bsul0798a01 can fence (reboot) bsul0798a01: dynamic-list
...
Jul 04 16:52:28 bsul0799 stonith-ng[1611]:   notice: Operation 'reboot' [4556] (call 2 from crmd.1618) for host 'bsul0798a01' with device 'sbd-vglvmha-bsul0798a01' returned: 0 (OK)
Jul 04 16:52:28 bsul0799 stonith-ng[1611]:   notice: Operation reboot of bsul0798a01 by bsul0799a01 for crmd.1618: OK
Jul 04 16:52:28 bsul0799 crmd[1618]:   notice: Stonith operation 2/37:0:0:ce07d574-a428-4c34-ab75-c25e4b58ebeb: OK (0)
Jul 04 16:52:28 bsul0799 crmd[1618]:   notice: Peer bsul0798a01 was terminated (reboot) by bsul0799a01 for bsul0799a01: OK (ref=5efd041a-6e6a-49be-864e-3f7182a92ffe) by client crmd.1618

So it took about 4 Minutes to receive an OK message from sbd. Does this have to do with the high token value?

Comment 15 Klaus Wenninger 2017-07-04 19:01:37 UTC

(In reply to Marcel Fischer from comment #14)
> 
> So it took about 4 Minutes to receive an OK message from sbd. Does this have
> to do with the high token value?

In my test I didn't experience the time for fencing via sbd enlonged by the token-timeout.
My token-timeout is 10s and my msgwait-timeout 10s as well.
With '-v' option hacked into generate_sbd_command in /usr/sbin/fence_sbd the log shows just 10s between stonith-ng finding the device and getting the execution-ok.

Do you get more delay than the msgwait-timeout with a shorter token-timeout as well?

Jul  4 20:47:32 localhost stonith-ng[17416]:  notice: sbd-fencing can fence (reboot) remote_node1: dynamic-list
Jul  4 20:47:32 localhost sbd[18419]:    info: main: Verbose mode enabled.
Jul  4 20:47:32 localhost sbd[18419]:    info: main: Watchdog enabled.
Jul  4 20:47:32 localhost sbd[18421]:    info: main: Verbose mode enabled.
Jul  4 20:47:32 localhost sbd[18421]:    info: main: Watchdog enabled.
Jul  4 20:47:32 localhost sbd[18423]:    info: main: Verbose mode enabled.
Jul  4 20:47:32 localhost sbd[18423]:    info: main: Watchdog enabled.
Jul  4 20:47:32 localhost sbd[18424]:    info: sbd_make_realtime: Scheduler priority is now 99
Jul  4 20:47:32 localhost sbd[18424]:    info: sbd_memlock: Locked ourselves in memory
Jul  4 20:47:32 localhost sbd[18424]:  /dev/vdb:     info: slot_msg_wrapper: Delivery process handling /dev/vdb
Jul  4 20:47:32 localhost sbd[18424]:  /dev/vdb:     info: slot_msg: Device UUID: aea922dc-0da6-4e43-b9e2-5ab550a3f453
Jul  4 20:47:32 localhost sbd[18424]:  /dev/vdb:     info: slot_lookup: remote_node1 owns slot 2
Jul  4 20:47:32 localhost sbd[18424]:  /dev/vdb:     info: slot_msg: Writing reset to node slot remote_node1
Jul  4 20:47:32 localhost sbd[18424]:  /dev/vdb:     info: slot_msg: Messaging delay: 10
Jul  4 20:47:42 localhost sbd[18424]:  /dev/vdb:     info: slot_msg: reset successfully delivered to remote_node1
Jul  4 20:47:42 localhost sbd[18423]:    info: messenger: Process 18424 succeeded.
Jul  4 20:47:42 localhost sbd[18423]:    info: messenger: Message successfully delivered.
Jul  4 20:47:42 localhost stonith-ng[17416]:  notice: Operation 'reboot' [18412] (call 2 from stonith_admin.18399) for host 'remote_node1' with device 'sbd-fencing' returned: 0 (OK)

Comment 16 Marcel Fischer 2017-07-05 10:05:46 UTC

(In reply to Klaus Wenninger from comment #15)
> (In reply to Marcel Fischer from comment #14)
> > 
> > So it took about 4 Minutes to receive an OK message from sbd. Does this have
> > to do with the high token value?
> 
> In my test I didn't experience the time for fencing via sbd enlonged by the
> token-timeout.
> My token-timeout is 10s and my msgwait-timeout 10s as well.
> With '-v' option hacked into generate_sbd_command in /usr/sbin/fence_sbd the
> log shows just 10s between stonith-ng finding the device and getting the
> execution-ok.
> 
> Do you get more delay than the msgwait-timeout with a shorter token-timeout
> as well?

Well it seems token-timeout has no influence. Some time ago I changed the method parameter of the sbd device to onoff:
[root@bsul0799 ~]# pcs stonith show sbd-vglvmha-bsul0798a01
 Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=onoff power_timeout=200
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s)

I changed this back to "cycle" as configured by Daniel Peess. Now I have around 2 minutes until an OK is return by sbd.
...
Jul 05 11:42:04 [38583] bsul0799 stonith-ng:     info: call_remote_stonith:     Total timeout set to 300 for peer's fencing of bsul0798a01 for crmd.38587|id=388e13c4-f549-4c87-ac50-5acbe7fb148d
Jul 05 11:42:04 [38583] bsul0799 stonith-ng:     info: call_remote_stonith:     Requesting that 'bsul0799a01' perform op 'bsul0798a01 reboot' for crmd.38587 (360s, 0s)
Jul 05 11:42:04 [38583] bsul0799 stonith-ng:   notice: can_fence_host_with_device:      sbd-vglvmha-bsul0798a01 can fence (reboot) bsul0798a01: dynamic-list
Jul 05 11:42:04 [38583] bsul0799 stonith-ng:     info: stonith_fence_get_devices_cb:    Found 1 matching devices for 'bsul0798a01'
...
Jul 05 11:44:15 [38583] bsul0799 stonith-ng:   notice: log_operation:   Operation 'reboot' [26596] (call 4 from crmd.38587) for host 'bsul0798a01' with device 'sbd-vglvmha-bsul0798a01' returned: 0 (OK)

This would match with the msgwait-timeout of sbd

Could you tell me the difference between those two methods?
And how did you configure this '-v' to fence_sbd. Could you tell me the line in this file?

Comment 17 Klaus Wenninger 2017-07-05 10:25:42 UTC

(In reply to Marcel Fischer from comment #16)

> 
> Could you tell me the difference between those two methods?

Guess for 'onoff' configuration to make sense your watchdog would have to be set in a way that it does an off as well instead of rebooting. But as you wouldn't have an 'on' then the usefulness is a little questionable. If you want to keep pacemaker down on the sbd-fenced node I would rather go with SBD_STARTMODE=clean
 
What you are actually experiencing I have to investigate though ...

> And how did you configure this '-v' to fence_sbd. Could you tell me the line
> in this file?

Around line 114:

'cmd += " %s %s" % (command, arguments)' --> 'cmd += " %s %s -v" % (command, arguments)'

I'll check that I get that in officially switchable ...

Comment 18 Marcel Fischer 2017-07-05 13:31:38 UTC

(In reply to Klaus Wenninger from comment #17)
> (In reply to Marcel Fischer from comment #16)
> 
> > 
> > Could you tell me the difference between those two methods?
> 
> Guess for 'onoff' configuration to make sense your watchdog would have to be
> set in a way that it does an off as well instead of rebooting. But as you
> wouldn't have an 'on' then the usefulness is a little questionable. If you
> want to keep pacemaker down on the sbd-fenced node I would rather go with
> SBD_STARTMODE=clean
>  
> What you are actually experiencing I have to investigate though ...

That makes absolutly sense. I guess thats the reason for the four minutes wait. Just two times msgwait, two minutes for "off" and and two minutes for "on"

> > And how did you configure this '-v' to fence_sbd. Could you tell me the line
> > in this file?
> 
> Around line 114:
> 
> 'cmd += " %s %s" % (command, arguments)' --> 'cmd += " %s %s -v" % (command,
> arguments)'
> 
> I'll check that I get that in officially switchable ...

switchable would be great!

One other question, currently we are using ping resource (to ping default gateway) to disable the sbd resource. The reason for that is, that we dont want fecing from nodes with no working network access. It works quite fine, but configuring timeouts for that is not clear to me.
[root@bsul0799 ~]# pcs resource show myservicegateways-clone
 Clone: myservicegateways-clone
  Resource: myservicegateways (class=ocf provider=pacemaker type=ping)
   Attributes: dampen=20s multiplier=10000 host_list=10.41.92.1
   Operations: start interval=0s timeout=60 (myservicegateways-start-interval-0s)
               stop interval=0s timeout=20 (myservicegateways-stop-interval-0s)
               monitor interval=10 (myservicegateways-monitor-interval-10)


The dampen value is understable, wait 20s before doing anything.
But additionally it takes about 20 to 30 seconds for the cluster to detect that the ping is not possible anymore
Example:
[root@bsul0798 ~]# iptables -I OUTPUT -d 'X.X.X.X' -p ICMP --icmp-type 8 -j DROP;date
Wed Jul  5 15:29:18 CEST 2017
==> /var/log/cluster/corosync.log <==
Jul 05 15:29:42 [46253] bsul0798      attrd:     info: attrd_peer_update:       Setting pingd[bsul0798a01]: 10000 -> 0 from bsul0798a01
==> /var/log/cluster/corosync.log <==
Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op:  Diff: --- 0.211.82 2
Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op:  Diff: +++ 0.211.83 (null)
Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op:  +  /cib:  @num_updates=83
Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op:  +  /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-pingd']:  @value=0
Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_process_request:     Completed cib_modify operation for section status: OK (rc=0, origin=bsul0799a01/attrd/13, version=0.211.83)
...

Whats the reason for the wait fron 15:29:18 to 15:29:42?

Comment 19 Klaus Wenninger 2017-07-05 13:59:54 UTC

(In reply to Marcel Fischer from comment #18)
> (In reply to Klaus Wenninger from comment #17)
> > (In reply to Marcel Fischer from comment #16)
> > 
> > > 
> > > Could you tell me the difference between those two methods?
> > 
> > Guess for 'onoff' configuration to make sense your watchdog would have to be
> > set in a way that it does an off as well instead of rebooting. But as you
> > wouldn't have an 'on' then the usefulness is a little questionable. If you
> > want to keep pacemaker down on the sbd-fenced node I would rather go with
> > SBD_STARTMODE=clean
> >  
> > What you are actually experiencing I have to investigate though ...
> 
> That makes absolutly sense. I guess thats the reason for the four minutes
> wait. Just two times msgwait, two minutes for "off" and and two minutes for
> "on"

Makes sense in a way that what you are saying is probably the reason for the behaviour experienced.
But 'onoff' doesn't make sense on a fencing device that is physically lacking the ability to turn anything 'on' again.
Have to think over it but it might make sense to remove that mode from the fence-agent.

> 
> > > And how did you configure this '-v' to fence_sbd. Could you tell me the line
> > > in this file?
> > 
> > Around line 114:
> > 
> > 'cmd += " %s %s" % (command, arguments)' --> 'cmd += " %s %s -v" % (command,
> > arguments)'
> > 
> > I'll check that I get that in officially switchable ...
> 
> switchable would be great!
> 
> One other question, currently we are using ping resource (to ping default
> gateway) to disable the sbd resource. The reason for that is, that we dont
> want fecing from nodes with no working network access. It works quite fine,
> but configuring timeouts for that is not clear to me.
> [root@bsul0799 ~]# pcs resource show myservicegateways-clone
>  Clone: myservicegateways-clone
>   Resource: myservicegateways (class=ocf provider=pacemaker type=ping)
>    Attributes: dampen=20s multiplier=10000 host_list=10.41.92.1
>    Operations: start interval=0s timeout=60
> (myservicegateways-start-interval-0s)
>                stop interval=0s timeout=20
> (myservicegateways-stop-interval-0s)
>                monitor interval=10 (myservicegateways-monitor-interval-10)
> 
> 
> The dampen value is understable, wait 20s before doing anything.
> But additionally it takes about 20 to 30 seconds for the cluster to detect
> that the ping is not possible anymore
> Example:
> [root@bsul0798 ~]# iptables -I OUTPUT -d 'X.X.X.X' -p ICMP --icmp-type 8 -j
> DROP;date
> Wed Jul  5 15:29:18 CEST 2017
> ==> /var/log/cluster/corosync.log <==
> Jul 05 15:29:42 [46253] bsul0798      attrd:     info: attrd_peer_update:   
> Setting pingd[bsul0798a01]: 10000 -> 0 from bsul0798a01
> ==> /var/log/cluster/corosync.log <==
> Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op: 
> Diff: --- 0.211.82 2
> Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op: 
> Diff: +++ 0.211.83 (null)
> Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op:  + 
> /cib:  @num_updates=83
> Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_perform_op:  + 
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/
> instance_attributes[@id='status-1']/nvpair[@id='status-1-pingd']:  @value=0
> Jul 05 15:30:02 [46250] bsul0798        cib:     info: cib_process_request: 
> Completed cib_modify operation for section status: OK (rc=0,
> origin=bsul0799a01/attrd/13, version=0.211.83)
> ...
> 
> Whats the reason for the wait fron 15:29:18 to 15:29:42?

default timeout is 20s,
dampen is 20s,
monitor-interval is 10s
equals to a delay of 40-50s.

Comment 20 Marcel Fischer 2017-07-05 14:44:44 UTC

(In reply to Klaus Wenninger from comment #19)
> default timeout is 20s,
> dampen is 20s,
> monitor-interval is 10s
> equals to a delay of 40-50s.

Ok dampen and monitor-interval are understandable. But where comes the default timeout? Operations default?
[root@bsul0798 ~]# pcs config
...
Resources Defaults:
 resource-stickiness: 10000
 migration-threshold: 2
Operations Defaults:
 timeout: 60s

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: multisite
 dc-version: 1.1.15-11.el7_3.2-e174ec8
 default-action-timeout: 60s
 have-watchdog: true
 last-lrm-refresh: 1494411936
 stonith-enabled: true
 stonith-timeout: 300s
 stonith-watchdog-timeout: 0
...

Its set to 60s in our case.

Comment 21 Klaus Wenninger 2017-07-05 14:55:17 UTC

(In reply to Marcel Fischer from comment #20)
> (In reply to Klaus Wenninger from comment #19)
> > default timeout is 20s,
> > dampen is 20s,
> > monitor-interval is 10s
> > equals to a delay of 40-50s.
> 
> Ok dampen and monitor-interval are understandable. But where comes the
> default timeout? Operations default?
> [root@bsul0798 ~]# pcs config
> ...
> Resources Defaults:
>  resource-stickiness: 10000
>  migration-threshold: 2
> Operations Defaults:
>  timeout: 60s
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: multisite
>  dc-version: 1.1.15-11.el7_3.2-e174ec8
>  default-action-timeout: 60s
>  have-watchdog: true
>  last-lrm-refresh: 1494411936
>  stonith-enabled: true
>  stonith-timeout: 300s
>  stonith-watchdog-timeout: 0
> ...
> 
> Its set to 60s in our case.

If host_list is set it devides OCF_RESKEY_CRM_meta_timeout by the number of attempts and gives ping that much time for each attempt summing up again to OCF_RESKEY_CRM_meta_timeout.

Comment 22 Marcel Fischer 2017-07-06 14:18:12 UTC

(In reply to Klaus Wenninger from comment #21)
> > 
> > Its set to 60s in our case.
> 
> If host_list is set it devides OCF_RESKEY_CRM_meta_timeout by the number of
> attempts and gives ping that much time for each attempt summing up again to
> OCF_RESKEY_CRM_meta_timeout.

Okay confuses me a bit, but I will test some different settings to get a feeling for that.

Just one more question regarding the reset triggered by sbd. Is it possible to change that to poweroff or halt? Probably with method=onoff, but I like to have an method only with "off". Because as we saw, onoff takes the double msgwait timeout to complete.

Comment 23 Klaus Wenninger 2017-07-06 16:16:21 UTC

(In reply to Marcel Fischer from comment #22)

> Just one more question regarding the reset triggered by sbd. Is it possible
> to change that to poweroff or halt? Probably with method=onoff, but I like
> to have an method only with "off". Because as we saw, onoff takes the double
> msgwait timeout to complete.

Unfortunately the kernel UAPI seems not to offer a possibility to configure the watchdog in a standardized way regarding which action it should perform.
Thus sbd can't do that setting for you based on the method configured.
So you would have to go with proprietary tooling matching your watchdog-device or do it in the BIOS anyway.
Thus I would suggest to keep it on cycle and configure the watchdog to actually trigger a shutdown if you prefer that.
Probably it makes sense to take the 'onoff' out of the fence-agent to not foster expectations that can't be satisfied.

Comment 24 Marcel Fischer 2017-07-07 07:17:25 UTC

(In reply to Klaus Wenninger from comment #23)
> Thus I would suggest to keep it on cycle and configure the watchdog to
> actually trigger a shutdown if you prefer that.
> Probably it makes sense to take the 'onoff' out of the fence-agent to not
> foster expectations that can't be satisfied.

Currently we are using KVM watchdog (i6300esb).
virsh dumpxml bsul0798
...
    <watchdog model='i6300esb' action='poweroff'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </watchdog>
virsh dumpxml bsul0799
    <watchdog model='i6300esb' action='poweroff'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </watchdog>

We already configured that for "poweroff".

Comment 25 Klaus Wenninger 2017-07-07 11:33:42 UTC

(In reply to Marcel Fischer from comment #24)
> (In reply to Klaus Wenninger from comment #23)
> > Thus I would suggest to keep it on cycle and configure the watchdog to
> > actually trigger a shutdown if you prefer that.
> > Probably it makes sense to take the 'onoff' out of the fence-agent to not
> > foster expectations that can't be satisfied.
> 
> Currently we are using KVM watchdog (i6300esb).
> virsh dumpxml bsul0798
> ...
>     <watchdog model='i6300esb' action='poweroff'>
>       <alias name='watchdog0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x08'
> function='0x0'/>
>     </watchdog>
> virsh dumpxml bsul0799
>     <watchdog model='i6300esb' action='poweroff'>
>       <alias name='watchdog0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x08'
> function='0x0'/>
>     </watchdog>
> 
> We already configured that for "poweroff".

Btw. I have to revise my tip to use attribute method=cycle while having configure the watchdog to poweroff.
As long as the reset/poweroff is really done by the watchdog that wouldn't make any difference but when receiving the stonith-command via block-device sbd tries to do that via sysrq and thus it is important which command is being sent by pacemaker.

adding stonith-action=poweroff
and having the fence-agent attribute method=onoff

together with the watchdog config from your xml should have the desired effect.
Although when I just tried to test it I still got a reboot of the fenced node instead of it being just powered off. Let me quickly investigate and come back to you.

Comment 26 Klaus Wenninger 2017-07-07 12:56:44 UTC

(In reply to Klaus Wenninger from comment #25)

> Although when I just tried to test it I still got a reboot of the fenced
> node instead of it being just powered off. Let me quickly investigate and
> come back to you.

This is a race between a shutdown being triggered via sysrq and a reboot that comes directly after that for the case access to sysrq wouldn't work properly.
Had fixed that issue upstream already but missed to take it into 7.4.
See bz1468580

Comment 27 Marcel Fischer 2017-07-10 08:56:12 UTC

(In reply to Klaus Wenninger from comment #26)
> (In reply to Klaus Wenninger from comment #25)
> 
> > Although when I just tried to test it I still got a reboot of the fenced
> > node instead of it being just powered off. Let me quickly investigate and
> > come back to you.
> 
> This is a race between a shutdown being triggered via sysrq and a reboot
> that comes directly after that for the case access to sysrq wouldn't work
> properly.
> Had fixed that issue upstream already but missed to take it into 7.4.
> See bz1468580

Great thanks, Steffen Froemer gave me an updated rpm with your fix. But now I have a new problem.

==> /var/log/messages <==
Jul 10 10:52:46 bsul0799 stonith-ng[2520]:   notice: sbd-vglvmha-bsul0798a01 can fence (poweroff) bsul0798a01: dynamic-list
Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Failed: Unrecognised action 'poweroff'
Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Please use '-h' for usage

==> /var/log/cluster/corosync.log <==
Jul 10 10:52:46 [2520] bsul0799 stonith-ng:  warning: log_action:       fence_sbd[24239] stderr: [ Failed: Unrecognised action 'poweroff' ]
Jul 10 10:52:46 [2520] bsul0799 stonith-ng:  warning: log_action:       fence_sbd[24239] stderr: [  ]
Jul 10 10:52:46 [2520] bsul0799 stonith-ng:  warning: log_action:       fence_sbd[24239] stderr: [ Please use '-h' for usage ]
Jul 10 10:52:46 [2520] bsul0799 stonith-ng:  warning: log_action:       fence_sbd[24239] stderr: [  ]
Jul 10 10:52:46 [2520] bsul0799 stonith-ng:     info: internal_stonith_action_execute:  Attempt 2 to execute fence_sbd (poweroff). remaining timeout is 300

[root@bsul0799 ~]# pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: multisite
 dc-version: 1.1.15-11.el7_3.2-e174ec8
 default-action-timeout: 60s
 have-watchdog: true
 last-lrm-refresh: 1494411936
 stonith-action: poweroff
 stonith-enabled: true
 stonith-timeout: 300s
 stonith-watchdog-timeout: 0

[root@bsul0799 ~]# pcs config
...
Stonith Devices:
 Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=onoff power_timeout=200
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s)
 Resource: sbd-vglvmha-bsul0799a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=5 port=bsul0799a01 plug=bsul0799a01 method=onoff power_timeout=200
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0799a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0799a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0799a01-monitor-interval-60s)

Comment 28 Marcel Fischer 2017-07-10 09:09:21 UTC

Without stonith-action it works better:
pcs property unset stonith-action

...

==> /var/log/messages <==
Jul 10 11:03:18 bsul0798 sbd[3496]: warning: inquisitor_child: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd requested a shutoff
Jul 10 11:03:18 bsul0798 sbd[3496]:   emerg: do_exit: Rebooting system: off
[ 1514.207606] i6300esb: Unexpected close, not stopping watchdog!
[ 1514.208969] SysRq : Power Off
[ 1514.209723] sd 2:0:0:2: [sdb] Synchronizing SCSI cache
[ 1514.210957] sd 2:0:0:3: [sda] Synchronizing SCSI cache
Jul 10 11:03:18 bsul0798 kernel: i6300esb: Unexpected close, not stopping watchdog!
Jul 10 11:03:18 bsul0798 kernel: SysRq : Power Off
[ 1514.216329] ACPI: Preparing to enter system sleep state S5
[ 1514.217577] Power down.

But the other node waits 2xmsg timeout because of that "off -> on" stuff. In our case around 4 minutes.

Could we fix that and remove the "on" part after "off". As you said, it doesn't make much sense with watchdog devices.

Comment 29 Steffen Froemer 2017-07-10 09:33:09 UTC

(In reply to Marcel Fischer from comment #27)

> Great thanks, Steffen Froemer gave me an updated rpm with your fix. But now
> I have a new problem.
> 
> ==> /var/log/messages <==
> Jul 10 10:52:46 bsul0799 stonith-ng[2520]:   notice: sbd-vglvmha-bsul0798a01
> can fence (poweroff) bsul0798a01: dynamic-list
> Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Failed: Unrecognised action
> 'poweroff'
> Jul 10 10:52:46 bsul0799 fence_sbd[24239]: Please use '-h' for usage

Hi Marcel, try 'shutdown' instead of poweroff.

pcs stonith update sbd-fencing method=onoff
pcs property set stonith-action=shutdown

Comment 30 Klaus Wenninger 2017-07-10 09:36:51 UTC

(In reply to Steffen Froemer from comment #29)

> 
> pcs stonith update sbd-fencing method=onoff
> pcs property set stonith-action=shutdown

Sorry for that - was a misstake in my email.
stonith-action=poweroff is fine and that is what I've tested

Comment 31 Klaus Wenninger 2017-07-10 09:41:49 UTC

(In reply to Marcel Fischer from comment #28)
> Without stonith-action it works better:
> pcs property unset stonith-action
> 
> ...
> 
> ==> /var/log/messages <==
> Jul 10 11:03:18 bsul0798 sbd[3496]: warning: inquisitor_child:
> /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd requested a shutoff
> Jul 10 11:03:18 bsul0798 sbd[3496]:   emerg: do_exit: Rebooting system: off
> [ 1514.207606] i6300esb: Unexpected close, not stopping watchdog!
> [ 1514.208969] SysRq : Power Off
> [ 1514.209723] sd 2:0:0:2: [sdb] Synchronizing SCSI cache
> [ 1514.210957] sd 2:0:0:3: [sda] Synchronizing SCSI cache
> Jul 10 11:03:18 bsul0798 kernel: i6300esb: Unexpected close, not stopping
> watchdog!
> Jul 10 11:03:18 bsul0798 kernel: SysRq : Power Off
> [ 1514.216329] ACPI: Preparing to enter system sleep state S5
> [ 1514.217577] Power down.
> 
> But the other node waits 2xmsg timeout because of that "off -> on" stuff. In
> our case around 4 minutes.
> 
> Could we fix that and remove the "on" part after "off". As you said, it
> doesn't make much sense with watchdog devices.

This behaviour is understandable as removing the stonith-action leads to falling back to the default of reboot.
So the behaviour that pacemaker tries to map that to a sequence of 'off', 'on' on a device that doesn't support 'reboot' (you set method='onoff') is correct.
We have to get it working with stonith-action=poweroff - which it strangely does in my setup.
Difference might be that I was testing with quite a current upstream-master-version of pacemaker. Although I consider this difference in behaviour with your older pacemaker quite unlikely. But I'll investigate ...

Comment 32 Klaus Wenninger 2017-07-10 11:28:51 UTC

(In reply to Klaus Wenninger from comment #31)

> Difference might be that I was testing with quite a current
> upstream-master-version of pacemaker. Although I consider this difference in
> behaviour with your older pacemaker quite unlikely. But I'll investigate ...

a misbehaviour of pacemaker in older versions is at least not known to anybody I've talked to ...
I know that your config doesn't show it but would it be possible that you had pcmk_off_action set at the time when you did run your test?
Or there was some debugging-code left in the fence-agent?

Comment 33 Marcel Fischer 2017-07-10 12:23:41 UTC

Steffen Froemer provided me the newest versions, but still the same error:

[root@bsul0799 ~]# rpm -qi pacemaker
Name        : pacemaker
Version     : 1.1.16
Release     : 8.el7
Architecture: x86_64

[root@bsul0799 ~]# rpm -qi sbd
Name        : sbd
Version     : 1.3.0
Release     : 3.shutdown_issue.0.el7

[root@bsul0799 ~]# rpm -qi fence-agents-sbd
Name        : fence-agents-sbd
Version     : 4.0.11
Release     : 59.el7


==> /var/log/messages <==
Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr: [ Failed: Unrecognised action 'poweroff' ]
Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr: [  ]
Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr: [ Please use '-h' for usage ]
Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr: [  ]
Jul 10 14:12:20 bsul0799 stonith-ng[3916]:    error: Operation 'poweroff' [6442] (call 12 from crmd.3920) for host 'bsul0798a01' with device 'sbd-vglvmha-bsul0798a01' returned: -95 (Operation not supported)
Jul 10 14:12:20 bsul0799 stonith-ng[3916]:   notice: Couldn't find anyone to fence (poweroff) bsul0798a01 with any device

Comment 34 Klaus Wenninger 2017-07-10 16:12:06 UTC

(In reply to Marcel Fischer from comment #33)
> Steffen Froemer provided me the newest versions, but still the same error:
> 
> [root@bsul0799 ~]# rpm -qi pacemaker
> Name        : pacemaker
> Version     : 1.1.16
> Release     : 8.el7
> Architecture: x86_64
> 
> [root@bsul0799 ~]# rpm -qi sbd
> Name        : sbd
> Version     : 1.3.0
> Release     : 3.shutdown_issue.0.el7
> 
> [root@bsul0799 ~]# rpm -qi fence-agents-sbd
> Name        : fence-agents-sbd
> Version     : 4.0.11
> Release     : 59.el7
> 
> 
> ==> /var/log/messages <==
> Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> [ Failed: Unrecognised action 'poweroff' ]
> Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> [  ]
> Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> [ Please use '-h' for usage ]
> Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> [  ]
> Jul 10 14:12:20 bsul0799 stonith-ng[3916]:    error: Operation 'poweroff'
> [6442] (call 12 from crmd.3920) for host 'bsul0798a01' with device
> 'sbd-vglvmha-bsul0798a01' returned: -95 (Operation not supported)
> Jul 10 14:12:20 bsul0799 stonith-ng[3916]:   notice: Couldn't find anyone to
> fence (poweroff) bsul0798a01 with any device

Was assuming that you first would trigger a test-fencing using pcs.
There is definitely a misbehaviour here regarding what it does when fencing is triggered via pcs or when pacemaker is triggering fencing.
When I let pacemaker trigger fencing with your setting I get the same issue that poweroff is passed through to the fence-agent not understanding it.
For now we can categorize it as kind of a documentation-issue although further analysis has to be done and there is definitely some inconsistency outside documentation.

Anyway for now you should be able to set stonith-action=off.
At least that lead to the desired result in my test-setup even if I let pacemaker trigger the fencing.

Comment 35 Klaus Wenninger 2017-07-10 17:15:15 UTC

(In reply to Klaus Wenninger from comment #34)
> (In reply to Marcel Fischer from comment #33)
> > Steffen Froemer provided me the newest versions, but still the same error:
> > 
> > [root@bsul0799 ~]# rpm -qi pacemaker
> > Name        : pacemaker
> > Version     : 1.1.16
> > Release     : 8.el7
> > Architecture: x86_64
> > 
> > [root@bsul0799 ~]# rpm -qi sbd
> > Name        : sbd
> > Version     : 1.3.0
> > Release     : 3.shutdown_issue.0.el7
> > 
> > [root@bsul0799 ~]# rpm -qi fence-agents-sbd
> > Name        : fence-agents-sbd
> > Version     : 4.0.11
> > Release     : 59.el7
> > 
> > 
> > ==> /var/log/messages <==
> > Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> > [ Failed: Unrecognised action 'poweroff' ]
> > Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> > [  ]
> > Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> > [ Please use '-h' for usage ]
> > Jul 10 14:12:20 bsul0799 stonith-ng[3916]:  warning: fence_sbd[6442] stderr:
> > [  ]
> > Jul 10 14:12:20 bsul0799 stonith-ng[3916]:    error: Operation 'poweroff'
> > [6442] (call 12 from crmd.3920) for host 'bsul0798a01' with device
> > 'sbd-vglvmha-bsul0798a01' returned: -95 (Operation not supported)
> > Jul 10 14:12:20 bsul0799 stonith-ng[3916]:   notice: Couldn't find anyone to
> > fence (poweroff) bsul0798a01 with any device
> 
> Was assuming that you first would trigger a test-fencing using pcs.
> There is definitely a misbehaviour here regarding what it does when fencing
> is triggered via pcs or when pacemaker is triggering fencing.
> When I let pacemaker trigger fencing with your setting I get the same issue
> that poweroff is passed through to the fence-agent not understanding it.
> For now we can categorize it as kind of a documentation-issue although
> further analysis has to be done and there is definitely some inconsistency
> outside documentation.
> 
> Anyway for now you should be able to set stonith-action=off.
> At least that lead to the desired result in my test-setup even if I let
> pacemaker trigger the fencing.

stonith-action=off leads to fencing triggered via pcs not working

thus the better workaround is probably:

cluster-property stonith-action=reboot (or empty as reboot this is the default)
fencing-agent attribute pcmk_reboot_action=off

Comment 36 Marcel Fischer 2017-07-11 09:53:34 UTC

(In reply to Klaus Wenninger from comment #35)
> 
> stonith-action=off leads to fencing triggered via pcs not working
> 
> thus the better workaround is probably:
> 
> cluster-property stonith-action=reboot (or empty as reboot this is the
> default)
> fencing-agent attribute pcmk_reboot_action=off

Yes, with pcmk_reboot_action=off it works.
pcs config
...
Stonith Devices:
 Resource: sbd-vglvmha-bsul0798a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=10 port=bsul0798a01 plug=bsul0798a01 method=cycle power_timeout=200 pcmk_reboot_action=off
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0798a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0798a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0798a01-monitor-interval-60s)
 Resource: sbd-vglvmha-bsul0799a01 (class=stonith type=fence_sbd)
  Attributes: devices=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_sdd delay=5 port=bsul0799a01 plug=bsul0799a01 method=cycle power_timeout=200 pcmk_reboot_action=off
  Operations: start interval=0s timeout=20s (sbd-vglvmha-bsul0799a01-start-interval-0s)
              stop interval=0s timeout=60s (sbd-vglvmha-bsul0799a01-stop-interval-0s)
              monitor interval=60s (sbd-vglvmha-bsul0799a01-monitor-interval-60s)
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: multisite
 dc-version: 1.1.16-8.el7-94ff4df
 default-action-timeout: 60s
 have-watchdog: true
 last-lrm-refresh: 1494411936
 stonith-action: reboot
 stonith-enabled: true
 stonith-timeout: 300s
 stonith-watchdog-timeout: 0

Comment 39 Klaus Wenninger 2017-07-24 17:26:28 UTC

Depending on how the disabling of a fencing-resource is done bz1474463 can be triggered.
If the device is disabled with a queued action bz1470262 triggers as well.
And it has to be noted that location-rules for fencing-resources may not contain score attributes or alike so that the resulting score doesn't depend on anything but the location-rule itself.
This is important as the rule is just evaluated once by stonithd and no reevaluation e.g. required by a score-attribute having changed is triggered.
Valid triggers are just changes in the resources and in the constraints section of the cib.
And even this is just working properly when configured dynamically if the fixes for bz1474463 (target-role) & bz1454933 (location-rules) are included.

The need for the fixes for bz1474463 & bz1454933 might be worked around (has to be verified): 
If location-rules are not created and deleted but score is switched e.g. between -INF and INF.
If target-role is switched between stopped and started instead of deleting it for the latter case.

Comment 41 John Ruemker 2017-07-25 16:53:09 UTC

Because this bug was a sort-of catch-all for the discussion of this customer's needs and concerns, I am trying to move individual points of concern to appropriate bugs that can be focused on that singular aspect.  We'll continue to open new reports as anything else comes up.

So for reference, before I work on closing this, here are the bugs that are relevant to this discussion or have spawned out of it:

Bug #1454933 - Fencing occurs from a node even if fencing resource is banned from that node
Bug #1470262 - disabling a fencing-device that has queued actions leads to stonithd receiving SIGABRT
Bug #1474463 - fencing-device not properly registered after disable/enable cycle
Bug #1474905 - stonith: dynamic enabling/disabling of stonith resources by rule-constraints
Bug #1413573 - [RFE] qdevice: Include support for heuristics
Bug #1474917 - pcs: Simplify configuration of sbd timeouts in various components
Bug #1470813 - stonith: Be less susceptible to fence-agent internal timeout failures (power_timeout, login_timeout, shell_timeout)
    + Note: This hasn't been discussed here, but I'm including it as its relevant to the concerns around complexity of timeout configuration.  If we work to de-enforce agent-internal timeout (and sbd's msgwait timeout in turn), then configuration of timeouts around sbd can be mostly done at the pacemaker level - reducing some complexity.

I will follow up outside this bug on overall discussion of our path forward with this customer.  

If any points of possible product improvement or concern were not touched on by these bugs, please raise awareness of them.  For now, I'm closing this out.

Note You need to log in before you can comment on or make changes to this bug.