RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1014978 - fence-agents should have a sane a default for the reboot operation
Summary: fence-agents should have a sane a default for the reboot operation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: pacemaker
Version: 6.5
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: rc
: ---
Assignee: Andrew Beekhof
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-03 09:05 UTC by Jaroslav Kortus
Modified: 2015-02-25 15:13 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-25 15:13:38 UTC
Target Upstream Version:
Embargoed:
mgrac: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 996850 0 urgent CLOSED Unfence at cluster startup with fence_scsi 2021-02-22 00:41:40 UTC

Internal Links: 996850

Description Jaroslav Kortus 2013-10-03 09:05:17 UTC
Description of problem:
Pacemaker has it's own default for pcmk_reboot_action which is "reboot". It calls the agent with that as an action default, which might not always be optimal.

The agents act in a way that by default they perform what you would expect to get a node fenced. Most of the time this is actually the same (i.e. "reboot"), but sometimes this is different.

The only agents I found to behave differently are fence_scsi and fence_brocade.

If pacemaker used agents defaults (i.e. action undefined), these agents would work without any pcmk_reboot_action configuration, just like they did in cman times.

Version-Release number of selected component (if applicable):
pacemaker-1.1.10-12.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. configure fence_scsi without any pcmk_reboot_action
2. fence the node
3. see agents failing and keys still registered

Actual results:
pcmk_reboot_action necessary for certain agents

Expected results:
no pcmk_reboot_action necessary, pacemaker uses agents' own defaults (i.e. no action parameter)

Additional info:

Comment 3 Andrew Beekhof 2014-01-14 05:19:04 UTC
This was fixed in 1.1.10-10

Comment 4 Andrew Beekhof 2014-01-14 05:19:56 UTC
Dammit, wrong bug

Comment 5 Andrew Beekhof 2014-01-15 06:28:23 UTC
I disagree here.

Allowing each agent to have its own default results inconsistent and non-obvious behaviour (as well as increasing the number of places to check for the value being used).

The two agents should be doing something sane for 'reboot' instead.

There are a number of agents that fake "reboot" by sending "off" + "on", do they not report success as long as "off" succeeds?  This wouldn't seem much different.

We'd also not be able to push such a change upstream as changing the Pacemaker defaults would cause compatibility issues with other tools and agents.

Comment 6 Fabio Massimo Di Nitto 2014-01-15 08:00:52 UTC
Marek,

fence_brocade has been rewritted since this bug was filed and fence_scsi has to be ported/fixed for rhel7.

Can we address those issues without possibly introducing regressions vs current deployed setups in RHEL6?

Otherwise we will need to document it for 6 and I´d like to see it fixed properly in 7 with a consistent reboot action across all agents.

Comment 7 Marek Grac 2014-01-22 14:44:20 UTC
@Fabio,

Those bugs should be fixed now. But we do not want to have action 'reboot' everywhere (and we never had) because fabric fence agents (e.g. switches, scsi, ... + kdump which is kind of special one as it does not have 'on') can not have reboot action. 

--

possible solutions:

1) add this information to fence agents XML

2) make cluster aware that the orders of default actions is reboot/off - cluster already has <actions> in XML, so this should not be a problem. 

I prefer, first version because this is not cluster specific.

Comment 8 Marek Grac 2014-04-07 13:40:58 UTC
There is another option which I did not mention and is available/ready and supported. When action is not specified then default action is used, what is reboot/off according to fence agent.

Is this acceptable?

Comment 9 Jaroslav Kortus 2014-04-07 19:33:47 UTC
probably not as we need sane default for "reboot" action (see the summary and the history of this bug). Pacemaker always calls it with reboot and expects the agent to handle that.

Comment 10 David Vossel 2014-04-22 15:04:56 UTC
(In reply to Jaroslav Kortus from comment #9)
> probably not as we need sane default for "reboot" action (see the summary
> and the history of this bug). Pacemaker always calls it with reboot and
> expects the agent to handle that.

If no reboot is advertised in the agent's metadata xml, we'll revert to 'off'.

https://github.com/ClusterLabs/pacemaker/commit/8383a38a478ed6473ff2179596335ed4de583cfa

I put a big warning message in there so we'd know the 'reboot' to 'off' substitution took place because some agent didn't support 'reboot'.  This seems like the path of least resistance in that it allows these couple of agents to work properly with the current pacemaker defaults.  If people want the warnings to go away, they can change the default off operation. 

-- Vossel


Note You need to log in before you can comment on or make changes to this bug.