Bug 1194301
Summary: | HA | Current fencing configuration(ipmilan default command) only shuts down the fenced host and does not bring it up. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Leonid Natapov <lnatapov> |
Component: | rhosp-director | Assignee: | Chris Jones <chjones> |
Status: | CLOSED DUPLICATE | QA Contact: | Udi Shkalim <ushkalim> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 (Kilo) | CC: | abeekhof, dhill, fdinitto, jcoufal, jguiditt, lnatapov, mburns, michele, morazi, oblaut, rhel-osp-director-maint, rhos-maint, royoung, srevivo, tshefi, ushkalim |
Target Milestone: | ga | Keywords: | FutureFeature, InstallerIntegration, UserExperience |
Target Release: | 11.0 (Ocata) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-01-11 04:47:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1242422 | ||
Bug Blocks: |
Description
Leonid Natapov
2015-02-19 14:10:33 UTC
Just to make it clear. When I was using GA with RHEL 7 there was no problems with the fencing. Now I am using A1 with RHEL 7.1 Currently (and since at least april of 2014), the command we run to crate ipmilan stonith device is: /usr/sbin/pcs stonith create stonith-ipmilan-${real_address} fence_ipmilan ${pcmk_host_list_chunk} ipaddr=${real_address} ${username_chunk} ${password_chunk} ${lanplus_chunk} op monitor interval=${interval} Andrew, can you provide any insight on this? Was there a change between RHEL 7.0 and 7.1 that could cause what Leonid is seeing? Is the setting he mentions reasonable? is it something we want the user to be able to tweak, even if it is reasonable as a default? Jason: No change, just some devices are evidently slower. In general there is no penalty for having a longer timeout, we'll continue as soon as the fencing completes. One could even argue that the agent itself should be a little more generous. This bug happen on another setup here in QA. Fencing only shuts down cluster node and it stays down. RHWL 7.1. Jason ,do you know when we are going to fix it ? (In reply to Leonid Natapov from comment #9) > This bug happen on another setup here in QA. Fencing only shuts down cluster > node and it stays down. RHWL 7.1. Jason ,do you know when we are going to > fix it ? There is currently no target release, and I do not know what the recommendation is for a proper setting from the pacemaker team. Andrew, could you clarify your previous response so I know what to fix? If this is to make any release, my guess would be A4. 'fence_ipmilan -o metadata' shows the following option: <parameter name="power_timeout" unique="0" required="0"> <getopt mixed="--power-timeout=[seconds]" /> <content type="string" default="20" /> <shortdesc lang="en">Test X seconds for status change after ON/OFF</shortdesc> </parameter> and <parameter name="retry_on" unique="0" required="0"> <getopt mixed="--retry-on=[attempts]" /> <content type="string" default="1" /> <shortdesc lang="en">Count of attempts to retry power on</shortdesc> </parameter> which correspond to the options mentioned in the description. I'd suggest adding the following parameters to the command for creating the fencing device in pcs: power_timeout=60 retry_on=3 Reproduces on the latest rhel-osp-director puddle. moving the bug to ospd, since the problem is still there Hi Leonid, can you clarify a bit what you mean with reproduced in director? (I ask because osp-d does not configure fencing out of the box) Can you share the following: - CIB (pcs cluster cib) of this OSP-d cluster? - I.e. how did you configure fencing and how are you trying to fence the node? - Can we get /var/log/pacemaker.log from all three controllers thanks, Michele I configure it manually according to this guide: https://docs.google.com/document/d/10FPwRba6aJ4PzXLw7FKR77mV5PwKpSOiEUwGAyc1o18/edit So I tried to reproduce this to no avail (aka fencing does a correct reboot). This might simply due to your baremetal needing a bit of tuning due to IPMI being slow to respond. Have you tried the suggestion Andrew gave in comment 12? If you still have this environment around, can you ping me online (bandini) and I'll take a look? (In reply to Michele Baldessari from comment #18) > So I tried to reproduce this to no avail (aka fencing does a correct reboot). > > This might simply due to your baremetal needing a bit of tuning due to IPMI > being slow to respond. Have you tried the suggestion Andrew gave in comment > 12? > > If you still have this environment around, can you ping me online (bandini) > and I'll take a look? Sure,will ping you. You can find me also on irc (Lesik). Would updating ipmitool to the same version as BZ 1269523 help fix this issue? (In reply to David Hill from comment #23) > Would updating ipmitool to the same version as BZ 1269523 help fix this > issue? I have tested with ipmitool-1.8.15-7.el7.x86_64 and the problem still exists. Has the suggestion in comment 12 been tried? What was the outcome? Can we get some feedback on the suggestion in comment #12 please? (In reply to Andrew Beekhof from comment #27) > Can we get some feedback on the suggestion in comment #12 please? Hey Andrew. Sorry for the late response. It has been tried and it solved the problem. Moving to OSP11 and depends on auto-fencing configuration. This bug is mostly as input for auto-fencing configuration, Chris we need to make sure that we use some sane timeout defaults when configuring devices, or allow overrides as we go. *** This bug has been marked as a duplicate of bug 1242422 *** |