Bug 435955 - fence_bladecenter fails when blade is not physically present
fence_bladecenter fails when blade is not physically present
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: fence (Show other bugs)
4
i386 Linux
low Severity medium
: ---
: ---
Assigned To: Jim Parsons
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-04 12:07 EST by Bryn M. Reeves
Modified: 2010-10-22 19:00 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2008-0801
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-25 15:16:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch for fence_bladecenter (412 bytes, patch)
2008-03-05 07:29 EST, David Juran
no flags Details | Diff

  None (edit)
Description Bryn M. Reeves 2008-03-04 12:07:48 EST
+++ This bug was initially created as a clone of Bug #248006 +++

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4)
Gecko/20070515 Firefox/2.0.0.4

Description of problem:
If a blade is not present (i.e. removed for maintenance), the fence_bladecenter
cannot check the state as it is reported empty. I think it is something simple
to fix for those versed in perl. Normally the fence only runs against a blade
that is present. If the blade is removed while running, you run into this issue.

My case below. Blade #3 is a good node. Blade #2 was removed. The fence does not
work with the blade removed.

system> env -T system:blade[3]
OK
system:blade[3]> power -state
On
system:blade[3]> env -T system:blade[2]
The target bay is empty. 
system:blade[3]> env -T system:blade[1]
OK
system:blade[1]>


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Bring up cluster on two nodes
2. Physically remove blade running the service
3. Fence fails shown in log

Actual Results:
 The clustered service does not failover to standby node

Expected Results:
Clustered service should failover. Fence should detect that fenced node is no
longer present in Blade_Center instead of hanging 

Additional info:
Got this from James Parsons - RHCS Mailing list..

I believe this is what you want to happen...if state cannot be checked, fenced
keeps trying. How could you determine it was safe to stop without persisting
some value like the number of fence tries, and trying to reason out whether it
was safe to stop? This will not happen if you remove the blade from the cluster
before physically removing it. It is a snap to do this  with one of the UIs, if
you are not prejudiced against UIs :).

Also, removing the node from cluster membership before jerking it out of the
rack tells rgmanager to move any services off of it  - rather than having to
depend on heartbeat failure to make this happen.

That said, if the blade catches fire and a cage IT guy notices and jerks it
quick, (using his IT Oven Mitt, of course) it is silly for fenced to keep
incessantly trying when the thing no longer even exists. Perhaps the correct
solution would be to have the fence_bladecenter report success if the
bladecenter admin unit reports that 'no status is available' for a particular
blade - obviously if the thing is not there, it should be safe to say it is
fenced :)

If this addresses your situation (I think it does), now would be a REALLY good
time to file a ticket requesting this behavior - like today! I'll post a fixed
version to the ticket when it is ready.

Thanks to Lon for discussing this with me...;)

Regards,

-Jim
Comment 2 RHEL Product and Program Management 2008-03-04 12:28:27 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 David Juran 2008-03-05 07:29:37 EST
Created attachment 296878 [details]
patch for fence_bladecenter

Unfortunately I'm not very well versed in perl, but that didn't stop me from
throwing this quick hack together (-: Just like James Parsons suggests, it
exits with success if the blade is no longer present.
Comment 6 errata-xmlrpc 2008-07-25 15:16:56 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0801.html

Note You need to log in before you can comment on or make changes to this bug.