Bug 228420 - GFS Filesystem Failure
GFS Filesystem Failure
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: GFS-kernel (Show other bugs)
3.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Kiersten (Kerri) Anderson
Dean Jansa
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-12 19:33 EST by Phillip Short
Modified: 2010-01-11 22:22 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 14:38:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Phillip Short 2007-02-12 19:33:58 EST
Description of problem: Cluster consists of one lock server and 2 GFS Nodes. One
of the nodes and the lock server appeared to have communication issues though no
problems existed on the network and the other node did not have any problems
communicating with the lock server. The lock server attempted to fence the node
it was having troubles communicating with out of the cluster but this failed.


Version-Release number of selected component (if applicable): 
GFS-devel-6.0.2.30-0
rh-gfs-en-6.0-4
GFS-modules-smp-6.0.2.27-0.1
GFS-modules-smp-6.0.2.30-0
GFS-modules-smp-6.0.2.27-0
GFS-6.0.2.30-0


How reproducible: Has only occurred once


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:  Output from server logs
Lock server reported these errors
au04qws060apor2 lock_gulmd_core[13528]: au04qdb020apor2 missed a heartbeat
(time:1168621522519197 mb:1)
au04qws060apor2 lock_gulmd_core[13528]: au04qdb020apor2 missed a heartbeat
(time:1168621537557378 mb:2)
au04qws060apor2 lock_gulmd_core[13528]: au04qdb020apor2 missed a heartbeat
(time:1168621552595480 mb:3)
au04qws060apor2 lock_gulmd_core[13528]: Client (au04qdb020apor2) expired
au04qws060apor2 lock_gulmd_core[13528]: Forked [32478] fence_node
au04qdb020apor2 with a 0 pause.
au04qws060apor2 lock_gulmd_core[32478]: Gonna exec fence_node au04qdb020apor2

The Node reported these errors
 au04qdb020apor2 lock_gulmd_core[9713]: EOF on xdr
(au04qws060apor2.infra.bhdc.mebs.ihost.com:10.193.101.218 idx:1 fd:5)
 au04qdb020apor2 lock_gulmd_core[9713]: In core_io.c:425 (6.0.2.30) death by:
Lost connection to SLM Master (au04qws060apor2.infra.bhdc.mebs.ihost.com),
stopping. node reset required to re-activate cluster operations.
 au04qdb020apor2 lock_gulmd_LTPX[9720]: ERROR [ltpx_io.c:613] XDR error
-32:Broken pipe sending to lt000
 au04qdb020apor2 lock_gulmd_LTPX[9720]: ERROR [ltpx_io.c:1005] Got a -32:Broken
pipe trying to send packet to Master 0 on
0x4746532047076f726170726f6400090700000000001ab72800
 au04qdb020apor2 lock_gulmd_LTPX[9720]: EOF on xdr (_ core _:0.0.0.0 idx:1 fd:5)
 au04qdb020apor2 lock_gulmd_LTPX[9720]: In ltpx_io.c:332 (6.0.2.30) death by:
Lost connection to core, cannot continue. node reset required to re-activate
cluster operations.
 au04qdb020apor2 kernel: lock_gulm: ERROR Got an error in gulm_res_recvd err: -71
 au04qdb020apor2 kernel: lock_gulm: ERROR gulm_LT_recver err -104
 au04qdb020apor2 kernel: lock_gulm: ERROR Got a -111 trying to login to
lock_gulmd.  Is it running?

The latest version of GFS change log shows two bugs that have been foxed but I
cannot access the info for these bugs in Bugzilla
Comment 1 RHEL Product and Program Management 2007-10-19 14:38:44 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.