Bug 63176 - clustat and cluadmin segfault when cluster stopped.
clustat and cluadmin segfault when cluster stopped.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: clumanager (Show other bugs)
2.1
i686 Linux
medium Severity low
: ---
: ---
Assigned To: Lon Hohberger
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-04-10 16:44 EDT by John Flanagan
Modified: 2008-05-01 11:38 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-05-01 14:27:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Flanagan 2002-04-10 16:44:40 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901

Description of problem:
When the cluster is stopped.  clustat and cluadmin both segfault and errors
appear in /var/log/messages.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.service cluster stop
2.clustat or cluadmin
3.
	

Actual Results:  3 second delay and then:

Segmentation fault

Expected Results:  I would have expected a message indicating that the cluster
was not operational or running or somesuch.

Additional info:

Here's the errors that appeared in /var/log/messages:

Apr 10 16:30:58 clue clustat[11103]: <err> Unable to open /dev/raw/raw1 read/write.
Apr 10 16:30:58 clue clustat[11103]: <crit> initSharedFD: unable to validate
partition /dev/raw/raw1. Configuration error?
Apr 10 16:30:58 clue clustat[11103]: <err> Unable to open /dev/raw/raw1 read/write.
Apr 10 16:30:58 clue clustat[11103]: <crit> initSharedFD: unable to validate
partition /dev/raw/raw1. Configuration error?
Apr 10 16:30:58 clue clustat[11103]: <err> Unable to open /dev/raw/raw1 read/write.
Apr 10 16:30:58 clue clustat[11103]: <crit> initSharedFD: unable to validate
partition /dev/raw/raw1. Configuration error?
Apr 10 16:30:58 clue clustat[11103]: <err> readNetBlock: bad ret -1 from
diskRawReadShadow
Apr 10 16:30:58 clue clustat[11103]: <err> getNetBlockData: IO error reading
quorum partition.
Apr 10 16:30:58 clue clustat[11103]: <err> msg_svc_init: Unable to read session_id.
Apr 10 16:30:58 clue clustat[11103]: <err> msg_open: unable to initialize msg
subsystem.
Apr 10 16:30:58 clue clustat[11103]: <crit> _clu_write_lock: bad return from
lockWrite, ret = -1
Comment 1 Mike McLean 2002-04-11 09:27:19 EDT
What version of clumanager is this?  I'm not seeing this behavior with
clumanager-1.0.9-1
Comment 2 John Flanagan 2002-04-11 10:15:50 EDT
rpm -q clumanager
clumanager-1.0.9-1

And I have crucial updated information to this incident.  After speaking with
the user of this cluster, it turns out that what they had done is unloaded the
qlogic driver module from the kernel [rmmod qla2x00] which essentially removed
access to the shared storage [QUORUM DEVICE!!!].

So this is certainly a very rare occurrence.  However, we should probably seek
out an additional error message rather than segfaulting.  That may not be
reasonable given the unusual cirmumstances..

So, to reproduce this:

service cluster stop
rmmod qla2x00 [or whatever shared storage driver module you have]
clustat

I'm bumping the severity down to low given the fringe case of this bug.

John
Comment 3 Tim Burke 2002-04-23 08:24:17 EDT
This is a Winchell-ism in the clulib error path (or lack thereof).
Comment 4 Lon Hohberger 2002-05-01 14:27:52 EDT
Implemented lock error paths back up.  Now they produce errors, but don't
segfault.  This is probably more expected behavior.
Comment 5 Lon Hohberger 2002-05-07 16:41:22 EDT
This same thing happens on machines which are not cluster members at all (and
hence have no shared storage).  This was due to the fact that the clu_lock()
never could return an error condition, and would simply effectively
raise(SIGSEGV) as the result instead of returning an error condition.

Fixed in current pool.

Note You need to log in before you can comment on or make changes to this bug.