Bug 624606

Summary: kernel: fs.sh[20430]: segfault at 0000000000000008 rip 0000000000432098 rsp 0000007fbfffda70 error 4
Product: [Retired] Red Hat Cluster Suite Reporter: Gupteswar <guptmishra>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: low    
Version: 4CC: cluster-maint, edamato, jwest
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-18 21:04:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Cluseter configuration file fs.sh
none
Local kernel and system message
none
cluster.conf file
none
logs before segfault occur at system
none
OS logs . none

Description Gupteswar 2010-08-17 08:29:32 UTC
Description of problem:We got segfault error at cluster configuration file fs.sh after that listener services of DB gets disconnected from sap application and in result we are disconnecting from db and CI leads to cluster restarted the DB service immediately.


Version-Release number of selected component (if applicable):

Linux led 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

DB Machine with - 10.2.0.2 version 

CI Machine with - SAP ECC 6.0 - 2005.

How reproducible:Automated restart of CI and DB through cluster resolving the issue but there is dataloss at sap as well as the DB level also.


Steps to Reproduce:
1. Os cluster has done the automated restarting of DB services.

  
Actual results:
Running work process through sap and db gets aborted resulting in dataloss.


Expected results: May database crash leads to inconsistent of DB.


Additional info:We are having two machine i.e. one for DB and other for Central Instance and having os clusterig between them.

SAP version - ECC 6.0 and DB - 10.2.0.4 and OS - Rhel 4.

Comment 1 Gupteswar 2010-08-17 08:30:31 UTC
Created attachment 439069 [details]
Cluseter configuration file fs.sh

Comment 2 Gupteswar 2010-08-17 08:35:00 UTC
Created attachment 439070 [details]
Local kernel and system message

Comment 3 Perry Myers 2010-08-17 12:36:00 UTC
*** Bug 624509 has been marked as a duplicate of this bug. ***

Comment 4 Perry Myers 2010-08-17 12:36:56 UTC
Hi Gupteswar, just repeating the support message over here, thanks.

Thank you for taking the time to enter a bug report with us. We do appreciate
the feedback and look to use reports such as this to guide our efforts at
improving our products. That being said, this bug tracking system is not a
mechanism for getting support, and as such we are not able to make any
guarantees as to the timeliness or suitability of a resolution. If this issue
is critical or in any way time sensitive, please raise a ticket through your
regular Red Hat support channels to make certain that it gets the proper
attention and prioritization to assure a timely resolution.  

For information on how to contact the Red Hat production support team, please
see:
https://www.redhat.com/support/process/production/#howto

Comment 5 Gupteswar 2010-08-17 13:13:48 UTC
No we can wait...not an issue.

Comment 6 Lon Hohberger 2010-08-23 15:00:06 UTC
I need:

- cluster.conf
- rgmanager version (rpm -q rgmanager)
- logs -before- the segfault

Comment 7 Gupteswar 2010-08-25 04:43:57 UTC
Created attachment 440831 [details]
cluster.conf file

Comment 8 Gupteswar 2010-08-25 04:44:58 UTC
Created attachment 440832 [details]
logs before segfault occur at system

Comment 9 Gupteswar 2010-08-25 04:49:18 UTC
Hi

Here it is :

* Cluster.conf attached.
* System message.

rgmanager version - rgmanager-1.9.54-1  (x86_64)


Regards
Gupteswar Prasad Mishra
9910511955

Comment 10 Gupteswar 2010-09-04 17:12:18 UTC
waiting for your earliest reply.

Comment 13 Lon Hohberger 2010-09-08 14:46:38 UTC
Ok ... so, the cluster.conf is okay; you might want to add nodeid attributes to the clusternode tags.

The <multicast> children of the <clusternode> tags do nothing on RHEL5 and should be removed (the only one that matters is the one that is a child of the <cman> tag).

Apart from that, the config file passes all my sanity checks.

fs.sh is a shell script -- I'm not sure how it can segfault randomly unless there is an issue in bash itself or the system has run out of memory.  Unfortunately, the logs didn't show what I was looking for - that is, which fs.sh instance segfaulted, specifically.

The thing is that after the service recovery, everything seems to work fine again - so it's probably not specific to one or two of your <fs> lines in cluster.conf.  This really makes me think there was a low memory situation which caused this.  Has this happened again since the initial problem ?

What version of bash do you have installed?

You should be running at least bash-3.2-21 - there was an issue with recycled PIDs in prior releases.

Comment 14 Gupteswar 2010-09-09 06:35:45 UTC
Dear Lon Hohberger,

Currently we are having bash version is :- bash-3.0-19.2 .


Regards
Gupteswar Prasad Mishra
9910511955

Comment 15 Gupteswar 2010-09-16 09:31:31 UTC
Hi

Again our instance down ...Providing you the os logs for the same. i.e. 2 hours before down and 2 hours after down.

Regards
Gupteswar 
9910511955

Comment 16 Gupteswar 2010-09-16 09:33:42 UTC
Created attachment 447693 [details]
OS logs .

Comment 17 Gupteswar 2010-09-20 13:23:26 UTC
Hi

Kindly help on this issue.

Regards
Gupteswar Prasad Mishra
9910511955

Comment 18 Lon Hohberger 2010-09-20 19:20:59 UTC
Since bash 3.0-19.2, there have been several updates to the bash package, some of which resolve segmentation faults.

Here is the most recent erratum for bash for Red Hat Enterprise Linux 4:

https://rhn.redhat.com/errata/RHBA-2008-0938.html

You really need to run 'up2date bash' before we can continue.  fs.sh is a script handled by bash.  The script does not crash; bash crashes.

So, we need to eliminate the known-fixed problems before we can isolate any new ones.

Comment 19 Lon Hohberger 2010-09-20 19:21:47 UTC
Oops --

My apologies, it's this one:

https://rhn.redhat.com/errata/RHBA-2009-1599.html

Comment 20 Perry Myers 2010-09-21 20:05:15 UTC
(In reply to comment #17)
> Hi
> 
> Kindly help on this issue.

Please file a ticket with support as indicated in comment #4, otherwise we can't properly track your issue.

Thanks.

Comment 21 Lon Hohberger 2010-10-18 21:04:43 UTC
Closing; I haven't been able to reproduce this issue based on the current version of bash & rgmanager.