Bug 624606 - kernel: fs.sh[20430]: segfault at 0000000000000008 rip 0000000000432098 rsp 0000007fbfffda70 error 4
kernel: fs.sh[20430]: segfault at 0000000000000008 rip 0000000000432098 rsp 0...
Status: CLOSED WORKSFORME
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
4
All Linux
low Severity urgent
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
:
: 624509 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-17 04:29 EDT by Gupteswar
Modified: 2016-04-26 15:56 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-18 17:04:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Cluseter configuration file fs.sh (27.14 KB, text/plain)
2010-08-17 04:30 EDT, Gupteswar
no flags Details
Local kernel and system message (11.52 KB, text/plain)
2010-08-17 04:35 EDT, Gupteswar
no flags Details
cluster.conf file (8.89 KB, text/plain)
2010-08-25 00:43 EDT, Gupteswar
no flags Details
logs before segfault occur at system (21.26 KB, text/plain)
2010-08-25 00:44 EDT, Gupteswar
no flags Details
OS logs . (3.03 KB, application/x-zip-compressed)
2010-09-16 05:33 EDT, Gupteswar
no flags Details

  None (edit)
Description Gupteswar 2010-08-17 04:29:32 EDT
Description of problem:We got segfault error at cluster configuration file fs.sh after that listener services of DB gets disconnected from sap application and in result we are disconnecting from db and CI leads to cluster restarted the DB service immediately.


Version-Release number of selected component (if applicable):

Linux led 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

DB Machine with - 10.2.0.2 version 

CI Machine with - SAP ECC 6.0 - 2005.

How reproducible:Automated restart of CI and DB through cluster resolving the issue but there is dataloss at sap as well as the DB level also.


Steps to Reproduce:
1. Os cluster has done the automated restarting of DB services.

  
Actual results:
Running work process through sap and db gets aborted resulting in dataloss.


Expected results: May database crash leads to inconsistent of DB.


Additional info:We are having two machine i.e. one for DB and other for Central Instance and having os clusterig between them.

SAP version - ECC 6.0 and DB - 10.2.0.4 and OS - Rhel 4.
Comment 1 Gupteswar 2010-08-17 04:30:31 EDT
Created attachment 439069 [details]
Cluseter configuration file fs.sh
Comment 2 Gupteswar 2010-08-17 04:35:00 EDT
Created attachment 439070 [details]
Local kernel and system message
Comment 3 Perry Myers 2010-08-17 08:36:00 EDT
*** Bug 624509 has been marked as a duplicate of this bug. ***
Comment 4 Perry Myers 2010-08-17 08:36:56 EDT
Hi Gupteswar, just repeating the support message over here, thanks.

Thank you for taking the time to enter a bug report with us. We do appreciate
the feedback and look to use reports such as this to guide our efforts at
improving our products. That being said, this bug tracking system is not a
mechanism for getting support, and as such we are not able to make any
guarantees as to the timeliness or suitability of a resolution. If this issue
is critical or in any way time sensitive, please raise a ticket through your
regular Red Hat support channels to make certain that it gets the proper
attention and prioritization to assure a timely resolution.  

For information on how to contact the Red Hat production support team, please
see:
https://www.redhat.com/support/process/production/#howto
Comment 5 Gupteswar 2010-08-17 09:13:48 EDT
No we can wait...not an issue.
Comment 6 Lon Hohberger 2010-08-23 11:00:06 EDT
I need:

- cluster.conf
- rgmanager version (rpm -q rgmanager)
- logs -before- the segfault
Comment 7 Gupteswar 2010-08-25 00:43:57 EDT
Created attachment 440831 [details]
cluster.conf file
Comment 8 Gupteswar 2010-08-25 00:44:58 EDT
Created attachment 440832 [details]
logs before segfault occur at system
Comment 9 Gupteswar 2010-08-25 00:49:18 EDT
Hi

Here it is :

* Cluster.conf attached.
* System message.

rgmanager version - rgmanager-1.9.54-1  (x86_64)


Regards
Gupteswar Prasad Mishra
9910511955
Comment 10 Gupteswar 2010-09-04 13:12:18 EDT
waiting for your earliest reply.
Comment 13 Lon Hohberger 2010-09-08 10:46:38 EDT
Ok ... so, the cluster.conf is okay; you might want to add nodeid attributes to the clusternode tags.

The <multicast> children of the <clusternode> tags do nothing on RHEL5 and should be removed (the only one that matters is the one that is a child of the <cman> tag).

Apart from that, the config file passes all my sanity checks.

fs.sh is a shell script -- I'm not sure how it can segfault randomly unless there is an issue in bash itself or the system has run out of memory.  Unfortunately, the logs didn't show what I was looking for - that is, which fs.sh instance segfaulted, specifically.

The thing is that after the service recovery, everything seems to work fine again - so it's probably not specific to one or two of your <fs> lines in cluster.conf.  This really makes me think there was a low memory situation which caused this.  Has this happened again since the initial problem ?

What version of bash do you have installed?

You should be running at least bash-3.2-21 - there was an issue with recycled PIDs in prior releases.
Comment 14 Gupteswar 2010-09-09 02:35:45 EDT
Dear Lon Hohberger,

Currently we are having bash version is :- bash-3.0-19.2 .


Regards
Gupteswar Prasad Mishra
9910511955
Comment 15 Gupteswar 2010-09-16 05:31:31 EDT
Hi

Again our instance down ...Providing you the os logs for the same. i.e. 2 hours before down and 2 hours after down.

Regards
Gupteswar 
9910511955
Comment 16 Gupteswar 2010-09-16 05:33:42 EDT
Created attachment 447693 [details]
OS logs .
Comment 17 Gupteswar 2010-09-20 09:23:26 EDT
Hi

Kindly help on this issue.

Regards
Gupteswar Prasad Mishra
9910511955
Comment 18 Lon Hohberger 2010-09-20 15:20:59 EDT
Since bash 3.0-19.2, there have been several updates to the bash package, some of which resolve segmentation faults.

Here is the most recent erratum for bash for Red Hat Enterprise Linux 4:

https://rhn.redhat.com/errata/RHBA-2008-0938.html

You really need to run 'up2date bash' before we can continue.  fs.sh is a script handled by bash.  The script does not crash; bash crashes.

So, we need to eliminate the known-fixed problems before we can isolate any new ones.
Comment 19 Lon Hohberger 2010-09-20 15:21:47 EDT
Oops --

My apologies, it's this one:

https://rhn.redhat.com/errata/RHBA-2009-1599.html
Comment 20 Perry Myers 2010-09-21 16:05:15 EDT
(In reply to comment #17)
> Hi
> 
> Kindly help on this issue.

Please file a ticket with support as indicated in comment #4, otherwise we can't properly track your issue.

Thanks.
Comment 21 Lon Hohberger 2010-10-18 17:04:43 EDT
Closing; I haven't been able to reproduce this issue based on the current version of bash & rgmanager.

Note You need to log in before you can comment on or make changes to this bug.