Bug 961209 - Glusterd crash when sosreport is run multiple times
Glusterd crash when sosreport is run multiple times
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Kaushal
Sachidananda Urs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-09 02:41 EDT by Sachidananda Urs
Modified: 2013-09-23 18:43 EDT (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs-beta3-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:39:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
glusterd log and cli history (42.54 MB, application/x-tar)
2013-05-09 02:55 EDT, Sachidananda Urs
no flags Details

  None (edit)
Description Sachidananda Urs 2013-05-09 02:41:51 EDT
Description of problem:


Version-Release number of selected component (if applicable):

glusterfs 3.4.0.4rhs built on May  7 2013 09:35:04


Steps to Reproduce:
1. Run sosreport multiple times in a loop
2. Bring down one of the servers (Pull the plug)

  
Actual results:


Expected results:


Additional info:

[root@tex ~]# gluster volume info
 
Volume Name: bb
Type: Distributed-Replicate
Volume ID: 7c26fe1e-9b7e-48fa-81ed-6db4bd23b6e8
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: tex.lab.eng.blr.redhat.com:/mnt/store/bb
Brick2: wingo.lab.eng.blr.redhat.com:/mnt/store/bb
Brick3: van.lab.eng.blr.redhat.com:/mnt/store/bb
Brick4: mater.lab.eng.blr.redhat.com:/mnt/store/bb

On Brick1, glusterd was oom killed.

Uploading logs from all four nodes.
Comment 2 Sachidananda Urs 2013-05-09 02:55:05 EDT
Created attachment 745555 [details]
glusterd log and cli history
Comment 4 Kaushal 2013-05-22 06:06:11 EDT
There were several issues with glusterds' usage of the syncop framework, which were causing problems. These issues could lead to crashes and possibly high mem usage leading to oom-kill.

The issues have been fixed, by Krishnan, upstream and have been ported back to downstream by him.

I'm now trying to see if the problem reported in this bug can still be reproduced with the fixes applied.

Will update the bug once I'm done testing.
Comment 5 Kaushal 2013-07-11 02:40:55 EDT
The following patches,
 7503237 syncop: synctask shouldn't yawn, it could miss a 'wake
 2b525e1 syncop: Remove task from syncbarrier's waitq before 'wake
 3496933 syncop: Update synctask state appropriately
have been available since glusterfs-3.4.0.9 builds, which should have fixed the issues of OOM-kills and crashes.
Please verify if this still happens with the latest packages.
Moving to ON_QA for verification.
Comment 6 Sachidananda Urs 2013-07-17 01:14:27 EDT
Verified on Beta-3, I do not see the crash anymore. Resolving.
Comment 7 Scott Haines 2013-09-23 18:39:42 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html
Comment 8 Scott Haines 2013-09-23 18:43:47 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.