Bug 961209 - Glusterd crash when sosreport is run multiple times
Summary: Glusterd crash when sosreport is run multiple times
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Kaushal
QA Contact: Sachidananda Urs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-09 06:41 UTC by Sachidananda Urs
Modified: 2013-09-23 22:43 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0.12rhs-beta3-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:39:42 UTC
Embargoed:


Attachments (Terms of Use)
glusterd log and cli history (42.54 MB, application/x-tar)
2013-05-09 06:55 UTC, Sachidananda Urs
no flags Details

Description Sachidananda Urs 2013-05-09 06:41:51 UTC
Description of problem:


Version-Release number of selected component (if applicable):

glusterfs 3.4.0.4rhs built on May  7 2013 09:35:04


Steps to Reproduce:
1. Run sosreport multiple times in a loop
2. Bring down one of the servers (Pull the plug)

  
Actual results:


Expected results:


Additional info:

[root@tex ~]# gluster volume info
 
Volume Name: bb
Type: Distributed-Replicate
Volume ID: 7c26fe1e-9b7e-48fa-81ed-6db4bd23b6e8
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: tex.lab.eng.blr.redhat.com:/mnt/store/bb
Brick2: wingo.lab.eng.blr.redhat.com:/mnt/store/bb
Brick3: van.lab.eng.blr.redhat.com:/mnt/store/bb
Brick4: mater.lab.eng.blr.redhat.com:/mnt/store/bb

On Brick1, glusterd was oom killed.

Uploading logs from all four nodes.

Comment 2 Sachidananda Urs 2013-05-09 06:55:05 UTC
Created attachment 745555 [details]
glusterd log and cli history

Comment 4 Kaushal 2013-05-22 10:06:11 UTC
There were several issues with glusterds' usage of the syncop framework, which were causing problems. These issues could lead to crashes and possibly high mem usage leading to oom-kill.

The issues have been fixed, by Krishnan, upstream and have been ported back to downstream by him.

I'm now trying to see if the problem reported in this bug can still be reproduced with the fixes applied.

Will update the bug once I'm done testing.

Comment 5 Kaushal 2013-07-11 06:40:55 UTC
The following patches,
 7503237 syncop: synctask shouldn't yawn, it could miss a 'wake
 2b525e1 syncop: Remove task from syncbarrier's waitq before 'wake
 3496933 syncop: Update synctask state appropriately
have been available since glusterfs-3.4.0.9 builds, which should have fixed the issues of OOM-kills and crashes.
Please verify if this still happens with the latest packages.
Moving to ON_QA for verification.

Comment 6 Sachidananda Urs 2013-07-17 05:14:27 UTC
Verified on Beta-3, I do not see the crash anymore. Resolving.

Comment 7 Scott Haines 2013-09-23 22:39:42 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 8 Scott Haines 2013-09-23 22:43:47 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.