961209 – Glusterd crash when sosreport is run multiple times

Bug 961209 - Glusterd crash when sosreport is run multiple times

Summary: Glusterd crash when sosreport is run multiple times

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kaushal
QA Contact:	Sachidananda Urs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-05-09 06:41 UTC by Sachidananda Urs
Modified:	2013-09-23 22:43 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.4.0.12rhs-beta3-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:39:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterd log and cli history (42.54 MB, application/x-tar) 2013-05-09 06:55 UTC, Sachidananda Urs	no flags	Details
View All

Description Sachidananda Urs 2013-05-09 06:41:51 UTC

Description of problem:


Version-Release number of selected component (if applicable):

glusterfs 3.4.0.4rhs built on May  7 2013 09:35:04


Steps to Reproduce:
1. Run sosreport multiple times in a loop
2. Bring down one of the servers (Pull the plug)

  
Actual results:


Expected results:


Additional info:

[root@tex ~]# gluster volume info
 
Volume Name: bb
Type: Distributed-Replicate
Volume ID: 7c26fe1e-9b7e-48fa-81ed-6db4bd23b6e8
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: tex.lab.eng.blr.redhat.com:/mnt/store/bb
Brick2: wingo.lab.eng.blr.redhat.com:/mnt/store/bb
Brick3: van.lab.eng.blr.redhat.com:/mnt/store/bb
Brick4: mater.lab.eng.blr.redhat.com:/mnt/store/bb

On Brick1, glusterd was oom killed.

Uploading logs from all four nodes.

Comment 2 Sachidananda Urs 2013-05-09 06:55:05 UTC

Created attachment 745555 [details]
glusterd log and cli history

Comment 4 Kaushal 2013-05-22 10:06:11 UTC

There were several issues with glusterds' usage of the syncop framework, which were causing problems. These issues could lead to crashes and possibly high mem usage leading to oom-kill.

The issues have been fixed, by Krishnan, upstream and have been ported back to downstream by him.

I'm now trying to see if the problem reported in this bug can still be reproduced with the fixes applied.

Will update the bug once I'm done testing.

Comment 5 Kaushal 2013-07-11 06:40:55 UTC

The following patches,
 7503237 syncop: synctask shouldn't yawn, it could miss a 'wake
 2b525e1 syncop: Remove task from syncbarrier's waitq before 'wake
 3496933 syncop: Update synctask state appropriately
have been available since glusterfs-3.4.0.9 builds, which should have fixed the issues of OOM-kills and crashes.
Please verify if this still happens with the latest packages.
Moving to ON_QA for verification.

Comment 6 Sachidananda Urs 2013-07-17 05:14:27 UTC

Verified on Beta-3, I do not see the crash anymore. Resolving.

Comment 7 Scott Haines 2013-09-23 22:39:42 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 8 Scott Haines 2013-09-23 22:43:47 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.