Description of problem: Version-Release number of selected component (if applicable): glusterfs 3.4.0.4rhs built on May 7 2013 09:35:04 Steps to Reproduce: 1. Run sosreport multiple times in a loop 2. Bring down one of the servers (Pull the plug) Actual results: Expected results: Additional info: [root@tex ~]# gluster volume info Volume Name: bb Type: Distributed-Replicate Volume ID: 7c26fe1e-9b7e-48fa-81ed-6db4bd23b6e8 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: tex.lab.eng.blr.redhat.com:/mnt/store/bb Brick2: wingo.lab.eng.blr.redhat.com:/mnt/store/bb Brick3: van.lab.eng.blr.redhat.com:/mnt/store/bb Brick4: mater.lab.eng.blr.redhat.com:/mnt/store/bb On Brick1, glusterd was oom killed. Uploading logs from all four nodes.
Created attachment 745555 [details] glusterd log and cli history
There were several issues with glusterds' usage of the syncop framework, which were causing problems. These issues could lead to crashes and possibly high mem usage leading to oom-kill. The issues have been fixed, by Krishnan, upstream and have been ported back to downstream by him. I'm now trying to see if the problem reported in this bug can still be reproduced with the fixes applied. Will update the bug once I'm done testing.
The following patches, 7503237 syncop: synctask shouldn't yawn, it could miss a 'wake 2b525e1 syncop: Remove task from syncbarrier's waitq before 'wake 3496933 syncop: Update synctask state appropriately have been available since glusterfs-3.4.0.9 builds, which should have fixed the issues of OOM-kills and crashes. Please verify if this still happens with the latest packages. Moving to ON_QA for verification.
Verified on Beta-3, I do not see the crash anymore. Resolving.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html