Red Hat Bugzilla – Bug 961209
Glusterd crash when sosreport is run multiple times
Last modified: 2013-09-23 18:43:47 EDT
Description of problem:
Version-Release number of selected component (if applicable):
glusterfs 184.108.40.206rhs built on May 7 2013 09:35:04
Steps to Reproduce:
1. Run sosreport multiple times in a loop
2. Bring down one of the servers (Pull the plug)
[root@tex ~]# gluster volume info
Volume Name: bb
Volume ID: 7c26fe1e-9b7e-48fa-81ed-6db4bd23b6e8
Number of Bricks: 2 x 2 = 4
On Brick1, glusterd was oom killed.
Uploading logs from all four nodes.
Created attachment 745555 [details]
glusterd log and cli history
There were several issues with glusterds' usage of the syncop framework, which were causing problems. These issues could lead to crashes and possibly high mem usage leading to oom-kill.
The issues have been fixed, by Krishnan, upstream and have been ported back to downstream by him.
I'm now trying to see if the problem reported in this bug can still be reproduced with the fixes applied.
Will update the bug once I'm done testing.
The following patches,
7503237 syncop: synctask shouldn't yawn, it could miss a 'wake
2b525e1 syncop: Remove task from syncbarrier's waitq before 'wake
3496933 syncop: Update synctask state appropriately
have been available since glusterfs-220.127.116.11 builds, which should have fixed the issues of OOM-kills and crashes.
Please verify if this still happens with the latest packages.
Moving to ON_QA for verification.
Verified on Beta-3, I do not see the crash anymore. Resolving.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
For information on the advisory, and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.