Red Hat Bugzilla – Bug 851175
Avoid volume status for getting statedumps in sosreport
Last modified: 2013-09-23 18:33:09 EDT
Description of problem:
Instead of using gluster cli command for statedump to take the statedump, send the USR1 signal to glusterfs processes (may be killall -USR1 glusterfs glusterfsd).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Also avoid using gluster volume status commands while getting the sosreports. And also include the statedump files generated in the sosreport.
The problem is when sosreport is executed in the cluster, then volume status commands and statedump commands will be executed parallely. But since the 1st command entered makes glusterd lock entire cluster, the commands executed on the other glusterds will fail thus failing to gather the information. So to avoid parallel command execution, information collection via gluster cli should be avoided (thus not involving glusterd). To avoid gluster cli volume status commands should not be part of the sosreport. Instead USR1 signal should be sent to all the glusterfs processes (killall -USR1 glusterfs glusterfsd) which collects the statedump without locking the cluster.
diff -pruN old_gluster.py new_gluster.py
--- old_gluster.py 2012-08-31 05:11:37.633936947 -0400
+++ new_gluster.py 2012-08-31 05:23:49.898666408 -0400
@@ -12,6 +12,7 @@
## along with this program; if not, write to the Free Software
## Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
@@ -71,16 +72,10 @@ class gluster(sos.plugintools.PluginBase
volume_file = self.collectOutputNow("/usr/sbin/gluster volume info",
- for volname in self.get_volume_names(volume_file):
- self.collectExtOutput("gluster volume statedump %s" % volname)
- self.collectExtOutput("gluster volume statedump %s nfs" % volname)
- self.collectExtOutput("gluster volume status %s detail" % volname)
- self.collectExtOutput("gluster volume status %s clients" % volname)
- self.collectExtOutput("gluster volume status %s mem" % volname)
- self.collectExtOutput("gluster volume status %s callpool" % volname)
- self.collectExtOutput("gluster volume status %s inode" % volname)
- self.collectExtOutput("gluster volume status %s fd" % volname)
+ self.collectExtOutput("killall -USR1 glusterfs glusterfsd")
+ # arbitrary timeout for statedump to complete
self.collectExtOutput("gluster volume status")
# collect this last as some of the other actions create log entries
The above diff works and the statedump files are now copied in the sosreport generated. Please let us know the comments on it.
Thanks for the patch proposal - removing the gluster commands is fine and a trivial change that we can make.
Some comments on the statedump changes:
Grabbing files from /tmp with globs is something we try very hard to avoid since we have no simple way to clean up afterwards and may pull in extremely large files that we do not want to collect.
I.e. every run of sosreport will dump some new files in /tmp and if for example there was a huge file at /tmp/database.dump we would drag that in potentially greatly inflating the report size (something that generates customer complaint).
From the sos side it would be preferable if there was a way for a gluster command to generate this content on stdout. Alternately we would need to add code to determine all the PIDs for glusterd processes, send the signals directly to those processes and then construct exact file names to be collected and then removed.
Adding a sleep(10) is also a hack that we cannot really do - since we have around 100 modules now if we allow this behaviour we could end up with very unpredictable run times for the script. Again this has been a major source of customer and GSS annoyance in the past and was a major element of the work done upstream for the 2.0 release that was done for RHEL6. A better option would be to poll or monitor for the file creation - at least this gives us the minimum delay rather than forcing a 10s wait each time.
Btw, is there a reason that this bug is filed against the gluster component rather than sos? Sachidananda already filed a bug against sos for this where we'd been discussing possible solutions:
If we had a means to re-set the altered statedump dir parameter then we could use that. Another alternative would be to permit a per-statedump override of this path.
Thanks for the feedback. The complexity is understood. For dumping the statedumps of glusterfs in a separate directory instead of tmp, changes are being done. The patch has to accepted in glusterfs repository. Once that is done, will be sending the changes as per your feed back.
Thanks - that would be great and a big help to us. I'll talk to Niels de Vos about trying to get access to a testing system with these changes. This would allow us to have a couple of versions of the support staged and then select one depending on what we have available as the release deadline draws near.
Bryn, wanted to check if the changes proposed here by Raghavendra Bhat can go in RHS 2.0 Update2 release?
There's a bug in the postproc method of the patch committed to branch rhs-2.0-rhel-6 in e73a1e1. The current code unconditionally tries to remove files that do not exist on non-gluster hosts:
71 def postproc(self):
72 for dirs in os.listdir(self.statedump_dir):
73 os.remove(self.statedump_dir + '/' + dirs);
This triggers an exception when run on RHEL or other products - in normal mode it is silently logged but with --debug it causes a trap to the python debugger:
Running plugins. Please wait ...
Completed [54/54] ...
Traceback (most recent call last):
File "/usr/sbin/sosreport", line 23, in <module>
File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 835, in sosreport
OSError: [Errno 2] No such file or directory: '/tmp/glusterfs-statedumps'
-> for dirs in os.listdir(self.statedump_dir):
The indentation is also a bit odd (4 chars + 5 chars within the fn).
I've added this patch to the rhel-6.4 branch to fix this:
diff -up sos-2.2/sos/plugins/gluster.py.orig sos-2.2/sos/plugins/gluster.py
--- sos-2.2/sos/plugins/gluster.py.orig 2012-10-18 17:58:33.466134685 +0100
+++ sos-2.2/sos/plugins/gluster.py 2012-10-18 17:59:49.861201893 +0100
@@ -69,11 +69,15 @@ class gluster(sos.plugintools.PluginBase
ret = string.count (last_line, 'DUMP_END_TIME');
- for dirs in os.listdir(self.statedump_dir):
- os.remove(self.statedump_dir + '/' + dirs);
+ if not os.path.exists(self.statedump_dir):
+ for dirs in os.listdir(self.statedump_dir):
self.collectExtOutput("/usr/sbin/gluster peer status")
Let me know if you would like a new bug for this. It would be good to get this fixed so that the two branches stay in-sync.
sure, will make this change and make one more brewbuild. I can use this bug itself for the commit.
But I wonder how '/tmp/glusterfs-statedumps' directory was not found. It is created by the gluster plugin for the sosreport itself. (Its created in the function make_preparations which is executed before postproc).
For 2.0.z this bug should be VERIFIED (as per bug 849546)
> --- Comment #15 from Sachidananda Urs <firstname.lastname@example.org> ---
> After updating with newer version of sosreport package, I see the bug is fixed.
> Now the glusterdumps are included in the sosreport archive.
Will mark it MODIFIED now, and will keep the bug for RHEL open... because we havn't yet fixed it in RHEL branch.
In reply to comment #11 setup() (which creates the directory) is only invoked if checkenabled() returns true. The postproc() method is called unconditionally so the module needs to be able to handle the fact that it is not enabled.
Tested with new sosreport package: http://download.devel.redhat.com/brewroot/packages/sos/2.2/17.2.el6rhs/noarch/sos-2.2-17.2.el6rhs.noarch.rpm
Now the sosreport puts gluster related statedumps and other information in gluster specific directory instead of tmp/ under sosreport.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
For information on the advisory, and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.