851175 – Avoid volume status for getting statedumps in sosreport

Bug 851175 - Avoid volume status for getting statedumps in sosreport

Summary: Avoid volume status for getting statedumps in sosreport

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	Sachidananda Urs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	858469
TreeView+	depends on / blocked

Reported:	2012-08-23 12:14 UTC by Raghavendra Bhat
Modified:	2013-09-23 22:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:	sos-2.2_17.2.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	858469 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:33:09 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Raghavendra Bhat 2012-08-23 12:14:01 UTC

Description of problem:
Instead of using gluster cli command for statedump to take the statedump, send the USR1 signal to glusterfs processes (may be killall -USR1 glusterfs glusterfsd).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Raghavendra Bhat 2012-08-27 06:08:16 UTC

Also avoid using gluster volume status commands while getting the sosreports. And also include the statedump files generated in the sosreport.

Comment 3 Raghavendra Bhat 2012-08-27 09:49:26 UTC

The problem is when sosreport is executed in the cluster, then volume status commands and statedump commands will be executed parallely. But since the 1st command entered makes glusterd lock entire cluster, the commands executed on the other glusterds will fail thus failing to gather the information. So to avoid parallel command execution, information collection via gluster cli should be avoided (thus not involving glusterd). To avoid gluster cli volume status commands should not be part of the sosreport. Instead USR1 signal should be sent to all the glusterfs processes (killall -USR1 glusterfs glusterfsd) which collects the statedump without locking the cluster.

Comment 4 Raghavendra Bhat 2012-08-31 09:26:58 UTC

diff -pruN old_gluster.py new_gluster.py 
--- old_gluster.py	2012-08-31 05:11:37.633936947 -0400
+++ new_gluster.py	2012-08-31 05:23:49.898666408 -0400
@@ -12,6 +12,7 @@
 ## along with this program; if not, write to the Free Software
 ## Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 
+import time
 import os.path
 import sos.plugintools
 
@@ -71,16 +72,10 @@ class gluster(sos.plugintools.PluginBase
         volume_file = self.collectOutputNow("/usr/sbin/gluster volume info",
                         "gluster_volume_info")
         if volume_file:
-            for volname in self.get_volume_names(volume_file):
-                self.collectExtOutput("gluster volume statedump %s" % volname)
-                self.collectExtOutput("gluster volume statedump %s nfs" % volname)
-                self.collectExtOutput("gluster volume status %s detail" % volname)
-                self.collectExtOutput("gluster volume status %s clients" % volname)
-                self.collectExtOutput("gluster volume status %s mem" % volname)
-                self.collectExtOutput("gluster volume status %s callpool" % volname)
-                self.collectExtOutput("gluster volume status %s inode" % volname)
-                self.collectExtOutput("gluster volume status %s fd" % volname)
-
+            self.collectExtOutput("killall -USR1 glusterfs glusterfsd")
+            # arbitrary timeout for statedump to complete
+            time.sleep(10)
+            self.addCopySpec("/tmp/*.dump*")
         self.collectExtOutput("gluster volume status")
         # collect this last as some of the other actions create log entries
         self.addCopySpec("/var/log/glusterfs")

===============================================================================

The above diff works and the statedump files are now copied in the sosreport generated. Please let us know the comments on it.

Comment 5 Bryn M. Reeves 2012-08-31 09:56:13 UTC

Thanks for the patch proposal - removing the gluster commands is fine and a trivial change that we can make.

Some comments on the statedump changes:

Grabbing files from /tmp with globs is something we try very hard to avoid since we have no simple way to clean up afterwards and may pull in extremely large files that we do not want to collect.

I.e. every run of sosreport will dump some new files in /tmp and if for example there was a huge file at /tmp/database.dump we would drag that in potentially greatly inflating the report size (something that generates customer complaint).

From the sos side it would be preferable if there was a way for a gluster command to generate this content on stdout. Alternately we would need to add code to determine all the PIDs for glusterd processes, send the signals directly to those processes and then construct exact file names to be collected and then removed.

Adding a sleep(10) is also a hack that we cannot really do - since we have around 100 modules now if we allow this behaviour we could end up with very unpredictable run times for the script. Again this has been a major source of customer and GSS annoyance in the past and was a major element of the work done upstream for the 2.0 release that was done for RHEL6. A better option would be to poll or monitor for the file creation - at least this gives us the minimum delay rather than forcing a 10s wait each time.

Btw, is there a reason that this bug is filed against the gluster component rather than sos? Sachidananda already filed a bug against sos for this where we'd been discussing possible solutions:

bug 849546

If we had a means to re-set the altered statedump dir parameter then we could use that. Another alternative would be to permit a per-statedump override of this path.

Comment 6 Raghavendra Bhat 2012-08-31 10:11:19 UTC

Thanks for the feedback. The complexity is understood. For dumping the statedumps of glusterfs in a separate directory instead of tmp, changes are being done. The patch has to accepted in glusterfs repository. Once that is done, will be sending the changes as per your feed back.

Comment 7 Bryn M. Reeves 2012-08-31 10:18:14 UTC

Thanks - that would be great and a big help to us. I'll talk to Niels de Vos about trying to get access to a testing system with these changes. This would allow us to have a couple of versions of the support staged and then select one depending on what we have available as the release deadline draws near.

Comment 8 Amar Tumballi 2012-09-04 19:01:39 UTC

Bryn, wanted to check if the changes proposed here by Raghavendra Bhat can go in RHS 2.0 Update2 release?

Comment 9 Bryn M. Reeves 2012-10-18 17:08:06 UTC

There's a bug in the postproc method of the patch committed to branch rhs-2.0-rhel-6 in e73a1e1. The current code unconditionally tries to remove files that do not exist on non-gluster hosts:

 71     def postproc(self):
 72          for dirs in os.listdir(self.statedump_dir):
 73              os.remove(self.statedump_dir + '/' + dirs);
 74 
 75          os.rmdir(self.statedump_dir);
 76          os.unlink('/tmp/glusterdump.options');

This triggers an exception when run on RHEL or other products - in normal mode it is silently logged but with --debug it causes a trap to the python debugger:

  Running plugins. Please wait ...

  Completed [54/54] ...
Traceback (most recent call last):
  File "/usr/sbin/sosreport", line 23, in <module>
    sosreport(sys.argv[1:])
  File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 835, in sosreport
    plug.postproc()
OSError: [Errno 2] No such file or directory: '/tmp/glusterfs-statedumps'

> /usr/lib/python2.6/site-packages/sos/plugins/gluster.py(72)postproc()
-> for dirs in os.listdir(self.statedump_dir):
(Pdb) exit

The indentation is also a bit odd (4 chars + 5 chars within the fn).

I've added this patch to the rhel-6.4 branch to fix this:

diff -up sos-2.2/sos/plugins/gluster.py.orig sos-2.2/sos/plugins/gluster.py
--- sos-2.2/sos/plugins/gluster.py.orig 2012-10-18 17:58:33.466134685 +0100
+++ sos-2.2/sos/plugins/gluster.py      2012-10-18 17:59:49.861201893 +0100
@@ -69,11 +69,15 @@ class gluster(sos.plugintools.PluginBase
                 ret = string.count (last_line, 'DUMP_END_TIME');

     def postproc(self):
-         for dirs in os.listdir(self.statedump_dir):
-             os.remove(self.statedump_dir + '/' + dirs);
-
-         os.rmdir(self.statedump_dir);
-         os.unlink('/tmp/glusterdump.options');
+        if not os.path.exists(self.statedump_dir):
+            return
+        try:
+            for dirs in os.listdir(self.statedump_dir):
+                os.remove(os.path.join(self.statedump_dir,dirs));
+            os.rmdir(self.statedump_dir);
+            os.unlink('/tmp/glusterdump.options');
+        except:
+            pass

     def setup(self):
         self.collectExtOutput("/usr/sbin/gluster peer status")

Let me know if you would like a new bug for this. It would be good to get this fixed so that the two branches stay in-sync.

Comment 10 Amar Tumballi 2012-10-18 17:46:20 UTC

sure, will make this change and make one more brewbuild. I can use this bug itself for the commit.

Comment 11 Raghavendra Bhat 2012-10-22 06:28:23 UTC

But I wonder how '/tmp/glusterfs-statedumps' directory was not found. It is created by the gluster plugin for the sosreport itself. (Its created in the function make_preparations which is executed before postproc).

Comment 12 Amar Tumballi 2012-10-23 07:58:08 UTC

For 2.0.z this bug should be VERIFIED (as per bug 849546)

> https://bugzilla.redhat.com/show_bug.cgi?id=849546
> 
> --- Comment #15 from Sachidananda Urs <sac> ---
> After updating with newer version of sosreport package, I see the bug is fixed.
> Now the glusterdumps are included in the sosreport archive.
> 

Will mark it MODIFIED now, and will keep the bug for RHEL open... because we havn't yet fixed it in RHEL branch.

Comment 13 Bryn M. Reeves 2012-10-23 09:25:39 UTC

In reply to comment #11 setup() (which creates the directory) is only invoked if checkenabled() returns true. The postproc() method is called unconditionally so the module needs to be able to handle the fact that it is not enabled.

Comment 15 Sachidananda Urs 2013-01-07 13:39:07 UTC

Tested with new sosreport package: http://download.devel.redhat.com/brewroot/packages/sos/2.2/17.2.el6rhs/noarch/sos-2.2-17.2.el6rhs.noarch.rpm 

Now the sosreport puts gluster related statedumps and other information in gluster specific directory instead of tmp/ under sosreport.

Comment 16 Scott Haines 2013-09-23 22:33:09 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.