Bug 371251 - sos hangs when running with a xen kernel where xend has not been started
sos hangs when running with a xen kernel where xend has not been started
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: sos (Show other bugs)
5.1
All Linux
high Severity high
: ---
: ---
Assigned To: Adam Stokes
:
Depends On:
Blocks: 409971 481166
  Show dependency treegraph
 
Reported: 2007-11-08 09:44 EST by Navid Sheikhol-Eslami
Modified: 2010-10-22 16:09 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 481166 (view as bug list)
Environment:
Last Closed: 2009-01-20 16:41:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
First patch - Check is_xenstored_running() before looking at /sys/hypervisor/uuid (1.63 KB, patch)
2008-01-21 03:44 EST, Steve
no flags Details | Diff
Second patch - if is_xenstored_running() look at /sys/hypervisor/uuid else /var/lib/xenstored/tdb (1.79 KB, patch)
2008-01-21 03:46 EST, Steve
no flags Details | Diff

  None (edit)
Description Navid Sheikhol-Eslami 2007-11-08 09:44:04 EST
Description of problem:

SoS hangs when collecting data for the "xen" plugin when running on a Xen kernel
but where the "xend" daemon has not been yet started.

This is because the xen plugin is trying to read "/sys/hypervisor/uuid", which
results in a hung read() operation. This is the same result as simply doing a
"cat /sys/hypervisor/uuid" at this stage.

The easiest solution to this is to manually disable the xen plugin by using the
"-n" option of SoS as follows:

  sosreport -n xen

Alternatively, starting (and even stopping) the "xend" service once should make
/sys/hypervisor/uuid readable, thus sos not hanging while trying to read it.

Version-Release number of selected component (if applicable):

sos-1.7-9.1.el5

How reproducible:

"sosreport -o xen" or more simply "cat /sys/hypervisor/uuid"

Steps to Reproduce:
1.
2.
3.
  
Actual results:

Read hangs.

Expected results:

The meta-file contents are read and operations continue as usual.

Additional info:
Comment 1 Navid Sheikhol-Eslami 2007-11-08 09:48:28 EST
Previous versions of sos were not collecting /sys/hypervisor/uuid and therefore
are not affected.

Please note that this is not a bug in sos, which should however be able to 
gracefully handle these kind of situations by using a timeout.
Comment 2 Steve 2008-01-21 03:36:44 EST
From the attached Issue tracker report:

-----------------------------------------------------
Description of Problem:
If we try to do like "cat /sys/hypervisor/uuid" before starting xensotred, the
command hangup. Because read of /sys/hypervisor/uuid use xenbus, the command
wait a responce from xenstored.  sosreport read /sys/hypervisor/uuid, so if we
try to get sosreport without "service start xend", the sosreport hangup.

How reproducible:
 Always

Step to Reproduce:
 1. chkconfig xend off
 2. reboot
 3. sosreport

Actual Results:
 The sosreport hangup.

Expected Results:
If xenstored is not running, sosreport should avoid reading
/sys/hypervisor/uuid.  And /sys/hypervisor/uuid should be created after starting
xenstored.

Summary of actions taken to resolve issue:
If the command hangup, we can get the return by "service xend start" in another
console.

Location of diagnostic data:
101 /* UUID */
102
103 static ssize_t uuid_show(struct hyp_sysfs_attr *attr, char *buffer)
104 {
105         char *vm, *val;
106         int ret;
107
108         vm = xenbus_read(XBT_NIL, "vm", "", NULL);
109         if (IS_ERR(vm))
110                 return PTR_ERR(vm);
111         val = xenbus_read(XBT_NIL, vm, "uuid", NULL);
112         kfree(vm);
113         if (IS_ERR(val))
114                 return PTR_ERR(val);
115         ret = sprintf(buffer, "%sn", val);
116         kfree(val);
117         return ret;
118 }
119
120 HYPERVISOR_ATTR_RO(uuid);

Hardware configuration:
   Model: PRIMERGY TX200 S3
   CPU Info:  Xeon(R) CPU E5310 1.60GHz
   Memory Info: 8GB

Business Impact:
 Business impact:
    Our MW use sosreport, it starts at /etc/rc3.d/S95xxxx (before xend).  If the
MW cannot get the respornce of sosreport, RC scripts hangup and cannot bootup
the system completely. Our customer would think they cannot boot the system.

 Fix Target: RHEL5.2
 errata Request: Yes
 Hotfix Request: No

Additional Info:
 - the issue should be also occurred on RHEL5.0 xen.
 - I attach the sosreport of the system after starting xend.
-----------------------------------------------------
Comment 3 Steve 2008-01-21 03:43:14 EST
There are two patches provided by the customer as possible fixes. I haven't
tried them out yet, but they appear pretty straight forward. Attaching them both
for review.

- steve
Comment 4 Steve 2008-01-21 03:44:58 EST
Created attachment 292343 [details]
First patch - Check is_xenstored_running() before looking at /sys/hypervisor/uuid
Comment 5 Steve 2008-01-21 03:46:51 EST
Created attachment 292344 [details]
Second patch - if is_xenstored_running() look at /sys/hypervisor/uuid else /var/lib/xenstored/tdb
Comment 6 RHEL Product and Program Management 2008-01-21 03:56:19 EST
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 8 RHEL Product and Program Management 2008-06-02 16:29:00 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 15 Issue Tracker 2008-11-16 20:04:13 EST
Hi,

Following is a comment from FJ:
---
Hi,

The patch included in sos-1.7.9.10 is wrong.
You should remove "xenstore-ls" line.
Please check my patch.

diff -uNrp lib/sos/plugins/xen.py.orig lib/sos/plugins/xen.py
--- lib/sos/plugins/xen.py.orig 2008-11-14 21:19:22.000000000 +0900
+++ lib/sos/plugins/xen.py      2008-11-14 21:19:45.000000000 +0900
@@ -68,7 +68,6 @@ class xen(sos.plugintools.PluginBase):
            # default of dom0, collect lots of system information
            self.addCopySpec("/var/log/xen")
            self.addCopySpec("/etc/xen")
-            self.collectExtOutput("/usr/bin/xenstore-ls")
            self.collectExtOutput("/usr/sbin/xm dmesg")
            self.collectExtOutput("/usr/sbin/xm info")
            self.collectExtOutput("/usr/sbin/xm list")


Best Regards,

Akio Takebe 
---

Best Regards,
M Oshiro


This event sent from IssueTracker by moshiro@redhat.com 
 issue 144875
Comment 16 Adam Stokes 2008-11-17 09:38:17 EST
This is the latest patch :

--- /usr/lib/python2.4/site-packages/sos/plugins/xen.py.orig	2008-01-08 10:22:46.000000000 +0900
+++ /usr/lib/python2.4/site-packages/sos/plugins/xen.py	2008-01-08 11:20:41.000000000 +0900
@@ -38,6 +38,11 @@
             return False
         return True
 
+    def is_running_xenstored(self):
+        xs_pid = os.popen("pidof xenstored").read()
+        xs_pidnum = re.split('\n$',xs_pid)[0]
+        return xs_pidnum.isdigit()
+
     def domCollectProc(self):
         self.addCopySpec("/proc/xen/balloon")
         self.addCopySpec("/proc/xen/capabilities")
@@ -63,12 +68,21 @@
             # default of dom0, collect lots of system information
             self.addCopySpec("/var/log/xen")
             self.addCopySpec("/etc/xen")
-            self.collectExtOutput("/usr/bin/xenstore-ls")
             self.collectExtOutput("/usr/sbin/xm dmesg")
             self.collectExtOutput("/usr/sbin/xm info")
             self.collectExtOutput("/usr/sbin/brctl show")
             self.domCollectProc()
-            self.addCopySpec("/sys/hypervisor")
+            self.addCopySpec("/sys/hypervisor/version")
+            self.addCopySpec("/sys/hypervisor/compilation")
+            self.addCopySpec("/sys/hypervisor/properties")
+            self.addCopySpec("/sys/hypervisor/type")
+            if is_xenstored_running(): 
+                self.addCopySpec("/sys/hypervisor/uuid")
+                self.collectExtOutput("/usr/bin/xenstore-ls")
+            else:
+                # we need tdb instead of xenstore-ls if cannot get it.
+                self.addCopySpec("/var/lib/xenstored/tdb")
+                
             # FIXME: we *might* want to collect things in /sys/bus/xen*,
             # /sys/class/xen*, /sys/devices/xen*, /sys/modules/blk*,
             # /sys/modules/net*, but I've never heard of them actually being


Not sure why you want xenstore-ls removed entirely.

Thanks,
Adam
Comment 17 Issue Tracker 2008-11-18 04:36:40 EST
Dear Adam-san,

Could you please add Fujitsu Confidential Group to bz#371251 asap? 

Best Regards,
M Oshiro

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by moshiro@redhat.com 
 issue 144875
Comment 18 Issue Tracker 2008-11-18 04:41:07 EST
Dear Adam-san,

Following comments are from FJ:
---
Event posted 11-17-2008 11:52pm JST by asakai 	
Hi,

The patch is correct.
xenstore-ls should also access to xenstored with xenbus.
If there is not xenstored, we must not use xenstore-ls.

Thanks,

Akio Takebe
---   

---
Event posted 11-18-2008 08:51am JST by asakai 	
Hi,

Just FYI, the patch including in sos-1.7-9.13.el5.src.rpm is below.
The following patch is wrong.
We need to remove the "xenstore-ls" line if there is not xenstored.

BTW, could you reflect my comments to BZ371251, Oshiro-san?

# cat ../../SOURCES/sos-xend-no-hang.patch
diff -up sos-1.7/lib/sos/plugins/xen.py.stokes
sos-1.7/lib/sos/plugins/xen.py
--- sos-1.7/lib/sos/plugins/xen.py.stokes 2008-09-18 11:14:29.000000000
-0400
+++ sos-1.7/lib/sos/plugins/xen.py 2008-09-18 11:16:36.000000000 -0400
@@ -38,6 +38,11 @@ class xen(sos.plugintools.PluginBase):
            return False
        return True

+    def is_running_xenstored(self):
+        xs_pid = os.popen("pidof xenstored").read()
+        xs_pidnum = re.split('n$',xs_pid)[0]
+        return xs_pidnum.isdigit()
+
    def domCollectProc(self):
        self.addCopySpec("/proc/xen/balloon")
        self.addCopySpec("/proc/xen/capabilities")
@@ -68,7 +73,17 @@ class xen(sos.plugintools.PluginBase):
            self.collectExtOutput("/usr/sbin/xm info")
            self.collectExtOutput("/usr/sbin/brctl show")
            self.domCollectProc()
-            self.addCopySpec("/sys/hypervisor")
+            self.addCopySpec("/sys/hypervisor/version")
+            self.addCopySpec("/sys/hypervisor/compilation")
+            self.addCopySpec("/sys/hypervisor/properties")
+            self.addCopySpec("/sys/hypervisor/type")
+            if is_xenstored_running():
+                self.addCopySpec("/sys/hypervisor/uuid")
+                self.collectExtOutput("/usr/bin/xenstore-ls")
+            else:
+                # we need tdb instead of xenstore-ls if cannot get it.
+                self.addCopySpec("/var/lib/xenstored/tdb")
+
            # FIXME: we *might* want to collect things in /sys/bus/xen*,
            # /sys/class/xen*, /sys/devices/xen*, /sys/modules/blk*,
            # /sys/modules/net*, but I've never heard of them actually
being

Best Regards,

Akio Takebe
---
   
---
Event posted 11-18-2008 05:01pm JST by asakai 	
Hi, Oshiro-san

Could you link this IT to the BZ?
We cannot write the BZ directly.
My comments are not reflected immediately.

I'll be out of office from tommorow.
So I want to send my commets as soon as possible.
I'm sorry for the inconvenience.

Thanks,
Akio Takebe 
---

Best Regards,
M Oshiro



This event sent from IssueTracker by moshiro@redhat.com 
 issue 144875
Comment 22 errata-xmlrpc 2009-01-20 16:41:55 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0171.html

Note You need to log in before you can comment on or make changes to this bug.