Description of problem: SoS hangs when collecting data for the "xen" plugin when running on a Xen kernel but where the "xend" daemon has not been yet started. This is because the xen plugin is trying to read "/sys/hypervisor/uuid", which results in a hung read() operation. This is the same result as simply doing a "cat /sys/hypervisor/uuid" at this stage. The easiest solution to this is to manually disable the xen plugin by using the "-n" option of SoS as follows: sosreport -n xen Alternatively, starting (and even stopping) the "xend" service once should make /sys/hypervisor/uuid readable, thus sos not hanging while trying to read it. Version-Release number of selected component (if applicable): sos-1.7-9.1.el5 How reproducible: "sosreport -o xen" or more simply "cat /sys/hypervisor/uuid" Steps to Reproduce: 1. 2. 3. Actual results: Read hangs. Expected results: The meta-file contents are read and operations continue as usual. Additional info:
Previous versions of sos were not collecting /sys/hypervisor/uuid and therefore are not affected. Please note that this is not a bug in sos, which should however be able to gracefully handle these kind of situations by using a timeout.
From the attached Issue tracker report: ----------------------------------------------------- Description of Problem: If we try to do like "cat /sys/hypervisor/uuid" before starting xensotred, the command hangup. Because read of /sys/hypervisor/uuid use xenbus, the command wait a responce from xenstored. sosreport read /sys/hypervisor/uuid, so if we try to get sosreport without "service start xend", the sosreport hangup. How reproducible: Always Step to Reproduce: 1. chkconfig xend off 2. reboot 3. sosreport Actual Results: The sosreport hangup. Expected Results: If xenstored is not running, sosreport should avoid reading /sys/hypervisor/uuid. And /sys/hypervisor/uuid should be created after starting xenstored. Summary of actions taken to resolve issue: If the command hangup, we can get the return by "service xend start" in another console. Location of diagnostic data: 101 /* UUID */ 102 103 static ssize_t uuid_show(struct hyp_sysfs_attr *attr, char *buffer) 104 { 105 char *vm, *val; 106 int ret; 107 108 vm = xenbus_read(XBT_NIL, "vm", "", NULL); 109 if (IS_ERR(vm)) 110 return PTR_ERR(vm); 111 val = xenbus_read(XBT_NIL, vm, "uuid", NULL); 112 kfree(vm); 113 if (IS_ERR(val)) 114 return PTR_ERR(val); 115 ret = sprintf(buffer, "%sn", val); 116 kfree(val); 117 return ret; 118 } 119 120 HYPERVISOR_ATTR_RO(uuid); Hardware configuration: Model: PRIMERGY TX200 S3 CPU Info: Xeon(R) CPU E5310 1.60GHz Memory Info: 8GB Business Impact: Business impact: Our MW use sosreport, it starts at /etc/rc3.d/S95xxxx (before xend). If the MW cannot get the respornce of sosreport, RC scripts hangup and cannot bootup the system completely. Our customer would think they cannot boot the system. Fix Target: RHEL5.2 errata Request: Yes Hotfix Request: No Additional Info: - the issue should be also occurred on RHEL5.0 xen. - I attach the sosreport of the system after starting xend. -----------------------------------------------------
There are two patches provided by the customer as possible fixes. I haven't tried them out yet, but they appear pretty straight forward. Attaching them both for review. - steve
Created attachment 292343 [details] First patch - Check is_xenstored_running() before looking at /sys/hypervisor/uuid
Created attachment 292344 [details] Second patch - if is_xenstored_running() look at /sys/hypervisor/uuid else /var/lib/xenstored/tdb
This request was evaluated by Red Hat Product Management for inclusion, but this component is not scheduled to be updated in the current Red Hat Enterprise Linux release. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?".
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Hi, Following is a comment from FJ: --- Hi, The patch included in sos-1.7.9.10 is wrong. You should remove "xenstore-ls" line. Please check my patch. diff -uNrp lib/sos/plugins/xen.py.orig lib/sos/plugins/xen.py --- lib/sos/plugins/xen.py.orig 2008-11-14 21:19:22.000000000 +0900 +++ lib/sos/plugins/xen.py 2008-11-14 21:19:45.000000000 +0900 @@ -68,7 +68,6 @@ class xen(sos.plugintools.PluginBase): # default of dom0, collect lots of system information self.addCopySpec("/var/log/xen") self.addCopySpec("/etc/xen") - self.collectExtOutput("/usr/bin/xenstore-ls") self.collectExtOutput("/usr/sbin/xm dmesg") self.collectExtOutput("/usr/sbin/xm info") self.collectExtOutput("/usr/sbin/xm list") Best Regards, Akio Takebe --- Best Regards, M Oshiro This event sent from IssueTracker by moshiro issue 144875
This is the latest patch : --- /usr/lib/python2.4/site-packages/sos/plugins/xen.py.orig 2008-01-08 10:22:46.000000000 +0900 +++ /usr/lib/python2.4/site-packages/sos/plugins/xen.py 2008-01-08 11:20:41.000000000 +0900 @@ -38,6 +38,11 @@ return False return True + def is_running_xenstored(self): + xs_pid = os.popen("pidof xenstored").read() + xs_pidnum = re.split('\n$',xs_pid)[0] + return xs_pidnum.isdigit() + def domCollectProc(self): self.addCopySpec("/proc/xen/balloon") self.addCopySpec("/proc/xen/capabilities") @@ -63,12 +68,21 @@ # default of dom0, collect lots of system information self.addCopySpec("/var/log/xen") self.addCopySpec("/etc/xen") - self.collectExtOutput("/usr/bin/xenstore-ls") self.collectExtOutput("/usr/sbin/xm dmesg") self.collectExtOutput("/usr/sbin/xm info") self.collectExtOutput("/usr/sbin/brctl show") self.domCollectProc() - self.addCopySpec("/sys/hypervisor") + self.addCopySpec("/sys/hypervisor/version") + self.addCopySpec("/sys/hypervisor/compilation") + self.addCopySpec("/sys/hypervisor/properties") + self.addCopySpec("/sys/hypervisor/type") + if is_xenstored_running(): + self.addCopySpec("/sys/hypervisor/uuid") + self.collectExtOutput("/usr/bin/xenstore-ls") + else: + # we need tdb instead of xenstore-ls if cannot get it. + self.addCopySpec("/var/lib/xenstored/tdb") + # FIXME: we *might* want to collect things in /sys/bus/xen*, # /sys/class/xen*, /sys/devices/xen*, /sys/modules/blk*, # /sys/modules/net*, but I've never heard of them actually being Not sure why you want xenstore-ls removed entirely. Thanks, Adam
Dear Adam-san, Could you please add Fujitsu Confidential Group to bz#371251 asap? Best Regards, M Oshiro Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by moshiro issue 144875
Dear Adam-san, Following comments are from FJ: --- Event posted 11-17-2008 11:52pm JST by asakai Hi, The patch is correct. xenstore-ls should also access to xenstored with xenbus. If there is not xenstored, we must not use xenstore-ls. Thanks, Akio Takebe --- --- Event posted 11-18-2008 08:51am JST by asakai Hi, Just FYI, the patch including in sos-1.7-9.13.el5.src.rpm is below. The following patch is wrong. We need to remove the "xenstore-ls" line if there is not xenstored. BTW, could you reflect my comments to BZ371251, Oshiro-san? # cat ../../SOURCES/sos-xend-no-hang.patch diff -up sos-1.7/lib/sos/plugins/xen.py.stokes sos-1.7/lib/sos/plugins/xen.py --- sos-1.7/lib/sos/plugins/xen.py.stokes 2008-09-18 11:14:29.000000000 -0400 +++ sos-1.7/lib/sos/plugins/xen.py 2008-09-18 11:16:36.000000000 -0400 @@ -38,6 +38,11 @@ class xen(sos.plugintools.PluginBase): return False return True + def is_running_xenstored(self): + xs_pid = os.popen("pidof xenstored").read() + xs_pidnum = re.split('n$',xs_pid)[0] + return xs_pidnum.isdigit() + def domCollectProc(self): self.addCopySpec("/proc/xen/balloon") self.addCopySpec("/proc/xen/capabilities") @@ -68,7 +73,17 @@ class xen(sos.plugintools.PluginBase): self.collectExtOutput("/usr/sbin/xm info") self.collectExtOutput("/usr/sbin/brctl show") self.domCollectProc() - self.addCopySpec("/sys/hypervisor") + self.addCopySpec("/sys/hypervisor/version") + self.addCopySpec("/sys/hypervisor/compilation") + self.addCopySpec("/sys/hypervisor/properties") + self.addCopySpec("/sys/hypervisor/type") + if is_xenstored_running(): + self.addCopySpec("/sys/hypervisor/uuid") + self.collectExtOutput("/usr/bin/xenstore-ls") + else: + # we need tdb instead of xenstore-ls if cannot get it. + self.addCopySpec("/var/lib/xenstored/tdb") + # FIXME: we *might* want to collect things in /sys/bus/xen*, # /sys/class/xen*, /sys/devices/xen*, /sys/modules/blk*, # /sys/modules/net*, but I've never heard of them actually being Best Regards, Akio Takebe --- --- Event posted 11-18-2008 05:01pm JST by asakai Hi, Oshiro-san Could you link this IT to the BZ? We cannot write the BZ directly. My comments are not reflected immediately. I'll be out of office from tommorow. So I want to send my commets as soon as possible. I'm sorry for the inconvenience. Thanks, Akio Takebe --- Best Regards, M Oshiro This event sent from IssueTracker by moshiro issue 144875
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0171.html