Description of problem: Attempting to read from /sys/hypervisor/uuid in Dom0 will hang indefinitely if XenStored hasn't been launched. Version-Release number of selected component (if applicable): uname -r 2.6.18-1.2747.el5xen kernel-xen-2.6.18-1.2747.el5 Confirmed same behaviour on 32 & 64 bit kernels. How reproducible: Always, provided XenD has *never* been started since boot Steps to Reproduce: 1. chkconfig xend off 2. Reboot Dom0 in kernel-xen 3. # cat /sys/hypervisor/uuid Actual results: Hangs indefinitely Expected results: Prints 0000000000000000000000000000000000 (16 of them) Additional info: SysRq+t shows the following trace: Jan 29 07:07:47 dhcp-5-234 kernel: cat D ffff88004c4c3d98 0 388 344 (NOTLB) Jan 29 07:07:47 dhcp-5-234 kernel: ffff88004c4c3d98 ffff88004c4c3d38 0000014f00000002 0000000000000008 Jan 29 07:07:47 dhcp-5-234 kernel: ffff88007406a820 ffffffff804b9a00 00000000000d403f ffff88007406aa08 Jan 29 07:07:47 dhcp-5-234 kernel: ffff88007406a820 ffffffffffffffff Jan 29 07:07:47 dhcp-5-234 kernel: Call Trace: Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff80219798>] vsnprintf+0x33b/0x59e Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff803926bd>] read_reply+0x85/0xf5 Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff80294dbe>] autoremove_wake_function+0x0/0x2e Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff8030512d>] inode_has_perm+0x56/0x63 Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff803928a5>] xs_talkv+0xba/0x176 Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff80392ab1>] xs_single+0x3e/0x43 Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff80392d64>] xenbus_read+0x3c/0x53 Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff8038f0b9>] uuid_show+0x1e/0x7c Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff802e8f4c>] sysfs_read_file+0xa5/0x13f Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff8020b3ae>] vfs_read+0xcb/0x171 Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff802116b4>] sys_read+0x45/0x6e Jan 29 07:07:47 dhcp-5-234 kernel: [<ffffffff8025c65d>] tracesys+0xa7/0xb2
Possible work-around: read /sys/hypervisor/uuid only if xenstored is running, check with kill -0 `cat /var/run/xenstore.pid` or equivalent.
Ick. At that point, I'd almost just go w/ using the bios stuff we need for fully-virt to pick up UUID's. I've got emails out on that to see if all we need is to have xenstore running for that to show up in HAL... it may be a lot more robust to go off of that.
Proposing this as a blocker for RHEL5 GA, either we fix this issue, or bz # 224494 becomes a RHEL5 blocker w/ some work-around.
Aside from the option mentioned by Markus in comment #1, the other possible workaround is to look for UUID in the SMBIOS / DMIDecode data first, only trying the /sys/hypervisor/uuid file if SMBIOS data is not present. A baremetal or Dom0 kernel/host will typically always have SMBIOS data avaiable, so by preferring SMBIOS you should minimise chances of hitting this bug.
Bret is working on this fix in bug 224494 in rhn_register. Moved this bugzilla to 5.1. Note that depending on Bret's fix, this issue may be mute.
Created attachment 147079 [details] Crude but simple patch to avoid the hang The read hangs because communication with xenstored blocks. Sending the request can block (ring buffer filled up already), receiving the reply will block. This is a crude but simple patch to fail the read with EAGAIN right away unless xenstored is known to have started. The read will still hang if xenstored starts okay, but later dies for whatever reason. Note that stopping xend does not kill xenstored. Death of xenstored loses all xenstore contents, which makes Xen quite unhappy.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Comment #5 says `this issue may be mute'. It blocks bz#224494, which is now CLOSED. This suggests to me that it is indeed moot (and doesn't block). Is it moot or not? I'm not sure who can answer this, and I'm making this a NEEDINFO from Pete just because bz#224494 is assigned to him.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
This bug lingered in comment#8's NEEDINFO for months, until comment#9 switched it back to ASSIGNED without answering my question. Instead of setting NEEDINFO again, I now simply assume that the issue is indeed moot and resolve it WONTFIX. If it is not, please reopen the bug.
Not moot, have just hit this on a production server with patches up to 2010-03-17 10:25 GMT applied.
Hi Markus, I am faceing the same issue regarding Rhel 5.. Which is gerating unwanted services in back hand as fallows. 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 396 cat /sys/hypervisor/uuid 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print progname ":\n"????? progname="";???? }???? ----------------- After some time server get hang becouse of this OS kindly letmeknow after the installing this Patch is this issue will resolve.. please tell me how to install this Patch..in Rhel 5 Thanks Umesh (In reply to comment #6) > Created an attachment (id=147079) [details] > Crude but simple patch to avoid the hang > The read hangs because communication with xenstored blocks. Sending the > request can block (ring buffer filled up already), receiving the reply will > block. > This is a crude but simple patch to fail the read with EAGAIN right away unless > xenstored is known to have started. The read will still hang if xenstored > starts okay, but later dies for whatever reason. Note that stopping xend does > not kill xenstored. Death of xenstored loses all xenstore contents, which > makes Xen quite unhappy.
(In reply to comment #12) > Hi, > I am faceing the same issue regarding Rhel 5.. > Which is genrating unwanted services in back hand as fallows. > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 396 cat /sys/hypervisor/uuid > 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print > progname ":\n"????? progname="";???? }???? > 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print > progname ":\n"????? progname="";???? }???? > 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print > progname ":\n"????? progname="";???? }???? > 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print > progname ":\n"????? progname="";???? }???? > 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print > progname ":\n"????? progname="";???? }???? > 668 awk -v progname=/etc/cron.hourly/mcelog.cron progname {????? print > progname ":\n"????? progname="";???? }???? 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1036 /bin/bash /usr/bin/run-parts /etc/cron.hourly 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond 1588 crond > --------------------------------------------------- > After some time server get hang becouse of this Services & memory usage will rise becoz of the services > kindly let me know after the installing this Patch is this issue will resolve.. My server is runing on cluster mode..... > please tell me how to install this Patch..in Rhel 5 > Thanks > Umesh > (In reply to comment #6) > > Created an attachment (id=147079) [details] [details] > > Crude but simple patch to avoid the hang > > The read hangs because communication with xenstored blocks. Sending the > > request can block (ring buffer filled up already), receiving the reply will > > block. > > This is a crude but simple patch to fail the read with EAGAIN right away unless > > xenstored is known to have started. The read will still hang if xenstored > > starts okay, but later dies for whatever reason. Note that stopping xend does > > not kill xenstored. Death of xenstored loses all xenstore contents, which > > makes Xen quite unhappy.