Receiving error message while running a xen guest that states, "BUG: soft lockup detected on CPU#0!" This seems to occur a lot when running 'yum -y update' on both the xen host and guest, both of which running Fedora Core 5 Test 2. The error occurs at other times as well, but seems virtually 100% reproducable for me if I run yum on both the host and guest at the same time. The console output from one attempt to run yum is attached to this bug report. The guest console does not respond for several seconds upon receiving this message, but does return and continue exactly where it left off without any problems. Nothing is recorded in /var/log/dmesg when these errors occur.
Created attachment 124800 [details] Xen guest console showing error messages
Additional Info: This problem existed in every version of the hypervisor/guest kernels I have used up to and including kernel-xen-hypervisor-2.6.15-1.1955_FC5/kernel-xen- guest-2.6.15-1.1955_FC5 I am also not sure if the systems low specs could be contributing to the frequency of the error messages. The system is a Dell Inspiron 8000 Laptop with a PIII 700 MHz processor, and 512 MB of RAM.
This still occurs on FC5T3 host/guests. I have been running through some different scenarios to try to determine what is causing this. It seems that the guest locks up when the host starts the line that reads: developmen: ################################################## 4303/4303 It looks as though, any time I am connected to the xen console (using 'xm console fc5t3xen') and try to run yum on both simultaneously this happens. It does not matter if it is two tabs in a Gnome Terminal, two separate Gnome Terminals, two ssh sessions to the host in which the guest console is connected, with a Gnome session open on the desktop and an ssh session to the host with the guest console open, etc. It basically occurs at any time if I try to run yum from both trying to using the xen console to initiate the process for the guest. However, if I open an SSH session to each individually (or just an ssh session to the guest at least) and run yum from both without attempting to use the xen guest console, then everything appears to work perfectly normal. Because of that I tend to believe that the guest is having a problem updating it's console while the host is updating one of its terminal sessions.
These bugs are being closed since a large number of updates have been released after the FC5 test1 and test2 releases. Kindly update your system by running yum update as root user or try out the third and final test version of FC5 being released in a short while and verify if the bugs are still present on the system .Reopen or file new bug reports as appropriate after confirming the presence of this issue. Thanks
Read previous comment. This still occurs with FC5T3 guest and host on a clean install.
This is almost certainly because of the raised priority dom0 has over domU. It may well be worthwhile modifying HV scheduler defaults to deal with this case.
From the xen-devel mailing list. It may be an idea to have defaults like this in our hypervisor... Date: Tue, 21 Feb 2006 11:01:06 +1100 From: James Harper Subject: RE: [Xen-devel] dom0 starves guests off CPU I found this too when doing a compile in dom0. Search the archives for a thread titled 'Performance problems' from January this year. Something like: xm sched-sedf <domID> 0 0 0 1 1 was suggested there and it works for me!
http://lists.xensource.com/archives/html/xen-devel/2006-02/msg00720.html for the recent xen-devel thread.
xm sched-sedf 0 0 0 0 1 1 seems to have made the two play much more nicely together. While compiling and running yum on the host I was able to run yum from the guest console and not once suffer one of these error messages.
*** Bug 185081 has been marked as a duplicate of this bug. ***
This should be fixed now in kernel-xen0-2.6.15-1.2054_FC5. Plase verify.
Appears to be working well. Can't force a soft lockup message out my system. :)
Created attachment 126269 [details] Errors still occuring with 2054 kernel under high CPU load
I hate to do this, after saying that everything was good. I downloaded the BOINC client (http://boinc.berkeley.edu) last night to play with last night, because I was bored and left it running all night. When I woke up in the morning the console connected to my xen guest had all the errors in the attachment sitting on it. The problem seems to have improved in that, when I catch the error message occuring, the guest is immediately responsive again, not like in the passed where it would hang for several seconds, possibly even minutes. Whatever is catching the condition and spitting out the error message seems almost too sensitive. Previously, even if I did not see the message I would know that something was wrong by the guest locking up and being unresponsive. Now, I would have no idea there was a problem, save for the error message...
What if you re-run the above load all night with the manual dom0 workaround? xm sched-sedf <domID> 0 0 0 1 1
running the manual workaround seems to prevent the errors from occurring all together. Without it, it still seems more difficult than with previous kernels to produce the errors, but still possible.
*** Bug 186049 has been marked as a duplicate of this bug. ***
There was a spec file problem which prevented the updated scheduler defaults from taking effect in 1.2054; we are preparing an update kernel to fix this.
i'm also seeing this, on a lowly pIII 450, 768MB ram, with no real load on either dom0 or domU. FC5 final, yum updated. Not too much to add, but didn't see a way to get me on the cc list w/out adding a comment. :-/
This should be fixed with the latest kernels on fedora-updates-testing (currently at 2.6.16-1.2069_FC5), can you confirm? Thanks.
Still happening with 2069 installed on the host and guest. Attached a copy of my console output from the guest.
Created attachment 126892 [details] Console output with Xen0/XenU 2069 installed This is the output from a console after installing the 2069 kernel from Fedora Test Updates.
What if you try the manual setting again: xm sched-sedf <domID> 0 0 0 1 1
I entered in the manual fix after getting the errors (I received about 5 error messages in about an hour of testing before entering the manual setting) and once again get no error messages with the manual setting.
Created attachment 127197 [details] Updated patch I've reworked the hypervisor patch to correctly simulate the effect of the manual workaround. I haven't been able to reproduce the problem with the existing patch, but heavy testing with this new patch has not revealed any new problems. Not sure when Xen will be enabled in the rawhide kernel rpm, so you may want to try this patch with an srpm build -- just drop this patch on top of the file xen-sched-sedf.patch in SOURCES.
I applied the patch and recompiled and I have been running for about 3 days now with the host at as close to 100% cpu I could keep it the entire time. I performed various tasks with the guest at various different times and was unable to create the errors. It looks like this patch is working good.
Thanks for the testing.