Bug 466630
Summary: | processes in rawhide guest hang in schedule_timeout (problem with timer interrupts?) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Layton <jlayton> |
Component: | xen | Assignee: | Rik van Riel <riel> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.3 | CC: | clalance, mathieu-acct, steved, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-07-31 11:10:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeff Layton
2008-10-11 20:48:36 UTC
BTW, I have a core dump from the guest in this state if it's helpful. I also seem to be able to generally reproduce this within a day or so of running the reproducer for the CIFS problem. Let me know if you need other info or access to the dom0 and domU in order to troubleshoot this. Collecting a bit more data while I have the guest in this state. Doing: [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745073, 353702315}) = 0 Sat Oct 11 13:11:13 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745074, 225444917}) = 0 Sat Oct 11 13:11:14 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745070, 174429480}) = 0 Sat Oct 11 13:11:10 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745070, 706055296}) = 0 Sat Oct 11 13:11:10 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745071, 144712846}) = 0 Sat Oct 11 13:11:11 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745071, 506527926}) = 0 Sat Oct 11 13:11:11 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745071, 894165569}) = 0 Sat Oct 11 13:11:11 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745072, 218262996}) = 0 Sat Oct 11 13:11:12 EDT 2008 [root@dhcp231-229 ~]# strace -e clock_gettime date clock_gettime(CLOCK_REALTIME, {1223745072, 517668451}) = 0 Sat Oct 11 13:11:12 EDT 2008 ...so here the seconds field is jumping backward and forward all over the place. Unfortunately, I can't install debuginfo now so that I can look at the xtime directly. I'll make it a point to get debuginfo installed before I reproduce this again so we might be able to gather more info. Booted without hpet=disable and the problem seems to occur much more quickly. I did: # echo t > /proc/sysrq-trigger multiple times and got the jiffies value for the CPU each time. It never changed: [root@dhcp231-229 ~]# echo t > /proc/sysrq-trigger [root@dhcp231-229 ~]# dmesg | grep jiffies .jiffies : 4300384685 .jiffies : 4300384685 [root@dhcp231-229 ~]# echo t > /proc/sysrq-trigger [root@dhcp231-229 ~]# dmesg | grep jiffies .jiffies : 4300384685 .jiffies : 4300384685 [root@dhcp231-229 ~]# echo t > /proc/sysrq-trigger [root@dhcp231-229 ~]# dmesg | grep jiffies .jiffies : 4300384685 .jiffies : 4300384685 ...though maybe this is expected with a tickless kernel. I'm going to try booting with nohz=0 and see whether it works around the problem at all... Booting the guest with nohz=0 allowed the reproducer to run overnight without a hang. So whatever this problem is, it does seem to be related to using a tickless guest kernel. That said, the guest seems to have lost ~13 minutes of wallclock time in the ~7.5 hours since I booted it: dom0$ date Sun Oct 12 06:51:09 EDT 2008 domU# date Sun Oct 12 06:38:05 EDT 2008 I should also mention that the dom0 is running 2.6.18-117.el5.jtltest.51xen. This is basically a -117.el5 kernel with a few extra patches (mostly CIFS and NFS stuff). Hand-wavy hypothesis: The guest is losing timer interrupts occasionally. On a "normal" guest, this still mostly works but manifests itself as clock drift vs. the dom0. On a tickless guest, we rely on a particular interrupt to wake up processes, and reset the timer for the next interrupt. Losing this interrupt is more critical, and processes end up stuck when timers can't pop. Sounds like maybe part of the hardware emulation that is needed to make tickless work right is not doing the right thing. Jeff, does "xm dmesg" in dom0 show any error messages around the time your tickless FV guest gets stuck? I don't think so. Here's a snip of the tail of "xm dmesg". Note that I've reproduced this problem several times in the domU since rebooting the dom0: (XEN) Scrubbing Free RAM: .done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type \047CTRL-a\047 three times to switch input to Xen). (XEN) Freed 100kB init memory. (XEN) ioapic_guest_write: apic=0, pin=4, old_irq=4, new_irq=4 (XEN) ioapic_guest_write: old_entry=000009f1, new_entry=000109f1 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! Still, I've set up the reproducer again and will verify whether any new messages pop once the problem shows up. Confirmed. I was able to reproduce the lockup, and no messages popped up in "xm dmesg"... The interesting bit is that the host ran for a *long* time booted with no extra kernel parms and it didn't lock up. I then booted it with hpet=disable and it reproduced after just a little while. Yesterday, I was able to reproduce it faster when booted without that parm. Very inconsistent, unfortunately :( Any other info you'd like me to gather while the guest is in this state? The guest ran the test overnight without the hpet=disable parm on the kernel command line. I'm fairly sure that I did see it lock up before with that parameter set, so I'm not sure why it's not doing so now. In any case, I'm rerunning the test now with hpet=disable set: # cat /sys/devices/system/clocksource/clocksource0/current_clocksource acpi_pm I've had more consistent results in reproducing it this way. Don mentioned booting the guest with "divider=10". Once I reconfirm to myself that I can reproduce the problem, I'll retry it with that. Was able to reproduce the lockup yesterday after about 4 hours of the reproducer running. I just booted the guest with divider=10, but I don't see any indiciation that the parameter was recognized. Still, rerunning the test now... I was able to also reproduce the hang today with divider=10. Complete kernel command line is: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200 hpet=disable divider=10 Guest kernel is: 2.6.27-3.fc10.x86_64.debug Based on a cursory look at the code, I sort of suspect that the tick divider isn't in rawhide kernels, but that's not really an area I'm that familiar with. I've still not been able to reproduce this again without the hpet disabled. I'm going to rerun the reproducer tonight and try that again. The guest ran the reproducer overnight using the hpet timer: # cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet ...so either I was mistaken that I had booted the guest with the hpet timer before, or it's just harder to reproduce it there and I just got lucky. Either way, it's pretty consistently reproducible within 8 hours or so when the guest uses the acpi_pm timer. Ran the reproducer overnight last night and the box was hung this morning. This time I was using hpet timer. I have a core if it'll be helpful: crash> p saved_command_line saved_command_line = $10 = 0xffff8800015f3580 "ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200" crash> bt PID: 22188 TASK: ffff880018cc8000 CPU: 1 COMMAND: "umount.cifs" #0 [ffff880003681d88] trace_hardirqs_on at ffffffff81067af0 #1 [ffff880003681da0] trace_hardirqs_on_caller at ffffffff81067abf #2 [ffff880003681dc0] trace_hardirqs_on at ffffffff81067af0 #3 [ffff880003681df0] __mod_timer at ffffffff8104fb44 #4 [ffff880003681e40] schedule_timeout at ffffffff8138a01e #5 [ffff880003681eb0] schedule_timeout_uninterruptible at ffffffff8138a072 #6 [ffff880003681ec0] msleep at ffffffff8104fb6f #7 [ffff880003681ed0] cifs_umount_begin at ffffffffa01d44bb #8 [ffff880003681ef0] sys_umount at ffffffff810ea2d9 #9 [ffff880003681f80] system_call_fastpath at ffffffff8101139a RIP: 000000300fae9ab7 RSP: 00007fff685e4b78 RFLAGS: 00010206 RAX: 00000000000000a6 RBX: ffffffff8101139a RCX: 4f4800662d003173 RDX: 00000000ff534d42 RSI: 0000000000000001 RDI: 00007fff685e6968 RBP: 00007fff685e4c50 R8: fefefefefefefeff R9: 0000000000000001 R10: 00007fff685e4900 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fff685e4d20 R15: 0000000000400ac0 ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b ...a lot of other processes are stuck in schedule_timeout with similar stack traces. The guest kernel is: 2.6.27-13.fc10.x86_64.debug This is a bit of a long shot, but it wouldn't hurt trying out my test RPMs to see whether this is a duplicate of bugs 480689 and 449346. This bug could be caused by a combination of two main factors: - while doing disk IO, one VCPU of an HVM guest can miss timer ticks - Xen did not re-deliver those missed timer ticks later on, causing clock skew between VCPUs inside an HVM guest Both of these issues should be resolved with the backport of the AIO disk handling code and upstream Xen 'no missed-tick accounting' timer code. Please test the test RPMs from http://people.redhat.com/riel/.xenaiotime/ and let us know if those (experimental!) test packages resolve the issue. Oh goodness...I had forgotten about this bug... My rawhide guests are currently pretty stable with the hpet timer, but I'm booting them with nohz=off. I'll plan to do some testing today with a tickless guest and see how it behaves. (In reply to comment #16) > Oh goodness...I had forgotten about this bug... > > My rawhide guests are currently pretty stable with the hpet timer, but I'm > booting them with nohz=off. I'll plan to do some testing today with a tickless > guest and see how it behaves. Heh, no problem, I was just going through the list and looking at what we need to do for 5.5, and I came across it. Thanks for doing the additional testing. Chris Lalancette I stress tested my rawhide guest the other day for several hours and never saw a hang. The clock seems to drift a lot more without the nohz=off option, but it didn't hang on me. I suggest we go ahead and close this as being fixed, and I'll plan to reopen it if the problem resurfaces. (In reply to comment #18) > I stress tested my rawhide guest the other day for several hours and never saw > a hang. The clock seems to drift a lot more without the nohz=off option, but it > didn't hang on me. > > I suggest we go ahead and close this as being fixed, and I'll plan to reopen it > if the problem resurfaces. OK, great, thanks a lot Jeff. If you want to open a bug about the nohz time drift problem, feel free to do so. Chris Lalancette This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6). |