Description of problem: After update to 2.6.35.12 kernel my kvm virtual machine (guest) hangs at random times. Version-Release number of selected component (if applicable): craching guest: kernel-PAE-2.6.35.12-88.fc14.i686 kernel-PAE-2.6.35.12-90.fc14.i686 host: kernel-2.6.34.8-68.fc13.x86_64 qemu-kvm-0.13.0-1.fc13.x86_64 How reproducible: aprox. 5 crashes until now, 2 virtual machines Steps to Reproduce: Not sure how to reproduce this. For me it's enough to boot it and wait some hours. Actual results: Bacause system is dead, no ping to this machine, virtual VGA console is blank, serial console dead too, but after one hang I see this on serial console output: [96758.097381] BUG: soft lockup - CPU#3 stuck for 61s! [httpd:32030] [96758.097381] Process httpd (pid: 32030, ti=dbbf8000 task=f50fe500 task.ti=dbbf8000) [96758.097381] Stack: [96758.097381] Call Trace: [96758.097381] Code: 83 fa 03 74 05 83 fa 13 75 0f 8d 54 01 18 b9 02 00 00 00 31 c0 89 d7 f3 ab b8 01 00 00 00 5f 5d c3 55 89 e5 0f 1f 44 00 00 eb 02 <f3> 90 f6 00 01 75 f9 5d c3 55 89 e5 53 0f 1f 44 00 00 89 c3 8d [96823.595381] BUG: soft lockup - CPU#3 stuck for 61s! [httpd:32030] [96823.595381] Process httpd (pid: 32030, ti=dbbf8000 task=f50fe500 task.ti=dbbf8000) [96823.595381] Stack: [96823.595381] Call Trace: [96823.595381] Code: 83 fa 03 74 05 83 fa 13 75 0f 8d 54 01 18 b9 02 00 00 00 31 c0 89 d7 f3 ab b8 01 00 00 00 5f 5d c3 55 89 e5 0f 1f 44 00 00 eb 02 <f3> 90 f6 00 01 75 f9 5d c3 55 89 e5 53 0f 1f 44 00 00 89 c3 8d Expected results: no crash Additional info: What else I can do to debug this? Now one virtual machine is running 2 days, but second died 2 times yesterday. After boot to 2.6.35.11 everything works well again, both 2 stable 2.6.35.12 kernels dies.
(In reply to comment #1) > [96758.097381] BUG: soft lockup - CPU#3 stuck for 61s! [httpd:32030] > [96758.097381] Process httpd (pid: 32030, ti=dbbf8000 task=f50fe500 > task.ti=dbbf8000) > [96758.097381] Stack: > [96758.097381] Call Trace: > [96758.097381] Code: 83 fa 03 74 05 83 fa 13 75 0f 8d 54 01 18 b9 02 00 00 00 > 31 c0 89 d7 f3 ab b8 01 00 00 00 5f 5d c3 55 89 e5 0f 1f 44 00 00 eb 02 <f3> 90 > f6 00 01 75 f9 5d c3 55 89 e5 53 0f 1f 44 00 00 89 c3 8d > [96823.595381] BUG: soft lockup - CPU#3 stuck for 61s! [httpd:32030] > [96823.595381] Process httpd (pid: 32030, ti=dbbf8000 task=f50fe500 > task.ti=dbbf8000) > [96823.595381] Stack: > [96823.595381] Call Trace: > [96823.595381] Code: 83 fa 03 74 05 83 fa 13 75 0f 8d 54 01 18 b9 02 00 00 00 > 31 > c0 89 d7 f3 ab b8 01 00 00 00 5f 5d c3 55 89 e5 0f 1f 44 00 00 eb 02 <f3> 90 > f6 00 01 75 f9 5d c3 55 89 e5 53 0f 1f 44 00 00 89 c3 8d > Boot the kernel with the "ignore_loglevel" kernel option to get the entire trace, and try to catch it on the serial console again.
> Boot the kernel with the "ignore_loglevel" kernel option to get the entire > trace, and try to catch it on the serial console again. Nothing logged this time, only server was dead. May be previous message was nothing with this bug. How I can collect more information?
Next crash. Whey trying to strace qemu-kvm process, only this is displayed repeatedly: timer_gettime(0x2, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0x2, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0 timer_gettime(0x2, {it_interval={0, 0}, it_value={0, 210462}}) = 0 select(19, [4 7 10 14 15 17 18], [], [], {1, 0}) = 1 (in [15], left {0, 999997}) read(15, "\1\0\0\0\0\0\0\0", 512) = 8 select(19, [4 7 10 14 15 17 18], [], [], {1, 0}) = 1 (in [17], left {0, 999917}) read(17, "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0"..., 128) = 128 rt_sigaction(SIGALRM, NULL, {0x4a55a0, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f78adbfe4a0}, 8) = 0 write(16, "\1\0\0\0\0\0\0\0", 8) = 8 read(17, 0x7fffc0ba65f0, 128) = -1 EAGAIN (Resource temporarily unavailable) ls -l /proc/PID/fd: lr-x------. 1 qemu qemu 64 máj 3 12:41 0 -> /dev/null l-wx------. 1 qemu qemu 64 máj 3 12:41 1 -> /var/log/libvirt/qemu/stats.log lrwx------. 1 qemu qemu 64 máj 3 12:41 10 -> anon_inode:[signalfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 11 -> /dev/dm-70 lrwx------. 1 qemu qemu 64 máj 3 12:41 12 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 máj 3 12:41 13 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 máj 3 12:41 14 -> socket:[7424220] lrwx------. 1 qemu qemu 64 máj 3 12:41 15 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 16 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 17 -> anon_inode:[signalfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 18 -> socket:[8167931] l-wx------. 1 qemu qemu 64 máj 3 12:41 2 -> /var/log/libvirt/qemu/stats.log lrwx------. 1 qemu qemu 64 máj 3 12:41 3 -> socket:[7424152] lrwx------. 1 qemu qemu 64 máj 3 12:41 4 -> /dev/ptmx lrwx------. 1 qemu qemu 64 máj 3 12:41 5 -> /dev/kvm lrwx------. 1 qemu qemu 64 máj 3 12:41 6 -> anon_inode:kvm-vm lrwx------. 1 qemu qemu 64 máj 3 12:41 63 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 64 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 65 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 66 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 7 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 8 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 9 -> /dev/dm-0 How to collect more data?
Can you see if this still happens in 2.6.35.13-91 ?
4 days uptime with this new kernel, so looks this was already fixed. Closing bug until it happens again. [root@www ~]# uptime 12:02:11 up 4 days, 3:28, 1 user, load average: 0.17, 0.23, 0.28 [root@www ~]# uname -a Linux www.upjs.sk 2.6.35.13-91.fc14.i686 #1 SMP Tue May 3 13:36:36 UTC 2011 i686 i686 i386 GNU/Linux
Looks like I tested it with non-PAE kernel and it works with standard 32bit non PAE kernel, but hangs again after switch to PAE. This kernel died 2 times today: Linux www.upjs.sk 2.6.35.13-91.fc14.i686.PAE #1 SMP Tue May 3 13:29:55 UTC 2011 i686 i686 i386 GNU/Linux Reopening bug. How to collect more information?
Looks like this is definitelly an PAE problem. All non-PAE 32bit and 64bit kernels works well. My host system: 2.6.34.8-68.fc13.x86_64 (qemu virtualization) Last working guest kernel: 2.6.35.11-83.fc14.i686.PAE Problematic PAE kernels: kernel-PAE-2.6.35.12-90.fc14.i686 kernel-PAE-2.6.35.13-91.fc14.i686 What changed in kernel-PAE-2.6.35.12 ?
Apparently this is caused by: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=4981d01eada5354d81c8929d5b2836829ba3df7b The above went in 2.6.35.12, but it looks like this commit is also needed: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=a79e53d85683c6dd9f99c90511028adc2043031f That one got silently dropped because it needs backporting to 2.6.35.
Any progress with this bug in current fc14 kernels?
Created attachment 506192 [details] Backport of commit a79e53d85683c6dd9f99c90511028adc2043031f
Ping? Still nothing? Can I help?
kernel-2.6.35.14-95.fc14 is out. Is this problem solved?
This patch was not included because I forgot to change the bug status to show that a patch was available. Sorry about that. The fix will be in 2.6.35.14-96.
kernel-2.6.35.14-96.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14
Package kernel-2.6.35.14-96.fc14: * should fix your issue, * was pushed to the Fedora 14 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-2.6.35.14-96.fc14' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14 then log in and leave karma (feedback).
kernel-2.6.35.14-96.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.