Bug 699684
Summary: | System freeze with 2.6.35.12-*.fc14.i686.PAE | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan ONDREJ <ondrejj> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 14 | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.35.14-96.fc14 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-09-06 23:58:57 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jan ONDREJ
2011-04-26 11:12:59 UTC
(In reply to comment #1) > [96758.097381] BUG: soft lockup - CPU#3 stuck for 61s! [httpd:32030] > [96758.097381] Process httpd (pid: 32030, ti=dbbf8000 task=f50fe500 > task.ti=dbbf8000) > [96758.097381] Stack: > [96758.097381] Call Trace: > [96758.097381] Code: 83 fa 03 74 05 83 fa 13 75 0f 8d 54 01 18 b9 02 00 00 00 > 31 c0 89 d7 f3 ab b8 01 00 00 00 5f 5d c3 55 89 e5 0f 1f 44 00 00 eb 02 <f3> 90 > f6 00 01 75 f9 5d c3 55 89 e5 53 0f 1f 44 00 00 89 c3 8d > [96823.595381] BUG: soft lockup - CPU#3 stuck for 61s! [httpd:32030] > [96823.595381] Process httpd (pid: 32030, ti=dbbf8000 task=f50fe500 > task.ti=dbbf8000) > [96823.595381] Stack: > [96823.595381] Call Trace: > [96823.595381] Code: 83 fa 03 74 05 83 fa 13 75 0f 8d 54 01 18 b9 02 00 00 00 > 31 > c0 89 d7 f3 ab b8 01 00 00 00 5f 5d c3 55 89 e5 0f 1f 44 00 00 eb 02 <f3> 90 > f6 00 01 75 f9 5d c3 55 89 e5 53 0f 1f 44 00 00 89 c3 8d > Boot the kernel with the "ignore_loglevel" kernel option to get the entire trace, and try to catch it on the serial console again. > Boot the kernel with the "ignore_loglevel" kernel option to get the entire
> trace, and try to catch it on the serial console again.
Nothing logged this time, only server was dead.
May be previous message was nothing with this bug.
How I can collect more information?
Next crash. Whey trying to strace qemu-kvm process, only this is displayed repeatedly: timer_gettime(0x2, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0x2, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0 timer_gettime(0x2, {it_interval={0, 0}, it_value={0, 210462}}) = 0 select(19, [4 7 10 14 15 17 18], [], [], {1, 0}) = 1 (in [15], left {0, 999997}) read(15, "\1\0\0\0\0\0\0\0", 512) = 8 select(19, [4 7 10 14 15 17 18], [], [], {1, 0}) = 1 (in [17], left {0, 999917}) read(17, "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0"..., 128) = 128 rt_sigaction(SIGALRM, NULL, {0x4a55a0, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f78adbfe4a0}, 8) = 0 write(16, "\1\0\0\0\0\0\0\0", 8) = 8 read(17, 0x7fffc0ba65f0, 128) = -1 EAGAIN (Resource temporarily unavailable) ls -l /proc/PID/fd: lr-x------. 1 qemu qemu 64 máj 3 12:41 0 -> /dev/null l-wx------. 1 qemu qemu 64 máj 3 12:41 1 -> /var/log/libvirt/qemu/stats.log lrwx------. 1 qemu qemu 64 máj 3 12:41 10 -> anon_inode:[signalfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 11 -> /dev/dm-70 lrwx------. 1 qemu qemu 64 máj 3 12:41 12 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 máj 3 12:41 13 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 máj 3 12:41 14 -> socket:[7424220] lrwx------. 1 qemu qemu 64 máj 3 12:41 15 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 16 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 17 -> anon_inode:[signalfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 18 -> socket:[8167931] l-wx------. 1 qemu qemu 64 máj 3 12:41 2 -> /var/log/libvirt/qemu/stats.log lrwx------. 1 qemu qemu 64 máj 3 12:41 3 -> socket:[7424152] lrwx------. 1 qemu qemu 64 máj 3 12:41 4 -> /dev/ptmx lrwx------. 1 qemu qemu 64 máj 3 12:41 5 -> /dev/kvm lrwx------. 1 qemu qemu 64 máj 3 12:41 6 -> anon_inode:kvm-vm lrwx------. 1 qemu qemu 64 máj 3 12:41 63 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 64 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 65 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 66 -> /dev/net/tun lrwx------. 1 qemu qemu 64 máj 3 12:41 7 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 8 -> anon_inode:[eventfd] lrwx------. 1 qemu qemu 64 máj 3 12:41 9 -> /dev/dm-0 How to collect more data? Can you see if this still happens in 2.6.35.13-91 ? 4 days uptime with this new kernel, so looks this was already fixed. Closing bug until it happens again. [root@www ~]# uptime 12:02:11 up 4 days, 3:28, 1 user, load average: 0.17, 0.23, 0.28 [root@www ~]# uname -a Linux www.upjs.sk 2.6.35.13-91.fc14.i686 #1 SMP Tue May 3 13:36:36 UTC 2011 i686 i686 i386 GNU/Linux Looks like I tested it with non-PAE kernel and it works with standard 32bit non PAE kernel, but hangs again after switch to PAE. This kernel died 2 times today: Linux www.upjs.sk 2.6.35.13-91.fc14.i686.PAE #1 SMP Tue May 3 13:29:55 UTC 2011 i686 i686 i386 GNU/Linux Reopening bug. How to collect more information? Looks like this is definitelly an PAE problem. All non-PAE 32bit and 64bit kernels works well. My host system: 2.6.34.8-68.fc13.x86_64 (qemu virtualization) Last working guest kernel: 2.6.35.11-83.fc14.i686.PAE Problematic PAE kernels: kernel-PAE-2.6.35.12-90.fc14.i686 kernel-PAE-2.6.35.13-91.fc14.i686 What changed in kernel-PAE-2.6.35.12 ? Apparently this is caused by: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=4981d01eada5354d81c8929d5b2836829ba3df7b The above went in 2.6.35.12, but it looks like this commit is also needed: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=a79e53d85683c6dd9f99c90511028adc2043031f That one got silently dropped because it needs backporting to 2.6.35. Any progress with this bug in current fc14 kernels? Created attachment 506192 [details]
Backport of commit a79e53d85683c6dd9f99c90511028adc2043031f
Ping? Still nothing? Can I help? kernel-2.6.35.14-95.fc14 is out. Is this problem solved? This patch was not included because I forgot to change the bug status to show that a patch was available. Sorry about that. The fix will be in 2.6.35.14-96. kernel-2.6.35.14-96.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14 Package kernel-2.6.35.14-96.fc14: * should fix your issue, * was pushed to the Fedora 14 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-2.6.35.14-96.fc14' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14 then log in and leave karma (feedback). kernel-2.6.35.14-96.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. |