Bug 681133
Summary: | RHEL 5.6 32bit SMP guest hang at boot up | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Joy Pu <ypu> | ||||||||
Component: | kernel | Assignee: | Zachary Amsden <zamsden> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 6.1 | CC: | arozansk, gcosta, jjarvis, jkachuck, khong, michen, mkenneth, tburke, virt-maint, zamsden | ||||||||
Target Milestone: | rc | Keywords: | TestBlocker, Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-2.6.32-130.el6 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-05-23 20:40:34 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 580951, 684385 | ||||||||||
Attachments: |
|
Description
Joy Pu
2011-03-01 08:21:33 UTC
Created attachment 481551 [details]
Whole log for kvm trace
I also find this problem in some machine without the install process. The 32 bit RHEL5.6 guest will hang in RHEL6.1 host while booting and not in RHEL 5.6 host. So set the priority to high. Please help rnd analyze this - it's too hard to drill down the issue with so many parameters. Please drop the -kernel -initrd, -qxl and other devices and test if it works. If it does, bisect the devices to know who is the faulty one. (In reply to comment #4) > Please help rnd analyze this - it's too hard to drill down the issue with so > many parameters. Please drop the -kernel -initrd, -qxl and other devices and > test if it works. If it does, bisect the devices to know who is the faulty one. It seems the key parameter is -smp. I tried with this command line with -smp 2,cores=1,threads=1,sockets=2 and without -smp. It will only hang when using -smp 2,cores=1,threads=1,sockets=2. command line: /usr/auto/test/autotest-devel/client/tests/kvm/qemu -name 'vm1' -chardev socket,id=human_monitor_PulE,path=/tmp/monitor-humanmonitor1-20110301-110905-SWMy,server,nowait -mon chardev=human_monitor_PulE,mode=readline -chardev socket,id=serial_nYLc,path=/tmp/serial-20110301-110905-SWMy,server,nowait -device isa-serial,chardev=serial_nYLc -drive file='/usr/auto/test/autotest-devel/client/tests/kvm/images/RHEL-Server-5.6-32.qcow2',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=none,format=qcow2,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device rtl8139,netdev=idkrV1T7,mac=9a:2e:3f:52:5b:9e,id=ndev00idkrV1T7,bus=pci.0,addr=0x3 -netdev tap,id=idkrV1T7,ifname='t0-110905-SWMy',script='/usr/auto/test/autotest-devel/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2046 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm And I use gdb to get the threads info for the hanging guest, I interrupt the process when I find the guest already hang. Here is the results: (gdb) c Continuing. [Thread 0x7f07fedfe700 (LWP 17315) exited] [New Thread 0x7f07fedfe700 (LWP 17321)] [Thread 0x7f07fedfe700 (LWP 17321) exited] ^C Program received signal SIGINT, Interrupt. 0x0000003ae10de923 in select () from /lib64/libc.so.6 (gdb) bt #0 0x0000003ae10de923 in select () from /lib64/libc.so.6 #1 0x000000000040b8d0 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4421 #2 0x000000000042b35a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2164 #3 0x000000000040eeb5 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4638 #4 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6852 (gdb) info thread 4 Thread 0x7f0805b54700 (LWP 17302) 0x0000003ae10dde87 in ioctl () from /lib64/libc.so.6 3 Thread 0x7f0805150700 (LWP 17303) 0x0000003ae10dde87 in ioctl () from /lib64/libc.so.6 * 1 Thread 0x7f0805d7a940 (LWP 17284) 0x0000003ae10de923 in select () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f0805150700 (LWP 17303))]#0 0x0000003ae10dde87 in ioctl () from /lib64/libc.so.6 (gdb) bt #0 0x0000003ae10dde87 in ioctl () from /lib64/libc.so.6 #1 0x000000000042cf4f in kvm_run (env=0x1824010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:927 #2 0x000000000042d3d9 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1663 #3 0x000000000042e11f in kvm_main_loop_cpu (_env=0x1824010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1931 #4 ap_main_loop (_env=0x1824010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1981 #5 0x0000003ae18077e1 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003ae10e5dcd in clone () from /lib64/libc.so.6 (gdb) thread 4 [Switching to thread 4 (Thread 0x7f0805b54700 (LWP 17302))]#0 0x0000003ae10dde87 in ioctl () from /lib64/libc.so.6 (gdb) bt #0 0x0000003ae10dde87 in ioctl () from /lib64/libc.so.6 #1 0x000000000042cf4f in kvm_run (env=0x180ae70) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:927 #2 0x000000000042d3d9 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1663 #3 0x000000000042e11f in kvm_main_loop_cpu (_env=0x180ae70) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1931 #4 ap_main_loop (_env=0x180ae70) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1981 #5 0x0000003ae18077e1 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003ae10e5dcd in clone () from /lib64/libc.so.6 (gdb) We have an upstream report of a similar problem - 32 bit SMP guests hang, which bisected to a patch of mine. It's very likely related. I looked at the RHEL6 logic and it looks like my patches went in fine. To rule this out or target it as suspect, I will prepare a brew build without my patches. *** Bug 688951 has been marked as a duplicate of this bug. *** === In Red Hat Customer Portal Case 00434734 === --- Comment by IBM bug, proxy on 3/23/2011 2:31 AM --- ------- Comment From santwana.samantray.com 2011-03-23 02:28 EDT------- Hi All, I verified this issue on RHEL6.1 Beta(k.v-2.6.32-122.el6), and the issue is still reproducible. While installing RHEL5.6-32bit guest using qemu-kvm the installation is getting halt. During installation of RHEL5.6-64 bit guest, this issue isn't noticed. Thanks, Santwana === In Red Hat Customer Portal Case 00434734 === --- Comment by IBM bug, proxy on 3/24/2011 8:21 AM --- ------- Comment From santwana.samantray.com 2011-03-24 08:17 EDT------- Hello, After lot of trials using qemu-kvm as well as virt-manager, I could conclude that the issue is related to vcpu's. When assigning just a single cpu to the guest, installation happens fine using qemu-kvm as well as virt-manager. On allocating vcpu > 1, the installation is getting struck. selinux is in a Permissive mode. Installation happens fine with a single cpu using an NFS mounted path for the ISO as well. The host is 8 processor Intel(R) Xeon(R) system. This issue isn't noticed in a 64-bit guest, i.e installation happens fine even though vcpus > 1. Thanks, Santwana I've posted a patch which should fix the regression at least .. although I'm not completely satisfied that even that solves the entire problem. We may have a more subtle bug lying underneath this that just got exposed. *** Bug 688951 has been marked as a duplicate of this bug. *** Created attachment 487983 [details]
Created an attachment for dmesg, /var/log/messages and sosreport of the host, also xml file of the guest
------- Comment From santwana.samantray.com 2011-04-05 08:47 EDT------- Hello Redhat, This issue is still reproducible with the kernel version : 2.6.32-125.el6.x86_64. Yes, bug is still there, patch has not been applied. ------- Comment From markwiz.com 2011-04-06 12:48 EDT------- This but blocks around 9% of our tests. If it is not included in Snap2, there is no way we can complete our testing before the End of Partner testing date. Is there a place we can get the kernel with the patch applied so we can test it earlier? Created attachment 490339 [details]
Patch fixing the problem
I've attached a patch which fixes the problems. The patch is fairly trivial, but apparently waiting on some ack or flags to be set before being included in release? Patch(es) available on kernel-2.6.32-130.el6 This fix is approved and planned for inclusion in snapshot 3. Can reproduce in kernel-2.6.32-128.el6, and the kernel-2.6.32-130.el6 works well. ------- Comment From santwana.samantray.com 2011-04-08 02:08 EDT------- Hello Redhat, I verified this issue on Snap2 kernel(2.6.32-128.el6.x86_64), after applying the patch, and the RHEL5.6 32-bit guest installation is working fine with vcpus > 1. ------- Comment From pradeepkumars.com 2011-04-08 13:02 EDT------- *** Bug 71382 has been marked as a duplicate of this bug. *** ------- Comment From pradeepkumars.com 2011-04-09 23:24 EDT------- *** Bug 71434 has been marked as a duplicate of this bug. *** move it to verified based on comment#25 and #26 ------- Comment From santwana.samantray.com 2011-04-20 02:59 EDT------- Hello Redhat, I verified this issue with RHEL6.1-Snap3 (k.v-2.6.32-130.el6), and the issue seems to be fixed. We can install the RHEL5.6 32-bit guest with smp > 1 . Thanks for your support, Santwana An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |