Bug 587150
Summary: | RHEL-5.5-i386 guest hungs when boot up | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Amos Kong <akong> | ||||||||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 5.5 | CC: | ailan, gcosta, llim, ndai, tburke | ||||||||||
Target Milestone: | rc | Keywords: | Regression, TestBlocker | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2010-06-29 13:45:26 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 562808 | ||||||||||||
Attachments: |
|
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. This bug could not reproduce on old qemu-kvm(0.12.1.2-2.37, 0.12.1.2-2.38, 0.12.1.2-2.39) host kernel(2.6.32-19.el6.x86_64) guest kernel(2.6.18-189.el5PAE) Can't reproduce with guest kernel 2.6.18-194.el5PAE. Upgrade you host to latest kernel and if it still reproducable try removing "rhgb quiet" from guest kernel command line to better see where it hangs. I've tested on other machines with different configuration, this problem(guest hangs when boot up) is easy to be reproduced on AMD machines when guest_mem_size equals host_mem_size. _Attached_ the test results(snapshot and serial output) guest kernel: 2.6.18-189.el5PAE / 2.6.18-194.el5PAE / 2.6.18-196.el5.i686 host kernel: 2.6.32-26.el6.x86_64 # rpm -qa |grep qemu qemu-img-0.12.1.2-2.51.el6.x86_64 gpxe-roms-qemu-0.9.7-6.3.el6.noarch qemu-kvm-debuginfo-0.12.1.2-2.51.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.51.el6.x86_64 qemu-kvm-0.12.1.2-2.51.el6.x86_64 ---- TOP result(qemu-kvm process took too much cpu resource): Tasks: 113 total, 2 running, 111 sleeping, 0 stopped, 0 zombie Cpu(s): 18.0%us, 1.5%sy, 0.0%ni, 80.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3858732k total, 581388k used, 3277344k free, 83176k buffers Swap: 6094840k total, 0k used, 6094840k free, 179020k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7699 root 40 0 4616m 105m 4304 R 193.8 2.8 1:46.35 qemu-kvm 7771 root 40 0 14932 1040 780 R 1.9 0.0 0:00.01 top 1 root 40 0 19236 1404 1140 S 0.0 0.0 0:00.80 init 2 root 40 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:00.01 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.01 migration/1 7 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 9 root 20 0 0 0 0 S 0.0 0.0 0:00.04 events/0 10 root 20 0 0 0 0 S 0.0 0.0 0:00.33 events/1 11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuset 12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper 13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns Created attachment 414734 [details]
test result with three kinds of guest kernel
test result with three kinds of guest kernel
bug can be reproduced.
(In reply to comment #6) > I've tested on other machines with different configuration, this problem(guest > hangs when boot up) is easy to be reproduced on AMD machines when > guest_mem_size equals host_mem_size. That is the config I am trying to reproduce the hang with. > > _Attached_ the test results(snapshot and serial output) > > guest kernel: 2.6.18-189.el5PAE / 2.6.18-194.el5PAE / 2.6.18-196.el5.i686 > host kernel: 2.6.32-26.el6.x86_64 > # rpm -qa |grep qemu > qemu-img-0.12.1.2-2.51.el6.x86_64 > gpxe-roms-qemu-0.9.7-6.3.el6.noarch > qemu-kvm-debuginfo-0.12.1.2-2.51.el6.x86_64 > qemu-kvm-tools-0.12.1.2-2.51.el6.x86_64 > qemu-kvm-0.12.1.2-2.51.el6.x86_64 Try to update host to the latest RHEL6 kernel/qemu-kvm. I am using the head of RHEL6 git. Attach the output of x86info -r on your host. > > ---- > TOP result(qemu-kvm process took too much cpu resource): > Tasks: 113 total, 2 running, 111 sleeping, 0 stopped, 0 zombie > Cpu(s): 18.0%us, 1.5%sy, 0.0%ni, 80.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 3858732k total, 581388k used, 3277344k free, 83176k buffers > Swap: 6094840k total, 0k used, 6094840k free, 179020k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 7699 root 40 0 4616m 105m 4304 R 193.8 2.8 1:46.35 qemu-kvm What is the output of kvm_stat when qemu hangs? While qemu hags like that do "mount -t debugfs debugfs /sys/kernel/debug". "echo kvm > /sys/kernel/debug/tracing/set_event". Wait a little then do "cat sys/kernel/debug/tracing/trace > /tmp/trace". Attach /tmp/trace here. host kernel:2.6.32-26.el6.x86_64 guest kernel: 2.6.18-196.el5.i686 / 2.6.18-194.el5PAE # rpm -qa |grep qemu qemu-img-0.12.1.2-2.54.el6.x86_64 gpxe-roms-qemu-0.9.7-6.2.el6.noarch qemu-kvm-tools-0.12.1.2-2.54.el6.x86_64 qemu-kvm-0.12.1.2-2.54.el6.x86_64 qemu-kvm-debuginfo-0.12.1.2-2.54.el6.x86_64 commandline: # qemu-kvm -drive file=~/RHEL-Server-5.5-32.qcow2,if=ide,cache=none,boot=on -net nic,vlan=0,model=e1000,macaddr=00:30:09:8A:35:c7 -net tap,vlan=0,ifname=e1000_0_6001,script=/root/autotest/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 4096 -smp 2 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -redir tcp:5000::22 -vnc :0 -serial unix:/tmp/serial-20100518-164314-5RfN,server,nowait [root@amd-5400b-4-3 ~]# x86info -r x86info v1.25. Dave Jones 2001-2009 Feedback to <davej>. Found 2 CPUs -------------------------------------------------------------------------- CPU #1 EFamily: 0 EModel: 6 Family: 15 Model: 107 Stepping: 2 CPU Model: Athlon 64 X2 Dual-Core (BH-G2) Processor name string: AMD Athlon(tm) Dual Core Processor 5400B SVM: revision 1, 64 ASIDs, lbrVirt Address Size: 48 bits virtual, 40 bits physical The physical package has 2 of 2 possible cores implemented. eax in: 0x00000000, eax = 00000001 ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x00000001, eax = 00060fb2 ebx = 00020800 ecx = 00002001 edx = 178bfbff eax in: 0x80000000, eax = 80000018 ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x80000001, eax = 00060fb2 ebx = 00000edd ecx = 0000011f edx = ebd3fbff eax in: 0x80000002, eax = 20444d41 ebx = 6c687441 ecx = 74286e6f edx = 4420296d eax in: 0x80000003, eax = 206c6175 ebx = 65726f43 ecx = 6f725020 edx = 73736563 eax in: 0x80000004, eax = 3520726f ebx = 42303034 ecx = 00000000 edx = 00000000 eax in: 0x80000005, eax = ff08ff08 ebx = ff20ff20 ecx = 40020140 edx = 40020140 eax in: 0x80000006, eax = 00000000 ebx = 42004200 ecx = 02008140 edx = 00000000 eax in: 0x80000007, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 0000007f eax in: 0x80000008, eax = 00003028 ebx = 00000000 ecx = 00000001 edx = 00000000 eax in: 0x80000009, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000a, eax = 00000001 ebx = 00000040 ecx = 00000000 edx = 00000002 eax in: 0x8000000b, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000c, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000d, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000e, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000f, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000010, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000011, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000012, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000013, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000014, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000015, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000016, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000017, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000018, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 -------------------------------------------------------------------------- CPU #2 EFamily: 0 EModel: 6 Family: 15 Model: 107 Stepping: 2 CPU Model: Athlon 64 X2 Dual-Core (BH-G2) Processor name string: AMD Athlon(tm) Dual Core Processor 5400B SVM: revision 1, 64 ASIDs, lbrVirt Address Size: 48 bits virtual, 40 bits physical The physical package has 2 of 2 possible cores implemented. eax in: 0x00000000, eax = 00000001 ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x00000001, eax = 00060fb2 ebx = 01020800 ecx = 00002001 edx = 178bfbff eax in: 0x80000000, eax = 80000018 ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x80000001, eax = 00060fb2 ebx = 00000edd ecx = 0000011f edx = ebd3fbff eax in: 0x80000002, eax = 20444d41 ebx = 6c687441 ecx = 74286e6f edx = 4420296d eax in: 0x80000003, eax = 206c6175 ebx = 65726f43 ecx = 6f725020 edx = 73736563 eax in: 0x80000004, eax = 3520726f ebx = 42303034 ecx = 00000000 edx = 00000000 eax in: 0x80000005, eax = ff08ff08 ebx = ff20ff20 ecx = 40020140 edx = 40020140 eax in: 0x80000006, eax = 00000000 ebx = 42004200 ecx = 02008140 edx = 00000000 eax in: 0x80000007, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 0000007f eax in: 0x80000008, eax = 00003028 ebx = 00000000 ecx = 00000001 edx = 00000000 eax in: 0x80000009, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000a, eax = 00000001 ebx = 00000040 ecx = 00000000 edx = 00000002 eax in: 0x8000000b, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000c, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000d, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000e, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000f, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000010, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000011, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000012, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000013, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000014, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000015, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000016, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000017, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000018, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 -------------------------------------------------------------------------- [root@amd-5400b-4-3 ~]# kvm_stat -1 (guest kernel:2.6.18-196.el5.i686) efer_reload 0 0 exits 2803661 7741 fpu_reload 1921 0 halt_exits 78941 0 halt_wakeup 15575 0 host_state_reload 585914 1974 hypercalls 0 0 insn_emulation 1456228 1001 insn_emulation_fail 0 0 invlpg 171166 0 io_exits 296742 0 irq_exits 804339 6739 irq_injections 1309640 3771 irq_window 0 0 largepages 0 0 mmio_exits 29584 0 mmu_cache_miss 5842 0 mmu_flooded 2858 0 mmu_pde_zapped 4476 0 mmu_pte_updated 13256 0 mmu_pte_write 29577 0 mmu_recycled 0 0 mmu_shadow_zapped 8003 0 mmu_unsync 38 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 159136 0 pf_guest 19617 0 remote_tlb_flush 35112 0 request_irq 0 0 signal_exits 1 0 tlb_flush 189320 0 [root@amd-5400b-4-3 ~]# kvm_stat -1 (guest kernel:2.6.18-194.el5PAE) efer_reload 0 0 exits 5786431 7650 fpu_reload 2713 0 halt_exits 82610 0 halt_wakeup 14805 0 host_state_reload 624029 1963 hypercalls 0 0 insn_emulation 3739774 1002 insn_emulation_fail 0 0 invlpg 1078507 0 io_exits 450944 0 irq_exits 541870 6646 irq_injections 2490139 4631 irq_window 0 0 largepages 0 0 mmio_exits 51952 0 mmu_cache_miss 39790 0 mmu_flooded 12133 0 mmu_pde_zapped 17212 0 mmu_pte_updated 0 0 mmu_pte_write 43470 0 mmu_recycled 0 0 mmu_shadow_zapped 50846 0 mmu_unsync 30 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 817038 0 pf_guest 174093 0 remote_tlb_flush 468282 0 request_irq 0 0 signal_exits 1 0 tlb_flush 1141779 0 Created attachment 414781 [details]
debugfs output when guest hung
Attached debugfs output of KVM.
It contained two text file.
first is the output when guest using 2.6.18-196.el5.i686 kernel
second is the output when guest using 2.6.18-194.el5PAE kernel
How long do you wait before you declare guest hang? Also can you try with virtio disk (if=virtio)? I always wait for 12 mins. The biggest time I waited is about 12 hours. Bug can be reproduced with virtio block device. But the point guest blocked is multiple. (screendump attached) Created attachment 415589 [details]
hang screendump using virtio block device
Bug can be reproduced with latest kernel/qemu-kvm guest kernel: kernel-2.6.18-200.el5.i686 host kernel: 2.6.32-30.el6.x86_64 # rpm -qa |grep qemu gpxe-roms-qemu-0.9.7-6.3.el6.noarch qemu-kvm-tools-0.12.1.2-2.68.el6.x86_64 qemu-img-0.12.1.2-2.68.el6.x86_64 qemu-kvm-0.12.1.2-2.68.el6.x86_64 qemu-kvm-debuginfo-0.12.1.2-2.68.el6.x86_64 Can you try 2.6.32-32.el6.x86_64 please. Actually try the latest one (2.6.32-33 as of now I think). No need to test anything for now. The patch I want you to test is not in -33 yet. Test this kernel rpm please https://brewweb.devel.redhat.com/getfile?taskID=2515192&name=kernel-2.6.32-33.el6glebirr.x86_64.rpm (https://brewweb.devel.redhat.com/taskinfo?taskID=2515192) Hi gleb, I tested with three kinds of guest kernel and host kernel, it seems not a host kernel bug, but guest kernel bug. qemu-kvm: 0.12.1.2-2.77.el6.x86_64 -------------------------------------- Host kernel: 2.6.32-25.el6 Guest-kernel-version Result kernel-2.6.18-196.el5.i686 Fail kernel-2.6.18-200.el5.i686 Fail kernel-2.6.18-203.el5.i686 PASS -------------------------------------- Host kernel: 2.6.32-33.el6 Guest-kernel-version Result kernel-2.6.18-196.el5.i686 Fail kernel-2.6.18-200.el5.i686 Fail kernel-2.6.18-203.el5.i686 Pass -------------------------------------- Host kernel: 2.6.32-33.el6glebirr Guest-kernel-version Result kernel-2.6.18-196.el5.i686 Fail kernel-2.6.18-200.el5.i686 Fail kernel-2.6.18-203.el5.i686 Pass -------------------------------------- (In reply to comment #19) > Hi gleb, > > I tested with three kinds of guest kernel and host kernel, it seems not a host > kernel bug, but guest kernel bug. Interesting. I'll check what was changed between 200 and 203. Can you check with guest kernel kernel-2.6.18-200.el5.i686 and qemu-kvm-0.12.1.2-2.84.el6 with "-cpu qemu64,+sse2,-kvmclock" option instead of your -cpu option. (In reply to comment #21) > Can you check with guest kernel kernel-2.6.18-200.el5.i686 and > qemu-kvm-0.12.1.2-2.84.el6 with "-cpu qemu64,+sse2,-kvmclock" option instead > of your -cpu option. Bug could not be reproduced. guest kernel: kernel-2.6.18-200.el5.i686 host kernel: 2.6.32-33.el6 qemu-kvm: qemu-kvm-0.12.1.2-2.84.el6 This is a guest kernel bug. Has to be moved to RHEL5. *** This bug has been marked as a duplicate of bug 570824 *** |
Created attachment 410021 [details] screen dump of rhel55-32guest Description of problem: Boot up a RHEL5.5-i386 guest(smp), it always hungs at the begining of boot. Guest's status is also running. If add 'noapic' on kernel option, then guest can boot up successfully. Version-Release number of selected component (if applicable): host kernel:2.6.32-19.el6.x86_64 guest kernel: 2.6.18-189.el5PAE # rpm -qa |grep qemu qemu-kvm-tools-0.12.1.2-2.41.el6.x86_64 qemu-kvm-0.12.1.2-2.41.el6.x86_64 qemu-kvm-debuginfo-0.12.1.2-2.41.el6.x86_64 gpxe-roms-qemu-0.9.7-6.2.el6.noarch qemu-img-0.12.1.2-2.41.el6.x86_64 How reproducible: always Steps to Reproduce: 1. boot up a smp RHEL-55-i368 guest Actual results: guest could not boot up Expected results: guest can boot successfully Additional info: 1. commandline: qemu-kvm -name 'vm1' -monitor tcp:0:6001,server,nowait -drive file=./RHEL-Server-5.5-32-virtio.qcow2,if=ide,cache=none,boot=on -net nic,vlan=0,model=virtio,macaddr=00:A9:7C:6C:2b:f6 -net tap,vlan=0,ifname=virtio_0_6001,script=/etc/qemu-ifup-switch,downscript=no -m 4096 -smp 2 -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -redir tcp:5000::22 -vnc :0 -serial unix:/tmp/serial-20100429-140754-aE31,server,nowait 2. # nc 10.66.83.170 6001 QEMU 0.12.1 monitor - type 'help' for more information (qemu) info status info status VM status: running 3. host cpuinfo processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ stepping : 2 cpu MHz : 1000.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch bogomips : 2004.33 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 100mhzsteps