Bug 1363679
Summary: | RHEL guest hangs with kernel-irqchip=off and smp>1 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Peter Xu <peterx> | ||||
Component: | qemu-kvm-rhev | Assignee: | Eduardo Habkost <ehabkost> | ||||
Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chayang> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | chayang, drjones, ehabkost, huding, juzhang, mrezanin, virt-maint, yduan | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.6.0-22.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-07 21:28:14 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Peter Xu
2016-08-03 10:45:28 UTC
The guest hang is likely due to the primary cpu waiting for secondaries to send it IPIs, and the secondaries waiting for the primary to acknowledge those IPIs, but IPIs aren't going anywhere because of something wrong with the switch to the QEMU emulated interrupt controller. I just tried to reproduce on Fedora 24 using kvm-unit-tests qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel x86/smptest.flat -smp 2 It didn't reproduce for me. Does that unit test succeed on RHEL? If so, then my guess about the guest kernel hang may not be correct. (In reply to Andrew Jones from comment #2) > The guest hang is likely due to the primary cpu waiting for secondaries to > send it IPIs, and the secondaries waiting for the primary to acknowledge > those IPIs, but IPIs aren't going anywhere because of something wrong with > the switch to the QEMU emulated interrupt controller. > > I just tried to reproduce on Fedora 24 using kvm-unit-tests > > qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device > isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel > x86/smptest.flat -smp 2 > > It didn't reproduce for me. Does that unit test succeed on RHEL? If so, then > my guess about the guest kernel hang may not be correct. Hi, Andrew, Thanks for the update. I don't know whether smptest.flat can reproduce it, IIRC even upstream kernels cannot trigger this bug, only RHEL kernels. However I need some more tests to verify this, maybe tomorrow or day after that. (In reply to Peter Xu from comment #3) > Thanks for the update. I don't know whether smptest.flat can reproduce it, > IIRC even upstream kernels cannot trigger this bug, only RHEL kernels. > However I need some more tests to verify this, maybe tomorrow or day after > that. You don't need an upstream kernel to run kvm-unit-tests. Just do the following steps on the RHEL host that you used to report this bug git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git cd kvm-unit-tests ./configure make -j x86/run x86/smptest.flat -smp 2 # this runs it with kernel_irqchip=on (the default) and should work qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel x86/smptest.flat -smp 2 # this hopefully hangs, reproducing the bug (In reply to Andrew Jones from comment #4) > (In reply to Peter Xu from comment #3) > > Thanks for the update. I don't know whether smptest.flat can reproduce it, > > IIRC even upstream kernels cannot trigger this bug, only RHEL kernels. > > However I need some more tests to verify this, maybe tomorrow or day after > > that. > > You don't need an upstream kernel to run kvm-unit-tests. Just do the > following steps on the RHEL host that you used to report this bug > > git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git > cd kvm-unit-tests > ./configure > make -j > > x86/run x86/smptest.flat -smp 2 # this runs it with kernel_irqchip=on (the > default) and should work > > qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device > isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel > x86/smptest.flat -smp 2 # this hopefully hangs, reproducing the bug That host survived the above test (I did it for several times). I suspect this bug (not sure whether this is related to my root fs on the guest, let's say there is a bug somewhere) is related to guest kernel behavior during boot. I never reproduced it on upstream kernels (I was using 4.6.3), but I can reproduce it in several rounds if with RHEL kernels (I tried more than one version, also 3.10.0-483.el7.x86_64). (In reply to Peter Xu from comment #5) > That host survived the above test (I did it for several times). > I didn't expect a problem with the host. comment 0 says the problem is a guest hang. So did the test PASS? Does it PASS no matter how many vcpus you configure, i.e. -smp $NR_VCPUS, where 2 <= $NR_VCPUS <= $NR_PCPUS? > I suspect this bug (not sure whether this is related to my root fs on the > guest, let's say there is a bug somewhere) is related to guest kernel > behavior during boot. As I said in comment 2, a guest hang after switching the interrupt controller emulator from the kernel to userspace is quite likely due to waiting on IPIs. kvm-unit-tests/x86/smptest.c specifically tests that idea without needing a whole Linux kernel for the guest. But, if the unit test is PASSing, then I guess you should check to see what the Linux guest is doing by using virsh dump and the crash utility. > I never reproduced it on upstream kernels (I was using > 4.6.3), but I can reproduce it in several rounds if with RHEL kernels (I > tried more than one version, also 3.10.0-483.el7.x86_64). I presume you're talking about the guest kernel version? It doesn't reproduce 100% with the RHEL kernels? (In reply to Andrew Jones from comment #6) > (In reply to Peter Xu from comment #5) > > That host survived the above test (I did it for several times). > > > > I didn't expect a problem with the host. comment 0 says the problem is a > guest hang. So did the test PASS? Does it PASS no matter how many vcpus you > configure, i.e. -smp $NR_VCPUS, where 2 <= $NR_VCPUS <= $NR_PCPUS? I was trying using 4 vcpus. Yes it passed the tests. > > > I suspect this bug (not sure whether this is related to my root fs on the > > guest, let's say there is a bug somewhere) is related to guest kernel > > behavior during boot. > > As I said in comment 2, a guest hang after switching the interrupt > controller emulator from the kernel to userspace is quite likely due to > waiting on IPIs. kvm-unit-tests/x86/smptest.c specifically tests that idea > without needing a whole Linux kernel for the guest. But, if the unit test is > PASSing, then I guess you should check to see what the Linux guest is doing > by using virsh dump and the crash utility. Yes, that's on my plan as well. I will first try download debuginfo for that guest kernel, and see whether I can remote gdb to it using the vmlinux matching guest kernel. > > > I never reproduced it on upstream kernels (I was using > > 4.6.3), but I can reproduce it in several rounds if with RHEL kernels (I > > tried more than one version, also 3.10.0-483.el7.x86_64). > > I presume you're talking about the guest kernel version? It doesn't > reproduce 100% with the RHEL kernels? Right. This bug only happens if with: - downstream qemu-kvm-rhev - downstream RHEL kernel in guest Or say, I will *NOT* encounter the bug with following configurations: - upstream QEMU (latest master) + RHEL kernels in guest - downstream QEMU (latest qemu-kvm-rhev) + upstream kernel (4.6.3) Btw, it does not reproduce 100%. I mentioned it in comment 0, actually it's 10% (yes, it's 10% not 100%, and no zero missing..). For most cases, I can reproduce it with several rounds of system_reset. (In reply to Peter Xu from comment #7) > (In reply to Andrew Jones from comment #6) > > (In reply to Peter Xu from comment #5) > > > That host survived the above test (I did it for several times). > > > > > > > I didn't expect a problem with the host. comment 0 says the problem is a > > guest hang. So did the test PASS? Does it PASS no matter how many vcpus you > > configure, i.e. -smp $NR_VCPUS, where 2 <= $NR_VCPUS <= $NR_PCPUS? > > I was trying using 4 vcpus. Yes it passed the tests. I re-tested using 64 vcpus for 10 rounds. All passed. use dump-guest-memory -z to dump image when hang, then open with crash: crash> log [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-478.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Sat Jul 23 10:11:44 EDT 2016 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-478.el7.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.lvm.lv=fedora/root quiet nosplash intel_iommu=on ... [ 1.242104] sd 0:0:0:0: [sda] Attached SCSI disk [ 1.328461] type=1130 audit(1470312326.182:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 1.405764] type=1130 audit(1470312326.259:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 1.467845] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 [ 1.481766] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null) [ 1.587173] tsc: Refined TSC clocksource calibration: 2793.513 MHz [ 1.688399] systemd-journald[116]: Received SIGTERM from PID 1 (systemd). [ 61.698222] INFO: rcu_sched detected stalls on CPUs/tasks: { 2} (detected by 3, t=60003 jiffies, g=-52, c=-53, q=0) [ 61.699279] Task dump for CPU 2: [ 61.699281] systemd R running task 0 1 0 0x00000008 [ 61.699284] ffff880005dc3a88 ffffffffa00fbd4d ffff8800071f0000 ffff880005dc3a68 [ 61.699287] ffff880000805340 0000000000000000 ffff8800074b3800 ffff88000765e048 [ 61.699289] ffff880007706180 ffff880005dc3ab0 ffffffff81454a3a ffff88000765e000 [ 61.699291] Call Trace: [ 61.699306] [<ffffffffa00fbd4d>] ? ata_scsi_queuecmd+0x2d/0x420 [libata] [ 61.699313] [<ffffffff81454a3a>] scsi_dispatch_cmd+0xaa/0x230 [ 61.699316] [<ffffffff8145dbc1>] scsi_request_fn+0x501/0x770 [ 61.699322] [<ffffffff812e4683>] __blk_run_queue+0x33/0x40 [ 61.699325] [<ffffffff812e93d3>] blk_queue_bio+0x343/0x3a0 [ 61.699327] [<ffffffff812e44b2>] generic_make_request+0xe2/0x130 [ 61.699329] [<ffffffff812e4571>] submit_bio+0x71/0x150 [ 61.699334] [<ffffffff8123109d>] ? bio_alloc_bioset+0x1fd/0x350 [ 61.699336] [<ffffffff8122c1b3>] _submit_bh+0x143/0x210 [ 61.699338] [<ffffffff8122c4a9>] ll_rw_block+0xa9/0xb0 [ 61.699340] [<ffffffff8122c97d>] __breadahead+0x3d/0x70 [ 61.699348] [<ffffffffa01b967a>] __ext4_get_inode_loc+0x39a/0x3e0 [ext4] [ 61.699353] [<ffffffffa01bb3a6>] ext4_iget+0x96/0xae0 [ext4] [ 61.699358] [<ffffffffa01bbe25>] ext4_iget_normal+0x35/0x40 [ext4] [ 61.699364] [<ffffffffa01c5e57>] ext4_lookup+0x107/0x170 [ext4] [ 61.699367] [<ffffffff81201add>] lookup_real+0x1d/0x50 [ 61.699370] [<ffffffff81202452>] __lookup_hash+0x42/0x60 [ 61.699373] [<ffffffff81206438>] filename_create+0x98/0x180 [ 61.699376] [<ffffffff811d9915>] ? kmem_cache_alloc+0x35/0x1e0 [ 61.699378] [<ffffffff8120824f>] ? getname_flags+0x4f/0x1a0 [ 61.699380] [<ffffffff812082c4>] ? getname_flags+0xc4/0x1a0 [ 61.699383] [<ffffffff81208571>] user_path_create+0x41/0x60 [ 61.699384] [<ffffffff81209886>] SyS_mkdirat+0x46/0xe0 [ 61.699386] [<ffffffff81209939>] SyS_mkdir+0x19/0x20 [ 61.699389] [<ffffffff81695749>] system_call_fastpath+0x16/0x1b Seems that IO for rootfs stuck. QEMU thread dump: (gdb) thread apply all bt Thread 7 (Thread 0x7fffdf7fe700 (LWP 966)): #0 0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123 #2 0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042 #3 0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x55555663d300) at /root/git/rh/qemu-kvm/cpus.c:1081 #4 0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0 #5 0x00007ffff1909ead in clone () at /lib64/libc.so.6 Thread 6 (Thread 0x7fffdffff700 (LWP 963)): #0 0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123 #2 0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042 #3 0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x555556617350) at /root/git/rh/qemu-kvm/cpus.c:1081 #4 0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0 #5 0x00007ffff1909ead in clone () at /lib64/libc.so.6 Thread 5 (Thread 0x7fffe49b4700 (LWP 962)): #0 0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123 #2 0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042 #3 0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x555556604900) at /root/git/rh/qemu-kvm/cpus.c:1081 #4 0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0 #5 0x00007ffff1909ead in clone () at /lib64/libc.so.6 Thread 4 (Thread 0x7fffe51b5700 (LWP 959)): #0 0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123 #2 0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042 #3 0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x555556591dc0) at /root/git/rh/qemu-kvm/cpus.c:1081 #4 0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0 #5 0x00007ffff1909ead in clone () at /lib64/libc.so.6 Thread 2 (Thread 0x7fffe7f4a700 (LWP 957)): #0 0x00007ffff1903ff9 in syscall () at /lib64/libc.so.6 #1 0x0000555555a2d118 in qemu_event_wait (val=<optimized out>, ev=<optimized out>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:292 #2 0x0000555555a2d118 in qemu_event_wait (ev=ev@entry=0x55555644a644 <rcu_call_ready_event>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:399 #3 0x0000555555a3b5ce in call_rcu_thread (opaque=<optimized out>) at /root/git/rh/qemu-kvm/util/rcu.c:250 #4 0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0 #5 0x00007ffff1909ead in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7ffff7f6cc00 (LWP 953)): #0 0x00007ffff18fe3f1 in ppoll () at /lib64/libc.so.6 #1 0x00005555559a01e9 in qemu_poll_ns (__ss=0x0, __timeout=0x7fffffffd740, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x00005555559a01e9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=2785487) at /root/git/rh/qemu-kvm/qemu-timer.c:325 #3 0x000055555599fc0a in main_loop_wait (timeout=2785487) at /root/git/rh/qemu-kvm/main-loop.c:252 #4 0x000055555599fc0a in main_loop_wait (nonblocking=<optimized out>) at /root/git/rh/qemu-kvm/main-loop.c:506 #5 0x00005555557499b5 in main () at /root/git/rh/qemu-kvm/vl.c:1939 #6 0x00005555557499b5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/git/rh/qemu-kvm/vl.c:4674 Uploaded guest memory dump at: http://file.nay.redhat.com/zhexu/bugs/2016-08-04-rhel-guest-hang-with-irqchip-off/ (In reply to Peter Xu from comment #7) > Or say, I will *NOT* encounter the bug with following configurations: > > - upstream QEMU (latest master) + RHEL kernels in guest This is interesting. What about upstream QEMU v2.5? If that works then we should look for RHEL patches that we've applied. If it doesn't work then we can reverse bisect upstream QEMU. > - downstream QEMU (latest qemu-kvm-rhev) + upstream kernel (4.6.3) Also interesting, but harder to bisect/debug the guest kernel than QEMU. > > Btw, it does not reproduce 100%. I mentioned it in comment 0, actually it's > 10% (yes, it's 10% not 100%, and no zero missing..). Ah yes, sorry, my eyes automatically added an extra zero when I first read that. A non-100% reproducer makes everything more tricky. Bisecting may not be a reliable path forward. (for anyone who is interested in the cpu bt dump, but lazy enough to download the vmlinux): crash> bt -a PID: 44 TASK: ffff880005fece70 CPU: 0 COMMAND: "kworker/0:1" [exception RIP: native_halt+5] RIP: ffffffff8105fec5 RSP: ffff8800055539d8 RFLAGS: 00000046 RAX: 0000000000000de2 RBX: ffff880007c0f380 RCX: 0000000000000de4 RDX: 0000000000000de2 RSI: 0000000000000de4 RDI: ffff8800071aca98 RBP: ffff8800055539d8 R8: 0000000000000286 R9: ffff8800074b30d8 R10: ffff880007056540 R11: 0000000000000005 R12: 0000000000000082 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800071aec00 CS: 0010 SS: 0018 #0 [ffff8800055539e0] kvm_lock_spinning at ffffffff8105fa2a #1 [ffff880005553a10] __raw_callee_save_kvm_lock_spinning at ffffffff8105edd5 #2 [ffff880005553a60] _raw_spin_lock_irqsave at ffffffff8168cc63 #3 [ffff880005553a80] scsi_prep_fn at ffffffff8145bb5b #4 [ffff880005553ab8] scsi_dispatch_cmd at ffffffff81454a3a #5 [ffff880005553ae0] scsi_request_fn at ffffffff8145dbc1 #6 [ffff880005553b48] __blk_run_queue at ffffffff812e4683 #7 [ffff880005553b60] blk_execute_rq_nowait at ffffffff812ed375 #8 [ffff880005553ba0] blk_execute_rq at ffffffff812ed4cb #9 [ffff880005553c58] scsi_execute at ffffffff8145a933 #10 [ffff880005553c98] scsi_execute_req_flags at ffffffff8145c27e #11 [ffff880005553d00] sr_check_events at ffffffffa016a76c [sr_mod] #12 [ffff880005553d70] cdrom_check_events at ffffffffa0159058 [cdrom] #13 [ffff880005553d88] sr_block_check_events at ffffffffa016ac19 [sr_mod] #14 [ffff880005553da8] disk_check_events at ffffffff812f7f3b #15 [ffff880005553e10] disk_events_workfn at ffffffff812f80a6 #16 [ffff880005553e20] process_one_work at ffffffff810a622b #17 [ffff880005553e68] worker_thread at ffffffff810a7066 #18 [ffff880005553ec8] kthread at ffffffff810ae81f #19 [ffff880005553f50] ret_from_fork at ffffffff81695698 PID: 0 TASK: ffff880005e23ec0 CPU: 1 COMMAND: "swapper/1" [exception RIP: native_safe_halt+6] RIP: ffffffff8105feb6 RSP: ffff880005e3fe98 RFLAGS: 00000286 RAX: 00000000ffffffed RBX: ffff880005e3c000 RCX: 0100000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 RBP: ffff880005e3fe98 R8: 0000000000000000 R9: 0000000000000028 R10: 0000000000000028 R11: ffff880005eb9028 R12: 0000000000000001 R13: ffff880005e3c000 R14: ffff880005e3c000 R15: 0000000000000000 CS: 0010 SS: 0018 #0 [ffff880005e3fea0] default_idle at ffffffff8103459f #1 [ffff880005e3fec0] arch_cpu_idle at ffffffff81034ee6 #2 [ffff880005e3fed0] cpu_startup_entry at ffffffff810e5dc5 #3 [ffff880005e3ff28] start_secondary at ffffffff8104eeaa PID: 1 TASK: ffff880005d58000 CPU: 2 COMMAND: "systemd" [exception RIP: native_halt+5] RIP: ffffffff8105fec5 RSP: ffff880005dc39b0 RFLAGS: 00000046 RAX: 0000000000000de0 RBX: ffff880007d0f380 RCX: 0000000000000de2 RDX: 0000000000000de0 RSI: 0000000000000de2 RDI: ffff8800071aca98 RBP: ffff880005dc39b0 R8: 0000000000000286 R9: ffff8800074b38d8 R10: ffff880000805340 R11: ffff880000805340 R12: 0000000000000082 R13: 0000000000000002 R14: 0000000000000000 R15: ffff880007706180 CS: 0010 SS: 0018 #0 [ffff880005dc39b8] kvm_lock_spinning at ffffffff8105fa2a #1 [ffff880005dc39e8] __raw_callee_save_kvm_lock_spinning at ffffffff8105edd5 #2 [ffff880005dc3a38] _raw_spin_lock_irqsave at ffffffff8168cc63 #3 [ffff880005dc3a90] scsi_dispatch_cmd at ffffffff81454a3a #4 [ffff880005dc3ab8] scsi_request_fn at ffffffff8145dbc1 #5 [ffff880005dc3b20] __blk_run_queue at ffffffff812e4683 #6 [ffff880005dc3b38] blk_queue_bio at ffffffff812e93d3 #7 [ffff880005dc3b80] generic_make_request at ffffffff812e44b2 #8 [ffff880005dc3bb8] submit_bio at ffffffff812e4571 #9 [ffff880005dc3c10] _submit_bh at ffffffff8122c1b3 #10 [ffff880005dc3c40] ll_rw_block at ffffffff8122c4a9 #11 [ffff880005dc3c68] __breadahead at ffffffff8122c97d #12 [ffff880005dc3c88] __ext4_get_inode_loc at ffffffffa01b967a [ext4] #13 [ffff880005dc3cf0] ext4_iget at ffffffffa01bb3a6 [ext4] #14 [ffff880005dc3d80] ext4_iget_normal at ffffffffa01bbe25 [ext4] #15 [ffff880005dc3d90] ext4_lookup at ffffffffa01c5e57 [ext4] #16 [ffff880005dc3dc8] lookup_real at ffffffff81201add #17 [ffff880005dc3de8] __lookup_hash at ffffffff81202452 #18 [ffff880005dc3e18] filename_create at ffffffff81206438 #19 [ffff880005dc3ee8] user_path_create at ffffffff81208571 #20 [ffff880005dc3f18] sys_mkdirat at ffffffff81209886 #21 [ffff880005dc3f70] sys_mkdir at ffffffff81209939 #22 [ffff880005dc3f80] system_call_fastpath at ffffffff81695749 RIP: 00007f337ab60057 RSP: 00007fff20c7b188 RFLAGS: 00010246 RAX: 0000000000000053 RBX: ffffffff81695749 RCX: 00007f337ab706ba RDX: 0000000000000000 RSI: 00000000000001ed RDI: 00007fff20c7b2f0 RBP: 00007fff20c7b2f0 R8: 000000000000cf10 R9: 0000000000000004 R10: 00007fff20c7b2c0 R11: 0000000000000246 R12: ffffffff81209939 R13: ffff880005dc3f78 R14: 00007fff20c7c410 R15: 00007fff20c7b260 ORIG_RAX: 0000000000000053 CS: 0033 SS: 002b PID: 0 TASK: ffff880005e25e20 CPU: 3 COMMAND: "swapper/3" [exception RIP: native_safe_halt+6] RIP: ffffffff8105feb6 RSP: ffff880005e87e98 RFLAGS: 00000286 RAX: 00000000ffffffed RBX: ffff880005e84000 RCX: 0100000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 RBP: ffff880005e87e98 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 R13: ffff880005e84000 R14: ffff880005e84000 R15: 0000000000000000 CS: 0010 SS: 0018 #0 [ffff880005e87ea0] default_idle at ffffffff8103459f #1 [ffff880005e87ec0] arch_cpu_idle at ffffffff81034ee6 #2 [ffff880005e87ed0] cpu_startup_entry at ffffffff810e5dc5 #3 [ffff880005e87f28] start_secondary at ffffffff8104eeaa (In reply to Andrew Jones from comment #11) > (In reply to Peter Xu from comment #7) > > Or say, I will *NOT* encounter the bug with following configurations: > > > > - upstream QEMU (latest master) + RHEL kernels in guest > > This is interesting. What about upstream QEMU v2.5? If that works then we > should look for RHEL patches that we've applied. If it doesn't work then we > can reverse bisect upstream QEMU. > > > - downstream QEMU (latest qemu-kvm-rhev) + upstream kernel (4.6.3) > > Also interesting, but harder to bisect/debug the guest kernel than QEMU. > > > > > Btw, it does not reproduce 100%. I mentioned it in comment 0, actually it's > > 10% (yes, it's 10% not 100%, and no zero missing..). > > Ah yes, sorry, my eyes automatically added an extra zero when I first read > that. A non-100% reproducer makes everything more tricky. Bisecting may not > be a reliable path forward. Maybe I should write "20%" next time to avoid this. :) Let me try v2.5.0 then to see whether it's possible for a bisection. Thanks! comment 12 shows that two vcpus are waiting for the same lock (RDI is the same). I doubt we have a locking problem though, I think it's more likely that a third vcpu was holding that lock when those two arrived, then that third one released it and kicked the next in line (we're using pvticketlocks). Kicking is done with an interrupt, which, for whatever reason, isn't getting delivered when using kernel_irqchip=off. I hope reverse bisecting qemu works, since I think that should be the shortest path to root causing this. (In reply to Peter Xu from comment #13) > Let me try v2.5.0 then to see whether it's possible for a bisection. Thanks! Oops, I meant v2.6, qemu-kvm-rhev's upstream base. No need to try v2.5. (In reply to Andrew Jones from comment #15) > (In reply to Peter Xu from comment #13) > > Let me try v2.5.0 then to see whether it's possible for a bisection. Thanks! > > Oops, I meant v2.6, qemu-kvm-rhev's upstream base. No need to try v2.5. V2.6 does not encounter this bug. A bisection shows that this bug is introduced in commit: commit caaa2b731cf73681fe49ec96c2b817b22335679c Author: Miroslav Rezanina <mrezanin> Date: Sun Dec 14 18:32:18 2014 +0100 Add RHEL 7 machine types This commit adds all changes related to machine types applied in qemu-kvm-rhev-2.1.2-16.el7. And further, I shrinked the scope to target-i386/cpu.c changes in above commit. To see them: git diff caaa2b73~..caaa2b73 target-i386/cpu.c Mirek, do you have any clue with above info? (In reply to Peter Xu from comment #16) > And further, I shrinked the scope to target-i386/cpu.c changes in above > commit. To see them: > > git diff caaa2b73~..caaa2b73 target-i386/cpu.c > > Mirek, do you have any clue with above info? This changes the default CPUID flags for the default CPU model (qemu64), and enable extra CPU features. Probably it triggers an existing bug related to some of those flags. We need to check if enabling each of those flags on upstream QEMU also triggers the bug. My bet is on kvm-pv-unhalt: it probably requires kernel-irqchip to be enabled, to work, but QEMU enables it on CPUID anyway. Probably "-cpu qemu64,+kvm-pv-unhalt" will trigger the bug upstream, too (I am going to test it). (In reply to Eduardo Habkost from comment #18) > Probably "-cpu > qemu64,+kvm-pv-unhalt" will trigger the bug upstream, too (I am going to > test it). I mean "-cpu qemu64,+kvm_pv_unhalt". (In reply to Eduardo Habkost from comment #18) > My bet is on kvm-pv-unhalt: it probably requires kernel-irqchip to be > enabled, to work, but QEMU enables it on CPUID anyway. Probably "-cpu > qemu64,+kvm-pv-unhalt" will trigger the bug upstream, too (I am going to > test it). I couldn't reproduce it on my system yet. But considering that the bug may be hard to reproduce if too many variables change, it's probably easier to do the opposite: start with a case where the bug is already reproducible, and append "-cpu qemu64,-kvm_pv_unhalt" to the command-line. By looking at the KVM_PV_UNHALT implementation in KVM, I can't tell if it is really supposed to work without kernel_irqchip, or not. Eduardo is working on this bz, so changing "assigned to" to Eduardo for better tracking. Thanks! Fix included in qemu-kvm-rhev-2.6.0-22.el7 Reproduced with: qemu-kvm-rhev: 2.6.0-18.el7.x86_64 kernel: 3.10.0-475.el7.x86_64 (both host and guest) How reproducible: 30% Steps to Reproduce: 1. Boot guest. 2. See grub, select specific RHEL kernel 3. Kernel starts to load Actual results: Guest hangs during kernel boot, before reaching a prompt. ********************************************************* Verified with: qemu-kvm-rhev: 2.6.0-22.el7.x86_64 kernel: 3.10.0-475.el7.x86_64 (both host and guest) Actual results: Guest boots successfully. Command line is as attachment "CMD". Created attachment 1198122 [details]
CMD
(In reply to yduan from comment #24) > Verified with: > qemu-kvm-rhev: 2.6.0-22.el7.x86_64 > kernel: 3.10.0-475.el7.x86_64 (both host and guest) > > Actual results: > Guest boots successfully. > I tried around 20 times and didn't meet the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |