Bug 1363679

Summary: RHEL guest hangs with kernel-irqchip=off and smp>1
Product: Red Hat Enterprise Linux 7 Reporter: Peter Xu <peterx>
Component: qemu-kvm-rhevAssignee: Eduardo Habkost <ehabkost>
Status: CLOSED ERRATA QA Contact: Chao Yang <chayang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, drjones, ehabkost, huding, juzhang, mrezanin, virt-maint, yduan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-22.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 21:28:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
CMD none

Description Peter Xu 2016-08-03 10:45:28 UTC
Description of problem:

Boot RHEL guest using the following parameters:

bin=qemu-system-x86_64
$bin -machine pc,kernel-irqchip=off \
     -smp 4 -enable-kvm -monitor stdio \
     /var/lib/libvirt/images/vm1.qcow2

Guest hanged during boot (before reaching the login prompt).

Version-Release number of selected component (if applicable):

qemu-kvm-rhev: 2.6.0-18.el7.x86_64
kernel: 3.10.0-475.el7.x86_64 (both host and guest)

How reproducible:

10%

Steps to Reproduce:
1. Boot guest.
2. See grub, select specific RHEL kernel
3. Kernel starts to load

Actual results:

Guest hangs during kernel boot, before reaching a prompt.

Expected results:

Guest boots successfully.

Additional info:

None.

Comment 2 Andrew Jones 2016-08-03 11:47:30 UTC
The guest hang is likely due to the primary cpu waiting for secondaries to send it IPIs, and the secondaries waiting for the primary to acknowledge those IPIs, but IPIs aren't going anywhere because of something wrong with the switch to the QEMU emulated interrupt controller.

I just tried to reproduce on Fedora 24 using kvm-unit-tests

qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel x86/smptest.flat -smp 2

It didn't reproduce for me. Does that unit test succeed on RHEL? If so, then my guess about the guest kernel hang may not be correct.

Comment 3 Peter Xu 2016-08-03 12:41:09 UTC
(In reply to Andrew Jones from comment #2)
> The guest hang is likely due to the primary cpu waiting for secondaries to
> send it IPIs, and the secondaries waiting for the primary to acknowledge
> those IPIs, but IPIs aren't going anywhere because of something wrong with
> the switch to the QEMU emulated interrupt controller.
> 
> I just tried to reproduce on Fedora 24 using kvm-unit-tests
> 
> qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device
> isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel
> x86/smptest.flat -smp 2
> 
> It didn't reproduce for me. Does that unit test succeed on RHEL? If so, then
> my guess about the guest kernel hang may not be correct.

Hi, Andrew, 

Thanks for the update. I don't know whether smptest.flat can reproduce it, IIRC even upstream kernels cannot trigger this bug, only RHEL kernels. However I need some more tests to verify this, maybe tomorrow or day after that.

Comment 4 Andrew Jones 2016-08-03 13:00:11 UTC
(In reply to Peter Xu from comment #3)
> Thanks for the update. I don't know whether smptest.flat can reproduce it,
> IIRC even upstream kernels cannot trigger this bug, only RHEL kernels.
> However I need some more tests to verify this, maybe tomorrow or day after
> that.

You don't need an upstream kernel to run kvm-unit-tests. Just do the following steps on the RHEL host that you used to report this bug

git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
cd kvm-unit-tests
./configure
make -j

x86/run x86/smptest.flat -smp 2 # this runs it with kernel_irqchip=on (the default) and should work

qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel x86/smptest.flat -smp 2 # this hopefully hangs, reproducing the bug

Comment 5 Peter Xu 2016-08-04 09:25:12 UTC
(In reply to Andrew Jones from comment #4)
> (In reply to Peter Xu from comment #3)
> > Thanks for the update. I don't know whether smptest.flat can reproduce it,
> > IIRC even upstream kernels cannot trigger this bug, only RHEL kernels.
> > However I need some more tests to verify this, maybe tomorrow or day after
> > that.
> 
> You don't need an upstream kernel to run kvm-unit-tests. Just do the
> following steps on the RHEL host that you used to report this bug
> 
> git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
> cd kvm-unit-tests
> ./configure
> make -j
> 
> x86/run x86/smptest.flat -smp 2 # this runs it with kernel_irqchip=on (the
> default) and should work
> 
> qemu-kvm -machine pc,accel=kvm,kernel_irqchip=off -device pc-testdev -device
> isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -kernel
> x86/smptest.flat -smp 2 # this hopefully hangs, reproducing the bug

That host survived the above test (I did it for several times).

I suspect this bug (not sure whether this is related to my root fs on the guest, let's say there is a bug somewhere) is related to guest kernel behavior during boot. I never reproduced it on upstream kernels (I was using 4.6.3), but I can reproduce it in several rounds if with RHEL kernels (I tried more than one version, also 3.10.0-483.el7.x86_64).

Comment 6 Andrew Jones 2016-08-04 09:51:08 UTC
(In reply to Peter Xu from comment #5)
> That host survived the above test (I did it for several times).
> 

I didn't expect a problem with the host. comment 0 says the problem is a guest hang. So did the test PASS? Does it PASS no matter how many vcpus you configure, i.e. -smp $NR_VCPUS, where 2 <= $NR_VCPUS <= $NR_PCPUS?

> I suspect this bug (not sure whether this is related to my root fs on the
> guest, let's say there is a bug somewhere) is related to guest kernel
> behavior during boot.

As I said in comment 2, a guest hang after switching the interrupt controller emulator from the kernel to userspace is quite likely due to waiting on IPIs. kvm-unit-tests/x86/smptest.c specifically tests that idea without needing a whole Linux kernel for the guest. But, if the unit test is PASSing, then I guess you should check to see what the Linux guest is doing by using virsh dump and the crash utility.

> I never reproduced it on upstream kernels (I was using
> 4.6.3), but I can reproduce it in several rounds if with RHEL kernels (I
> tried more than one version, also 3.10.0-483.el7.x86_64).

I presume you're talking about the guest kernel version? It doesn't reproduce 100% with the RHEL kernels?

Comment 7 Peter Xu 2016-08-04 10:10:18 UTC
(In reply to Andrew Jones from comment #6)
> (In reply to Peter Xu from comment #5)
> > That host survived the above test (I did it for several times).
> > 
> 
> I didn't expect a problem with the host. comment 0 says the problem is a
> guest hang. So did the test PASS? Does it PASS no matter how many vcpus you
> configure, i.e. -smp $NR_VCPUS, where 2 <= $NR_VCPUS <= $NR_PCPUS?

I was trying using 4 vcpus. Yes it passed the tests. 

> 
> > I suspect this bug (not sure whether this is related to my root fs on the
> > guest, let's say there is a bug somewhere) is related to guest kernel
> > behavior during boot.
> 
> As I said in comment 2, a guest hang after switching the interrupt
> controller emulator from the kernel to userspace is quite likely due to
> waiting on IPIs. kvm-unit-tests/x86/smptest.c specifically tests that idea
> without needing a whole Linux kernel for the guest. But, if the unit test is
> PASSing, then I guess you should check to see what the Linux guest is doing
> by using virsh dump and the crash utility.

Yes, that's on my plan as well. I will first try download debuginfo for that guest kernel, and see whether I can remote gdb to it using the vmlinux matching guest kernel.

> 
> > I never reproduced it on upstream kernels (I was using
> > 4.6.3), but I can reproduce it in several rounds if with RHEL kernels (I
> > tried more than one version, also 3.10.0-483.el7.x86_64).
> 
> I presume you're talking about the guest kernel version? It doesn't
> reproduce 100% with the RHEL kernels?

Right. This bug only happens if with:

- downstream qemu-kvm-rhev
- downstream RHEL kernel in guest

Or say, I will *NOT* encounter the bug with following configurations:

- upstream QEMU (latest master) + RHEL kernels in guest
- downstream QEMU (latest qemu-kvm-rhev) + upstream kernel (4.6.3)

Btw, it does not reproduce 100%. I mentioned it in comment 0, actually it's 10% (yes, it's 10% not 100%, and no zero missing..). For most cases, I can reproduce it with several rounds of system_reset.

Comment 8 Peter Xu 2016-08-04 10:15:00 UTC
(In reply to Peter Xu from comment #7)
> (In reply to Andrew Jones from comment #6)
> > (In reply to Peter Xu from comment #5)
> > > That host survived the above test (I did it for several times).
> > > 
> > 
> > I didn't expect a problem with the host. comment 0 says the problem is a
> > guest hang. So did the test PASS? Does it PASS no matter how many vcpus you
> > configure, i.e. -smp $NR_VCPUS, where 2 <= $NR_VCPUS <= $NR_PCPUS?
> 
> I was trying using 4 vcpus. Yes it passed the tests. 

I re-tested using 64 vcpus for 10 rounds. All passed.

Comment 9 Peter Xu 2016-08-04 12:36:14 UTC
use dump-guest-memory -z to dump image when hang, then open with crash:

crash> log
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-478.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Sat Jul 23 10:11:44 EDT 2016
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-478.el7.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.lvm.lv=fedora/root quiet nosplash intel_iommu=on

...

[    1.242104] sd 0:0:0:0: [sda] Attached SCSI disk
[    1.328461] type=1130 audit(1470312326.182:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    1.405764] type=1130 audit(1470312326.259:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    1.467845] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[    1.481766] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[    1.587173] tsc: Refined TSC clocksource calibration: 2793.513 MHz
[    1.688399] systemd-journald[116]: Received SIGTERM from PID 1 (systemd).
[   61.698222] INFO: rcu_sched detected stalls on CPUs/tasks: { 2} (detected by 3, t=60003 jiffies, g=-52, c=-53, q=0)
[   61.699279] Task dump for CPU 2:
[   61.699281] systemd         R  running task        0     1      0 0x00000008
[   61.699284]  ffff880005dc3a88 ffffffffa00fbd4d ffff8800071f0000 ffff880005dc3a68
[   61.699287]  ffff880000805340 0000000000000000 ffff8800074b3800 ffff88000765e048
[   61.699289]  ffff880007706180 ffff880005dc3ab0 ffffffff81454a3a ffff88000765e000
[   61.699291] Call Trace:
[   61.699306]  [<ffffffffa00fbd4d>] ? ata_scsi_queuecmd+0x2d/0x420 [libata]
[   61.699313]  [<ffffffff81454a3a>] scsi_dispatch_cmd+0xaa/0x230
[   61.699316]  [<ffffffff8145dbc1>] scsi_request_fn+0x501/0x770
[   61.699322]  [<ffffffff812e4683>] __blk_run_queue+0x33/0x40
[   61.699325]  [<ffffffff812e93d3>] blk_queue_bio+0x343/0x3a0
[   61.699327]  [<ffffffff812e44b2>] generic_make_request+0xe2/0x130
[   61.699329]  [<ffffffff812e4571>] submit_bio+0x71/0x150
[   61.699334]  [<ffffffff8123109d>] ? bio_alloc_bioset+0x1fd/0x350
[   61.699336]  [<ffffffff8122c1b3>] _submit_bh+0x143/0x210
[   61.699338]  [<ffffffff8122c4a9>] ll_rw_block+0xa9/0xb0
[   61.699340]  [<ffffffff8122c97d>] __breadahead+0x3d/0x70
[   61.699348]  [<ffffffffa01b967a>] __ext4_get_inode_loc+0x39a/0x3e0 [ext4]
[   61.699353]  [<ffffffffa01bb3a6>] ext4_iget+0x96/0xae0 [ext4]
[   61.699358]  [<ffffffffa01bbe25>] ext4_iget_normal+0x35/0x40 [ext4]
[   61.699364]  [<ffffffffa01c5e57>] ext4_lookup+0x107/0x170 [ext4]
[   61.699367]  [<ffffffff81201add>] lookup_real+0x1d/0x50
[   61.699370]  [<ffffffff81202452>] __lookup_hash+0x42/0x60
[   61.699373]  [<ffffffff81206438>] filename_create+0x98/0x180
[   61.699376]  [<ffffffff811d9915>] ? kmem_cache_alloc+0x35/0x1e0
[   61.699378]  [<ffffffff8120824f>] ? getname_flags+0x4f/0x1a0
[   61.699380]  [<ffffffff812082c4>] ? getname_flags+0xc4/0x1a0
[   61.699383]  [<ffffffff81208571>] user_path_create+0x41/0x60
[   61.699384]  [<ffffffff81209886>] SyS_mkdirat+0x46/0xe0
[   61.699386]  [<ffffffff81209939>] SyS_mkdir+0x19/0x20
[   61.699389]  [<ffffffff81695749>] system_call_fastpath+0x16/0x1b

Seems that IO for rootfs stuck. 

QEMU thread dump:

(gdb) thread apply all bt
Thread 7 (Thread 0x7fffdf7fe700 (LWP 966)):
#0  0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>)
    at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123
#2  0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042
#3  0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x55555663d300) at /root/git/rh/qemu-kvm/cpus.c:1081
#4  0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0
#5  0x00007ffff1909ead in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7fffdffff700 (LWP 963)):
#0  0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>)
    at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123
#2  0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042
#3  0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x555556617350) at /root/git/rh/qemu-kvm/cpus.c:1081
#4  0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0
#5  0x00007ffff1909ead in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7fffe49b4700 (LWP 962)):
#0  0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>)
    at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123
#2  0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042
#3  0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x555556604900) at /root/git/rh/qemu-kvm/cpus.c:1081
#4  0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0
#5  0x00007ffff1909ead in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7fffe51b5700 (LWP 959)):
#0  0x00007ffff60ddbd0 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a2ce09 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555602c1e0 <qemu_global_mutex>)
    at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:123
#2  0x000055555577d61f in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /root/git/rh/qemu-kvm/cpus.c:1042
#3  0x000055555577d61f in qemu_kvm_cpu_thread_fn (arg=0x555556591dc0) at /root/git/rh/qemu-kvm/cpus.c:1081
#4  0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0
#5  0x00007ffff1909ead in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7fffe7f4a700 (LWP 957)):
#0  0x00007ffff1903ff9 in syscall () at /lib64/libc.so.6
#1  0x0000555555a2d118 in qemu_event_wait (val=<optimized out>, ev=<optimized out>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:292
#2  0x0000555555a2d118 in qemu_event_wait (ev=ev@entry=0x55555644a644 <rcu_call_ready_event>) at /root/git/rh/qemu-kvm/util/qemu-thread-posix.c:399
#3  0x0000555555a3b5ce in call_rcu_thread (opaque=<optimized out>) at /root/git/rh/qemu-kvm/util/rcu.c:250
#4  0x00007ffff60d85ca in start_thread () at /lib64/libpthread.so.0
#5  0x00007ffff1909ead in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7f6cc00 (LWP 953)):
#0  0x00007ffff18fe3f1 in ppoll () at /lib64/libc.so.6
#1  0x00005555559a01e9 in qemu_poll_ns (__ss=0x0, __timeout=0x7fffffffd740, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x00005555559a01e9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=2785487) at /root/git/rh/qemu-kvm/qemu-timer.c:325
#3  0x000055555599fc0a in main_loop_wait (timeout=2785487) at /root/git/rh/qemu-kvm/main-loop.c:252
#4  0x000055555599fc0a in main_loop_wait (nonblocking=<optimized out>) at /root/git/rh/qemu-kvm/main-loop.c:506
#5  0x00005555557499b5 in main () at /root/git/rh/qemu-kvm/vl.c:1939
#6  0x00005555557499b5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/git/rh/qemu-kvm/vl.c:4674

Comment 10 Peter Xu 2016-08-04 12:45:46 UTC
Uploaded guest memory dump at:

http://file.nay.redhat.com/zhexu/bugs/2016-08-04-rhel-guest-hang-with-irqchip-off/

Comment 11 Andrew Jones 2016-08-04 13:14:38 UTC
(In reply to Peter Xu from comment #7)
 > Or say, I will *NOT* encounter the bug with following configurations:
> 
> - upstream QEMU (latest master) + RHEL kernels in guest

This is interesting. What about upstream QEMU v2.5? If that works then we should look for RHEL patches that we've applied. If it doesn't work then we can reverse bisect upstream QEMU.

> - downstream QEMU (latest qemu-kvm-rhev) + upstream kernel (4.6.3)

Also interesting, but harder to bisect/debug the guest kernel than QEMU.

> 
> Btw, it does not reproduce 100%. I mentioned it in comment 0, actually it's
> 10% (yes, it's 10% not 100%, and no zero missing..).

Ah yes, sorry, my eyes automatically added an extra zero when I first read that. A non-100% reproducer makes everything more tricky. Bisecting may not be a reliable path forward.

Comment 12 Peter Xu 2016-08-05 02:02:34 UTC
(for anyone who is interested in the cpu bt dump, but lazy enough to download the vmlinux):

crash> bt -a
PID: 44     TASK: ffff880005fece70  CPU: 0   COMMAND: "kworker/0:1"
    [exception RIP: native_halt+5]
    RIP: ffffffff8105fec5  RSP: ffff8800055539d8  RFLAGS: 00000046
    RAX: 0000000000000de2  RBX: ffff880007c0f380  RCX: 0000000000000de4
    RDX: 0000000000000de2  RSI: 0000000000000de4  RDI: ffff8800071aca98
    RBP: ffff8800055539d8   R8: 0000000000000286   R9: ffff8800074b30d8
    R10: ffff880007056540  R11: 0000000000000005  R12: 0000000000000082
    R13: 0000000000000000  R14: 0000000000000000  R15: ffff8800071aec00
    CS: 0010  SS: 0018
 #0 [ffff8800055539e0] kvm_lock_spinning at ffffffff8105fa2a
 #1 [ffff880005553a10] __raw_callee_save_kvm_lock_spinning at ffffffff8105edd5
 #2 [ffff880005553a60] _raw_spin_lock_irqsave at ffffffff8168cc63
 #3 [ffff880005553a80] scsi_prep_fn at ffffffff8145bb5b
 #4 [ffff880005553ab8] scsi_dispatch_cmd at ffffffff81454a3a
 #5 [ffff880005553ae0] scsi_request_fn at ffffffff8145dbc1
 #6 [ffff880005553b48] __blk_run_queue at ffffffff812e4683
 #7 [ffff880005553b60] blk_execute_rq_nowait at ffffffff812ed375
 #8 [ffff880005553ba0] blk_execute_rq at ffffffff812ed4cb
 #9 [ffff880005553c58] scsi_execute at ffffffff8145a933
#10 [ffff880005553c98] scsi_execute_req_flags at ffffffff8145c27e
#11 [ffff880005553d00] sr_check_events at ffffffffa016a76c [sr_mod]
#12 [ffff880005553d70] cdrom_check_events at ffffffffa0159058 [cdrom]
#13 [ffff880005553d88] sr_block_check_events at ffffffffa016ac19 [sr_mod]
#14 [ffff880005553da8] disk_check_events at ffffffff812f7f3b
#15 [ffff880005553e10] disk_events_workfn at ffffffff812f80a6
#16 [ffff880005553e20] process_one_work at ffffffff810a622b
#17 [ffff880005553e68] worker_thread at ffffffff810a7066
#18 [ffff880005553ec8] kthread at ffffffff810ae81f
#19 [ffff880005553f50] ret_from_fork at ffffffff81695698

PID: 0      TASK: ffff880005e23ec0  CPU: 1   COMMAND: "swapper/1"
    [exception RIP: native_safe_halt+6]
    RIP: ffffffff8105feb6  RSP: ffff880005e3fe98  RFLAGS: 00000286
    RAX: 00000000ffffffed  RBX: ffff880005e3c000  RCX: 0100000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000046
    RBP: ffff880005e3fe98   R8: 0000000000000000   R9: 0000000000000028
    R10: 0000000000000028  R11: ffff880005eb9028  R12: 0000000000000001
    R13: ffff880005e3c000  R14: ffff880005e3c000  R15: 0000000000000000
    CS: 0010  SS: 0018
 #0 [ffff880005e3fea0] default_idle at ffffffff8103459f
 #1 [ffff880005e3fec0] arch_cpu_idle at ffffffff81034ee6
 #2 [ffff880005e3fed0] cpu_startup_entry at ffffffff810e5dc5
 #3 [ffff880005e3ff28] start_secondary at ffffffff8104eeaa

PID: 1      TASK: ffff880005d58000  CPU: 2   COMMAND: "systemd"
    [exception RIP: native_halt+5]
    RIP: ffffffff8105fec5  RSP: ffff880005dc39b0  RFLAGS: 00000046
    RAX: 0000000000000de0  RBX: ffff880007d0f380  RCX: 0000000000000de2
    RDX: 0000000000000de0  RSI: 0000000000000de2  RDI: ffff8800071aca98
    RBP: ffff880005dc39b0   R8: 0000000000000286   R9: ffff8800074b38d8
    R10: ffff880000805340  R11: ffff880000805340  R12: 0000000000000082
    R13: 0000000000000002  R14: 0000000000000000  R15: ffff880007706180
    CS: 0010  SS: 0018
 #0 [ffff880005dc39b8] kvm_lock_spinning at ffffffff8105fa2a
 #1 [ffff880005dc39e8] __raw_callee_save_kvm_lock_spinning at ffffffff8105edd5
 #2 [ffff880005dc3a38] _raw_spin_lock_irqsave at ffffffff8168cc63
 #3 [ffff880005dc3a90] scsi_dispatch_cmd at ffffffff81454a3a
 #4 [ffff880005dc3ab8] scsi_request_fn at ffffffff8145dbc1
 #5 [ffff880005dc3b20] __blk_run_queue at ffffffff812e4683
 #6 [ffff880005dc3b38] blk_queue_bio at ffffffff812e93d3
 #7 [ffff880005dc3b80] generic_make_request at ffffffff812e44b2
 #8 [ffff880005dc3bb8] submit_bio at ffffffff812e4571
 #9 [ffff880005dc3c10] _submit_bh at ffffffff8122c1b3
#10 [ffff880005dc3c40] ll_rw_block at ffffffff8122c4a9
#11 [ffff880005dc3c68] __breadahead at ffffffff8122c97d
#12 [ffff880005dc3c88] __ext4_get_inode_loc at ffffffffa01b967a [ext4]
#13 [ffff880005dc3cf0] ext4_iget at ffffffffa01bb3a6 [ext4]
#14 [ffff880005dc3d80] ext4_iget_normal at ffffffffa01bbe25 [ext4]
#15 [ffff880005dc3d90] ext4_lookup at ffffffffa01c5e57 [ext4]
#16 [ffff880005dc3dc8] lookup_real at ffffffff81201add
#17 [ffff880005dc3de8] __lookup_hash at ffffffff81202452
#18 [ffff880005dc3e18] filename_create at ffffffff81206438
#19 [ffff880005dc3ee8] user_path_create at ffffffff81208571
#20 [ffff880005dc3f18] sys_mkdirat at ffffffff81209886
#21 [ffff880005dc3f70] sys_mkdir at ffffffff81209939
#22 [ffff880005dc3f80] system_call_fastpath at ffffffff81695749
    RIP: 00007f337ab60057  RSP: 00007fff20c7b188  RFLAGS: 00010246
    RAX: 0000000000000053  RBX: ffffffff81695749  RCX: 00007f337ab706ba
    RDX: 0000000000000000  RSI: 00000000000001ed  RDI: 00007fff20c7b2f0
    RBP: 00007fff20c7b2f0   R8: 000000000000cf10   R9: 0000000000000004
    R10: 00007fff20c7b2c0  R11: 0000000000000246  R12: ffffffff81209939
    R13: ffff880005dc3f78  R14: 00007fff20c7c410  R15: 00007fff20c7b260
    ORIG_RAX: 0000000000000053  CS: 0033  SS: 002b

PID: 0      TASK: ffff880005e25e20  CPU: 3   COMMAND: "swapper/3"
    [exception RIP: native_safe_halt+6]
    RIP: ffffffff8105feb6  RSP: ffff880005e87e98  RFLAGS: 00000286
    RAX: 00000000ffffffed  RBX: ffff880005e84000  RCX: 0100000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000046
    RBP: ffff880005e87e98   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000003
    R13: ffff880005e84000  R14: ffff880005e84000  R15: 0000000000000000
    CS: 0010  SS: 0018
 #0 [ffff880005e87ea0] default_idle at ffffffff8103459f
 #1 [ffff880005e87ec0] arch_cpu_idle at ffffffff81034ee6
 #2 [ffff880005e87ed0] cpu_startup_entry at ffffffff810e5dc5
 #3 [ffff880005e87f28] start_secondary at ffffffff8104eeaa

Comment 13 Peter Xu 2016-08-05 02:09:48 UTC
(In reply to Andrew Jones from comment #11)
> (In reply to Peter Xu from comment #7)
>  > Or say, I will *NOT* encounter the bug with following configurations:
> > 
> > - upstream QEMU (latest master) + RHEL kernels in guest
> 
> This is interesting. What about upstream QEMU v2.5? If that works then we
> should look for RHEL patches that we've applied. If it doesn't work then we
> can reverse bisect upstream QEMU.
> 
> > - downstream QEMU (latest qemu-kvm-rhev) + upstream kernel (4.6.3)
> 
> Also interesting, but harder to bisect/debug the guest kernel than QEMU.
> 
> > 
> > Btw, it does not reproduce 100%. I mentioned it in comment 0, actually it's
> > 10% (yes, it's 10% not 100%, and no zero missing..).
> 
> Ah yes, sorry, my eyes automatically added an extra zero when I first read
> that. A non-100% reproducer makes everything more tricky. Bisecting may not
> be a reliable path forward.

Maybe I should write "20%" next time to avoid this. :)

Let me try v2.5.0 then to see whether it's possible for a bisection. Thanks!

Comment 14 Andrew Jones 2016-08-05 07:34:50 UTC
comment 12 shows that two vcpus are waiting for the same lock (RDI is the same). I doubt we have a locking problem though, I think it's more likely that a third vcpu was holding that lock when those two arrived, then that third one released it and kicked the next in line (we're using pvticketlocks). Kicking is done with an interrupt, which, for whatever reason, isn't getting delivered when using kernel_irqchip=off.

I hope reverse bisecting qemu works, since I think that should be the shortest path to root causing this.

Comment 15 Andrew Jones 2016-08-05 07:40:10 UTC
(In reply to Peter Xu from comment #13)
> Let me try v2.5.0 then to see whether it's possible for a bisection. Thanks!

Oops, I meant v2.6, qemu-kvm-rhev's upstream base. No need to try v2.5.

Comment 16 Peter Xu 2016-08-10 07:03:40 UTC
(In reply to Andrew Jones from comment #15)
> (In reply to Peter Xu from comment #13)
> > Let me try v2.5.0 then to see whether it's possible for a bisection. Thanks!
> 
> Oops, I meant v2.6, qemu-kvm-rhev's upstream base. No need to try v2.5.

V2.6 does not encounter this bug.

A bisection shows that this bug is introduced in commit:

commit caaa2b731cf73681fe49ec96c2b817b22335679c
Author: Miroslav Rezanina <mrezanin>
Date:   Sun Dec 14 18:32:18 2014 +0100

    Add RHEL 7 machine types
    
    This commit adds all changes related to machine types applied in
    qemu-kvm-rhev-2.1.2-16.el7.

And further, I shrinked the scope to target-i386/cpu.c changes in above commit. To see them:

    git diff caaa2b73~..caaa2b73 target-i386/cpu.c

Mirek, do you have any clue with above info?

Comment 18 Eduardo Habkost 2016-08-10 12:17:49 UTC
(In reply to Peter Xu from comment #16)
> And further, I shrinked the scope to target-i386/cpu.c changes in above
> commit. To see them:
> 
>     git diff caaa2b73~..caaa2b73 target-i386/cpu.c
> 
> Mirek, do you have any clue with above info?

This changes the default CPUID flags for the default CPU model (qemu64), and enable extra CPU features. Probably it triggers an existing bug related to some of those flags. We need to check if enabling each of those flags on upstream QEMU also triggers the bug.

My bet is on kvm-pv-unhalt: it probably requires kernel-irqchip to be enabled, to work, but QEMU enables it on CPUID anyway. Probably "-cpu qemu64,+kvm-pv-unhalt" will trigger the bug upstream, too (I am going to test it).

Comment 19 Eduardo Habkost 2016-08-10 12:26:10 UTC
(In reply to Eduardo Habkost from comment #18)
> Probably "-cpu
> qemu64,+kvm-pv-unhalt" will trigger the bug upstream, too (I am going to
> test it).

I mean "-cpu qemu64,+kvm_pv_unhalt".

Comment 20 Eduardo Habkost 2016-08-10 13:21:58 UTC
(In reply to Eduardo Habkost from comment #18)
> My bet is on kvm-pv-unhalt: it probably requires kernel-irqchip to be
> enabled, to work, but QEMU enables it on CPUID anyway. Probably "-cpu
> qemu64,+kvm-pv-unhalt" will trigger the bug upstream, too (I am going to
> test it).

I couldn't reproduce it on my system yet.

But considering that the bug may be hard to reproduce if too many variables change, it's probably easier to do the opposite: start with a case where the bug is already reproducible, and append "-cpu qemu64,-kvm_pv_unhalt" to the command-line.

By looking at the KVM_PV_UNHALT implementation in KVM, I can't tell if it is really supposed to work without kernel_irqchip, or not.

Comment 21 Peter Xu 2016-08-16 00:35:32 UTC
Eduardo is working on this bz, so changing "assigned to" to Eduardo for better tracking. Thanks!

Comment 22 Miroslav Rezanina 2016-08-22 18:26:33 UTC
Fix included in qemu-kvm-rhev-2.6.0-22.el7

Comment 24 yduan 2016-09-06 08:15:06 UTC
Reproduced with:
qemu-kvm-rhev: 2.6.0-18.el7.x86_64
kernel: 3.10.0-475.el7.x86_64 (both host and guest)

How reproducible:
30%

Steps to Reproduce:
1. Boot guest.
2. See grub, select specific RHEL kernel
3. Kernel starts to load

Actual results:
Guest hangs during kernel boot, before reaching a prompt.

*********************************************************

Verified with:
qemu-kvm-rhev: 2.6.0-22.el7.x86_64
kernel: 3.10.0-475.el7.x86_64 (both host and guest)

Actual results:
Guest boots successfully.


Command line is as attachment "CMD".

Comment 25 yduan 2016-09-06 08:15:40 UTC
Created attachment 1198122 [details]
CMD

Comment 26 yduan 2016-09-07 01:35:24 UTC
(In reply to yduan from comment #24)
> Verified with:
> qemu-kvm-rhev: 2.6.0-22.el7.x86_64
> kernel: 3.10.0-475.el7.x86_64 (both host and guest)
> 
> Actual results:
> Guest boots successfully.
> 

I tried around 20 times and didn't meet the issue.

Comment 29 errata-xmlrpc 2016-11-07 21:28:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html