Bug 664635

Summary: Entering screen to copy/scrollback mode caused guest deadlock
Product: Red Hat Enterprise Linux 5 Reporter: Amos Kong <akong>
Component: kvmAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 5.6CC: ailan, ehabkost, mkenneth, mzhan, tburke, virt-maint
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 665227 670145 (view as bug list) Environment:
Last Closed: 2011-01-17 10:57:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580946, 665227    
Attachments:
Description Flags
strace output none

Description Amos Kong 2010-12-21 03:14:48 UTC
Description of problem:
I used screen to manage my terminals, I connected guest serial by nc _before_ guest completely start up, when I entered screen to copy/scrollback mode, guest deadlocked.
If I recoverred to normall mode, guest also got back.


Version-Release number of selected component (if applicable):
host kernel: 2.6.18-236.el5
# rpm -qa |grep kvm
kmod-kvm-debug-83-223.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-223.el5
etherboot-roms-kvm-5.4.4-13.el5
kvm-tools-83-223.el5
kmod-kvm-83-223.el5
kvm-qemu-img-83-223.el5
kvm-83-223.el5

How reproducible:
100%

Steps to Reproduce:
1. boot up a guest 
# qemu-kvm -name 'vm2' -monitor unix:'/tmp/monitor-humanmonitor1-20101213-111527-JNsP1',server,nowait -serial unix:'/tmp/serial-20101213-111527-JNsP1',server,nowait -drive file='/home/devel/autotest-akong/client/tests/kvm/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=virtio,media=disk,cache=none,snapshot=on,boot=on,format=qcow2 -net nic,vlan=0,model=virtio,macaddr='9a:72:41:21:4c:15' -net tap,vlan=0,ifname='t0-111527-JNsP1',script='/home/devel/autotest-akong/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu qemu64,+sse2 -soundhw ac97 -vnc :2 -rtc-td-hack -M rhel5.6.0 -usbdevice tablet -no-kvm-pit-reinjection

2. connect guest monitor and check guest status
# nc -U /tmp/monitor-humanmonitor1-20101213-111527-JNsP1
(qemu) info status
VM status: running

2. use 'nc' to connect guest serial before guest completely start up
screen) # nc -U /tmp/serial-20101213-111527-JNsP1

3. set screen to enter copy/scrollback mode
screen)   ctrl + a + [

4. connect guest monitor and check guest status
# nc -U /tmp/monitor-humanmonitor1-20101213-111527-JNsP1
  
Actual results:
guest deadlocked.

Expected results:
guest boot up successfully

Additional info:
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff815fc000
0x000000340b60d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000340b60d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000340b608e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x000000340b608cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004ff62e in kvm_mutex_lock () at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:1306
#4  0x00000000004097aa in qemu_select (timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.h:202
#5  main_loop_wait (timeout=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4042
#6  0x00000000005005fa in kvm_main_loop () at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:596
#7  0x000000000040e795 in main_loop (argc=29, argv=0x7fff815c4238, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:4105
#8  main (argc=29, argv=0x7fff815c4238, envp=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/vl.c:6545
(gdb) info threads
  5 Thread 0x40e70940 (LWP 26784)  0x000000340aa31744 in do_sigwaitinfo () from /lib64/libc.so.6
  4 Thread 0x41f0b940 (LWP 26785)  0x000000340b60d89b in write () from /lib64/libpthread.so.0
  3 Thread 0x4290c940 (LWP 26786)  0x000000340b60d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x4330d940 (LWP 26787)  0x000000340aa31744 in do_sigwaitinfo () from /lib64/libc.so.6
* 1 Thread 0x2b3c5755f150 (LWP 26775)  0x000000340b60d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) thread 2
[Switching to thread 2 (Thread 0x4330d940 (LWP 26787))]#0  0x000000340aa31744 in do_sigwaitinfo () from /lib64/libc.so.6
(gdb) bt
#0  0x000000340aa31744 in do_sigwaitinfo () from /lib64/libc.so.6
#1  0x000000340aa317fd in sigwaitinfo () from /lib64/libc.so.6
#2  0x000000000041a991 in sigwait_compat (opaque=<value optimized out>) at compatfd.c:38
#3  0x000000340b60673d in start_thread () from /lib64/libpthread.so.0
#4  0x000000340aad40cd in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x4290c940 (LWP 26786))]#0  0x000000340b60d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000340b60d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000340b608e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x000000340b608cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004fff1e in post_kvm_run (opaque=0x0, data=0x80) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:160
#4  0x000000000052bc54 in post_kvm_run (kvm=0x1895e7e0, env=0x1913e010) at libkvm.c:874
#5  0x000000000052be0b in kvm_run (kvm=0x1895e7e0, vcpu=1, env=0x1913e010) at libkvm.c:920
#6  0x0000000000500759 in kvm_cpu_exec (env=0x0) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:206
#7  0x00000000005009e3 in kvm_main_loop_cpu (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:402
#8  ap_main_loop (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:443
#9  0x000000340b60673d in start_thread () from /lib64/libpthread.so.0
#10 0x000000340aad40cd in clone () from /lib64/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 0x41f0b940 (LWP 26785))]#0  0x000000340b60d89b in write () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000340b60d89b in write () from /lib64/libpthread.so.0
#1  0x0000000000470f09 in unix_write (fd=23, buf=<value optimized out>, len1=1) at qemu-char.c:477
#2  send_all (fd=23, buf=<value optimized out>, len1=1) at qemu-char.c:493
#3  0x000000000043a6a2 in serial_xmit (opaque=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/serial.c:311
#4  0x000000000043ab31 in serial_ioport_write (opaque=0x191559a0, addr=<value optimized out>, val=99) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/serial.c:366
#5  0x0000000000500018 in kvm_outb (opaque=<value optimized out>, addr=22948, data=1 '\001') at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:684
#6  0x000000000052b73f in handle_io (kvm=0x1895e7e0, run=0x2aaaaaaac000, vcpu=0) at libkvm.c:738
#7  0x000000000052c029 in kvm_run (kvm=0x1895e7e0, vcpu=0, env=0x19124950) at libkvm.c:967
#8  0x0000000000500759 in kvm_cpu_exec (env=0x1) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:206
#9  0x00000000005009e3 in kvm_main_loop_cpu (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:402
#10 ap_main_loop (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:443
#11 0x000000340b60673d in start_thread () from /lib64/libpthread.so.0
#12 0x000000340aad40cd in clone () from /lib64/libc.so.6
(gdb) thread 5 
[Switching to thread 5 (Thread 0x40e70940 (LWP 26784))]#0  0x000000340aa31744 in do_sigwaitinfo () from /lib64/libc.so.6
(gdb) bt
#0  0x000000340aa31744 in do_sigwaitinfo () from /lib64/libc.so.6
#1  0x000000340aa317fd in sigwaitinfo () from /lib64/libc.so.6
#2  0x000000000041a991 in sigwait_compat (opaque=<value optimized out>) at compatfd.c:38
#3  0x000000340b60673d in start_thread () from /lib64/libpthread.so.0
#4  0x000000340aad40cd in clone () from /lib64/libc.so.6

Comment 1 Amos Kong 2010-12-21 03:24:22 UTC
# qemu-kvm ... -serial stdio ...

When I redirected serial output to stdio, and entered screen to copy/scrollback mode, guest also deadlocked.

===>  After recoverred screen to normal mode, guest outputted:
BUG: soft lockup - CPU#1 stuck for 84s! [kauditd:793]
Modules linked in: ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport virtio_balloon snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core sg ext4 mbcache jbd2 sr_mod cdrom virtio_blk virtio_net pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: speedstep_lib]
CPU 1:
Modules linked in: ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport virtio_balloon snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core sg ext4 mbcache jbd2 sr_mod cdrom virtio_blk virtio_net pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: speedstep_lib]
Pid: 793, comm: kauditd Not tainted 2.6.32-71.7.1.el6.x86_64 #1 KVM
RIP: 0010:[<ffffffff8107de86>]  [<ffffffff8107de86>] run_timer_softirq+0x176/0x340
RSP: 0018:ffff880028223e50  EFLAGS: 00000246
RAX: ffff880028223e90 RBX: ffff880028223ed0 RCX: 0000000000000430
RDX: ffff880028223e90 RSI: dead000000200200 RDI: 0000000000000000
RBP: ffffffff81013c93 R08: ffff88013b7d4450 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000400 R12: ffff880028223dd0
R13: ffffffff81c9a9d0 R14: ffff88013b7d4000 R15: ffffffff814cfdac
FS:  0000000000000000(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f67d35ffa60 CR3: 00000001391d0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>  [<ffffffff8102f52d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81073c77>] ? __do_softirq+0xb7/0x1e0
 [<ffffffff81095af0>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff810142cc>] ? call_softirq+0x1c/0x30
 [<ffffffff81015f35>] ? do_softirq+0x65/0xa0
 [<ffffffff81073a75>] ? irq_exit+0x85/0x90
 [<ffffffff814cfdb1>] ? smp_apic_timer_interrupt+0x71/0x9c
 [<ffffffff81013c93>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8106c611>] ? vprintk+0x1d1/0x4f0
 [<ffffffff81059db2>] ? finish_task_switch+0x42/0xd0
 [<ffffffff814c7fd3>] ? printk+0x41/0x46
 [<ffffffff810cad5b>] ? audit_printk_skb+0x6b/0x70
 [<ffffffff810cae48>] ? kauditd_thread+0xe8/0x180
 [<ffffffff8105c530>] ? default_wake_function+0x0/0x20
 [<ffffffff810cad60>] ? kauditd_thread+0x0/0x180
 [<ffffffff810919d6>] ? kthread+0x96/0xa0
 [<ffffffff810141ca>] ? child_rip+0xa/0x20
 [<ffffffff81091940>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20

Comment 2 Amos Kong 2010-12-21 03:32:32 UTC
Bug can be reproduced in UPSTREAM, so add this keyword.

qemu-kvm: commit 53b6d3d5c2522e881c8d194f122de3114f6f76eb
    date:   Thu Dec 2 12:15:20 2010 -0200

Comment 3 Amos Kong 2010-12-21 05:26:02 UTC
Created attachment 469908 [details]
strace output

....
....
read(21, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
rt_sigaction(SIGALRM, NULL, {0x407a90, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x340b60eb10}, 8) = 0
write(5, "\0", 1)                       = 1
read(21, 0x7fffb2b87580, 128)           = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {962736, 763682502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 763707502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 763728502}) = 0
futex(0xb44f20, FUTEX_WAKE_PRIVATE, 1)  = 1
select(25, [4 6 10 14 15 19 21 23 24], [], [], {1, 0}) = 2 (in [4 24], left {1, 0})
recvfrom(24, "\3\1\0\0\0\0\2\320\1\220", 4096, 0, NULL, NULL) = 10
read(4, "\0", 512)                      = 1
read(4, 0x7fffb2b87400, 512)            = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {962736, 795117502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795138502}) = 0
timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795241502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795264502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795286502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795307502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795341502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795362502}) = 0
clock_gettime(CLOCK_MONOTONIC, {962736, 795382502}) = 0
timer_gettime(0, {it_interval={0, 0}, it_value={0, 36000}}) = 0
futex(0xb44f20, FUTEX_WAKE_PRIVATE, 1)  = 1
select(25, [4 6 10 14 15 19 21 23 24], [], [], {1, 0}) = 1 (in [21], left {0, 999000})
futex(0xb44f20, FUTEX_WAIT_PRIVATE, 2, NULL

Comment 8 Min Zhan 2010-12-22 10:35:34 UTC
I have checked with rhel5u5 and rhel6 guest from libvirt aspect.

For rhel5u5 guest, it works well whenever in unix socket or pty guest serial type.

But for rhel6 guest, the guest will be hang in unix socket serial type and pty serial type after several times of guest reboot.

Detailed steps:
1 Start a guest (rhel6) with serial type is unix socket

...
 <serial type="unix">
      <source mode="bind" path="/tmp/foo"/>
      <target port="0"/>
    </serial>
...

2. Open virt-manager to monitor this guest

3. use 'nc' to connect guest serial before guest completely start up
screen) # nc -U /tmp/foo

4. set screen to enter copy/scrollback mode
screen)   ctrl + a + [

5. reboot the guest for several times and check guest status in virt-manager

Actual result: the guest is hang, then I ctrl+c to close the nc, the guest will be resumed and works well.

Comment 12 Eduardo Habkost 2011-01-17 10:57:29 UTC
Won't fix on RHEL-5. Programs reading from the serial output shouldn't stop reading the data (like screen does) or buffers will fill up and qemu process will block.

We could make qemu not block when writing to serial port (and drop data if needed), but it is not something we would implement in RHEL-5.

I cloned this bug for RHEL-6.1: bug #670145.