Bug 637703

Summary: guest hang at hdc mount during boot
Product: Red Hat Enterprise Linux 6 Reporter: Suqin Huang <shuang>
Component: qemu-kvmAssignee: Amit Shah <amit.shah>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.1CC: mkenneth, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-13 05:32:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suqin Huang 2010-09-27 07:42:44 UTC
Description of problem:


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.113.el6_0.1.x86_64

How reproducible:
1/10

Steps to Reproduce:
1. boot guest and update kernel to the latest one
/usr/libexec/qemu-kvm -drive file=/RHEL-Server-5.5-64-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idfA2Ni5,id=ndev00idfA2Ni5,mac=02:30:1F:F3:e7:93,bus=pci.0,addr=0x3 -netdev tap,id=idfA2Ni5,ifname=virtio_0_8000,script=/scripts/qemu-ifup-switch,downscript=no -m 2048 -smp 2 -cpu cpu64-rhel6,+x2apic -spice port=8000,disable-ticketing -vga qxl -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

2.
3.
  
Actual results:
guest hang

Expected results:


Additional info:
1. top
Tasks:   4 total,   1 running,   3 sleeping,   0 stopped,   0 zombie
Cpu0  : 20.3%us,  0.7%sy,  0.0%ni, 79.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 80.0%us,  0.0%sy,  0.0%ni, 20.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3857316k total,   444920k used,  3412396k free,     5776k buffers
Swap:  6094840k total,        0k used,  6094840k free,    64812k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
23061 root      20   0 2572m 109m 3200 R 99.5  2.9   2:48.15 qemu
23052 root      20   0 2572m 109m 3200 S  0.0  2.9   0:06.01 qemu
23064 root      20   0 2572m 109m 3200 S  0.0  2.9   0:02.89 qemu
23067 root      20   0 2572m 109m 3200 S  0.0  2.9   0:00.44 qemu


2. kvm_stat
kvm statistics

 efer_reload                  0       0
 exits                  2578941    4100
 fpu_reload                2027       0
 halt_exits               72440       0
 halt_wakeup              24358       0
 host_state_reload       713888     124
 hypercalls                   0       0
 insn_emulation         1265154    1002
 insn_emulation_fail          0       0
 invlpg                    1712       0
 io_exits                551966       0
 irq_exits               646089    3098
 irq_injections          271607    1002
 irq_window                   0       0
 largepages                  10       0
 mmio_exits               68486       0
 mmu_cache_miss            1580       0
 mmu_flooded                608       0
 mmu_pde_zapped            2240       0
 mmu_pte_updated              0       0
 mmu_pte_write            52273       0
                                               

3. serial

2010-09-27 15:04:58: usb 1-1: configuration #1 chosen from 1 choice
2010-09-27 15:04:58: input: QEMU 0.12.1 QEMU USB Tablet as /class/input/input2
2010-09-27 15:04:58: input: USB HID v0.01 Pointer [QEMU 0.12.1 QEMU USB Tablet] on usb-0000:00:01.2-1
2010-09-27 15:04:58: device-mapper: uevent: version 1.0.3
2010-09-27 15:04:58: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel
2010-09-27 15:04:58: device-mapper: dm-raid45: initialized v0.2594l
2010-09-27 15:05:19: kjournald starting.  Commit interval 5 seconds
2010-09-27 15:05:19: EXT3-fs: mounted filesystem with ordered data mode.
2010-09-27 15:05:19: SELinux:  Disabled at runtime.
2010-09-27 15:05:19: type=1404 audit(1285571119.235:2): selinux=0 auid=4294967295 ses=4294967295

4. host
processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 107
model name	: AMD Athlon(tm) Dual Core Processor 4450B
stepping	: 2
cpu MHz		: 1000.000

flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv

Comment 2 Suqin Huang 2010-09-27 08:21:46 UTC
5. guest rhel5-64 (update to kernel-2.6.18-223.el5)

Comment 3 Suqin Huang 2010-09-27 09:52:27 UTC
(gdb) bt
#0  0x00007fab7a74350d in read () from /lib64/libpthread.so.0
#1  0x00007fab7a2a946a in ?? () from /usr/lib64/libspice-server.so.0
#2  0x00000000004722bd in qxl_detach (d=0x21197c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:461
#3  0x0000000000472d79 in qxl_reset (d=0x21197c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:544
#4  0x0000000000474627 in qxl_display_resize (ds=0x172a970) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:821
#5  0x000000000049f906 in dpy_resize (ds=0x172a970, width=<value optimized out>, height=<value optimized out>) at console.h:218
#6  qemu_console_resize (ds=0x172a970, width=<value optimized out>, height=<value optimized out>) at console.c:1441
#7  0x000000000044428d in vga_draw_text (opaque=0x2119a58) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/vga.c:1318
#8  vga_update_display (opaque=0x2119a58) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/vga.c:1932
#9  0x000000000049eb38 in vga_hw_screen_dump (filename=<value optimized out>) at console.c:182
#10 0x0000000000417829 in handle_user_command (mon=0x1775570, cmdline=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:3960
#11 0x000000000041787a in monitor_command_cb (mon=0x1775570, cmdline=<value optimized out>, opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4506
#12 0x000000000049e4db in readline_handle_byte (rs=0x218c0e0, ch=<value optimized out>) at readline.c:369
#13 0x00000000004178ec in monitor_read (opaque=<value optimized out>, buf=0x7fff481d5770 "\n", size=1) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4492
#14 0x00000000004b6d8a in qemu_chr_read (opaque=0x16bd140) at qemu-char.c:154
#15 tcp_chr_read (opaque=0x16bd140) at qemu-char.c:2072
#16 0x000000000040b4af in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#17 0x0000000000428c2a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2133
#18 0x000000000040e5cb in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4444
#19 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6601

Comment 4 Dor Laor 2010-10-03 11:01:40 UTC
How do you know it is hdc mount?
Can you please try w/o spice?
Does a rhel5.5 guest hangs as well? (you tried 5.6)

Comment 5 Suqin Huang 2010-10-08 02:01:01 UTC
from the successful case, the next step is hdc mount

Comment 6 Amit Shah 2010-10-08 07:39:13 UTC
(In reply to comment #5)
> from the successful case, the next step is hdc mount

What do you mean by 'hdc mount'?  From your command line, looks like you just have one virtio disk.  From the guest serial log, doesn't look like there's any guest bug.

Comment 7 Suqin Huang 2010-10-08 07:53:42 UTC
1. can reproduce with rhel5.5 guest (kernel 189)

2. 
2010-09-27 11:45:15: SCSI subsystem initialized
2010-09-27 11:45:15: usb 1-1: configuration #1 chosen from 1 choice
2010-09-27 11:45:15: device-mapper: uevent: version 1.0.3
2010-09-27 11:45:15: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel
2010-09-27 11:45:15: input: QEMU 0.12.1 QEMU USB Tablet as /class/input/input2
2010-09-27 11:45:15: input: USB HID v0.01 Pointer [QEMU 0.12.1 QEMU USB Tablet] on usb-0000:00:01.2-1
2010-09-27 11:45:16: device-mapper: dm-raid45: initialized v0.2594l
2010-09-27 11:45:36: kjournald starting.  Commit interval 5 seconds
2010-09-27 11:45:36: EXT3-fs: mounted filesystem with ordered data mode.
2010-09-27 11:45:36: SELinux:  Disabled at runtime.
2010-09-27 11:45:36: type=1404 audit(1285559136.614:2): selinux=0 auid=4294967295 ses=4294967295

========> stop here



2010-09-27 12:05:30: hdc: drive_cmd: status=0x41 { DriveReady Error }
2010-09-27 12:05:30: hdc: drive_cmd: error=0x04 { AbortedCommand }
2010-09-27 12:05:30: ide: failed opcode was: 0xec
2010-09-27 12:05:59: audit(1285560357.771:23): audit_pid=0 old=1968 by auid=4294967295
2010-09-27 12:06:10: Removing netfilter NETLINK layer.
2010-09-27 12:06:16: type=1111 audit(1285560375.002:24): user pid=3777 uid=0 auid=4294967295 msg='changing system time: exe="/sbin/hwclock" (hostname=?, addr=?, terminal=console res=success)'
2010-09-27 12:06:23: md: stopping all md devices.
2010-09-27 12:06:24: Power down.
2010-09-27 12:06:24: acpi_power_off called
2010-09-27 12:06:24: (Process terminated with status 0)

Comment 8 Amit Shah 2010-10-08 08:06:44 UTC
1. What is hdc?  I don't see any block device in your command line except the virtio-block.  Is it a cdrom?

2. When you say 'stop here', does it mean the guest freezes for some time and then wakes up again?

3. Which kernel versions worked fine?  Can you find a single version where this behaviour started to appear?

Comment 9 Suqin Huang 2010-10-11 10:32:24 UTC
1. repeat 50 times can reproduce with spice qxl, can not reproduce with vnc, seems relate to qxl
2. from the gdb info, it's similar with bug 617119

(gdb) info threads 
  4 Thread 0x7f97eab38710 (LWP 2860)  0x000000390ce0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
  3 Thread 0x7f97ea137710 (LWP 2862)  0x000000390ce0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f97e949b710 (LWP 2874)  0x000000390ce0e034 in __lll_lock_wait () from /lib64/libpthread.so.0
* 1 Thread 0x7f97eae3b800 (LWP 2848)  0x000000390ce0e50d in read () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000390ce0e50d in read () from /lib64/libpthread.so.0
#1  0x0000003918a1046a in read (qxl_worker=0x2a24e90) at /usr/include/bits/unistd.h:45
#2  receive_data (qxl_worker=0x2a24e90) at red_worker.h:117
#3  read_message (qxl_worker=0x2a24e90) at red_worker.h:130
#4  qxl_worker_detach (qxl_worker=0x2a24e90) at red_dispatcher.c:233
#5  0x00000000004722bd in qxl_detach (d=0x34137c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:461
#6  0x0000000000472d79 in qxl_reset (d=0x34137c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:544
#7  0x0000000000474627 in qxl_display_resize (ds=0x2a24970) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:821
#8  0x000000000049f906 in dpy_resize (ds=0x2a24970, width=<value optimized out>, height=<value optimized out>) at console.h:218
#9  qemu_console_resize (ds=0x2a24970, width=<value optimized out>, height=<value optimized out>) at console.c:1441
#10 0x000000000044428d in vga_draw_text (opaque=0x3413a58) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/vga.c:1318
#11 vga_update_display (opaque=0x3413a58) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/vga.c:1932
#12 0x000000000049eb38 in vga_hw_screen_dump (filename=<value optimized out>) at console.c:182
#13 0x0000000000417829 in handle_user_command (mon=0x2a65560, cmdline=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:3960
#14 0x000000000041787a in monitor_command_cb (mon=0x2a65560, cmdline=<value optimized out>, opaque=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4506
#15 0x000000000049e4db in readline_handle_byte (rs=0x34860e0, ch=<value optimized out>) at readline.c:369
#16 0x00000000004178ec in monitor_read (opaque=<value optimized out>, buf=0x7fff12b7d980 "\n", size=1) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4492
#17 0x00000000004b6d8a in qemu_chr_read (opaque=0x2987120) at qemu-char.c:154
#18 tcp_chr_read (opaque=0x2987120) at qemu-char.c:2072
#19 0x000000000040b4af in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#20 0x0000000000428c2a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2133
#21 0x000000000040e5cb in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4444
#22 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6601
(gdb) frame 13
#13 0x0000000000417829 in handle_user_command (mon=0x2a65560, cmdline=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:3960
3960	        cmd->mhandler.cmd(mon, qdict);
(gdb) p cmd
$4 = (const mon_cmd_t *) 0x580b98
(gdb) p *cmd
$5 = {name = 0x57e093 "screendump", args_type = 0x57e09e "filename:F", params = 0x57d8c7 "filename", help = 0x57f668 "save screen into PPM image 'filename'", user_print = 0, 
  mhandler = {info = 0x411f60 <do_screen_dump>, info_new = 0x411f60 <do_screen_dump>, info_async = 0x411f60 <do_screen_dump>, cmd = 0x411f60 <do_screen_dump>, 
    cmd_new = 0x411f60 <do_screen_dump>, cmd_async = 0x411f60 <do_screen_dump>}, async = 0}
(gdb)

Comment 10 Amit Shah 2010-10-11 10:37:09 UTC
Can you try qemu-kvm-0.12.1.2-2.114.el6 which fixes the bug you mentioned?

Also, please give answers to questions in comment 8 if the build above does not fix the issue.

Comment 11 Suqin Huang 2010-10-12 10:05:58 UTC
repeat 60 times, can not reproduce in qemu-kvm-0.12.1.2-2.114.el6

Comment 12 Suqin Huang 2010-10-13 02:36:26 UTC
(In reply to comment #8)
> 1. What is hdc?  I don't see any block device in your command line except the
> virtio-block.  Is it a cdrom?

2010-10-12 15:49:45:     ide0: BM-DMA at 0xc000-0xc007, BIOS settings: hda:pio, hdb:pio
2010-10-12 15:49:45:     ide1: BM-DMA at 0xc008-0xc00f, BIOS settings: hdc:pio, hdd:pio
2010-10-12 15:49:46: hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
2010-10-12 15:49:47: ide1 at 0x170-0x177,0x376 on irq 15
2010-10-12 15:49:47: ide-floppy driver 0.99.newide
2010-10-12 15:49:47: usbcore: registered new driver hiddev
2010-10-12 15:49:47: usbcore: registered new driver usbhid

Comment 13 Amit Shah 2010-10-13 05:32:44 UTC
OK, thanks.  Marking as a dup.

*** This bug has been marked as a duplicate of bug 617119 ***