Bug 965396 - turn down tap link off under hmp, rhel6.4 guest boot stalled at "starting certmonger" after “system-reset”
turn down tap link off under hmp, rhel6.4 guest boot stalled at "starting ce...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.5
Unspecified Unspecified
unspecified Severity medium
: rc
: ---
Assigned To: Vlad Yasevich
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-21 03:28 EDT by Qian Guo
Modified: 2014-10-14 02:49 EDT (History)
10 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.425.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-10-14 02:49:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
tty1 can not login (73.86 KB, image/jpeg)
2013-05-21 03:28 EDT, Qian Guo
no flags Details

  None (edit)
Description Qian Guo 2013-05-21 03:28:04 EDT
Created attachment 750873 [details]
tty1 can not login

Description of problem:
Boot a rhel6.4(w/ GUI) guset w/ vhost=on, diable the tap link via hmp, and then do "system-reset", this guest boot stalled at "starting certmonger"(tty1). But can access system via other console terminal, like tty2 ... ttyS0. 

Version-Release number of selected component (if applicable):
host kernel:# uname -r
2.6.32-380.el6.x86_64

# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.369.el6.x86_64


Guest kernel:
# uname -r
2.6.32-358.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot RHEL6.4 guest w/ vhost=on
# /usr/libexec/qemu-kvm -cpu Penryn -m 2G -smp 2,sockets=1,cores=2,threads=1 -M pc -enable-kvm -name rhel6u4 -nodefaults -nodefconfig -vga std -monitor stdio -drive file=/home/qiguo1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :20 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-netpci0,mac=54:52:1b:36:1a:02

2.after boot up, under hmp, disable tap link
(qemu) set_link hostnet0 off
3.Reset guest under hmp
(qemu) system_reset

Actual results:
Guest boot stalled at "starting certmonger".

Expected results:
Guest can boot up successfully on tty1

Additional info:
1.sendkey ctrl-alt-f2, can access the system via tty2, and from serial console can login system too.
2.Just happened w/ vhost=on.
Comment 2 Qian Guo 2013-05-21 03:39:21 EDT
Test several times, guest call trace, log list below:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff811fba6d>] sysfs_follow_link+0x6d/0x1d0
PGD 37b1a067 PUD 37d6c067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/block/vda/uevent
CPU 0 
Modules linked in: uinput microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 1103, comm: lvm Not tainted 2.6.32-358.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff811fba6d>]  [<ffffffff811fba6d>] sysfs_follow_link+0x6d/0x1d0
RSP: 0018:ffff8800377ebde8  EFLAGS: 00010286
RAX: ffff8800377ea000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000001b RDI: ffffffff81adc000
RBP: ffff8800377ebe38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff880037361000
R13: 0000000000000000 R14: ffff8800377ebe48 R15: ffff880037361000
FS:  00007f92beb517a0(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000037373000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lvm (pid: 1103, threadinfo ffff8800377ea000, task ffff88007a475500)
Stack:
 00000000ffffff9c 0000000000000000 ffff8800377ebe18 ffffffff8118f141
<d> ffff880037988000 ffff88007b1d5480 ffff8800377ebf28 ffff8800377ebe48
<d> 00007ffff89f21b0 0000000000000400 ffff8800377ebf18 ffffffff8118e3af
Call Trace:
 [<ffffffff8118f141>] ? path_put+0x31/0x40
 [<ffffffff8118e3af>] generic_readlink+0x4f/0xc0
 [<ffffffff8122556b>] ? dentry_has_perm+0x5b/0x80
 [<ffffffff8119c441>] ? touch_atime+0x71/0x1a0
 [<ffffffff81186680>] sys_readlinkat+0xb0/0xc0
 [<ffffffff811866ab>] sys_readlink+0x1b/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 83 98 00 00 00 48 c7 c7 00 c0 ad 81 4d 89 fc 48 8b 58 08 4c 8b 68 50 e8 43 31 31 00 <48> 83 7b 08 00 48 89 d9 74 4b 66 0f 1f 84 00 00 00 00 00 49 8b 
RIP  [<ffffffff811fba6d>] sysfs_follow_link+0x6d/0x1d0
 RSP <ffff8800377ebde8>
CR2: 0000000000000008
---[ end trace f2df9dae3fd9c69a ]---
Kernel panic - not syncing: Fatal exception
Pid: 1103, comm: lvm Tainted: G      D    ---------------    2.6.32-358.el6.x86_64 #1
Call Trace:
 [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
 [<ffffffff815111f4>] ? oops_end+0xe4/0x100
 [<ffffffff81046bfb>] ? no_context+0xfb/0x260
 [<ffffffff81046e85>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff81046fae>] ? bad_area+0x4e/0x60
 [<ffffffff81047760>] ? __do_page_fault+0x3d0/0x480
 [<ffffffff8112bae3>] ? __alloc_pages_nodemask+0x113/0x8d0
 [<ffffffff811902ff>] ? do_lookup+0x9f/0x230
 [<ffffffff8151311e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff815104d5>] ? page_fault+0x25/0x30
 [<ffffffff811fba6d>] ? sysfs_follow_link+0x6d/0x1d0
 [<ffffffff811fba6d>] ? sysfs_follow_link+0x6d/0x1d0
 [<ffffffff8118f141>] ? path_put+0x31/0x40
 [<ffffffff8118e3af>] ? generic_readlink+0x4f/0xc0
 [<ffffffff8122556b>] ? dentry_has_perm+0x5b/0x80
 [<ffffffff8119c441>] ? touch_atime+0x71/0x1a0
 [<ffffffff81186680>] ? sys_readlinkat+0xb0/0xc0
 [<ffffffff811866ab>] ? sys_readlink+0x1b/0x20
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Comment 3 Vlad Yasevich 2013-08-14 12:47:16 EDT
I can't reproduce the crash or the hang, but the VM takes much longer to come up since it is waiting for dhcp to return the address on a link whose backend is down.

Currently qemu simply doesn't change the state of virtio link when the tap is
brought down.  It simply notifies it, but the notification doesn't do anything.
The comment says that this is on purpose because some disconnected devices can still communicate.  However, this is not the case in case vhost is used.

It might make sense for virtio to disable the link in the case vhost is enabled.
Currently testing this implementation.
Comment 5 Miroslav Rezanina 2014-04-29 02:01:56 EDT
Fix included in qemu-kvm-0.12.1.2-2.425.el6
Comment 7 Qian Guo 2014-06-23 03:14:44 EDT
Reproduced this bug with qemu-kvm-0.12.1.2-2.398.el6

Steps:
1.Boot rhel6.4GA guest:
#  /usr/libexec/qemu-kvm -cpu Penryn -m 2G -smp 2,sockets=1,cores=2,threads=1 -M pc -enable-kvm -name rhel6u4 -nodefaults -nodefconfig -vga std -monitor stdio -drive file=/home/rhel6u4/qiguo1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :20 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-netpci0,mac=54:52:1b:36:1a:02

2.After guest bootup, turn down the link of tap under hmp
(qemu) set_link hostnet0 off
3.Reboot guest


Result:During booting up, guest spend long time on waiting for dhcp to return the address on a link whose backend is down.

So this bug is reproduced.

Verify this bug with qemu-kvm-0.12.1.2-2.428.el6.x86_64

Steps as above

Result:when guest is booting, when failed to get the address, it quickly boot continuesly, but not waiting anymore.

So this bug is fixed and verified by qemu-kvm-0.12.1.2-2.428.el6.x86_64
Comment 9 errata-xmlrpc 2014-10-14 02:49:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1490.html

Note You need to log in before you can comment on or make changes to this bug.