Bug 965396
Summary: | turn down tap link off under hmp, rhel6.4 guest boot stalled at "starting certmonger" after “system-reset” | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Qian Guo <qiguo> | ||||
Component: | qemu-kvm | Assignee: | Vlad Yasevich <vyasevic> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.5 | CC: | bsarathy, chayang, jasowang, juzhang, michen, mkenneth, qzhang, rbalakri, rhod, virt-maint | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.425.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-10-14 06:49:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Test several times, guest call trace, log list below: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff811fba6d>] sysfs_follow_link+0x6d/0x1d0 PGD 37b1a067 PUD 37d6c067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/block/vda/uevent CPU 0 Modules linked in: uinput microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1103, comm: lvm Not tainted 2.6.32-358.el6.x86_64 #1 Red Hat KVM RIP: 0010:[<ffffffff811fba6d>] [<ffffffff811fba6d>] sysfs_follow_link+0x6d/0x1d0 RSP: 0018:ffff8800377ebde8 EFLAGS: 00010286 RAX: ffff8800377ea000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000001b RDI: ffffffff81adc000 RBP: ffff8800377ebe38 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: ffff880037361000 R13: 0000000000000000 R14: ffff8800377ebe48 R15: ffff880037361000 FS: 00007f92beb517a0(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000037373000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process lvm (pid: 1103, threadinfo ffff8800377ea000, task ffff88007a475500) Stack: 00000000ffffff9c 0000000000000000 ffff8800377ebe18 ffffffff8118f141 <d> ffff880037988000 ffff88007b1d5480 ffff8800377ebf28 ffff8800377ebe48 <d> 00007ffff89f21b0 0000000000000400 ffff8800377ebf18 ffffffff8118e3af Call Trace: [<ffffffff8118f141>] ? path_put+0x31/0x40 [<ffffffff8118e3af>] generic_readlink+0x4f/0xc0 [<ffffffff8122556b>] ? dentry_has_perm+0x5b/0x80 [<ffffffff8119c441>] ? touch_atime+0x71/0x1a0 [<ffffffff81186680>] sys_readlinkat+0xb0/0xc0 [<ffffffff811866ab>] sys_readlink+0x1b/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Code: 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 83 98 00 00 00 48 c7 c7 00 c0 ad 81 4d 89 fc 48 8b 58 08 4c 8b 68 50 e8 43 31 31 00 <48> 83 7b 08 00 48 89 d9 74 4b 66 0f 1f 84 00 00 00 00 00 49 8b RIP [<ffffffff811fba6d>] sysfs_follow_link+0x6d/0x1d0 RSP <ffff8800377ebde8> CR2: 0000000000000008 ---[ end trace f2df9dae3fd9c69a ]--- Kernel panic - not syncing: Fatal exception Pid: 1103, comm: lvm Tainted: G D --------------- 2.6.32-358.el6.x86_64 #1 Call Trace: [<ffffffff8150cfc8>] ? panic+0xa7/0x16f [<ffffffff815111f4>] ? oops_end+0xe4/0x100 [<ffffffff81046bfb>] ? no_context+0xfb/0x260 [<ffffffff81046e85>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff81046fae>] ? bad_area+0x4e/0x60 [<ffffffff81047760>] ? __do_page_fault+0x3d0/0x480 [<ffffffff8112bae3>] ? __alloc_pages_nodemask+0x113/0x8d0 [<ffffffff811902ff>] ? do_lookup+0x9f/0x230 [<ffffffff8151311e>] ? do_page_fault+0x3e/0xa0 [<ffffffff815104d5>] ? page_fault+0x25/0x30 [<ffffffff811fba6d>] ? sysfs_follow_link+0x6d/0x1d0 [<ffffffff811fba6d>] ? sysfs_follow_link+0x6d/0x1d0 [<ffffffff8118f141>] ? path_put+0x31/0x40 [<ffffffff8118e3af>] ? generic_readlink+0x4f/0xc0 [<ffffffff8122556b>] ? dentry_has_perm+0x5b/0x80 [<ffffffff8119c441>] ? touch_atime+0x71/0x1a0 [<ffffffff81186680>] ? sys_readlinkat+0xb0/0xc0 [<ffffffff811866ab>] ? sys_readlink+0x1b/0x20 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b I can't reproduce the crash or the hang, but the VM takes much longer to come up since it is waiting for dhcp to return the address on a link whose backend is down. Currently qemu simply doesn't change the state of virtio link when the tap is brought down. It simply notifies it, but the notification doesn't do anything. The comment says that this is on purpose because some disconnected devices can still communicate. However, this is not the case in case vhost is used. It might make sense for virtio to disable the link in the case vhost is enabled. Currently testing this implementation. Fix included in qemu-kvm-0.12.1.2-2.425.el6 Reproduced this bug with qemu-kvm-0.12.1.2-2.398.el6 Steps: 1.Boot rhel6.4GA guest: # /usr/libexec/qemu-kvm -cpu Penryn -m 2G -smp 2,sockets=1,cores=2,threads=1 -M pc -enable-kvm -name rhel6u4 -nodefaults -nodefconfig -vga std -monitor stdio -drive file=/home/rhel6u4/qiguo1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :20 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-netpci0,mac=54:52:1b:36:1a:02 2.After guest bootup, turn down the link of tap under hmp (qemu) set_link hostnet0 off 3.Reboot guest Result:During booting up, guest spend long time on waiting for dhcp to return the address on a link whose backend is down. So this bug is reproduced. Verify this bug with qemu-kvm-0.12.1.2-2.428.el6.x86_64 Steps as above Result:when guest is booting, when failed to get the address, it quickly boot continuesly, but not waiting anymore. So this bug is fixed and verified by qemu-kvm-0.12.1.2-2.428.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1490.html |
Created attachment 750873 [details] tty1 can not login Description of problem: Boot a rhel6.4(w/ GUI) guset w/ vhost=on, diable the tap link via hmp, and then do "system-reset", this guest boot stalled at "starting certmonger"(tty1). But can access system via other console terminal, like tty2 ... ttyS0. Version-Release number of selected component (if applicable): host kernel:# uname -r 2.6.32-380.el6.x86_64 # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.369.el6.x86_64 Guest kernel: # uname -r 2.6.32-358.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.Boot RHEL6.4 guest w/ vhost=on # /usr/libexec/qemu-kvm -cpu Penryn -m 2G -smp 2,sockets=1,cores=2,threads=1 -M pc -enable-kvm -name rhel6u4 -nodefaults -nodefconfig -vga std -monitor stdio -drive file=/home/qiguo1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :20 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-netpci0,mac=54:52:1b:36:1a:02 2.after boot up, under hmp, disable tap link (qemu) set_link hostnet0 off 3.Reset guest under hmp (qemu) system_reset Actual results: Guest boot stalled at "starting certmonger". Expected results: Guest can boot up successfully on tty1 Additional info: 1.sendkey ctrl-alt-f2, can access the system via tty2, and from serial console can login system too. 2.Just happened w/ vhost=on.