Bug 980072

Summary: quit src qemu-kvm (ovs-ifdown script) cause src host kernel panic after migrate
Product: Red Hat Enterprise Linux 7 Reporter: zhonglinzhang <zhzhang>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: acathrow, chayang, eparis, hhuang, jasowang, juzhang, lersek, michen, mst, qiguo, qzhang, virt-maint, xfu, xwei, zhzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-12 13:20:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhonglinzhang 2013-07-01 11:02:37 UTC
Description of problem:
  Boot qemu-kvm with ovs-ifdown script, and quit src qemu-kvm after migration. this will cause src host kernel panic.  
  Boot qemu-kvm without ovs-ifdown script, it won't cause src host kernel panic. so root reason is that ovs-ifdown script cause src host kernel panic. 

Version-Release number of selected component (if applicable):
# uname -r
host and guest kernel version:
3.10.0-0.rc6.62.el7.x86_64
qemu-kvm-1.5.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:

1. boot src guest:
 /usr/libexec/qemu-kvm -M pc-i440fx-1.5 -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=2 -name network-test -uuid 389d06a7-ed31-4fae-baf4-87bcb9b5596e -rtc base=utc,clock=host,driftfix=slew -k en-us -boot menu=on -vnc :1 -spice disable-ticketing,port=5931 -vga cirrus -monitor stdio  -qmp tcp:0:4444,server,nowait         -device virtio-scsi-pci,bus=pci.0,id=scsi0,addr=0x5 -drive file=/mnt/rhel7cp5.qcow3,if=none,id=drive-system-disk,format=qcow2,snapshot=off,aio=native,werror=stop,rerror=stop -device scsi-hd,bus=scsi0.0,drive=drive-system-disk,id=system-disk,bootindex=1           -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:13:10:20 -netdev tap,id=hostnet0,vhost=on,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown

2. boot dst guest:
/usr/libexec/qemu-kvm -M pc-i440fx-1.5 -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=2 -name network-test -uuid 389d06a7-ed31-4fae-baf4-87bcb9b5596e -rtc base=utc,clock=host,driftfix=slew -k en-us -boot menu=on -vnc :1 -spice disable-ticketing,port=5931 -vga cirrus -monitor stdio  -qmp tcp:0:4444,server,nowait         -device virtio-scsi-pci,bus=pci.0,id=scsi0,addr=0x5 -drive file=/mnt/rhel7cp5.qcow3,if=none,id=drive-system-disk,format=qcow2,snapshot=off,aio=native,werror=stop,rerror=stop -device scsi-hd,bus=scsi0.0,drive=drive-system-disk,id=system-disk,bootindex=1           -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:13:10:20 -netdev tap,id=hostnet0,vhost=on,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -incoming tcp:0:5888
 
3. migrate from src to host
   after finish to migrate, quit src qemu-kvm.

Actual results:
src kernel panic after quit src qemu-kvm
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

crash: cannot determine thread return address
      KERNEL: /usr/lib/debug/lib/modules/3.10.0-0.rc6.62.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013.07.01-10:29:51/vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Mon Jul  1 18:29:49 2013
      UPTIME: 00:12:31
LOAD AVERAGE: 0.48, 0.31, 0.21
       TASKS: 236
    NODENAME: localhost.localdomain
     RELEASE: 3.10.0-0.rc6.62.el7.x86_64
     VERSION: #1 SMP Sun Jun 16 16:37:24 EDT 2013
     MACHINE: x86_64  (3392 Mhz)
      MEMORY: 7.9 GB
       PANIC: "Oops: 0003 [#1] SMP " (check log for details)
         PID: 2275
     COMMAND: "qemu-kvm"
        TASK: ffff880187f78000  [THREAD_INFO: ffff88020b2e8000]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 2275   TASK: ffff880187f78000  CPU: 2   COMMAND: "qemu-kvm"
 #0 [ffff88020b2e99b8] machine_kexec at ffffffff8103ce72
 #1 [ffff88020b2e9a08] crash_kexec at ffffffff810c9903
 #2 [ffff88020b2e9ad0] oops_end at ffffffff816055c0
 #3 [ffff88020b2e9af8] no_context at ffffffff815f7d1c
 #4 [ffff88020b2e9b40] __bad_area_nosemaphore at ffffffff815f7d9c
 #5 [ffff88020b2e9b88] bad_area_nosemaphore at ffffffff815f7f08
 #6 [ffff88020b2e9b98] __do_page_fault at ffffffff8160818e
 #7 [ffff88020b2e9c90] do_page_fault at ffffffff8160838e
 #8 [ffff88020b2e9ca0] page_fault at ffffffff81604a18
    [exception RIP: anon_vma_chain_link+18]
    RIP: ffffffff81165122  RSP: ffff88020b2e9d58  RFLAGS: 00010246
    RAX: ffff880158652988  RBX: 00007ff6d4000000  RCX: ffff88020b2e9fd8
    RDX: ffff880158652980  RSI: 00007ff6d4000000  RDI: ffff8801cab160b8
    RBP: ffff88020b2e9d68   R8: 0000000000017360   R9: ffffffff81166b19
    R10: ffff88021edd0d80  R11: 000000000000000e  R12: ffff880158652980
    R13: ffff880158652980  R14: ffff880158652980  R15: 00007ff6d4000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88020b2e9d70] anon_vma_clone at ffffffff81166b52
#10 [ffff88020b2e9db8] anon_vma_fork at ffffffff8116723e
#11 [ffff88020b2e9df0] dup_mm at ffffffff8105ce06
#12 [ffff88020b2e9e60] copy_process at ffffffff8105dc0c
#13 [ffff88020b2e9ed8] do_fork at ffffffff8105e71d
#14 [ffff88020b2e9f38] sys_clone at ffffffff8105ea36
#15 [ffff88020b2e9f48] stub_clone at ffffffff8160cd39
    RIP: 00007ff80e7396cc  RSP: 00007fffc65a8970  RFLAGS: 00000246
    RAX: 0000000000000038  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000001200011
    RBP: 00007fffc65a89f0   R8: 00000000000008e3   R9: 0000000000000000
    R10: 00007ff812d4ecd0  R11: 0000000000000246  R12: 00007fffc65a8970
    R13: 00007fffc65a8990  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b


Expected results:
src host should be work well

Additional info:

Comment 2 Eric Paris 2013-07-03 14:55:35 UTC
possibly a dup of 976789

Comment 3 Eric Paris 2013-07-03 15:06:44 UTC
*** Bug 980418 has been marked as a duplicate of this bug. ***

Comment 4 Eric Paris 2013-07-03 15:07:20 UTC
*** Bug 977227 has been marked as a duplicate of this bug. ***

Comment 5 jason wang 2013-07-04 03:09:31 UTC
Looks similar to what was reported upstream:

http://lkml.org/lkml/2013/7/2/499

Just to check, does disable zerocopy help?

Comment 6 Hai Huang 2013-07-09 19:15:45 UTC
Please see Jason's comments (https://bugzilla.redhat.com/show_bug.cgi?id=980072#c5) above.

Also, since this BZ reported a host panic on the src, would you please
provide a kernel crash dump.  Just the backtrace from the crash utility
is insufficient to diagnose the problem.  Thank you.

Comment 8 Hai Huang 2013-07-12 13:20:38 UTC

*** This bug has been marked as a duplicate of bug 953706 ***

Comment 9 Laszlo Ersek 2013-07-12 14:23:09 UTC
(In reply to Hai Huang from comment #8)
> 
> *** This bug has been marked as a duplicate of bug 953706 ***

I agree. To confirm, any one of the following steps could be tried:

- Changing all VM NICs from virtio-net-pci to something else should make the crash go away.

- Keeping virtio-net-pci, but changing their backend implementation from vhost to userspace (ie. qemu) should also make the crash go away. See bug 643050 comment 2 for how to do this.

- On both src and dst host, install the kernel linked in bug 953706 comment 15 and retry. No crash should occur.