Bug 980418

Summary: After first migration, boot src qemu-kvm again, src host hit kernel panic(only vhost=on hit this bug)
Product: Red Hat Enterprise Linux 7 Reporter: Qian Guo <qiguo>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: acathrow, chayang, eparis, juzhang, michen, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-03 15:06:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vmcore-dmesg file
none
split the vmcore to 4 files, this is the 2nd one: vmcore_part_aa
none
split the vmcore to 4 files, this is the 2nd one: vmcore_part_ab
none
split the vmcore to 4 files, this is the 3nd one: vmcore_part_ac
none
split the vmcore to 4 files, this is the last one: vmcore_part_ad none

Description Qian Guo 2013-07-02 10:35:38 UTC
Created attachment 767679 [details]
vmcore-dmesg file

Description of problem:
Migrate guest from src host to dst, it finished, if I quit the src qemu-kvm and re-launch it (repeatedly this some times (I hit this bug w/ 3 times)), src host kernel panic

Version-Release number of selected component (if applicable):
host kernel 
# uname -r
3.10.0-0.rc6.62.el7.x86_64
# rpm -q qemu-kvm
qemu-kvm-1.5.0-2.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Launch qemu-kvm in one host and listening mode in another, the src qemu-kvm like this:
/usr/libexec/qemu-kvm -cpu Penryn -enable-kvm -m 2048 -smp 4,sockets=1,cores=4,threads=1 -name rhel6u3c2 -drive file=/mnt/rhel7/rhel7.qcow2,if=none,id=drive-scsi0-disk0,format=qcow2,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0,addr=0x4 -device scsi-hd,scsi-id=0,lun=0,bus=scsi0.0,drive=drive-scsi0-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet0,mac=54:52:1b:35:3c:18,id=test -device virtio-balloon-pci,id=balloon0 -vnc :10 -vga std -boot menu=on -monitor stdio

2.migrate guest from src host to dst

3.After migration, quit the qemu-kvm, then launch the same qemu-kvm command line like step1 in src host.

4.Remigration then repeat step3

Actual results:
Src host got kernel panic

# cat /var/crash/~~/vmcore-dmesg.txt
...
[    4.262769] BUG: unable to handle kernel paging request at 000000305d424000
[    4.271324] IP: [<ffffffff81165122>] anon_vma_chain_link+0x12/0x40
[    4.279095] PGD 210bd9067 PUD 156d26067 PMD 156c3a067 PTE 800000020d5e6025
[    4.287650] Oops: 0003 [#1] SMP 
[    4.292493] Modules linked in: vhost_net macvtap macvlan tun rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd sunrpc dns_resolver fscache ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bnep bluetooth openvswitch bridge stp llc sg mperf coretemp kvm_intel kvm crc32_pclmul crc32c_intel iTCO_wdt iTCO_vendor_support ghash_clmulni_intel lpc_ich snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep hp_wmi sparse_keymap rfkill snd_seq snd_seq_device e1000e mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore i2c_i801 microcode serio_raw pcspkr ptp pps_core wmi tpm_infineon uinput xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif
[    0.076435]  i915 i2c_algo_bit ahci drm_kms_helper libahci drm libata i2c_core video dm_mirror dm_region_hash dm_log dm_mod
[    0.088552] CPU: 0 PID: 1971 Comm: bash Not tainted 3.10.0-0.rc6.62.el7.x86_64 #1
[    0.098013] Hardware name: Hewlett-Packard HP Compaq Elite 8300 MT/3397, BIOS K01 v02.05 05/07/2012
[    0.109079] task: ffff880209509640 ti: ffff880156c00000 task.ti: ffff880156c00000
[    0.118583] RIP: 0010:[<ffffffff81165122>]  [<ffffffff81165122>] anon_vma_chain_link+0x12/0x40
[    0.129281] RSP: 0018:ffff880156c01d58  EFLAGS: 00010246
[    0.136631] RAX: ffff8801fe5f70c8 RBX: 000000305d424000 RCX: ffff880156c01fd8
[    0.145844] RDX: ffff8801fe5f70c0 RSI: 000000305d424000 RDI: ffff880156c2d508
[    0.155051] RBP: ffff880156c01d68 R08: 0000000000017360 R09: ffffffff81166b19
[    0.164263] R10: 0000000000000021 R11: ffff880156f6afa8 R12: ffff8801fe5f70c0
[    0.173510] R13: ffff8801fe5f70c0 R14: ffff8801fe5f70c0 R15: 000000305d424000
[    0.182752] FS:  00007f1d4e71e740(0000) GS:ffff88021ea00000(0000) knlGS:0000000000000000
[    0.192990] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.200882] CR2: 000000305d424000 CR3: 0000000204eef000 CR4: 00000000001407e0
[    0.210203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.219538] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.228873] Stack:
[    0.233044]  ffff8801fe5f70c0 ffff880204c584c0 ffff880156c01db0 ffffffff81166b52
[    0.242738]  ffff8801fe317f90 ffff880156c2d508 ffff8801fe317f18 0000000000000000
[    0.252440]  ffff8801fe317f18 ffff880156c2d508 ffff880156c2d508 ffff880156c01de8
[    0.262117] Call Trace:
[    0.266707]  [<ffffffff81166b52>] anon_vma_clone+0x82/0x140
[    0.274450]  [<ffffffff8116723e>] anon_vma_fork+0x2e/0x100
[    0.282126]  [<ffffffff8105ce06>] dup_mm+0x276/0x670
[    0.289258]  [<ffffffff8105dc0c>] copy_process.part.25+0x9dc/0x13f0
[    0.297694]  [<ffffffff8105e71d>] do_fork+0xad/0x340
[    0.304845]  [<ffffffff811b6350>] ? get_unused_fd_flags+0x30/0x40
[    0.313129]  [<ffffffff8105ea36>] SyS_clone+0x16/0x20
[    0.320367]  [<ffffffff8160cd39>] stub_clone+0x69/0x90
[    0.327678]  [<ffffffff8160c9d9>] ? system_call_fastpath+0x16/0x1b
...

Debugging the vmcore file, 
crash> bt
PID: 1971   TASK: ffff880209509640  CPU: 0   COMMAND: "bash"
 #0 [ffff880156c019b8] machine_kexec at ffffffff8103ce72
 #1 [ffff880156c01a08] crash_kexec at ffffffff810c9903
 #2 [ffff880156c01ad0] oops_end at ffffffff816055c0
 #3 [ffff880156c01af8] no_context at ffffffff815f7d1c
 #4 [ffff880156c01b40] __bad_area_nosemaphore at ffffffff815f7d9c
 #5 [ffff880156c01b88] bad_area_nosemaphore at ffffffff815f7f08
 #6 [ffff880156c01b98] __do_page_fault at ffffffff8160818e
 #7 [ffff880156c01c90] do_page_fault at ffffffff8160838e
 #8 [ffff880156c01ca0] page_fault at ffffffff81604a18
    [exception RIP: anon_vma_chain_link+18]
    RIP: ffffffff81165122  RSP: ffff880156c01d58  RFLAGS: 00010246
    RAX: ffff8801fe5f70c8  RBX: 000000305d424000  RCX: ffff880156c01fd8
    RDX: ffff8801fe5f70c0  RSI: 000000305d424000  RDI: ffff880156c2d508
    RBP: ffff880156c01d68   R8: 0000000000017360   R9: ffffffff81166b19
    R10: 0000000000000021  R11: ffff880156f6afa8  R12: ffff8801fe5f70c0
    R13: ffff8801fe5f70c0  R14: ffff8801fe5f70c0  R15: 000000305d424000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880156c01d70] anon_vma_clone at ffffffff81166b52
#10 [ffff880156c01db8] anon_vma_fork at ffffffff8116723e
#11 [ffff880156c01df0] dup_mm at ffffffff8105ce06
#12 [ffff880156c01e60] copy_process at ffffffff8105dc0c
#13 [ffff880156c01ed8] do_fork at ffffffff8105e71d
#14 [ffff880156c01f38] sys_clone at ffffffff8105ea36
#15 [ffff880156c01f48] stub_clone at ffffffff8160cd39
    RIP: 0000003f7f0bc6cc  RSP: 00007fff30d0e7e0  RFLAGS: 00000246
    RAX: 0000000000000038  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000001200011
    RBP: 00007fff30d0e820   R8: 0000000000000000   R9: 0000000000000000
    R10: 00007f1d4e71ea10  R11: 0000000000000246  R12: 00007fff30d0e7e0
    R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b

Expected results:
Host should work well

Additional info:

Comment 2 Qian Guo 2013-07-02 11:00:04 UTC
Created attachment 767682 [details]
split the vmcore to 4 files, this is the 2nd one: vmcore_part_aa

Comment 3 Qian Guo 2013-07-02 11:03:29 UTC
Created attachment 767685 [details]
split the vmcore to 4 files, this is the 2nd one: vmcore_part_ab

Comment 4 Qian Guo 2013-07-02 11:06:16 UTC
Created attachment 767686 [details]
split the vmcore to 4 files, this is the 3nd one: vmcore_part_ac

Comment 5 Qian Guo 2013-07-02 11:08:43 UTC
Created attachment 767687 [details]
split the vmcore to 4 files, this is the last one: vmcore_part_ad

Comment 6 Qian Guo 2013-07-02 11:10:48 UTC
After some test, found that this bug is related w/ vhost, if launch guest w/o vhost=on, won't hit this, so change the title

Comment 7 Eric Paris 2013-07-03 14:55:28 UTC
possibly a dup of 976789

Comment 8 Eric Paris 2013-07-03 15:03:54 UTC
since this is a RHEL7 bug, I'm going to mark it as a dup of 980072.  But it will likely be the same root cause as 976789.

Comment 9 Eric Paris 2013-07-03 15:06:44 UTC

*** This bug has been marked as a duplicate of bug 980072 ***

Comment 10 Qian Guo 2013-07-11 10:24:44 UTC
test this w/ macvtap, when src host boot qemu w/ vhost=on, then migrate, src host will get kernel panic and same vmcore of this bug.