Bug 1369641

Summary: Boot guest with 'kernel-irqchip=split', 'intremap=true' and e1000, guest fails to get ip and call trace occurs
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Peter Xu <peterx>
Status: CLOSED ERRATA QA Contact: Pei Zhang <pezhang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, hhuang, ilmostro7, imammedo, juzhang, knoel, michen, mrezanin, peterx, virt-maint, xfu, xiywang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.8.0-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:34:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pei Zhang 2016-08-24 05:10:40 UTC
Description of problem:
Boot guest with 'kernel-irqchip=split' and 'intremap=true', guest fails to get ip with 'Reset adapter' info and call trace occurs.

Version-Release number of selected component (if applicable):
Host:
3.10.0-495.rt56.397.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64

Guest:
qemu-kvm-rhev-2.6.0-22.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot guest with 'kernel-irqchip=split' and 'intremap=true'
# /usr/libexec/qemu-kvm -name rhel7.3 -M q35,kernel-irqchip=split \
-device intel-iommu,intremap=true \
-cpu IvyBridge -m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-netdev tap,id=hostnet0 \
-device e1000,netdev=hostnet0,id=net0,mac=12:54:00:5c:88:61 \
-spice port=5901,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-monitor stdio \
-device ahci,id=ahci0 \
-drive file=/home/pezhang/rhel7.3.qcow2,format=qcow2,if=none,id=drive-system-disk,werror=stop,rerror=stop \
-device ide-drive,bus=ahci0.0,drive=drive-system-disk,id=system-disk,bootindex=1 \
-serial unix:/tmp/socket,server,nowait \

2. Try to get ip in guest, fails with 'Reset adapter' and call trace occurs.
# dhclient
[  106.983848] e1000 0000:00:02.0 enp0s2: Reset adapter

# ifconfig
enp0s2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 12:54:00:5c:88:61  txqueuelen 1000  (Ethernet)
        RX packets 547  bytes 41190 (40.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg
[   97.747019] IPv6: ADDRCONF(NETDEV_UP): virbr0-nic: link is not ready
[  106.983736] ------------[ cut here ]------------
[  106.983743] WARNING: at net/sched/sch_generic.c:297 dev_watchdog+0x28e/0x2a0()
[  106.983744] NETDEV WATCHDOG: enp0s2 (e1000): transmit queue 0 timed out
[    4.838035] intel_powerclamp: No package C-state available
[  106.983747] Modules linked in:
[  106.983748]  xt_CHECKSUM
[  106.983749]  ipt_MASQUERADE
[  106.983749]  nf_nat_masquerade_ipv4
[  106.983750]  tun
[  106.983750]  ipt_REJECT
[  106.983751]  nf_reject_ipv4
[  106.983751]  ip6t_rpfilter
[  106.983752]  ip6t_REJECT
[  106.983752]  nf_reject_ipv6
[  106.983753]  xt_conntrack
[  106.983753]  ip_set
[  106.983754]  nfnetlink
[  106.983755]  ebtable_nat
[  106.983755]  ebtable_broute
[  106.983755]  bridge
[  106.983756]  stp
[  106.983756]  llc
[  106.983757]  ip6table_nat
[  106.983757]  nf_conntrack_ipv6
[  106.983758]  nf_defrag_ipv6
[  106.983758]  nf_nat_ipv6
[  106.983758]  ip6table_mangle
[  106.983759]  ip6table_security
[  106.983759]  ip6table_raw
[  106.983760]  iptable_nat
[  106.983760]  nf_conntrack_ipv4
[  106.983761]  nf_defrag_ipv4
[  106.983761]  nf_nat_ipv4
[  106.983761]  nf_nat
[  106.983762]  nf_conntrack
[  106.983762]  iptable_mangle
[  106.983763]  iptable_security
[  106.983763]  iptable_raw
[  106.983764]  ebtable_filter
[  106.983764]  ebtables
[  106.983765]  ip6table_filter
[  106.983765]  ip6_tables
[  106.983766]  iptable_filter
[  106.983766]  iosf_mbi
[  106.983767]  crc32_pclmul
[  106.983767]  ghash_clmulni_intel
[  106.983768]  iTCO_wdt
[  106.983768]  iTCO_vendor_support
[  106.983769]  ppdev
[  106.983770]  aesni_intel
[  106.983770]  lrw
[  106.983771]  gf128mul
[  106.983771]  glue_helper
[  106.983772]  ablk_helper
[  106.983772]  cryptd
[  106.983773]  sg
[  106.983773]  pcspkr
[  106.983773]  lpc_ich
[  106.983774]  i2c_i801
[  106.983774]  parport_pc
[  106.983775]  parport
[  106.983775]  nfsd
[  106.983776]  auth_rpcgss
[  106.983776]  nfs_acl
[  106.983777]  lockd
[  106.983777]  grace
[  106.983778]  sunrpc
[  106.983778]  ip_tables
[  106.983779]  xfs
[  106.983779]  libcrc32c
[  106.983780]  sd_mod
[  106.983780]  crc_t10dif
[  106.983780]  crct10dif_generic
[  106.983781]  bochs_drm
[  106.983781]  drm_kms_helper
[  106.983782]  syscopyarea
[  106.983782]  ahci
[  106.983782]  libahci
[  106.983783]  sysfillrect
[  106.983783]  crct10dif_pclmul
[  106.983784]  crct10dif_common
[  106.983784]  sysimgblt
[  106.983785]  fb_sys_fops
[  106.983785]  ttm
[  106.983786]  crc32c_intel
[  106.983786]  serio_raw
[  106.983787]  libata
[  106.983787]  e1000
[  106.983792]  drm
[  106.983793]  i2c_core
[  106.983794]  dm_mirror
[  106.983794]  dm_region_hash
[  106.983795]  dm_log
[  106.983795]  dm_mod

[  106.983797] CPU: 0 PID: 4 Comm: ktimersoftd/0 Not tainted 3.10.0-495.rt56.397.el7.x86_64 #1
[  106.983798] Hardware name: Red Hat KVM, BIOS 1.9.1-4.el7 04/01/2014
[  106.983799]  ffff880176737c90
[  106.983800]  00000000c25e1265
[  106.983800]  ffff880176737c48
[  106.983800]  ffffffff81678f96

[  106.983801]  ffff880176737c80
[  106.983802]  ffffffff81079160
[  106.983802]  0000000000000000
[  106.983802]  ffff880174922000

[  106.983803]  ffff88007fa9a480
[  106.983803]  0000000000000001
[  106.983803]  0000000000000000
[  106.983804]  ffff880176737ce8

[  106.983804] Call Trace:

[  106.983809]  [<ffffffff81678f96>] dump_stack+0x19/0x1b

[  106.983813]  [<ffffffff81079160>] warn_slowpath_common+0x70/0xc0

[  106.983815]  [<ffffffff8107920c>] warn_slowpath_fmt+0x5c/0x80

[  106.983818]  [<ffffffff81585d7e>] dev_watchdog+0x28e/0x2a0

[  106.983820]  [<ffffffff81585af0>] ? pfifo_fast_init+0x80/0x80

[  106.983822]  [<ffffffff8108cec6>] call_timer_fn+0x36/0x180

[  106.983824]  [<ffffffff8108d1af>] run_timer_softirq+0x19f/0x330

[  106.983825]  [<ffffffff81585af0>] ? pfifo_fast_init+0x80/0x80

[  106.983827]  [<ffffffff810830b7>] do_current_softirqs+0x247/0x470

[  106.983829]  [<ffffffff810833ca>] run_ksoftirqd+0x3a/0x70

[  106.983831]  [<ffffffff810af502>] smpboot_thread_fn+0x202/0x2d0

[  106.983833]  [<ffffffff810af300>] ? lg_double_unlock+0x40/0x40

[  106.983835]  [<ffffffff810a6091>] kthread+0xc1/0xd0

[  106.983836]  [<ffffffff810a5fd0>] ? kthread_worker_fn+0x170/0x170

[  106.983839]  [<ffffffff81687398>] ret_from_fork+0x58/0x90

[  106.983841]  [<ffffffff810a5fd0>] ? kthread_worker_fn+0x170/0x170
[  106.983842] ---[ end trace 0000000000000002 ]---
[  106.983848] e1000 0000:00:02.0 enp0s2: Reset adapter
[  115.293585] xor: automatically using best checksumming function:
[  115.302676]    avx       : 22124.000 MB/sec
[  115.321680] raid6: sse2x1   gen()  7371 MB/s
[  115.338676] raid6: sse2x2   gen()  9273 MB/s
[  115.355678] raid6: sse2x4   gen() 10730 MB/s
[  115.355679] raid6: using algorithm sse2x4 gen() (10730 MB/s)
[  115.355680] raid6: using ssse3x2 recovery algorithm
[  115.375500] Btrfs loaded
[  115.385542] fuse init (API version 7.22)
[  116.200968] nr_pdflush_threads exported in /proc is scheduled for removal
[  118.033862] warning: `turbostat' uses 32-bit capabilities (legacy support in use)
[  173.641583] IPv6: ADDRCONF(NETDEV_UP): enp0s2: link is not ready

3.

Actual results:
Guest fails to get ip and call traces appear in dmesg.

Expected results:
Guest network should get ip and no error info occurs.

Additional info:
1. Without 'kernel-irqchip=split' and 'intremap=true', guest network works well.

2. This bug is found by verifying bug[1]
[1]Bug 1358653 - [RFE] Interrupt remapping support for Intel vIOMMUs

Comment 2 Pei Zhang 2016-08-25 06:38:34 UTC
As Peter suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1358653#c6,

add '-global ioapic.version=0x20', then the network becomes well. So this bug may not a bug.

Comment 3 Peter Xu 2016-08-30 13:29:33 UTC
(In reply to Pei Zhang from comment #2)
> As Peter suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1358653#c6,
> 
> add '-global ioapic.version=0x20', then the network becomes well. So this
> bug may not a bug.

Pei,

I think so. Shall we change it to NOTABUG?

Comment 4 Pei Zhang 2016-08-31 00:27:35 UTC
(In reply to Peter Xu from comment #3)
> (In reply to Pei Zhang from comment #2)
> > As Peter suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1358653#c6,
> > 
> > add '-global ioapic.version=0x20', then the network becomes well. So this
> > bug may not a bug.
> 
> Pei,
> 
> I think so. Shall we change it to NOTABUG?

OK. Close this bug as 'NOTABUG' as Comment 2 and Comment 3.

Comment 5 Igor Mammedov 2016-09-22 07:42:48 UTC
*** Bug 1378140 has been marked as a duplicate of this bug. ***

Comment 6 Igor Mammedov 2016-09-22 07:48:22 UTC
Peter,

Shall we reopen this bug and fix it upstream as well?

I think that emulated HW should work by default without need to specify some obscure parameter on CLI or if it's not possible to fix then at least QEMU should warn user that device won't be usable in specified configuration.

Comment 7 Peter Xu 2016-09-22 08:26:26 UTC
Hi, Igor,

(In reply to Igor Mammedov from comment #6)
> Peter,
> 
> Shall we reopen this bug and fix it upstream as well?
> 
> I think that emulated HW should work by default without need to specify some
> obscure parameter on CLI or if it's not possible to fix then at least QEMU
> should warn user that device won't be usable in specified configuration.

The problem is: it only happens on some kernels, and we could never know what kernel the guest is running. (and actually this is possibly a upstream kernel bug on IR, which is on my todo list. this is another story.)

But I agree with you that this is awkward. The best solution is to let it run by default. Since now we have released QEMU 2.7, maybe it's time to post a patch for QEMU 2.8 upstream (and for rhev 7.4 as well).

If you think this is the right way to go, please just OPEN it and I'll handle the rest.

Thanks.

Comment 10 ilmostro7 2017-03-27 17:20:45 UTC
I believe this just occurred on my RHEL7.3 system and a fedora25 guest; though, there is no current RHEV subscription on the system.  It might be related to the network adapter having been used in Passthrough mode.

Comment 12 Pei Zhang 2017-05-08 08:30:01 UTC
Verification:

Versions:
3.10.0-663.rt56.582.el7.x86_64/3.10.0-663.el7.x86_64
qemu-kvm-rhev-2.9.0-3.el7.x86_64


Steps:
Same with Description. 
- The e1000 network device can get IP.
- No call trace info shows in host.
- No other errors with reboot/shutdown VM.

So this bug has been fixed well. Moving to "VERIFIED".

Comment 14 errata-xmlrpc 2017-08-01 23:34:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 15 errata-xmlrpc 2017-08-02 01:12:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 16 errata-xmlrpc 2017-08-02 02:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 02:45:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 18 errata-xmlrpc 2017-08-02 03:09:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 19 errata-xmlrpc 2017-08-02 03:29:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392