Bug 1064516
Summary: | macvtap bridge guest can't receive unicast | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | David Carlson <thecubic> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 20 | CC: | adam, airlied, ajax, aquini, arozansk, atomlin, berrange, bskeggs, chrisw, clalancette, dallan, davej, dwmw2, eparis, gansalmon, hdegoede, igeorgex, itamar, ivecera, jarodwilson, jforbes, jglisse, jogreene, jonathan, josef, j, jwboyer, kernel-maint, kmcmartin, laine, libvirt-maint, madhu.chinakonda, mael.lavault, m.a.young, mchehab, mjg59, mkletzan, nhorman, paul.f.fee, quintela, steved, thecubic, veillard, virt-maint, yangyudevel | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-3.14.7-100.fc19 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1076597 1204588 (view as bug list) | Environment: | |||||
Last Closed: | 2014-06-13 22:49:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1076597, 1204588 | ||||||
Attachments: |
|
Description
David Carlson
2014-02-12 18:35:08 UTC
Someone on the libvirt-users list reported the "macvtap not working" problem, although they hadn't tried turning on promiscuous mode - I just replied to them recommending they try that to see if it's the same problem you're reporting. In the meantime, I'm using macvtap on a F20 system with both the updates-testing repo and the fedora-virt-preview repo enabled, and it is working with no problems. You may want to try setting enabled=1 in /etc/yum.repos.d/fedora-updates-testing.repo followed by a yum update and reboot to see if maybe a newer kernel would solve the problem. If that doesn't help, get: http://fedorapeople.org/groups/virt/virt-preview/fedora-virt-preview.repo and put it in /etc/yum.repos.d, then again yum update and reboot. If that doesn't help, then at least we've determined that it's a difference (from my setup) in hardware or configuration, rather than a difference in the code that you're running. Also, Bug 1040315 was just pointed out to me on IRC - if your physical device is using the e100e driver, this may be the source of your problem. The bug described in Bug 1040315 was apparently introduced in kernel-3.12. If you are using an e1000 driver, and you're feeling adventurous, I've found one 3.11 kernel built for f20 that you might want to try to see if it fixes the problem: http://koji.fedoraproject.org/koji/buildinfo?buildID=483008 Same problem observed on my F20 machine. Kernel: 3.13.5-200.fc20.x86_64. Physical NIC driver: e1000e Workaround: * Stop VM * On host: $ sudo ifconfig <nic> promisc * Start VM * DHCP in VM now works as expected. How can I help here, do you need any more info? Moving to kernel as it's most probably the same issue as in Bug 1040315. I'm not sure this is a duplicate of the bug you mentioned. That seems to be for a very specific piece of hardware, not everything covered by e1000e. John, any ideas on this one? Can you give the output of lspci -nvv? If we are on e1000e hardware, a couple patches are out for this from Intel. Sounds much like it. If not, these won't help. This is in 3.14 rc5 96dee02 e1000e: Fix SHRA register access for 82579 This is really new, on 3.15 rc6 b3e5bf1 e1000e: Failure to write SHRA turns on PROMISC mode I'd try the later kernel if you can. Created attachment 902126 [details]
Output of "lspci -nn"
This bug is still present on my system, the workaround of setting promiscuous mode on the host interface is still required for correct network access from guest VM. I'm currently running kernel: 3.14.4-200.fc20.x86_64 My machine has two NICs: Intel 82579LM and Broadcom BCM5721. The "lspci -nvv" output for the Intel NIC is: 00:19.0 0200: 8086:1502 (rev 04) Subsystem: 1028:047e Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 43 Region 0: Memory at f1600000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f1680000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 5040 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00338 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e Complete lspci output is attached above. This patch isn't helpful if your using the Broadcom device. The 82579 Intel device is one of the ones in question, yes. I assume that's em1? Usually is, but first please lets' confirm that (ethtool -i em1). That should show the e1000e device. If it is, read on. I re-checked the patch and believe it's applicable to your Intel hardware: The fix you need is: b3e5bf1 e1000e: Failure to write SHRA turns on PROMISC mode This is really new, on 3.15 rc6 (should be in rawhide). Give that a try and let me know. Yes, it's my Intel NIC that's having problems. I'm running Fedora 20, is it safe(ish) to take the rawhide kernel (currently 3.15.0-0.rc8.git1.2.fc21) and install on F20 for the purpose of testing the fix? I would then revert to the F20 kernel after collecting results. Good news, testing with kernel-3.15.0-0.rc8.git1.2.fc21.x86_64 shows the bug is fixed in that kernel. The guest VM can bring up its NIC using DHCP without having to set promiscuous mode on the host. You made my day.. Glad to hear that's the case. I'll close this then. Glad I fixed something today. ;) (In reply to John Greene from comment #11) > This patch isn't helpful if your using the Broadcom device. The 82579 Intel > device is one of the ones in question, yes. I assume that's em1? Usually > is, but first please lets' confirm that (ethtool -i em1). That should show > the e1000e device. If it is, read on. > > > I re-checked the patch and believe it's applicable to your Intel hardware: > The fix you need is: > b3e5bf1 e1000e: Failure to write SHRA turns on PROMISC mode > > This is really new, on 3.15 rc6 (should be in rawhide). Give that a try and > let me know. Erm... that sha1sum doesn't exist in Linus' tree: [jwboyer@vader linux]$ git log b3e5bf1 fatal: ambiguous argument 'b3e5bf1': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' [jwboyer@vader linux]$ But the summary matches one that is in 3.15-rc1: [jwboyer@vader linux]$ git log --pretty=oneline -1 96dee024ca4799d6d21588951240035c21ba1c67 96dee024ca4799d6d21588951240035c21ba1c67 e1000e: Fix SHRA register access for 82579 [jwboyer@vader linux]$ git describe --contains 96dee024ca4799d6d21588951240035c21ba1c67 v3.15-rc1~113^2~204^2~3 [jwboyer@vader linux]$ So while confusing, it seems the fix you highlighted is still in 3.15. Shouldn't it go back to stable? (In reply to Josh Boyer from comment #15) > (In reply to John Greene from comment #11) > > This patch isn't helpful if your using the Broadcom device. The 82579 Intel > > device is one of the ones in question, yes. I assume that's em1? Usually > > is, but first please lets' confirm that (ethtool -i em1). That should show > > the e1000e device. If it is, read on. > > > > > > I re-checked the patch and believe it's applicable to your Intel hardware: > > The fix you need is: > > b3e5bf1 e1000e: Failure to write SHRA turns on PROMISC mode > > > > This is really new, on 3.15 rc6 (should be in rawhide). Give that a try and > > let me know. > > Erm... that sha1sum doesn't exist in Linus' tree: > > > [jwboyer@vader linux]$ git log b3e5bf1 > fatal: ambiguous argument 'b3e5bf1': unknown revision or path not in the > working tree. > Use '--' to separate paths from revisions, like this: > 'git <command> [<revision>...] -- [<file>...]' > [jwboyer@vader linux]$ > > But the summary matches one that is in 3.15-rc1: Sorry, I should have said "...summary is similar to one..." Still confused if this is actually the upstream fix your referring to, or if you have some other fix out there that isn't in Linus' tree yet. > [jwboyer@vader linux]$ git log --pretty=oneline -1 > 96dee024ca4799d6d21588951240035c21ba1c67 > 96dee024ca4799d6d21588951240035c21ba1c67 e1000e: Fix SHRA register access > for 82579 > [jwboyer@vader linux]$ git describe --contains > 96dee024ca4799d6d21588951240035c21ba1c67 > v3.15-rc1~113^2~204^2~3 > [jwboyer@vader linux]$ > > So while confusing, it seems the fix you highlighted is still in 3.15. > Shouldn't it go back to stable? (In reply to Josh Boyer from comment #16) > (In reply to Josh Boyer from comment #15) > > (In reply to John Greene from comment #11) > > > This patch isn't helpful if your using the Broadcom device. The 82579 Intel > > > device is one of the ones in question, yes. I assume that's em1? Usually > > > is, but first please lets' confirm that (ethtool -i em1). That should show > > > the e1000e device. If it is, read on. > > > > > > > > > I re-checked the patch and believe it's applicable to your Intel hardware: > > > The fix you need is: > > > b3e5bf1 e1000e: Failure to write SHRA turns on PROMISC mode > > > > > > This is really new, on 3.15 rc6 (should be in rawhide). Give that a try and > > > let me know. > > > > Erm... that sha1sum doesn't exist in Linus' tree: > > > > > > [jwboyer@vader linux]$ git log b3e5bf1 > > fatal: ambiguous argument 'b3e5bf1': unknown revision or path not in the > > working tree. > > Use '--' to separate paths from revisions, like this: > > 'git <command> [<revision>...] -- [<file>...]' > > [jwboyer@vader linux]$ > > > > But the summary matches one that is in 3.15-rc1: > > Sorry, I should have said "...summary is similar to one..." > > Still confused if this is actually the upstream fix your referring to, or if > you have some other fix out there that isn't in Linus' tree yet. Oh. Actually, I'm not confused now on the commit. The commint you pointed to is in linux-next: https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/net/ethernet/intel/e1000e?id=b3e5bf1ff32cbc58c56675498565020460c683cd but it isn't in Linus' tree for 3.15. So... is the commit I highlighted the one that fixes this issue? Not sure what in rawhide is causing things to function correctly for Paul, but it sure isn't the fix you pointed to. Reopening until we get it sorted. Yes that right: it is the fix. It is in net-next tree git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git commit b3e5bf1ff32cbc58c56675498565020460c683cd Author: David Ertman <davidx.m.ertman> Date: Tue May 6 03:50:17 2014 +0000 e1000e: Failure to write SHRA turns on PROMISC mode Previously, the check to turn on promiscuous mode only took into account the total number of SHared Receive Address (SHRA) registers and if the request was for a register within that range. It is possible that the Management Engine might have locked a number of SHRA and not allowed a new address to be written to the requested register. Add a function to determine the number of unlocked SHRA registers. Then determine if the number of registers available is sufficient for our needs, if not then return -ENOMEM so that UNICAST PROMISC mode is activated. Since the method by which ME claims SHRA registers is non-deterministic, also add a return value to the function attempting to write an address to a SHRA, and return a -E1000_ERR_CONFIG if the write fails. The error will be passed up the function chain and allow the driver to also set UNICAST PROMISC when this happens. Cc: Vlad Yasevich <vyasevic> Signed-off-by: Dave Ertman <davidx.m.ertman> Tested-by: Aaron Brown <aaron.f.brown> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher> This is fix from Intel. We work closely with them to take their stuff, and try to get these things pretty quickly. Does that clear it up? Perhaps linus version will be different, it's listed as 3.15 is net-next v3.15-rc6-1089-gb3e5bf1 But perhaps it may slide to next window of linus tree. Does that help? It helps with the exception of why Paul says the current rawhide kernel works fine. The current rawhide kernel doesn't have that patch, so... Yeah, if so a definite puzzler. I have no answer w/o deeper dig. Ah wait, I think I know check this (which in 3.14 ): 96dee02 e1000e: Fix SHRA register access for 82579 This patch was first and did fix work on some versions of 579, but we found it wasn't a total fix for all flavors of e1000e. So, we collaborated with Intel to produce the 2nd patch, discussion was in https://bugzilla.redhat.com/show_bug.cgi?id=1040315 This later patch should encompass the RAR issues for all the family. If not, i am clueless as to why. (In reply to John Greene from comment #21) > Ah wait, I think I know > > check this (which in 3.14 ): > 96dee02 e1000e: Fix SHRA register access for 82579 That's the patch I pointed to in comment #15. It's not in 3.14 though, it's only in 3.15-rc1. Whatever git tree you are using is really really not Linus' tree. > This patch was first and did fix work on some versions of 579, but we found > it wasn't a total fix for all flavors of e1000e. > So, we collaborated with Intel to produce the 2nd patch, discussion was in > https://bugzilla.redhat.com/show_bug.cgi?id=1040315 > > This later patch should encompass the RAR issues for all the family. If > not, i am clueless as to why. OK. So basically, both of those patches should probably get backported to stable. 96dee02 needs to go back to 3.14.y and b3e5bf1 should probably go to 3.14.y and 3.15.y. Yes, agreed. If practical, keep them together on a version but I understand that may not be possible. But hopefully this thread will help those that come behind understand the history. Most me coming back to it too! Will the fix appear in the F20 kernel, either via a bump to 3.15 or backporting the fix to 3.14? I'm willing to test if necessary. I backported both of them to F20. They will be in the next kernel build. kernel-3.14.7-200.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.14.7-200.fc20 kernel-3.14.7-100.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.14.7-100.fc19 Confirmed that kernel-3.14.7-200.fc20 fixes this bug. Thanks. Package kernel-3.14.7-200.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.14.7-200.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-7313/kernel-3.14.7-200.fc20 then log in and leave karma (feedback). kernel-3.14.7-200.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report. kernel-3.14.7-100.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. I Have the same issue in fedora 21 with kernel 3.17.6-300.fc21.x86_64 My ethernet card : Qualcomm Atheros AR8161 Gigabit Ethernet (rev 08) Current CentOS 7 with kernel 3.10.0-123.13.2.el7.x86_64 seems to have this issue too |