Bug 1067126

Summary:

Virt-manager doesn't configure bridge for VM

Product:

Red Hat Enterprise Linux 7

Reporter:

Martin <mholec>

Component:

kernel

Assignee:

John Greene <jogreene>

Status:

CLOSED DUPLICATE

QA Contact:

zenghui.shi <zshi>

Severity:

urgent

Docs Contact:

Priority:

medium

Version:

7.0

CC:

bruce.w.allan, codong, davidx.m.ertman, dyuan, fkluknav, gscrivan, hyao, jasowang, jaster, jkoten, jogreene, john.ronciak, juzhang, kernel-qe, knoel, kzhang, lilu, michele, mprivozn, mst, mzhan, ohudlick, peterm, qiguo, toracat, tpelka, tzheng, vyasevic, zshi

Target Milestone:

Keywords:

Regression, Reopened, TestBlocker

Target Release:

7.0

Flags:

jogreene: needinfo-

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-01-23 13:34:43 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1083244, 1113511

Attachments:

Description	Flags
screenshot	none
journalctl log	none
Test patch	none
Test patch 2 - with return value added	none
Test Patch 3	none
Test Patch 4	none

Description Martin 2014-02-19 18:44:25 UTC

Description of problem:
Virt-manager doesn't configure bridge for VM.

Version-Release number of selected component (if applicable):
virt-manager-0.10.0-16.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Show virtual hardware details - NIC ...
2. Set "Source mode" to bridge. (Source device should already be ethernet).
3. Hit "Apply" and try to boot from PXE.

Actual results:
Can't boot from PXE. Bridge interface is in UNKNOWN state.

Expected results:
Working bridge networking.

Additional info:
brctl show doesn't list any bridge.

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f0:de:f1:6b:1d:b8 brd ff:ff:ff:ff:ff:ff
    inet 10.34.130.132/23 brd 10.34.131.255 scope global dynamic enp0s25
       valid_lft 85472sec preferred_lft 85472sec
    inet6 fe80::f2de:f1ff:fe6b:1db8/64 scope link 
       valid_lft forever preferred_lft forever
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:24:d7:cd:a6:d0 brd ff:ff:ff:ff:ff:ff
    inet 10.200.138.23/22 brd 10.200.139.255 scope global dynamic wlp3s0
       valid_lft 5535sec preferred_lft 5535sec
    inet6 fe80::224:d7ff:fecd:a6d0/64 scope link 
       valid_lft forever preferred_lft forever
9: macvtap0@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 52:54:00:fa:cb:07 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fefa:cb07/64 scope link 
       valid_lft forever preferred_lft forever

Comment 1 Martin 2014-02-19 18:51:17 UTC

Regression - This is working on RHEL 6.x

TestBlocker - This breaks QE workflow when installing VMs from PXE.

Comment 3 Martin 2014-02-19 18:54:44 UTC

journalctl:

Feb 19 19:53:54 dhcp130-132.brq.redhat.com NetworkManager[1138]: <info> (macvtap0): link connected
Feb 19 19:53:54 dhcp130-132.brq.redhat.com NetworkManager[1138]: <info> (macvtap0): carrier is ON
Feb 19 19:53:54 dhcp130-132.brq.redhat.com NetworkManager[1138]: <info> (macvtap0): new Macvlan device (driver: 'unknown' ifindex: 11)
Feb 19 19:53:54 dhcp130-132.brq.redhat.com NetworkManager[1138]: <info> (macvtap0): exported as /org/freedesktop/NetworkManager/Devices/10
Feb 19 19:53:54 dhcp130-132.brq.redhat.com libvirtd[1688]: User record for user '107' was not found: No such file or directory
Feb 19 19:53:54 dhcp130-132.brq.redhat.com libvirtd[1688]: Group record for user '107' was not found: No such file or directory
Feb 19 19:53:54 dhcp130-132.brq.redhat.com systemd[1]: Starting Virtual Machine qemu-RHEL7-Beta.
Feb 19 19:53:54 dhcp130-132.brq.redhat.com systemd-machined[5512]: New machine qemu-RHEL7-Beta.
Feb 19 19:53:54 dhcp130-132.brq.redhat.com systemd[1]: Started Virtual Machine qemu-RHEL7-Beta.
Feb 19 19:53:54 dhcp130-132.brq.redhat.com libvirtd[1688]: User record for user '107' was not found: No such file or directory
Feb 19 19:53:54 dhcp130-132.brq.redhat.com libvirtd[1688]: Group record for user '107' was not found: No such file or directory
Feb 19 19:53:54 dhcp130-132.brq.redhat.com libvirtd[1688]: User record for user '107' was not found: No such file or directory
Feb 19 19:53:54 dhcp130-132.brq.redhat.com libvirtd[1688]: Group record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com libvirtd[1688]: User record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com libvirtd[1688]: Group record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com libvirtd[1688]: User record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com libvirtd[1688]: Group record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com libvirtd[1688]: User record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com libvirtd[1688]: Group record for user '107' was not found: No such file or directory
Feb 19 19:53:55 dhcp130-132.brq.redhat.com kvm[9399]: 1 guest now active
Feb 19 19:53:56 dhcp130-132.brq.redhat.com avahi-daemon[952]: Registering new address record for fe80::5054:ff:fefa:cb07 on macvtap0.*.

Comment 5 tingting zheng 2014-02-25 02:45:00 UTC

Tested with:
virt-manager-0.10.0-16.el7.noarch

1.Create a new guest from virt-manager.
2.In "Step 1" choose "Network Boot(PXE)".
3.In "Step 2" choose "OS type" and "Version".
4 In "Step 5" check the "Customize configuration before install" checkbox to customize the network settings later. Click "Finish"
5.Click "Virtual network interface" (icon of two blue monitors) => "Source device" set to "Host device em1:macvtap" (or eth0 instead of em1, or whatever you normally use to access the network with PXE server). 
6.Set "Source mode" as "Bridge".
7.Apply changes, begin installation, successful PXE boot.

It seems on my machine macvtap0@em1 interface is UNKNOW state,but I can boot guest from pxe successfully.

On the host:
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 52:54:00:e7:8d:e4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 500
    link/ether 52:54:00:e7:8d:e4 brd ff:ff:ff:ff:ff:ff
8: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:25:64:a6:fe:df brd ff:ff:ff:ff:ff:ff
    inet 10.66.5.145/22 brd 10.66.7.255 scope global dynamic em1
       valid_lft 50633sec preferred_lft 50633sec
    inet6 fe80::225:64ff:fea6:fedf/64 scope link 
       valid_lft forever preferred_lft forever
32: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN qlen 500
    link/ether fe:54:00:16:3e:a0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe16:3ea0/64 scope link 
       valid_lft forever preferred_lft forever
33: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN qlen 500
    link/ether fe:54:00:08:38:d6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe08:38d6/64 scope link 
       valid_lft forever preferred_lft forever
34: macvtap0@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 52:54:00:ee:0e:6d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:feee:e6d/64 scope link 
       valid_lft forever preferred_lft forever

Comment 6 tingting zheng 2014-02-25 02:52:23 UTC

Refer to comment 5,on my enviroment,the macvtap network with bridge mode did work.
Is there anything I misunderstand or would you pls try again with a new environment?thanks.

Comment 7 Giuseppe Scrivano 2014-02-25 17:56:29 UTC

I've followed the step in comment 5 and I've noticed some UDP bad checksums (trough tcpdump -vvv):

192.168.125.96.1024 > localhost.tftp: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0 
18:49:55.555196 IP (tos 0x0, ttl 64, id 26210, offset 0, flags [none], proto UDP (17), length 55)
localhost.34118 > 192.168.125.96.1024: [bad udp cksum 0x7be7 -> 0x9e0a!] UDP, length 27
18:49:55.591686 IP (tos 0x0, ttl 64, id 26211, offset 0, flags [none], proto UDP (17), length 55)
localhost.59917 > 192.168.125.96.1024: [bad udp cksum 0x7be7 -> 0x3943!] UDP, length 27
18:49:55.887819 IP (tos 0x0, ttl 64, id 26212, offset 0, flags [none], proto UDP (17), length 55)

Martin, can you please check if it is the same problem you are having?

The following command seems to fix the problem here:
 
$ sudo ethtool --offload  virbr0  tx off

After that, I can use PXE without problems.

Comment 8 Giuseppe Scrivano 2014-02-26 14:07:18 UTC

any update on this?

Comment 9 Giuseppe Scrivano 2014-02-27 15:37:27 UTC

I am going to close this as I couldn't reproduce it here.

Comment 10 Martin 2014-03-03 16:05:06 UTC

I have installed clean RHEL-7.0-20140226.0.

I went through steps in comment #5. I still can't boot from PXE and I don't see any virbr or vnet interfaces, have tried reproducing this on clean installed RHEL7?

Interfaces on the host after starting VM:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f0:de:f1:6b:1d:b8 brd ff:ff:ff:ff:ff:ff
    inet 10.34.131.127/23 brd 10.34.131.255 scope global dynamic enp0s25
       valid_lft 77638sec preferred_lft 77638sec
    inet6 2620:52:0:2282:f2de:f1ff:fe6b:1db8/64 scope global dynamic 
       valid_lft 2591998sec preferred_lft 604798sec
    inet6 fe80::f2de:f1ff:fe6b:1db8/64 scope link 
       valid_lft forever preferred_lft forever
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:24:d7:cd:a6:d0 brd ff:ff:ff:ff:ff:ff
    inet 10.200.136.206/22 brd 10.200.139.255 scope global dynamic wlp3s0
       valid_lft 5145sec preferred_lft 5145sec
    inet6 fe80::224:d7ff:fecd:a6d0/64 scope link 
       valid_lft forever preferred_lft forever
7: macvtap0@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether 52:54:00:f5:8d:7c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fef5:8d7c/64 scope link 
       valid_lft forever preferred_lft forever

Comment 11 Martin 2014-03-03 16:07:13 UTC

Created attachment 870010 [details]
screenshot

My network configuration in virt-manager.

Comment 13 Giuseppe Scrivano 2014-03-03 16:16:28 UTC

have you tried tcpdump (as in comment 7) to check if something is going wrong on the network?

Comment 14 Martin 2014-03-03 16:20:29 UTC

When I run tcpdump -vvv something interesting, I can suddenly boot from PXE. But it's only temporary, when I interrupt tcpdump, I can't run PXE again.

I can't run: "sudo ethtool --offload  virbr0  tx off" Because I don't have this interface, see above.

Comment 15 Giuseppe Scrivano 2014-03-03 16:31:44 UTC

does it make a difference if you try that command replacing virbr0 with the interface you use to connect to the network (enp0s25/wlp3s0) ?

Comment 16 Martin 2014-03-03 17:11:19 UTC

No change.

# ethtool --offload  enp0s25 tx off
Actual changes:
tx-checksumming: off
	tx-checksum-ip-generic: off
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp6-segmentation: off [requested on]

Comment 17 Giuseppe Scrivano 2014-03-04 15:10:30 UTC

Comment 14 kind of confirm that it is a problem with your network.  Can you also try "rx off" to ethtool?

Comment 18 Martin 2014-03-04 16:02:43 UTC

Created attachment 870491 [details]
journalctl log

I've run "ethtool --offload  enp0s25 rx off" as suggested by Giuseppe. After that, I lost connection and was unable to get IP using Network Manager applet. I had to run dhclient manually from console.

Comment 19 Michal Privoznik 2014-03-04 17:53:05 UTC

It seems like this issue has been reported earlier:

https://bugzilla.redhat.com/show_bug.cgi?id=844006

However, this seems like a kernel issue to me. I mean, whenever a macvtap device is created, the lower interface needs to be put into promisc mode (it's gonna be receiving packets with different MAC addresses and it cannot drop them). Although, this kind of promisc mode is not exposed to userspace as this e-mail suggests:

http://www.spinics.net/lists/netdev/msg174700.html

Comment 20 Vlad Yasevich 2014-03-04 18:42:00 UTC

Can you please provide the following information:

1) Output of 'ethtool -i enp0s25'
2) Output of 'bridge fdb show dev enp0s25'
3) Output of 'ip -d link show dev macvtap0'

Thanks
-vlad

Comment 21 Vlad Yasevich 2014-03-04 18:54:05 UTC

Also, please provide output of lspci.

At this point, this sounds like it might be a duplicate of Bug 1040315, but
we need lspci data to verify.

Thanks
-vlad

Comment 22 Martin 2014-03-05 10:05:48 UTC

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b4)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation QM67 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)
05:00.0 System peripheral: Ricoh Co Ltd MMC/SD Host Controller (rev 07)
0d:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)

Comment 23 Martin 2014-03-05 10:10:18 UTC

# ethtool -i enp0s25
driver: e1000e
version: 2.3.2-k
firmware-version: 0.13-3
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

# bridge fdb show dev enp0s25
52:54:00:f5:8d:7c self permanent
01:00:5e:00:00:01 self permanent
33:33:00:00:00:01 self permanent
33:33:ff:6b:1d:b8 self permanent
01:00:5e:00:00:fb self permanent
33:33:ff:f5:8d:7c self permanent

# ip -d link show dev macvtap0
5: macvtap0@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 500
    link/ether 52:54:00:f5:8d:7c brd ff:ff:ff:ff:ff:ff promiscuity 0 
    macvtap  mode vepa

Comment 24 Vlad Yasevich 2014-03-05 14:20:42 UTC

(In reply to Martin Holec from comment #22)

> 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network
> Connection (rev 04)

Yes, you have a nic that was recently broken in rhel7 as well as upstream.
This is already known issue.  See Bug 1040315.

Feel free to close at duplicate.
-vlad

Comment 25 Giuseppe Scrivano 2014-03-05 15:15:32 UTC


*** This bug has been marked as a duplicate of bug 1040315 ***

Comment 26 Jaroslav Aster 2014-04-07 14:14:24 UTC

I'm reopening this bug because it seems to be still broken. I tested it on my laptop Lenovo T440s with this ethernet card:

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM (rev 04)

When I tried to boot virtual machine from pxe, it did not work, after I run tcpdump on host machine on physical ethernet interface, it started to work, but only if promiscuous mode was enabled.

I tried server kernels: 110, 116, 117, 119
and virt-manager: virt-manager-0.10.0-20.el7.

Comment 27 Vlad Yasevich 2014-04-07 14:47:02 UTC

Could you try this test:

On you host create and bring up macvlan device:
# ip l a link enp0s25 type macvlan
# ip l s macvlan0 up
# dhclient -1 macvlan0

If this doesn't work, then there are still driver issues.
If this works, we'll have to debug.

-vlad

Comment 32 Vlad Yasevich 2014-04-07 20:16:09 UTC

(In reply to Jaroslav Aster from comment #26)
> I'm reopening this bug because it seems to be still broken. I tested it on
> my laptop Lenovo T440s with this ethernet card:
> 
> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM
> (rev 04)
> 
> When I tried to boot virtual machine from pxe, it did not work, after I run
> tcpdump on host machine on physical ethernet interface, it started to work,
> but only if promiscuous mode was enabled.
> 
> I tried server kernels: 110, 116, 117, 119
> and virt-manager: virt-manager-0.10.0-20.el7.

Could you provide the PCI IDs for this board also (lspci -n).  We'll need to isolate the version that might be impacted.

I just retested with the board from the original report and the problem is fixed.
I seems likely that other board versions were also effected.

Thanks
-vlad

Comment 33 Jaroslav Aster 2014-04-08 07:25:42 UTC

[root@newton ~]# lspci -n
00:00.0 0600: 8086:0a04 (rev 0b)
00:02.0 0300: 8086:0a16 (rev 0b)
00:03.0 0403: 8086:0a0c (rev 0b)
00:14.0 0c03: 8086:9c31 (rev 04)
00:16.0 0780: 8086:9c3a (rev 04)
00:16.3 0700: 8086:9c3d (rev 04)
00:19.0 0200: 8086:155a (rev 04)
00:1b.0 0403: 8086:9c20 (rev 04)
00:1c.0 0604: 8086:9c1a (rev e4)
00:1c.1 0604: 8086:9c14 (rev e4)
00:1d.0 0c03: 8086:9c26 (rev 04)
00:1f.0 0601: 8086:9c43 (rev 04)
00:1f.2 0106: 8086:9c03 (rev 04)
00:1f.3 0c05: 8086:9c22 (rev 04)
02:00.0 ff00: 10ec:5227 (rev 01)
03:00.0 0280: 8086:08b2 (rev 83)

We tested it on ethernet card from original report too and I agree, it seems to be fixed for this type of card. Kernel and kernel driver was the same, version 3.10.0-119.el7.x86_64.

I agree, it seems to be driver bug.

Comment 35 John Greene 2014-04-08 14:41:53 UTC

Right now, I'm running the e1000e code to determine which variants of e1000e may still be at risk to an incomplete change that Bug 1040315 apparently did fix with 82579.  That patch was confined to the 82579.  

Working on the assumption right now this is something along the same lines as that patch, just need propagating more widely in the other family members with the the e1000e driver.  That may not be the case, but it's a good start.

I need QA guys to please quickly get an assessment of how may different flavors of e1000e are affected by the problem?  Let's try the test on as many variant in beaker as possible before 1PM call with vendor please. I'll take a look at code during that time.


It seems at this point from my read that the following are at issue:


82579 seems ok.


So far I see issues with these: 
From Bug 1040315:
Intel Corporation Ethernet Connection I217-LM (rev 05) (PCI ID  8086:153a)

From this Bug:
Intel Corporation Ethernet Connection I218-LM (rev 04) 


Please update this with anything I've missed.

Comment 36 Vlad Yasevich 2014-04-08 15:05:18 UTC

Looking at the driver code, the special handling for Manageability locked
addresses happens only for PCH2 and PCH-LPT variants.  The originally reported
82579 is a PCH2 chip.  The I217 and I218 are PCP-LPT variants.

My guess is that Intel Manageability is locking some registers in the PCH-LPT
variant as well and we fail to program the registers for macvlan.

Something of note is that for I217 and I218 cards, the number of programmable
receive registers is defined as:
#define E1000_PCH_LPT_RAR_ENTRIES       12      /* RAR[0], SHRA[0-10] */

where as the 82579 definition has been changed to:
#define E1000_PCH2_RAR_ENTRIES  5       /* RAR[0], SHRA[0-3] */

and accounts for registers reserved for manageability.

-vlad

Comment 37 John Greene 2014-04-08 16:11:24 UTC

(In reply to Vlad Yasevich from comment #36)
> Looking at the driver code, the special handling for Manageability locked
> addresses happens only for PCH2 and PCH-LPT variants.  The originally
> reported
> 82579 is a PCH2 chip.  The I217 and I218 are PCP-LPT variants.
> 
> My guess is that Intel Manageability is locking some registers in the PCH-LPT
> variant as well and we fail to program the registers for macvlan.
> 
> Something of note is that for I217 and I218 cards, the number of programmable
> receive registers is defined as:
> #define E1000_PCH_LPT_RAR_ENTRIES       12      /* RAR[0], SHRA[0-10] */
> 
> where as the 82579 definition has been changed to:
> #define E1000_PCH2_RAR_ENTRIES  5       /* RAR[0], SHRA[0-3] */
> 
> and accounts for registers reserved for manageability.
> 
> -vlad

Vlad,
I checked the same code and think you are right.  Furthermore, I have a theory I'm working now based on debug of previous issue   I217 and I218 are PCP-LPT variants have 12 registers true, 1 unicast register for this device MAC and 11 shared.  The piece of code in 
e1000_rar_set_pch_lpt()
{
...
        if (index < hw->mac.rar_entry_count) {
                wlock_mac = er32(FWSM) & E1000_FWSM_WLOCK_MAC_MASK;
                wlock_mac >>= E1000_FWSM_WLOCK_MAC_SHIFT;

                /* Check if all SHRAR registers are locked */
                if (wlock_mac == 1)
                        goto out;

                if ((wlock_mac == 0) || (index <= wlock_mac)) {
                                         ^^^^^^^^^^^^^^^^^^^^
                        s32 ret_val;
...
/* go program a RAR with a MAC, success for the VM dhcp */

else
/* error can't do it msg, no dhcp */


Debugging the previous issues shows that when macvtap loads, the requests for a RAR register is always made at the end of the list, i.e. for this device it'll be 11 (MAX - 1).
Highlighted section of code above show that wlock_mac value being not 1 or non-zero assumes that firmware owned shared RAR regs will be allocated top down also. So if firmware is using this value, it will fail.  
Upper layers will retry the same location about 10 times, then quit. It should changes to another location.
So either we fix the driver or the macvtap requestor.

Again, this is my working theory, not proven yet.

Comment 38 Vlad Yasevich 2014-04-08 16:29:01 UTC

John

(In reply to John Greene from comment #37)
> (In reply to Vlad Yasevich from comment #36)
> > Looking at the driver code, the special handling for Manageability locked
> > addresses happens only for PCH2 and PCH-LPT variants.  The originally
> > reported
> > 82579 is a PCH2 chip.  The I217 and I218 are PCP-LPT variants.
> > 
> > My guess is that Intel Manageability is locking some registers in the PCH-LPT
> > variant as well and we fail to program the registers for macvlan.
> > 
> > Something of note is that for I217 and I218 cards, the number of programmable
> > receive registers is defined as:
> > #define E1000_PCH_LPT_RAR_ENTRIES       12      /* RAR[0], SHRA[0-10] */
> > 
> > where as the 82579 definition has been changed to:
> > #define E1000_PCH2_RAR_ENTRIES  5       /* RAR[0], SHRA[0-3] */
> > 
> > and accounts for registers reserved for manageability.
> > 
> > -vlad
> 
> Vlad,
> I checked the same code and think you are right.  Furthermore, I have a
> theory I'm working now based on debug of previous issue   I217 and I218 are
> PCP-LPT variants have 12 registers true, 1 unicast register for this device
> MAC and 11 shared.  The piece of code in 
> e1000_rar_set_pch_lpt()
> {
> ...
>         if (index < hw->mac.rar_entry_count) {
>                 wlock_mac = er32(FWSM) & E1000_FWSM_WLOCK_MAC_MASK;
>                 wlock_mac >>= E1000_FWSM_WLOCK_MAC_SHIFT;
> 
>                 /* Check if all SHRAR registers are locked */
>                 if (wlock_mac == 1)
>                         goto out;
> 
>                 if ((wlock_mac == 0) || (index <= wlock_mac)) {
>                                          ^^^^^^^^^^^^^^^^^^^^
>                         s32 ret_val;
> ...
> /* go program a RAR with a MAC, success for the VM dhcp */
> 
> else
> /* error can't do it msg, no dhcp */
> 
> 
> Debugging the previous issues shows that when macvtap loads, the requests
> for a RAR register is always made at the end of the list, i.e. for this
> device it'll be 11 (MAX - 1).
> Highlighted section of code above show that wlock_mac value being not 1 or
> non-zero assumes that firmware owned shared RAR regs will be allocated top
> down also. So if firmware is using this value, it will fail.  
> Upper layers will retry the same location about 10 times, then quit. It
> should changes to another location.

I don't see upper layers retrying the same location.  The upper layer will jump
to the next address and try to write it to the next location.

Here is an interesting test.  Create 11 macvlan devices.  Then bring them up
one by one and try to get the address.  The one where it starts working will
let you know which offset finally started working.

-vlad

> So either we fix the driver or the macvtap requestor.
> 
> Again, this is my working theory, not proven yet.

Comment 39 Jaroslav Aster 2014-04-09 07:59:47 UTC

(In reply to Vlad Yasevich from comment #38)
> Here is an interesting test.  Create 11 macvlan devices.  Then bring them up
> one by one and try to get the address.  The one where it starts working will
> let you know which offset finally started working.

Hi,

I run this test:

#!/bin/sh

for i in $(seq 0 10); do
        ip l add link enp0s25 type macvlan
done

for i in $(seq 0 10); do
        ip l set dev macvlan$i up
done

for i in $(seq 0 10); do
        iface="macvlan${i}"
        pid_file="/var/run/dhclient-$iface.pid"

        dhclient -1 "$iface" -pf "$pid_file"
        ip a s dev "$iface" | grep -q 'inet ' && echo "${iface} succeed"
        kill $(cat "$pid_file") 2>/dev/null
done

for i in $(seq 0 10); do
        ip l del macvlan$i
done

and I got this results.

macvlan7 succeed
macvlan8 succeed
macvlan9 succeed
macvlan10 succeed

So, first 7 devices failed, others succeed.

Comment 40 John Greene 2014-04-09 12:49:04 UTC

(In reply to Jaroslav Aster from comment #39)
> (In reply to Vlad Yasevich from comment #38)
> > Here is an interesting test.  Create 11 macvlan devices.  Then bring them up
> > one by one and try to get the address.  The one where it starts working will
> > let you know which offset finally started working.
> 
> Hi,
> 
> I run this test:
> 
> #!/bin/sh
> 
> for i in $(seq 0 10); do
>         ip l add link enp0s25 type macvlan
> done
> 
> for i in $(seq 0 10); do
>         ip l set dev macvlan$i up
> done
> 
> for i in $(seq 0 10); do
>         iface="macvlan${i}"
>         pid_file="/var/run/dhclient-$iface.pid"
> 
>         dhclient -1 "$iface" -pf "$pid_file"
>         ip a s dev "$iface" | grep -q 'inet ' && echo "${iface} succeed"
>         kill $(cat "$pid_file") 2>/dev/null
> done
> 
> for i in $(seq 0 10); do
>         ip l del macvlan$i
> done
> 
> and I got this results.
> 
> macvlan7 succeed
> macvlan8 succeed
> macvlan9 succeed
> macvlan10 succeed
> 
> So, first 7 devices failed, others succeed.

Excellent information. We can confirm the bug is a RAR issue with this and one more small piece.  Run this cmd, the repeat the test please.

Clean start, then 
sudo echo -n 'module e1000e +p' > /sys/kernel/debug/dynamic_debug/control 

run the above script.
Then check the dmesg for this output.  

e_dbg("Failed to write receive address at index %d\n", index);

Given the above, I'd expect to see these for the six or seven failures.  This will either confirm the issue or point us elsewhere.

sudo echo -n 'module e1000e -p' > /sys/kernel/debug/dynamic_debug/control 
will turn off the e1000e debug prints when you are done.

Comment 41 Jaroslav Aster 2014-04-09 13:30:33 UTC

(In reply to John Greene from comment #40)

> Excellent information. We can confirm the bug is a RAR issue with this and
> one more small piece.  Run this cmd, the repeat the test please.
> 
> Clean start, then 
> sudo echo -n 'module e1000e +p' > /sys/kernel/debug/dynamic_debug/control 
> 
> run the above script.
> Then check the dmesg for this output.  
> 
> e_dbg("Failed to write receive address at index %d\n", index);
> 
> Given the above, I'd expect to see these for the six or seven failures. 
> This will either confirm the issue or point us elsewhere.
> 
> sudo echo -n 'module e1000e -p' > /sys/kernel/debug/dynamic_debug/control 
> will turn off the e1000e debug prints when you are done.

Hi,

I run test again and I turned on debug informations. My script had the same results and there were 931 lines like line bellow in dmesg.

[197019.349809] e1000e 0000:00:19.0 enp0s25: Failed to write receive address at index 11

Index number was not the same in each line and it had range from 5 to 11.

 # dmesg | grep 'Failed to write' | cut -d ' ' -f 12 | sort | uniq -c
    157 10
    169 11
     97 5
    109 6
    121 7
    133 8
    145 9

Comment 42 Vlad Yasevich 2014-04-09 13:39:01 UTC

After looking at the data sheet for I218 device, I am really surprised that any
of the devices above worked.  It appears that for I217 and I218, the Intel Management layer can actually reserve all of the SHRAR registers and that
would completely preclude macvlans from working.

There really needs to be something in the driver that should dynamically limit
the number of available registers based on what's reserved by manageability.
Then, if the number of addresses desired in the unicast filter is larger then
available limit, promiscuous mode needs to be enabled.

This would also allow us to start programming the registers at the proper index.

-vlad

Comment 43 John Greene 2014-04-09 14:36:47 UTC

(In reply to Vlad Yasevich from comment #42)
> After looking at the data sheet for I218 device, I am really surprised that
> any
> of the devices above worked.  It appears that for I217 and I218, the Intel
> Management layer can actually reserve all of the SHRAR registers and that
> would completely preclude macvlans from working.
> 
> There really needs to be something in the driver that should dynamically
> limit
> the number of available registers based on what's reserved by manageability.
> Then, if the number of addresses desired in the unicast filter is larger then
> available limit, promiscuous mode needs to be enabled.
> 
> This would also allow us to start programming the registers at the proper
> index.
> 
> -vlad

Two problems I see with your approach (although I agree it's root cause).  
The driver doesn't "see" the allocation by manageability, it doesn't use the driver code best I can can tell.  Access to those registers is just refused by the FWSM check of them being locked.  Without a single point allocation, we got a problem.

Second, is the opposite side issue: if we have no RAR and just open promisc mode to allow it all it works.   But when stuff shuts down that may free RARs, we can't really ever leave promisc mode without possibly killing net access to something.  At least not without implementing a software RAR table to supplement hardware; that seems a slippery slope.

Lemme check the latest upstream Intel code  I got a couple days back for any changes to this.  Doubtful, but worth a look to see.

Does this system have manageability support?  dmidecode should tell you.  Haven't tried this but if it can be disabled in the BIOS or elsewhere somehow, wonder if this goes away.

Comment 44 Vlad Yasevich 2014-04-09 15:13:52 UTC

(In reply to John Greene from comment #43)
> (In reply to Vlad Yasevich from comment #42)
> > After looking at the data sheet for I218 device, I am really surprised that
> > any
> > of the devices above worked.  It appears that for I217 and I218, the Intel
> > Management layer can actually reserve all of the SHRAR registers and that
> > would completely preclude macvlans from working.
> > 
> > There really needs to be something in the driver that should dynamically
> > limit
> > the number of available registers based on what's reserved by manageability.
> > Then, if the number of addresses desired in the unicast filter is larger then
> > available limit, promiscuous mode needs to be enabled.
> > 
> > This would also allow us to start programming the registers at the proper
> > index.
> > 
> > -vlad
> 
> Two problems I see with your approach (although I agree it's root cause).  
> The driver doesn't "see" the allocation by manageability, it doesn't use the
> driver code best I can can tell.  Access to those registers is just refused
> by the FWSM check of them being locked.  Without a single point allocation,
> we got a problem.

You are right in that there is no single access point between kernel and IME; however, from just the kernel side, there is one and that the rar_set methods.
We already have custom methods for manageability enabled cards. Those methods
can simply the FWSM register to see which index they can start programming at
and if they have more data to write then available register, they enter
promisc mode.

> 
> Second, is the opposite side issue: if we have no RAR and just open promisc
> mode to allow it all it works.   But when stuff shuts down that may free
> RARs, we can't really ever leave promisc mode without possibly killing net
> access to something.  At least not without implementing a software RAR table
> to supplement hardware; that seems a slippery slope.

We already keep a software version as a device unicast filter list and already support turning on promisc when the software list exceeds HW table.

This also brings up another question.  Can manageability layer overwirite SHRAR
programming?  If yes, anything we do at the driver is really pointless.
If no, then we should OK.  If manageability layer releases resources, we'll
pick them up next time RAR table needs to change due to software list changes.

> 
> Lemme check the latest upstream Intel code  I got a couple days back for any
> changes to this.  Doubtful, but worth a look to see.
> 
> Does this system have manageability support?  dmidecode should tell you. 
> Haven't tried this but if it can be disabled in the BIOS or elsewhere
> somehow, wonder if this goes away.

That's a question for Jaroslav.  I currently don't have a system with this HW.

-vlad

Comment 45 John Greene 2014-04-09 18:00:25 UTC

I checked latest intel out of tree driver, we have the same code here as e1000e 3.0.4.  No joy there.

No upstream patches (linus/net-next/stable) of interest either, so either we craft a fix, or work with Intel.

Comment 46 Jaroslav Aster 2014-04-10 07:16:48 UTC

(In reply to Vlad Yasevich from comment #44)
> > 
> > Lemme check the latest upstream Intel code  I got a couple days back for any
> > changes to this.  Doubtful, but worth a look to see.
> > 
> > Does this system have manageability support?  dmidecode should tell you. 
> > Haven't tried this but if it can be disabled in the BIOS or elsewhere
> > somehow, wonder if this goes away.
> 
> That's a question for Jaroslav.  I currently don't have a system with this
> HW.

Hi,

I think not, but I don't know what is manageability support exactly. I run this command, and I got nothing.

# dmidecode |grep -i manage

If you want me to do something more, please give me a note.

Comment 47 John Greene 2014-04-10 14:15:52 UTC

(In reply to Jaroslav Aster from comment #46)
> (In reply to Vlad Yasevich from comment #44)
> > > 
> > > Lemme check the latest upstream Intel code  I got a couple days back for any
> > > changes to this.  Doubtful, but worth a look to see.
> > > 
> > > Does this system have manageability support?  dmidecode should tell you. 
> > > Haven't tried this but if it can be disabled in the BIOS or elsewhere
> > > somehow, wonder if this goes away.
> > 
> > That's a question for Jaroslav.  I currently don't have a system with this
> > HW.
> 
> Hi,
> 
> I think not, but I don't know what is manageability support exactly. I run
> this command, and I got nothing.
> 
> # dmidecode |grep -i manage
> 
> If you want me to do something more, please give me a note.

dmidecode will show the presence of management as a BIOS extension: the system here looks like this when dumped with dmidecode.

Handle 0x0054, DMI type 221, 26 bytes
OEM-specific Type
        Header and Data:
                DD 1A 54 00 03 01 00 00 07 01 00 00 02 00 09 00
                00 13 00 03 04 09 00 00 CD 04
        Strings:
                Reference Code - ME 9.0
                MEBx version
                ME Firmware Version
                5MB SKU  

Does the failing system has this too?

As an aside: my little knowledge here is from an Intel paper: 
An Introduction to Intel® Active Management Technology
(Intel® AMT).

Comment 48 Jaroslav Aster 2014-04-10 15:30:17 UTC

(In reply to John Greene from comment #47)
> (In reply to Jaroslav Aster from comment #46)
> > (In reply to Vlad Yasevich from comment #44)
> > > > 
> > > > Lemme check the latest upstream Intel code  I got a couple days back for any
> > > > changes to this.  Doubtful, but worth a look to see.
> > > > 
> > > > Does this system have manageability support?  dmidecode should tell you. 
> > > > Haven't tried this but if it can be disabled in the BIOS or elsewhere
> > > > somehow, wonder if this goes away.
> > > 
> > > That's a question for Jaroslav.  I currently don't have a system with this
> > > HW.
> > 
> > Hi,
> > 
> > I think not, but I don't know what is manageability support exactly. I run
> > this command, and I got nothing.
> > 
> > # dmidecode |grep -i manage
> > 
> > If you want me to do something more, please give me a note.
> 
> dmidecode will show the presence of management as a BIOS extension: the
> system here looks like this when dumped with dmidecode.
> 
> Handle 0x0054, DMI type 221, 26 bytes
> OEM-specific Type
>         Header and Data:
>                 DD 1A 54 00 03 01 00 00 07 01 00 00 02 00 09 00
>                 00 13 00 03 04 09 00 00 CD 04
>         Strings:
>                 Reference Code - ME 9.0
>                 MEBx version
>                 ME Firmware Version
>                 5MB SKU  
> 
> Does the failing system has this too?
> 
> As an aside: my little knowledge here is from an Intel paper: 
> An Introduction to Intel® Active Management Technology
> (Intel® AMT).

No, it has not. It has several 'OEM-specific Type' structures in dmidecode's outputs, but nothing with these strings. Hardware is notebook Lenovo t440s.

Comment 49 Vlad Yasevich 2014-04-10 15:56:35 UTC

Created attachment 885015 [details]
Test patch

Jaroslav

Can you give this brew build a try:
https://brewweb.devel.redhat.com/taskinfo?taskID=7325535

I've also attached a test patch to the BZ, just as an idea of what I was talking
about.  I was able to create a 10 macvlans with this patch and they all worked.

Thanks
-vlad

Comment 50 John Greene 2014-04-10 19:17:15 UTC

(In reply to Vlad Yasevich from comment #49)
> Created attachment 885015 [details]
> Test patch
> 
> Jaroslav
> 
> Can you give this brew build a try:
> https://brewweb.devel.redhat.com/taskinfo?taskID=7325535
> 
> I've also attached a test patch to the BZ, just as an idea of what I was
> talking
> about.  I was able to create a 10 macvlans with this patch and they all
> worked.
> 
> Thanks
> -vlad
Nice work Vlad!    
This patch will probably fix the 217 and 218 e1000e flavors assuming we understand the behaviour of management in full. The other part of this is that trying to understand what other variants  might be subject to this issue.

Jburke was helping make a new automated tests as well.

Bug 1086285   Support macvtap-based interfaces in virt install test 

Anybody find other e1000e with similar problems as yet manually?

Comment 51 Vlad Yasevich 2014-04-10 20:00:58 UTC

I think that ME engine is black box and only Intel folks really understand it.
I didn't test that patch extensively and I think there might be an off-by-1
error when management is consuming registers, but I didn't have much time/inclination to prove it. :)

Feel free to post upstream to get feedback (you can add my:
Signed-off-by: Vlad Yasevich <vyasevic>).

-vlad

Comment 52 John Greene 2014-04-10 20:12:00 UTC

(In reply to Vlad Yasevich from comment #51)
> I think that ME engine is black box and only Intel folks really understand
> it.
> I didn't test that patch extensively and I think there might be an off-by-1
> error when management is consuming registers, but I didn't have much
> time/inclination to prove it. :)
> 
> Feel free to post upstream to get feedback (you can add my:
> Signed-off-by: Vlad Yasevich <vyasevic>).
> 
> -vlad

Agreed on the black box notion.  Not sure when or where the allocations come and leave either, making me unsure how to safely account for them.

Happy to post.. will keep you in the loop.

Comment 53 Jaroslav Aster 2014-04-11 06:35:13 UTC

(In reply to Vlad Yasevich from comment #49)
> Created attachment 885015 [details]
> Test patch
> 
> Jaroslav
> 
> Can you give this brew build a try:
> https://brewweb.devel.redhat.com/taskinfo?taskID=7325535
> 
> I've also attached a test patch to the BZ, just as an idea of what I was
> talking
> about.  I was able to create a 10 macvlans with this patch and they all
> worked.

Good job, it works. I run my test and all eleven macvlan interfaces succeeded.

# ./test-driver.sh 
macvlan0 succeed
macvlan1 succeed
macvlan2 succeed
macvlan3 succeed
macvlan4 succeed
macvlan5 succeed
macvlan6 succeed
macvlan7 succeed
macvlan8 succeed
macvlan9 succeed
macvlan10 succeed

Then, I run virt-manager and it worked too :-).

Comment 54 John Greene 2014-04-11 14:46:38 UTC

(In reply to Jaroslav Aster from comment #53)
> (In reply to Vlad Yasevich from comment #49)
> > Created attachment 885015 [details]
> > Test patch
> > 
> > Jaroslav
> > 
> > Can you give this brew build a try:
> > https://brewweb.devel.redhat.com/taskinfo?taskID=7325535
> > 
> > I've also attached a test patch to the BZ, just as an idea of what I was
> > talking
> > about.  I was able to create a 10 macvlans with this patch and they all
> > worked.
> 
> Good job, it works. I run my test and all eleven macvlan interfaces
> succeeded.
> 
> # ./test-driver.sh 
> macvlan0 succeed
> macvlan1 succeed
> macvlan2 succeed
> macvlan3 succeed
> macvlan4 succeed
> macvlan5 succeed
> macvlan6 succeed
> macvlan7 succeed
> macvlan8 succeed
> macvlan9 succeed
> macvlan10 succeed
> 
> Then, I run virt-manager and it worked too :-).

Any further testing before I post upstream you guys can suggest?  Or any comment on implementation?  Looks ok to me..

Comment 55 Vlad Yasevich 2014-04-11 15:08:40 UTC

Can you double check the index that we get from ME and what we use to program the registers.  I took a guess at its usage...
I don't want to turn on promisc on the card if we still have available
registers.

-vlad

Comment 56 John Greene 2014-04-23 20:42:23 UTC

I've posted the test patch to Intel for review, as it touches some files they wish to review.  Will followup to get it upstream.

Comment 57 Dave Ertman 2014-04-23 21:31:29 UTC

(In reply to John Greene from comment #56)
> I've posted the test patch to Intel for review, as it touches some files
> they wish to review.  Will followup to get it upstream.

I have the patch and am reviewing it.  Please allow us to push this patch upstream and give you credit for it, as the files it touches require us to maintain ownership of changes so as to avoid GPL violations.

Thanks,
Dave Ertman

Comment 58 John Greene 2014-04-24 13:49:41 UTC

(In reply to Dave Ertman from comment #57)
> (In reply to John Greene from comment #56)
> > I've posted the test patch to Intel for review, as it touches some files
> > they wish to review.  Will followup to get it upstream.
> 
> I have the patch and am reviewing it.  Please allow us to push this patch
> upstream and give you credit for it, as the files it touches require us to
> maintain ownership of changes so as to avoid GPL violations.
Dave,

Thanks!  But credit is not due me, I'm just passing it along.  I'll be doing the backport to RHEL when it's upstream.  Please credit my talented co-worker:

>Feel free to post upstream to get feedback (you can add my:
>Signed-off-by: Vlad Yasevich <vyasevic>).

John

Comment 60 Dave Ertman 2014-04-28 16:33:20 UTC

(In reply to John Greene from comment #58)
> (In reply to Dave Ertman from comment #57)
> > (In reply to John Greene from comment #56)
> > > I've posted the test patch to Intel for review, as it touches some files
> > > they wish to review.  Will followup to get it upstream.
> > 
> > I have the patch and am reviewing it.  Please allow us to push this patch
> > upstream and give you credit for it, as the files it touches require us to
> > maintain ownership of changes so as to avoid GPL violations.
> Dave,
> 
> Thanks!  But credit is not due me, I'm just passing it along.  I'll be doing
> the backport to RHEL when it's upstream.  Please credit my talented
> co-worker:
> 
> >Feel free to post upstream to get feedback (you can add my:
> >Signed-off-by: Vlad Yasevich <vyasevic>).
> 
> John

This patch impacts quite a few of our parts (client and 1Gig server), and also impacts other drivers.  We are evaluating other approaches to see if we can lessen the impact.

Thanks for your patience!

Comment 61 Dave Ertman 2014-04-28 18:46:22 UTC

Created attachment 890558 [details]
Test patch 2 - with return value added

Please evaluate whether this approach will still fix your issue.

Comment 62 Vlad Yasevich 2014-04-28 19:59:57 UTC

Yes, technically this still fixes the issue, but at the expense of turning on promiscuous mode much sooner then necessary most of the time.  With this patch,
my tests have worked, but the interface was made promiscuous with a single macvlan configured.  None of the SHRARs could be used in this case.

-vlad

Comment 63 John Greene 2014-04-29 13:44:50 UTC

(In reply to Vlad Yasevich from comment #62)
> Yes, technically this still fixes the issue, but at the expense of turning
> on promiscuous mode much sooner then necessary most of the time.  With this
> patch,
> my tests have worked, but the interface was made promiscuous with a single
> macvlan configured.  None of the SHRARs could be used in this case.
> 
> -vlad

Vlad, Dave,

I tend to agree with Vlad.  The issue with who keeps up with what regs are in use between the driver/OS and ME needs something to keep us from stepping on each other and still use the available hardware resources.  ME is black box here.  Does it have it's own notion of what indexes are in use separate from driver?  It seems to.

Comment 64 Dave Ertman 2014-05-01 19:47:02 UTC

Created attachment 891627 [details]
Test Patch 3

This patch encorporates both methodologies for approaching this problem.  Both the implementation of rar_get_count, and adding of a return value to rar_set for LPT devices.

The redundant check of FWSM using the E1000_FWSM_WLOCK_MAC_MASK serves two purposes:

1) serves as a double check for the possible timing issue of ME grabbing a SHRA between check and use (Still trying to find ME folks that can answer on this - and it could change without alerting the driver).

2) maintains functionality for drivers that do not want to utilize rar_get_count.

please let me know what you think of this approach.

Comment 65 Vlad Yasevich 2014-05-01 20:49:16 UTC

(In reply to Dave Ertman from comment #64)
> Created attachment 891627 [details]
> Test Patch 3
> 
> This patch encorporates both methodologies for approaching this problem. 
> Both the implementation of rar_get_count, and adding of a return value to
> rar_set for LPT devices.
> 
> The redundant check of FWSM using the E1000_FWSM_WLOCK_MAC_MASK serves two
> purposes:
> 
> 1) serves as a double check for the possible timing issue of ME grabbing a
> SHRA between check and use (Still trying to find ME folks that can answer on
> this - and it could change without alerting the driver).
> 
> 2) maintains functionality for drivers that do not want to utilize
> rar_get_count.
> 
> please let me know what you think of this approach.

I agree with this approach.  I was debating whether to remove redundant checks when I wrote the initial patch and I didn't consider point 1) above.

The one question that still worries me a little is whether ME checks if a SHRAR is available before writing to it?  It would be bad if ME can over-write values
that the kernel wrote.

Thanks
-vlad

Comment 66 John Greene 2014-05-01 21:09:34 UTC

And another one I think Vlad sensed in C51:

if e1000_rar_get_count_pch_lpt() returns 0 (entries full) at
https://bugzilla.redhat.com/attachment.cgi?id=891627&action=diff#a/drivers/net/ethernet/intel/e1000e/ich8lan.c_sec5

then e1000e_write_uc_addr_list() blindly decrements to zero to -1 number, possibly causing issues at line 3320 and beyond in that function in netdev.c


Isn't an issue with other family, just ich8lan.c 

Easy fix would be ..
        if ( rar_entries == 0)
                return -ENOMEM;

        --rar_entries;

of similar..

Comment 67 Dave Ertman 2014-05-01 21:30:32 UTC

(In reply to John Greene from comment #66)
> And another one I think Vlad sensed in C51:
> 
> if e1000_rar_get_count_pch_lpt() returns 0 (entries full) at
> https://bugzilla.redhat.com/attachment.cgi?id=891627&action=diff#a/drivers/
> net/ethernet/intel/e1000e/ich8lan.c_sec5
> 
> then e1000e_write_uc_addr_list() blindly decrements to zero to -1 number,
> possibly causing issues at line 3320 and beyond in that function in netdev.c
> 
> 
> Isn't an issue with other family, just ich8lan.c 
> 
> Easy fix would be ..
>         if ( rar_entries == 0)
>                 return -ENOMEM;
> 
>         --rar_entries;
> 
> of similar..

Actually, now that I look specifically at that part.  The rar_entry_count counts RAR[0] as an address register.  Even with a wlock value of 1, RAR[0] is open (as it has to be).  This is why the value for E1000_PCH_LPT_RAR_ENTRIES is 12 instead of 11.

e1000e_write_ic_addr_list() accounts for this (rar_entries--).  I need to take a quick run through the logic and account for this issue.  I will be submitting a new patch if necessary.  I will use the same basic logic that we agreed on so far, but I might have to tweak it to account for RAR[0].

Comment 68 Dave Ertman 2014-05-01 22:00:52 UTC

Created attachment 891643 [details]
Test Patch 4

That was an excellent catch John!  Since e1000e_write_uc_addr_list() is used by multiple parts, we need to keep its logic intact.  This means that e1000_rar_get_count_pch_lpt() should never return zero (since RAR[0] is always available).   Since the Write Lock MAC Adresses bits in FWSM only account for SHRA registers, the return value needs to look like this:

wlock 0 - return hw->mac.rar_entry_count
        (defined as E1000_PCH_LPT_RAR_ENTRIES or 12)
wlock 1 - return 1
        (only RAR[0]) available
wlock default - return [wlock + 1]
        (number of SHRA unlocked + 1 for RAR[0])

Comment 69 Dave Ertman 2014-05-01 22:03:09 UTC

(In reply to Dave Ertman from comment #68)
> Created attachment 891643 [details]
> Test Patch 4
> 
> That was an excellent catch John!  Since e1000e_write_uc_addr_list() is used
> by multiple parts, we need to keep its logic intact.  This means that
> e1000_rar_get_count_pch_lpt() should never return zero (since RAR[0] is
> always available).   Since the Write Lock MAC Adresses bits in FWSM only
> account for SHRA registers, the return value needs to look like this:
> 
> wlock 0 - return hw->mac.rar_entry_count
>         (defined as E1000_PCH_LPT_RAR_ENTRIES or 12)
> wlock 1 - return 1
>         (only RAR[0]) available
> wlock default - return [wlock + 1]
>         (number of SHRA unlocked + 1 for RAR[0])

I forgot to adjust the comments in e1000_rar_get_count_pch_lpt(), will do so before I submit upstream if this is acceptable.

Comment 70 John Greene 2014-05-02 13:17:17 UTC

(In reply to Dave Ertman from comment #69)
> (In reply to Dave Ertman from comment #68)
> > Created attachment 891643 [details]
> > Test Patch 4
> > 
> > That was an excellent catch John!  Since e1000e_write_uc_addr_list() is used
> > by multiple parts, we need to keep its logic intact.  This means that
> > e1000_rar_get_count_pch_lpt() should never return zero (since RAR[0] is
> > always available).   Since the Write Lock MAC Adresses bits in FWSM only
> > account for SHRA registers, the return value needs to look like this:
> > 
> > wlock 0 - return hw->mac.rar_entry_count
> >         (defined as E1000_PCH_LPT_RAR_ENTRIES or 12)
> > wlock 1 - return 1
> >         (only RAR[0]) available
> > wlock default - return [wlock + 1]
> >         (number of SHRA unlocked + 1 for RAR[0])
> 
> I forgot to adjust the comments in e1000_rar_get_count_pch_lpt(), will do so
> before I submit upstream if this is acceptable.

Thanks Dave.  Good idea: the comments were in need of more clarity in regards to what wlock means and the above sure helps.  I was wondering: wlock value means the number registers available or number in use?   
Must we cycle through them looking at the enable bit to determine which are available or do we assume we fill bottom up or top down, first not enabled?  

Debugging one of these episodes, I noted it seemed both driver and the ME alloc'ed SHAR[max] downward, which caused issues. 

A short note in the source as to what the driver should AVOID doing would be great help. It's not obvious that ME is involvled otherwise.  
 
Getting it right now will rid us of a class of BZs.  Thanks for your efforts!

Comment 71 Dave Ertman 2014-05-02 17:19:41 UTC

(In reply to John Greene from comment #70)
> (In reply to Dave Ertman from comment #69)
> > (In reply to Dave Ertman from comment #68)
> > > Created attachment 891643 [details]
> > > Test Patch 4
> > > 
> > > That was an excellent catch John!  Since e1000e_write_uc_addr_list() is used
> > > by multiple parts, we need to keep its logic intact.  This means that
> > > e1000_rar_get_count_pch_lpt() should never return zero (since RAR[0] is
> > > always available).   Since the Write Lock MAC Adresses bits in FWSM only
> > > account for SHRA registers, the return value needs to look like this:
> > > 
> > > wlock 0 - return hw->mac.rar_entry_count
> > >         (defined as E1000_PCH_LPT_RAR_ENTRIES or 12)
> > > wlock 1 - return 1
> > >         (only RAR[0]) available
> > > wlock default - return [wlock + 1]
> > >         (number of SHRA unlocked + 1 for RAR[0])
> > 
> > I forgot to adjust the comments in e1000_rar_get_count_pch_lpt(), will do so
> > before I submit upstream if this is acceptable.
> 
> Thanks Dave.  Good idea: the comments were in need of more clarity in
> regards to what wlock means and the above sure helps.  I was wondering:
> wlock value means the number registers available or number in use?   
> Must we cycle through them looking at the enable bit to determine which are
> available or do we assume we fill bottom up or top down, first not enabled?  
> 
> Debugging one of these episodes, I noted it seemed both driver and the ME
> alloc'ed SHAR[max] downward, which caused issues. 
> 
> A short note in the source as to what the driver should AVOID doing would be
> great help. It's not obvious that ME is involvled otherwise.  
>  
> Getting it right now will rid us of a class of BZs.  Thanks for your efforts!

As far as what wlock means... the spec has that written in typical HW speak (i.e. clear as mud):

FWSM-7:9   SHRA that are WR locked
------     ----------------
000        None of the SHRA are locked
001        0,1,2,3,4,5,6,7,8,9,10
010        2,3,4,5,6,7,8,9,10
011        3,4,5,6,7,8,9,10
100        4,5,6,7,8,9,10
101        5,6,7,8,9,10
110        6,7,8,9,10
111        7,8,9,10

So, on a platform with ME actively taking SHRA registers, the largest number of SHRA that will be free is 7 (0-6).

The driver does fill top down, ME also fills top down (at least on an i217 part), if there are free SHRA, they are the lower indices SHRA.  But, with this change to rar_entries, the driver will start its top down filling with the highest *free* SHRA, as long the number of free SHRA is >= netdev_uc_count(netdev).

So now, if the driver needs, for example, 3 SHRA and wlock is 101, it will use the last 3 *free* SHRA (4,3, and 2).

The reason for writing them top down is that there was a bug where register writes would be combined if they were seen as being in contiguous memory (e.g. the SHRA registers).  I believe that this is a compiler issue, but I only have it second hand, as that was found before my time on e1000e started.

This gives me an idea for a change to the e1000e_write_uc_addr_list() function that might help all parts "dodge" ME locked registers: 
If we need 3 SHRA, just take the bottom 3, but start with the third (2, 1, 0), and then zero out what we believe to be the remaining SHRA (e.g. 11..3).  

I will need to connect with an ME resource before I do this though, as I cannot seem to find documentation on how ME grabs SHRA on other parts (i.e. how the wlock works on other parts) and there is no guarantee that making this change wouldn't make it worse for other parts.

Comment 72 John Greene 2014-05-02 18:15:47 UTC

Dave, excellent work on getting this.  Not intuitive is it..?
An issue (as I see it at least): these things get alloc'ed 1 at a time, now way of knowing upfront I need 3 total.  RAR0 always for unicast data, any multicast usage (which I am not sure of the use of hardware to assist.. After that, the macvlan come along later (on demand even) that ask for them.  

Does that cause you a problem?  Wish your suggestion was more usable (not a problem to do that) but they will still come in 1 at a time.  Any problem with that on your end?

Comment 73 John Greene 2014-05-13 15:17:58 UTC

Hi Dave,

Are you waiting on us for results for test patch 4 here or something else? Been distracted with other stuff. But haven't seen this upstream as yet.

Would like to get this in a release upcoming if possible.

Comment 74 Dave Ertman 2014-05-13 15:25:44 UTC

(In reply to John Greene from comment #73)
> Hi Dave,
> 
> Are you waiting on us for results for test patch 4 here or something else?
> Been distracted with other stuff. But haven't seen this upstream as yet.
> 
> Would like to get this in a release upcoming if possible.

This patch was submitted to our internal queue on 2014-05-05, it is currently assigned to a Validation Engineer who is looking for a good way to finalize his validation of the patch.  He has done quite a bit of work on it and I think he is just about done.  

I will point him to this bugzilla to see if anything here might complete his testing.

Comment 75 John Greene 2014-05-19 20:27:48 UTC

(In reply to Dave Ertman from comment #74)
> (In reply to John Greene from comment #73)
> > Hi Dave,
> > 
> > Are you waiting on us for results for test patch 4 here or something else?
> > Been distracted with other stuff. But haven't seen this upstream as yet.
> > 
> > Would like to get this in a release upcoming if possible.
> 
> This patch was submitted to our internal queue on 2014-05-05, it is
> currently assigned to a Validation Engineer who is looking for a good way to
> finalize his validation of the patch.  He has done quite a bit of work on it
> and I think he is just about done.  
> 
> I will point him to this bugzilla to see if anything here might complete his
> testing.

Thanks for the update Dave.. I'm awaiting it now.

Comment 76 John Greene 2014-05-27 15:26:02 UTC

Hi Dave, 

I checked upstream for this this morning and didn't see it.  Can you or someone at Intel check on the expected release time of this patch for me?  I'm on a deadline to get an OS update out by end of the month and I REALLY want this one if I can get it.  Is this going to make it out this week?  I need to plan for this one way or other.

Thanks.

Comment 77 John Greene 2014-05-27 16:08:55 UTC

(In reply to John Greene from comment #76)
> Hi Dave, 
> 
> I checked upstream for this this morning and didn't see it.  Can you or
> someone at Intel check on the expected release time of this patch for me? 
> I'm on a deadline to get an OS update out by end of the month and I REALLY
> want this one if I can get it.  Is this going to make it out this week?  I
> need to plan for this one way or other.
> 
> Thanks.

Oops, I just found it..never mind Dave.

http://patchwork.ozlabs.org/patch/352824/

Comment 82 John Greene 2014-07-07 18:52:41 UTC

*** Bug 1115562 has been marked as a duplicate of this bug. ***