Bug 481283

Summary: [RHEL5.3] Original ether's status is keeping PROMISC MULTICAST mode
Product: Red Hat Enterprise Linux 5 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.3CC: dzickus, emcnabb, h-itani, ltroan, qcai, riek, syeghiay, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 481292 (view as bug list) Environment:
Last Closed: 2009-09-02 08:10:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 483701    
Attachments:
Description Flags
5c15bdec5c38f4ccf73ef2585fc80a6164de9554.patch
none
8c979c26a0f093c13290320edda799d8335e50ae.patch
none
4417da668c0021903464f92db278ddae348e0299.patch
none
patch by customer based on kernel-2.6.18-92.1.6.el5.ia64
none
patch to track and undo promisc count on vlans
none
vlan-with-fixes-1.patch
none
vlan patch with gflags fixed none

Description Flavio Leitner 2009-01-23 12:33:32 UTC
Description of problem:
Though  removing vlan ether device,
original ether's status is keeping PROMISC MULTICAST mode.

How reproducible:
Always

Steps to Reproduce:
1.Create new vlan ether.
# vconfig etadd eth2 100
# ifconfig eth2.100
eth2.100  Link encap:Ethernet  HWaddr 00:00:87:E2:D4:02
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
# cat /sys/class/net/eth2/flags
0x1003

2.Change new ether's MAC address.
# ifconfig eth2.100 hw ether 22.:22:22:44:44:44
# ifconfig eth2.100
eth2.100  Link encap:Ethernet  HWaddr 22:22:22:44:44:44  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

3.Check eth2's status.
# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:00:87:E2:D4:02  
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1 <<<
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:80100000-80120000
# cat /sys/class/net/eth2.100/flags
0x1002
# cat /sys/calass/net/eth2/flags
0x1103  <<< PROMISC MULTICAST BIT ON

4.Delete vlan ether
# vconfig rem eth2.100
# ifconfig eth2.
eth2      Link encap:Ethernet  HWaddr 00:00:87:E2:D4:02  
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1 << STILL PROMISC MULTICAST
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:80100000-80120000
# cat /proc/sys/class/net/bond/eth2/flags
0x1103 <<<
# ifconfig eth2 -rpromisc
# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:00:87:E2:D4:02  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1  <<< RELEASE PROMISC MULTICAST
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:80100000-80120000

# cat /sys/class/net/eth2/flags
0x1103         <<< STILL PROMISC MULTICAST

Actual results:
Not release PROMISC MULTICAST

Expected results:
Release PROMISC MULTICAST

Environment:

e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
802.1Q VLAN Support v1.8 Ben Greear <greearb>


Additional info:
Our problem have already fixed on current upstream kernel.

--- snip ---
[VLAN]: Fix MAC address handling

The VLAN MAC address handling is broken in multiple ways. When the address
differs when setting it, the real device is put in promiscous mode twice,
but never taken out again. Additionally it doesn't resync when the real
device's address is changed and needlessly puts it in promiscous mode when
the vlan device is still down.

Fix by moving address handling to vlan_dev_open/vlan_dev_stop and properly
deal with address changes in the device notifier. Also switch to
dev_unicast_add (which needs the exact same handling).

Since the set_mac_address handler is identical to the generic ethernet one
with these changes, kill it and use ether_setup().
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8c979c26a0f093c13290320edda799d8335e50ae

Comment 1 Flavio Leitner 2009-01-23 12:35:41 UTC
 4. State specific action requested of SEG
   I have looked in the git. Below is upstream patch requested by the vendor:
     8c979c26a0f093c13290320edda799d8335e50ae
   Apparently this may need some more patches as:
     (vlan_group_get_device)
     5c15bdec5c38f4ccf73ef2585fc80a6164de9554
     (dev_unicast_delete/dev_unicast_add)
     4417da668c0021903464f92db278ddae348e0299
   
Issue escalated to Support Engineering Group by: tumeya.

Comment 2 Flavio Leitner 2009-01-23 12:37:53 UTC
Created attachment 329816 [details]
5c15bdec5c38f4ccf73ef2585fc80a6164de9554.patch

Comment 3 Flavio Leitner 2009-01-23 12:38:31 UTC
Created attachment 329817 [details]
8c979c26a0f093c13290320edda799d8335e50ae.patch

Comment 4 Flavio Leitner 2009-01-23 12:38:56 UTC
Created attachment 329818 [details]
4417da668c0021903464f92db278ddae348e0299.patch

Comment 5 Flavio Leitner 2009-01-23 12:41:36 UTC
Created attachment 329819 [details]
patch by customer based on kernel-2.6.18-92.1.6.el5.ia64

Comment 6 RHEL Program Management 2009-02-16 15:25:23 UTC
Updating PM score.

Comment 8 Neil Horman 2009-05-16 17:08:14 UTC
I'll try this myself on monday, but just in case you already have test results, is the problem reproducible upstream (or on an F10/F11 system)?

Comment 9 Neil Horman 2009-05-18 17:19:12 UTC
I'm going to try this on an upstream kernel to see if there is already a fix in existance that we can use.  I should note however, that the state change on the promisc bit is a bit suspect, but the multicast bit should not have any relevance to this problem.  All the multicast bit indicates is that the interface can send and recieve multicast frames, and for hardware that supports it, that flag should be set automatically, and only cleared administratively .

Comment 10 Neil Horman 2009-05-18 18:49:38 UTC
Created attachment 344499 [details]
patch to track and undo promisc count on vlans

Ok, looks like upstream fixed this via  a pretty big rewrite on how we handle unicast lists.  Can you try this patch, it seems somewhat more consice to me.  I've not tested it yet, but it should work just fine.

Also, I noticed in the reprouder given, the customer tried to undo the promisc flag by setting -rpromisc.  Not sure if that was just a typo, but it should be -promisc instead.  Either way, if lots of vlans are being used, it would still be broken, as the promisc count will be off (you would have to issue that command several times to clear the flag)

Comment 11 Flavio Leitner 2009-05-20 01:20:54 UTC
Created attachment 344731 [details]
vlan-with-fixes-1.patch

I did try fedora 10 running 2.6.10-rc6 and it worked fine. Actually, because of vlan_dev_set_mac_address() uses dev_unicast_*() it doesn't set promisc anymore.

I also tried -128.el5 and could see the problem happening then I applied your patch with two extra changes. One is a simple typo and another is moving  'VLAN_DEV_INFO(dev)->promisc_count * -1;' to inside of if clause, otherwise dev is NULL and it would oops.
...
 	struct net_device *dev = NULL;
 	int ret;
-
+	int undo_count = VLAN_DEV_INFO(dev)->promisc_count * -1;
 
 	dev = dev_get_by_name(vlan_IF_name);
...
modified patch is attached in case you need to check something.

Anyway, it didn't work because on -128.el5 does:
vlan_dev_set_mac_address()
  /* Increment our in-use promiscuity counter */
  dev_set_promiscuity(VLAN_DEV_INFO(dev)->real_dev, 1); <-- increments 1
  VLAN_DEV_INFO(dev)->promisc_count++;                  <-- patch counts that
  flgs |= IFF_PROMISC;
  dev_change_flags(VLAN_DEV_INFO(dev)->real_dev, flgs);
  which does:
        if ((flags ^ dev->gflags) & IFF_PROMISC) {
                int inc = (flags & IFF_PROMISC) ? +1 : -1;
                dev->gflags ^= IFF_PROMISC;
                dev_set_promiscuity(dev, inc);   <--- increments another one.
        }

so, the counter ends with 2. Removing vlan device leaves the counter with 1.
Flavio

Comment 12 Neil Horman 2009-05-20 02:16:19 UTC
Ok, so you identified a spot I missed, thats great.  Did you test with the obvious fix for the missed spot, and did it work? :)

Comment 13 Flavio Leitner 2009-05-21 22:12:55 UTC
It will work regarding with dev->flags but I'm not finding how it will work with dev->gflags.

vlan_ioctl_handler()
   unregister_vlan_device()
      dev_set_promiscuity() <-- flags only.

Another thing, changing macaddr will set promisc, then if you change the macaddr back to the real_dev's macaddr, it will leave promisc flag set on both flags and gflags too. See vlan_dev_set_mac_address().

Flavio

Comment 14 Neil Horman 2009-05-22 00:32:54 UTC
yes, you found another problem.

You took the time to analyze it.  Thank you, again now, please take the extra 5 minutes to fix it.  You should just need to check the promiscuity value of the physical dev after chaning its promiscuity and change gflags if it returns to zero.

Comment 15 Flavio Leitner 2009-05-25 21:14:17 UTC
Created attachment 345357 [details]
vlan patch with gflags fixed

Another patch.

On vlan_dev_set_mac_address() if there is no IFF_PROMISC in flags, call dev_change_flags() and set gflags if needed. Otherwise assume flags and gflags are okay and just increment one.

On unregister_vlan_device() undo vlan promisc changes and if flags has left without IFF_PROMISC takes it out from gflags too.

works here.
Flavio

Comment 16 Neil Horman 2009-05-26 10:41:43 UTC
Thank you, yes.  This looks good to me.  I'll post it shortly.   Thank you for fixing up those missing bits.

Comment 19 RHEL Program Management 2009-05-28 14:10:25 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 32 Don Zickus 2009-07-07 15:04:56 UTC
in kernel-2.6.18-157.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 37 errata-xmlrpc 2009-09-02 08:10:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html