Bug 520722 - [Cisco 5.6 bug] netdev interfaces fail to come up on rmmod/modprobe of driver
[Cisco 5.6 bug] netdev interfaces fail to come up on rmmod/modprobe of driver
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: udev (Show other bugs)
5.4
All Linux
high Severity high
: rc
: 5.6
Assigned To: Harald Hoyer
qe-baseos-daemons
: OtherQA, Reopened
Depends On:
Blocks: 1049888 1080020 1093117
  Show dependency treegraph
 
Reported: 2009-09-01 20:32 EDT by Vasanthy Kolluri
Modified: 2014-06-09 10:35 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1093117 (view as bug list)
Environment:
Last Closed: 2014-06-06 09:22:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Sample manifestation with enic driver.Provided the udevmonitor and ifconfig logs (5.54 KB, text/plain)
2009-09-01 20:32 EDT, Vasanthy Kolluri
no flags Details
initscripts patch for network interface renaming (1.09 KB, patch)
2010-07-01 06:50 EDT, Harald Hoyer
no flags Details | Diff

  None (edit)
Description Vasanthy Kolluri 2009-09-01 20:32:32 EDT
Created attachment 359459 [details]
Sample manifestation with enic driver.Provided the
udevmonitor  and ifconfig logs

Description of problem:
Some of the netdev interfaces fail to come up on rmmod/modprobe of the driver.

Version-Release number of selected component (if applicable):
udev-095-14.21.el5

How reproducible:
Reproducible when ifcfg-ethx interface names do not match the kernel assigned interface names. Seen consistently with a large no of interfaces.


Steps to Reproduce:
1. Configure say 5 netdev interfaces. Load driver.
2. By default, /etc/sysconfig/network-scripts/ifcfg-ethx (ifcfg-eth0 to ifcfg-eth4) files are created. All interfaces come up.
3. Now rename each ifcfg-ethx files as ifcfgeth(x+1). Also change the DEVICE name accordingly.
4. rmmod <driver>;modprobe <driver>   
  
Actual results:
Some of the interfaces fail to come up.
"$ifconfig | grep eth" doesn't show all interfaces
Exception: If rebooted with the renamed ifcfg-ethx files, all interfaces come up.


Expected results:
All the interfaces come up.
"$ifconfig | grep eth"  shows all interfaces


Additional info:

We did some analysis and the following is the understanding about what actions udev takes when kernel queues a UEVENT for any netdev interface:
  -- Runs /lib/udev/rename_device by passing the kernel assigned name (ethx)
  -- Renames (based on HWADDR in sysfs) the interface as per the ifcfg-eth* scripts. Say renamed to ethy.
    --- Does a ioctl to kernel to change the interface name.
    --- This renaming could recursively trigger more rename calls until 
distinct names are given to the interfaces.
    --- If ethy is already given to another interface,it's renamed to ethz.If ethz is already given to another interface,it's also renamed.This goes on until all interfaces get unique names. However, udev is unaware of all the other renames that happened and only brings up the interface ethy.
    --- This could leave some of the interfaces not coming up.The whole problem is because of the fact the kernel queues the udev events based on the kernel assigned names.

Say the kernel queues udev events for ethx and ethy. When udev handles the first event, ethx becomes ethy ,ethy becomes ethz , ethy comes up. When the second event for ethy is being handled, udev thinks that ethy is already up based on sysfs data and takes no action.This leaves ethz down.

As mentioned earlier, all the interfaces come up on a reboot.It's unclear how this is possible.
Comment 1 Harald Hoyer 2009-09-02 05:51:04 EDT
reassigning to component initscripts, cause that is where the renaming is happening.
Comment 2 Bill Nottingham 2009-09-02 13:20:13 EDT
Do you have any udev rules with the device names as well?
Comment 3 Vasanthy Kolluri 2009-09-02 13:27:42 EDT
I have used the udev default configuration only. I haven't changed any udev rules.
Comment 5 RHEL Product and Program Management 2009-11-06 14:28:17 EST
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 6 Andrius Benokraitis 2009-11-08 23:24:55 EST
Requesting an exception for consideration for 5.5 based on partner comments.
Comment 8 Andrius Benokraitis 2009-11-24 15:21:06 EST
Cisco: At present time Red Hat is *not* planning on updating the initscripts package for RHEL 5.5. Once I hear the final word, will have to defer this to RHEL 5.6.
Comment 10 Andrius Benokraitis 2009-11-25 15:54:36 EST
Deferring to RHEL 5.6.
Comment 11 Andrius Benokraitis 2009-11-25 15:56:59 EST
Vasanthy - have you tested this in RHEL 6.0 Alpha 2?
Comment 12 Vasanthy Kolluri 2009-12-15 16:55:39 EST
I haven't tried this in RHEL 6.0 Alpha2
Comment 13 Vasanthy Kolluri 2010-02-15 21:48:26 EST
I have tested this on RHEL6.0 Alpha3 with 56 netdev interfaces.

Results:
All the interface come up on host boot. Two issues seen:

1) Interface naming isn't clean.
One of them got "_rename" instead of a ethx name. However, on driver reload and restarting network service, all interfaces come up with ethx names.
2) kudzu isn't running on this distro.There is no renaming involved at all.
   So the actual testcase isn't tested.

Need clarifications:
udev in RHEL 5.4 handles device discovery dynamically and does renaming based on /etc/sysconfig/networ-scripts/ethx files created by kudzu.

In RHEL 6.0, what I see is that kudzu service doesn't run by default.NetworkManager takes care of bringing up the interfaces.

Why is kudzu not present?Is it just with this Alpha version?

Also, I couldn't find any documentation for RHEL6.0 on redhat.com.Could you give us some pointers?

Thanks
Vasanthy
Comment 14 Bill Nottingham 2010-02-16 11:00:15 EST
kudzu is no longer included. NetworkManager is default in the Alpha, this will change in later milestones, based on the product.
Comment 15 Vasanthy Kolluri 2010-03-05 14:50:52 EST
So is this not going to be fixed for RHEL5.6 at all?

In any case, I am going to file another bugzilla for RHEL6.0 as there are issues involved with interface naming.
Comment 16 Bill Nottingham 2010-03-05 14:57:24 EST
(In reply to comment #15)
> So is this not going to be fixed for RHEL5.6 at all?

No, it's not going to be fixed for 5.5. 5.6 has not been decided.

> In any case, I am going to file another bugzilla for RHEL6.0 as there are
> issues involved with interface naming.    

RHEL 6.0 is rather different in this area; please file separate bugs for any RHEL 6 issues, as they are almost certainly different, and there would not be any common fixes.
Comment 17 Bill Nottingham 2010-03-05 15:10:36 EST
I just attempted to reproduce this in a virtual machine with four interfaces that use a single driver. I was unable to reproduce. Do you have any hints for how to make this appear other than possibly 'add more interfaces'?
Comment 18 Vasanthy Kolluri 2010-03-05 15:18:27 EST
I reproduced this with 5 interfaces.Complete details of the experiment are in
the attachment.Let me know for any clarifications
Comment 19 Bill Nottingham 2010-03-05 17:16:55 EST
OK, I've got it now; what I thought you were saying was that the interface wasn't renamed properly. While it is named properly, the net.hotplug script isn't being run right. Assigning to udev; this is not an initscripts issue.

(The reason it 'works' on reboot is that network bringup on reboot isn't done via the net.hotplug script.)
Comment 20 Bill Nottingham 2010-03-05 17:24:13 EST
To describe it better, we have 60-net.rules that has:

ACTION=="add", SUBSYSTEM=="net", IMPORT{program}="/lib/udev/rename_device"

SUBSYSTEM=="net", RUN+="/etc/sysconfig/network-scripts/net.hotplug"

- a module is loaded for multiple devices
- the first rule is run for each device, causing the rename and INTERFACE
  to be set to the new name
- something in udev's internal bookkeeping gets confused when devices get renamed, and the second rule only gets run on some subset of interfaces
Comment 21 Scott Feldman 2010-03-05 19:34:56 EST
Thank you Bill for keeping this one going.  It's a real concern for our 10G NIC where it's not uncommon to present 50+ eths to the host, and if the ifcfg-ethx files aren't exactly right with matching HWADDR entries, in the correct order, we're exposed to this renaming bug.
Comment 22 Vasanthy Kolluri 2010-03-08 17:49:01 EST
Bill,

I would like to add to your comments.

We did some analysis and here's what's going on:
-- When there are recursive rename operations, the net.hotplug is called only for the interface involved in the first rename operation.

Say there are two interfaces ethx and ethy.The ifcfg-ethx scripts are set up such that ethx has to be renamed to ethy and ethy to  ethz.

On loading the driver,the kernel queues udev events for ethx and ethy. When udev handles the first event, ethx becomes ethy ,ethy becomes ethz (recursive renaming).But only ethy is brought up. 
When the second event for ethy is being handled, udev mistakenly thinks that ethy is already up based on sysfs data and takes no action.This leaves ethz down.

I have explained this in detail in the "Additional Info" Section of Bug Description.
Comment 23 Harald Hoyer 2010-06-25 09:48:47 EDT
Another "fix" would be to move the interface name out of the "eth" namespace.

So, you would name them "net[0-9]+" in the ifcfg-* files instead of "eth[0-9]+", resulting in no clash and _rename.
Comment 24 Harald Hoyer 2010-06-30 06:21:06 EDT
The rename operation could emit a "change" UEVENT for every renamed interface.
Comment 25 Harald Hoyer 2010-07-01 06:50:47 EDT
Created attachment 428260 [details]
initscripts patch for network interface renaming

Maybe someone can test this untested patch. It would send a "change" event for the other interfaces, which get renamed by rename_device.
Comment 27 Andrius Benokraitis 2010-07-08 01:28:24 EDT
Harald - can you spin a package so that Cisco can test?
Comment 28 Vasanthy Kolluri 2010-08-03 19:27:16 EDT
Follow up of Andrius's question: Is there a package with patch that we can test?
Comment 30 Phil Knirsch 2010-09-07 08:01:13 EDT
Any luck with testing yet?

Thanks & regards, Phil
Comment 31 Vasanthy Kolluri 2010-09-07 14:31:17 EDT
Can you point me to a latest RHEL5.6 ISO that I can try this against? I had contacted Andrius for this info and waiting for his reply.
Comment 32 Andrius Benokraitis 2010-09-07 14:55:02 EDT
(In reply to comment #31)
> Can you point me to a latest RHEL5.6 ISO that I can try this against? I had
> contacted Andrius for this info and waiting for his reply.

There are no 5.6 ISOs yet. Please apply this on top of 5.5 GA.
Comment 33 Vasanthy Kolluri 2010-09-08 14:26:28 EDT
I verified the with RHEL5.5 kernel + initscripts-8.45.30.0.rename-1.el5 package. The issue is still there. I did the exact same experiment as in the problem description.
Comment 34 Andrius Benokraitis 2010-09-08 15:50:00 EDT
(In reply to comment #32)
> (In reply to comment #31)
> > Can you point me to a latest RHEL5.6 ISO that I can try this against? I had
> > contacted Andrius for this info and waiting for his reply.
> 
> There are no 5.6 ISOs yet. Please apply this on top of 5.5 GA.

Harald - is this the right process?
Comment 35 Harald Hoyer 2010-09-09 04:20:51 EDT
(In reply to comment #34)
> (In reply to comment #32)
> > (In reply to comment #31)
> > > Can you point me to a latest RHEL5.6 ISO that I can try this against? I had
> > > contacted Andrius for this info and waiting for his reply.
> > 
> > There are no 5.6 ISOs yet. Please apply this on top of 5.5 GA.
> 
> Harald - is this the right process?

That's ok, and if the new package does not resolve the problem, then I have to think of another solution.
Comment 36 Harald Hoyer 2010-09-21 10:58:16 EDT
anyway, do we have a real problem here? Ff someone modifies the ifcfg and does rmmod/modprobe, then he/she can do a manual ifup also...
Comment 37 Vasanthy Kolluri 2010-09-23 17:34:16 EDT
It's udev's job to do the renaming and bring up the interfaces. It's not doing 
that right and leaving some of them down. That doesn't mean manual ifup is the way to go.
Comment 38 Harald Hoyer 2010-11-05 12:02:49 EDT
(In reply to comment #37)
> It's udev's job to do the renaming and bring up the interfaces. It's not doing 
> that right and leaving some of them down. That doesn't mean manual ifup is the
> way to go.

It's not udev's job... udev just provides the event and eventually renames the interface on the users request per configuration.

$ rpm -qf /lib/udev/rules.d/60-net.rules 
initscripts-9.21-5.fc15.x86_64
Comment 39 Andrius Benokraitis 2010-11-19 23:56:32 EST
I'm still puzzled as what's to do here - based on comments from Harald this isn't a udev bugzilla.
Comment 40 Vasanthy Kolluri 2011-03-09 14:01:30 EST
I have somehow missed Andrius's closing note. Haven't realized this bug has been closed. 

Harald says that "If someone modifies the ifcfg and does rmmod/modprobe, then he/she can do a manual ifup also..."

udev does the renaming calls ifup on only some of the interfaces. The issue that I'm pointing at is exactly that - ifup isn't called on the rest of the interfaces.

Harald, Can we have a conf call to discuss this? It's been there for a long time now and I believe a phone conversation should put all us on same page. I also know you haven't been able to reproduce. I could reproduce and demonstrate for you.

Let me know a couple of convenient timings. I could set up a webex invite.

Thanks in advance.
Comment 41 Harald Hoyer 2011-03-15 06:09:43 EDT
(In reply to comment #40)
> I have somehow missed Andrius's closing note. Haven't realized this bug has
> been closed. 
> 
> Harald says that "If someone modifies the ifcfg and does rmmod/modprobe, then
> he/she can do a manual ifup also..."
> 
> udev does the renaming calls ifup on only some of the interfaces. The issue
> that I'm pointing at is exactly that - ifup isn't called on the rest of the
> interfaces.
> 
> Harald, Can we have a conf call to discuss this? It's been there for a long
> time now and I believe a phone conversation should put all us on same page. I
> also know you haven't been able to reproduce. I could reproduce and demonstrate
> for you.
> 
> Let me know a couple of convenient timings. I could set up a webex invite.
> 
> Thanks in advance.

Yes, we can. Sure!
Comment 42 Vasanthy Kolluri 2011-03-16 12:49:38 EDT
Thanks Harald. Please pick a 1 hr slot on 17th Mar(Thu) between 11a.m.- 6 p.m. I'll schedule a webex.
Comment 43 Harald Hoyer 2011-03-17 04:21:12 EDT
(In reply to comment #42)
> Thanks Harald. Please pick a 1 hr slot on 17th Mar(Thu) between 11a.m.- 6 p.m.
> I'll schedule a webex.

which timezone?
Comment 44 Harald Hoyer 2011-03-17 04:22:46 EDT
(In reply to comment #43)
> (In reply to comment #42)
> > Thanks Harald. Please pick a 1 hr slot on 17th Mar(Thu) between 11a.m.- 6 p.m.
> > I'll schedule a webex.
> 
> which timezone?

If it is EDT, I would prefer 11a.m.
Comment 45 Harald Hoyer 2011-03-18 14:25:49 EDT
A possible fix would be to fix the key in the udev database to be the network interface index instead of the name. see comment #22
Comment 46 Harald Hoyer 2011-03-18 14:42:13 EDT
(In reply to comment #45)
> A possible fix would be to fix the key in the udev database to be the network
> interface index instead of the name. see comment #22

as it is now for udev >= 165
Comment 47 Vasanthy Kolluri 2011-03-18 18:22:25 EDT
Harald, Thanks for the webex conversation and reopening this bug.
Comment 50 Vasanthy Kolluri 2011-05-19 20:50:22 EDT
Harald,

Is there an update on this?Which RHEL release do you plan to get this in?
Comment 51 Harald Hoyer 2011-05-20 06:45:01 EDT
RHEL 5.8 was proposed and has to be acknowledged first by management.
Comment 52 Vasanthy Kolluri 2011-05-20 13:27:14 EDT
Thanks Harald.
Comment 55 RHEL Product and Program Management 2011-09-22 20:27:16 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.
Comment 58 Harald Hoyer 2014-04-30 12:10:37 EDT
I improved the udev network interface renaming.

The real fix though has to go in rename_device of initscripts. It has to stop renaming interfaces on its own. Because of that, udev does not know about the renaming.

Also 60-net.rules needs this added:
ACTION=="add", SUBSYSTEM=="net", ENV{INTERFACE}=="?*", NAME="$env{INTERFACE}"
Comment 60 Harald Hoyer 2014-06-06 09:22:54 EDT
As stated in comment 58: the real "fix" has to go in initscripts.

Nevertheless, one thing should be clear:
"rmmod" is _not_ really supported, neither from the kernel side, nor in userspace.

So, given, that a reboot would fix the issue, I close this bug as CANTFIX.

Note You need to log in before you can comment on or make changes to this bug.