RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 855107 - udev race condition -event loop when hotplug is enabled and vlan interface is put down and up
Summary: udev race condition -event loop when hotplug is enabled and vlan interface is...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: initscripts
Version: 6.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: David Kaspar // Dee'Kej
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
: 952538 (view as bug list)
Depends On:
Blocks: 1075802 1159926 1172231 1269194 1356047 1356056
TreeView+ depends on / blocked
 
Reported: 2012-09-06 17:40 UTC by Milos Vyletel
Modified: 2019-06-13 07:52 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-31 13:57:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
eth0.280 config (85 bytes, application/octet-stream)
2012-09-06 17:40 UTC, Milos Vyletel
no flags Details
eth0 config (72 bytes, application/octet-stream)
2012-09-06 17:40 UTC, Milos Vyletel
no flags Details
patch proposal (387 bytes, patch)
2012-12-13 22:18 UTC, Milos Vyletel
no flags Details | Diff
patch (387 bytes, patch)
2012-12-13 22:21 UTC, Milos Vyletel
deekej: review-
Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 351323 0 None None None 2016-10-29 14:58:02 UTC

Description Milos Vyletel 2012-09-06 17:40:07 UTC
Created attachment 610440 [details]
eth0.280 config

Description of problem:
We've discovered race condition when ifdown and ifup is called without any delay on VLAN interface udevd ends up in endless loop. Udevd itself does not seem to be a problem. It just does whatever kernel tells it to. The problem itself is with the interaction of initscripts and udev rules. Here's what's going on the server

ifdown eth0.280        |
KERNEL remove event    |   ifup eht0.280
UDEV remove event      |   KERNEL add event
net.hoplug calls idown |   UDEV add event
KERNEL remove event    |   net.hoplug calls ifup
UDEV remove event      |   KERNEL add event
...                    |   UDEV add event
                       |   ...

I have not seen any race when using physical interface or bridge. This seems to be isolated problem for VLAN interfaces. Disabling hotplug (HOTPLUG=no) for VLAN interfaces eliminates this race condition. As well as putting sleep 1 between ifdown and ifup to allow net.hoplug finish before ifup is called again.

I was trying to find a fix but I'm not really sure what the proper fix is. I was thinking about adding some kind of locking to the if{up,down} scripts to lock the execution to only 1 instance at a time. But this may be a bit too complicated and maybe only default to HOTPLUG=no for VLANs would be sufficient.

Version-Release number of selected component (if applicable):
kernel-2.6.32-220.el6.x86_64
initscripts-9.03.27-1.el6.x86_64
udev-147-2.40.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. create vlan interface (see attached ifcfg-eth0(.280))
2. service network restart
3. ifdown eth0.280; ifup eth0.280
4. udevadm monitor (to see the actual udev loop)
  
Actual results:
(while running udevadm monitor in background, HOTPLUG=yes (default))
KERNEL[1346945864.946898] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945864.946925] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945864.947003] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945864.947080] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945864.947195] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945864.974109] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1346945864.974233] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945864.974252] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945865.086871] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945865.086900] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945865.086924] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945866.102280] remove   /devices/virtual/net/eth0.280 (net)
KERNEL[1346945866.137424] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1346945866.137546] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945866.137564] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945866.212688] add      /devices/virtual/net/eth0.280 (net)
UDEV  [1346945866.213016] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945866.213045] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945866.213064] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945866.213077] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945866.350869] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945866.350994] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945866.351101] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945867.115054] remove   /devices/virtual/net/eth0.280 (net)
KERNEL[1346945867.150201] add      /devices/virtual/net/eth0.280 (net)
<snip>
loop continues until udevd is killed

Expected results:
(while running udevadm monitor in background, HOTPLUG=no)
[root@localhost network-scripts]# ifdown eth0.280 && ifup eth0.280
KERNEL[1346945676.096862] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945676.096921] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945676.097102] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945676.097348] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945676.097383] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945676.123261] remove   /devices/virtual/net/eth0.280 (net)
KERNEL[1346945676.136736] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1346945676.136764] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945676.136784] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945676.169711] add      /devices/virtual/net/eth0.280 (net)
UDEV  [1346945676.169972] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945676.170128] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)

Additional info:

Comment 2 Milos Vyletel 2012-09-06 17:40:59 UTC
Created attachment 610441 [details]
eth0 config

Comment 3 Milos Vyletel 2012-09-07 13:02:24 UTC
Forgot to mention hardware specs:

System Information
        Manufacturer: HP
        Product Name: ProLiant BL460c G1
BIOS Information
        Vendor: HP
        Version: I15
        Release Date: 10/25/2010

[root@localhost ~]# ethtool -i eth0
driver: bnx2
version: 2.1.11
firmware-version: bc 4.4.1
bus-info: 0000:03:00.0
[root@localhost ~]# modinfo bnx2
filename:       /lib/modules/2.6.32-220.el6.x86_64/kernel/drivers/net/bnx2.ko
firmware:       bnx2/bnx2-rv2p-09ax-6.0.17.fw
firmware:       bnx2/bnx2-rv2p-09-6.0.17.fw
firmware:       bnx2/bnx2-mips-09-6.2.1a.fw
firmware:       bnx2/bnx2-rv2p-06-6.0.15.fw
firmware:       bnx2/bnx2-mips-06-6.2.1.fw
version:        2.1.11
license:        GPL
description:    Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author:         Michael Chan <mchan>
srcversion:     61BD2699C6587068253C2BB
alias:          pci:v000014E4d0000163Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Bsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Asv*sd*bc*sc*i*
alias:          pci:v000014E4d00001639sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ACsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv0000103Csd00003102bc*sc*i*
alias:          pci:v000014E4d0000164Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003106bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003101bc*sc*i*
depends:
vermagic:       2.6.32-220.el6.x86_64 SMP mod_unload modversions
parm:           disable_msi:Disable Message Signaled Interrupt (MSI) (int)

Not sure if it's important but I'm including it just in case. If you have more questions don't hesitate to ask.

Comment 4 Lukáš Nykrýn 2012-09-10 13:29:52 UTC
Thanks for the report, I was able to reproduce this.
The main issue here is that if we just call ifup eth.280, ifup is started twice
ifup.280 -> kernel event -> udev reaction -> net.hotplug -> ifup.280 (which is definitely bad behavior) and same thing happens with ifdown.

I don't think that some locking would help, so we have two options
1) Ignore hotplug's calls of ifup and ifdown for vlans (but I am not sure if this will not break something)
2) Reassign this to kernel or maybe udev and they might be able solve this better on their level.

Comment 5 Milos Vyletel 2012-09-10 14:07:23 UTC
I've tried option 1) and it did not work:

# Ethernet 802.1Q VLAN support
-if [ "${VLAN}" = "yes" ] && [ "$ISALIAS" = "no" ]; then
+if [ "${VLAN}" = "yes" ] && [ "$ISALIAS" = "no" ] && [ -z "$IN_HOTPLUG" ]; then

not only the vlan ended in down state, I still could see one unnecessary kernel/udev events but they do not loop forever. Also as you've said it may actually break even more things...

[root@localhost ~]# ifdown eth0.280; ifup eth0.280
KERNEL[1347284812.857702] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1347284812.857771] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284812.857884] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1347284812.857980] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1347284812.858008] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284812.885215] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1347284812.885254] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1347284812.885270] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284813.000878] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1347284813.000930] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284813.001009] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1347284814.016270] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1347284814.100786] add      /devices/virtual/net/eth0.280 (net)
UDEV  [1347284814.100987] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1347284814.101012] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1347284814.101143] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1347284814.101160] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1347284814.202761] remove   /devices/virtual/net/eth0.280 (net)


Having said that I'm fine with reassigning to kernel/udev if you think they are the ones that should be fixing it. However I personally think that initscripts are still responsible. Kernel/udev may have limited ways of knowing if the ifup/ifdown was the trigger for the event they received. In the end it's your call.

Comment 6 Milos Vyletel 2012-12-13 22:18:23 UTC
Created attachment 663210 [details]
patch proposal

Wait for udev to process all current events before exiting. This fixes race condition we've been having with vlan interfaces. All comments are appreciated.

Comment 7 Milos Vyletel 2012-12-13 22:20:33 UTC
Comment on attachment 663210 [details]
patch proposal

swaped filenames in diff

Comment 8 Milos Vyletel 2012-12-13 22:21:19 UTC
Created attachment 663211 [details]
patch

Comment 9 Milos Vyletel 2013-01-30 19:18:20 UTC
Hi, any update? Did anyone had time to look at the proposed patch?

Comment 10 Lukáš Nykrýn 2013-01-31 08:30:18 UTC
This, patch looks quite sane. We will consider to include it in next release.

Comment 11 Milos Vyletel 2013-01-31 13:25:04 UTC
Great. Thanks.

Comment 12 Harald Hoyer 2013-03-18 10:39:01 UTC
udevadm settle --timeout=5

so, timeout=5 hardcoded? I don't think, that is a good idea.

Comment 13 Milos Vyletel 2013-03-18 11:41:14 UTC
Fair enough. I don't like that hardcoded value either but could not come up with better solution. What do you suggest?

Comment 14 Harald Hoyer 2013-03-20 10:39:42 UTC
And I also think, that ifup/ifdown should somehow file lock (see flock(1) for shell).

Concurrently operating on routing tables, interface settings, etc. does not seem to be a way to get consistent settings.

Comment 15 pletisan 2014-07-02 15:03:31 UTC
I can confirm this is happening on Red Hat Enterprise Linux Server release 6.4 (Santiago), x84_64 arch.

Sleeping for 1 second between ifdown and ifup works around the issue.

Comment 16 David Kaspar // Dee'Kej 2016-10-29 14:56:17 UTC
*** Bug 952538 has been marked as a duplicate of this bug. ***

Comment 18 David Kaspar // Dee'Kej 2016-10-31 13:57:57 UTC
According to Lukas, he thinks this BZ has been already fixed:
https://github.com/fedora-sysv/initscripts/commit/0c78d0c

The locking mechanism for ifup/ifdown is nice to have feature, but it would still not work correctly if someone would call different networking subscripts manually, or from some other non-RHEL scripts.

Therefore, I'm closing this BZ as WONTFIX. In case anyone still faces this issue, please use the workaround:
> HOTPLUG=no
as mentioned in comment #0.

Best regards,

David

Comment 19 David Kaspar // Dee'Kej 2016-11-30 09:47:06 UTC
*** Bug 1398326 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.