Bug 855107 - udev race condition -event loop when hotplug is enabled and vlan interface is put down and up
udev race condition -event loop when hotplug is enabled and vlan interface is...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: initscripts (Show other bugs)
6.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: David Kaspar [Dee'Kej]
qe-baseos-daemons
:
: 952538 (view as bug list)
Depends On:
Blocks: 1075802 1159926 1172231 1269194 1356047 1356056
  Show dependency treegraph
 
Reported: 2012-09-06 13:40 EDT by Milos Vyletel
Modified: 2016-11-30 04:47 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Release Note
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-31 09:57:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
eth0.280 config (85 bytes, application/octet-stream)
2012-09-06 13:40 EDT, Milos Vyletel
no flags Details
eth0 config (72 bytes, application/octet-stream)
2012-09-06 13:40 EDT, Milos Vyletel
no flags Details
patch proposal (387 bytes, patch)
2012-12-13 17:18 EST, Milos Vyletel
no flags Details | Diff
patch (387 bytes, patch)
2012-12-13 17:21 EST, Milos Vyletel
dkaspar: review-
Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 351323 None None None 2016-10-29 10:58 EDT

  None (edit)
Description Milos Vyletel 2012-09-06 13:40:07 EDT
Created attachment 610440 [details]
eth0.280 config

Description of problem:
We've discovered race condition when ifdown and ifup is called without any delay on VLAN interface udevd ends up in endless loop. Udevd itself does not seem to be a problem. It just does whatever kernel tells it to. The problem itself is with the interaction of initscripts and udev rules. Here's what's going on the server

ifdown eth0.280        |
KERNEL remove event    |   ifup eht0.280
UDEV remove event      |   KERNEL add event
net.hoplug calls idown |   UDEV add event
KERNEL remove event    |   net.hoplug calls ifup
UDEV remove event      |   KERNEL add event
...                    |   UDEV add event
                       |   ...

I have not seen any race when using physical interface or bridge. This seems to be isolated problem for VLAN interfaces. Disabling hotplug (HOTPLUG=no) for VLAN interfaces eliminates this race condition. As well as putting sleep 1 between ifdown and ifup to allow net.hoplug finish before ifup is called again.

I was trying to find a fix but I'm not really sure what the proper fix is. I was thinking about adding some kind of locking to the if{up,down} scripts to lock the execution to only 1 instance at a time. But this may be a bit too complicated and maybe only default to HOTPLUG=no for VLANs would be sufficient.

Version-Release number of selected component (if applicable):
kernel-2.6.32-220.el6.x86_64
initscripts-9.03.27-1.el6.x86_64
udev-147-2.40.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. create vlan interface (see attached ifcfg-eth0(.280))
2. service network restart
3. ifdown eth0.280; ifup eth0.280
4. udevadm monitor (to see the actual udev loop)
  
Actual results:
(while running udevadm monitor in background, HOTPLUG=yes (default))
KERNEL[1346945864.946898] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945864.946925] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945864.947003] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945864.947080] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945864.947195] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945864.974109] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1346945864.974233] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945864.974252] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945865.086871] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945865.086900] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945865.086924] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945866.102280] remove   /devices/virtual/net/eth0.280 (net)
KERNEL[1346945866.137424] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1346945866.137546] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945866.137564] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945866.212688] add      /devices/virtual/net/eth0.280 (net)
UDEV  [1346945866.213016] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945866.213045] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945866.213064] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945866.213077] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945866.350869] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945866.350994] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945866.351101] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945867.115054] remove   /devices/virtual/net/eth0.280 (net)
KERNEL[1346945867.150201] add      /devices/virtual/net/eth0.280 (net)
<snip>
loop continues until udevd is killed

Expected results:
(while running udevadm monitor in background, HOTPLUG=no)
[root@localhost network-scripts]# ifdown eth0.280 && ifup eth0.280
KERNEL[1346945676.096862] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945676.096921] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1346945676.097102] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1346945676.097348] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945676.097383] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945676.123261] remove   /devices/virtual/net/eth0.280 (net)
KERNEL[1346945676.136736] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1346945676.136764] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1346945676.136784] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1346945676.169711] add      /devices/virtual/net/eth0.280 (net)
UDEV  [1346945676.169972] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1346945676.170128] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)

Additional info:
Comment 2 Milos Vyletel 2012-09-06 13:40:59 EDT
Created attachment 610441 [details]
eth0 config
Comment 3 Milos Vyletel 2012-09-07 09:02:24 EDT
Forgot to mention hardware specs:

System Information
        Manufacturer: HP
        Product Name: ProLiant BL460c G1
BIOS Information
        Vendor: HP
        Version: I15
        Release Date: 10/25/2010

[root@localhost ~]# ethtool -i eth0
driver: bnx2
version: 2.1.11
firmware-version: bc 4.4.1
bus-info: 0000:03:00.0
[root@localhost ~]# modinfo bnx2
filename:       /lib/modules/2.6.32-220.el6.x86_64/kernel/drivers/net/bnx2.ko
firmware:       bnx2/bnx2-rv2p-09ax-6.0.17.fw
firmware:       bnx2/bnx2-rv2p-09-6.0.17.fw
firmware:       bnx2/bnx2-mips-09-6.2.1a.fw
firmware:       bnx2/bnx2-rv2p-06-6.0.15.fw
firmware:       bnx2/bnx2-mips-06-6.2.1.fw
version:        2.1.11
license:        GPL
description:    Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author:         Michael Chan <mchan@broadcom.com>
srcversion:     61BD2699C6587068253C2BB
alias:          pci:v000014E4d0000163Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Bsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Asv*sd*bc*sc*i*
alias:          pci:v000014E4d00001639sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ACsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv0000103Csd00003102bc*sc*i*
alias:          pci:v000014E4d0000164Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003106bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003101bc*sc*i*
depends:
vermagic:       2.6.32-220.el6.x86_64 SMP mod_unload modversions
parm:           disable_msi:Disable Message Signaled Interrupt (MSI) (int)

Not sure if it's important but I'm including it just in case. If you have more questions don't hesitate to ask.
Comment 4 Lukáš Nykrýn 2012-09-10 09:29:52 EDT
Thanks for the report, I was able to reproduce this.
The main issue here is that if we just call ifup eth.280, ifup is started twice
ifup.280 -> kernel event -> udev reaction -> net.hotplug -> ifup.280 (which is definitely bad behavior) and same thing happens with ifdown.

I don't think that some locking would help, so we have two options
1) Ignore hotplug's calls of ifup and ifdown for vlans (but I am not sure if this will not break something)
2) Reassign this to kernel or maybe udev and they might be able solve this better on their level.
Comment 5 Milos Vyletel 2012-09-10 10:07:23 EDT
I've tried option 1) and it did not work:

# Ethernet 802.1Q VLAN support
-if [ "${VLAN}" = "yes" ] && [ "$ISALIAS" = "no" ]; then
+if [ "${VLAN}" = "yes" ] && [ "$ISALIAS" = "no" ] && [ -z "$IN_HOTPLUG" ]; then

not only the vlan ended in down state, I still could see one unnecessary kernel/udev events but they do not loop forever. Also as you've said it may actually break even more things...

[root@localhost ~]# ifdown eth0.280; ifup eth0.280
KERNEL[1347284812.857702] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1347284812.857771] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284812.857884] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1347284812.857980] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1347284812.858008] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284812.885215] add      /devices/virtual/net/eth0.280 (net)
KERNEL[1347284812.885254] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1347284812.885270] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284813.000878] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
KERNEL[1347284813.000930] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
KERNEL[1347284813.001009] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1347284814.016270] remove   /devices/virtual/net/eth0.280 (net)
UDEV  [1347284814.100786] add      /devices/virtual/net/eth0.280 (net)
UDEV  [1347284814.100987] add      /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1347284814.101012] add      /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1347284814.101143] remove   /devices/virtual/net/eth0.280/queues/tx-0 (queues)
UDEV  [1347284814.101160] remove   /devices/virtual/net/eth0.280/queues/rx-0 (queues)
UDEV  [1347284814.202761] remove   /devices/virtual/net/eth0.280 (net)


Having said that I'm fine with reassigning to kernel/udev if you think they are the ones that should be fixing it. However I personally think that initscripts are still responsible. Kernel/udev may have limited ways of knowing if the ifup/ifdown was the trigger for the event they received. In the end it's your call.
Comment 6 Milos Vyletel 2012-12-13 17:18:23 EST
Created attachment 663210 [details]
patch proposal

Wait for udev to process all current events before exiting. This fixes race condition we've been having with vlan interfaces. All comments are appreciated.
Comment 7 Milos Vyletel 2012-12-13 17:20:33 EST
Comment on attachment 663210 [details]
patch proposal

swaped filenames in diff
Comment 8 Milos Vyletel 2012-12-13 17:21:19 EST
Created attachment 663211 [details]
patch
Comment 9 Milos Vyletel 2013-01-30 14:18:20 EST
Hi, any update? Did anyone had time to look at the proposed patch?
Comment 10 Lukáš Nykrýn 2013-01-31 03:30:18 EST
This, patch looks quite sane. We will consider to include it in next release.
Comment 11 Milos Vyletel 2013-01-31 08:25:04 EST
Great. Thanks.
Comment 12 Harald Hoyer 2013-03-18 06:39:01 EDT
udevadm settle --timeout=5

so, timeout=5 hardcoded? I don't think, that is a good idea.
Comment 13 Milos Vyletel 2013-03-18 07:41:14 EDT
Fair enough. I don't like that hardcoded value either but could not come up with better solution. What do you suggest?
Comment 14 Harald Hoyer 2013-03-20 06:39:42 EDT
And I also think, that ifup/ifdown should somehow file lock (see flock(1) for shell).

Concurrently operating on routing tables, interface settings, etc. does not seem to be a way to get consistent settings.
Comment 15 pletisan 2014-07-02 11:03:31 EDT
I can confirm this is happening on Red Hat Enterprise Linux Server release 6.4 (Santiago), x84_64 arch.

Sleeping for 1 second between ifdown and ifup works around the issue.
Comment 16 David Kaspar [Dee'Kej] 2016-10-29 10:56:17 EDT
*** Bug 952538 has been marked as a duplicate of this bug. ***
Comment 18 David Kaspar [Dee'Kej] 2016-10-31 09:57:57 EDT
According to Lukas, he thinks this BZ has been already fixed:
https://github.com/fedora-sysv/initscripts/commit/0c78d0c

The locking mechanism for ifup/ifdown is nice to have feature, but it would still not work correctly if someone would call different networking subscripts manually, or from some other non-RHEL scripts.

Therefore, I'm closing this BZ as WONTFIX. In case anyone still faces this issue, please use the workaround:
> HOTPLUG=no
as mentioned in comment #0.

Best regards,

David
Comment 19 David Kaspar [Dee'Kej] 2016-11-30 04:47:06 EST
*** Bug 1398326 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.