Bug 855107
Summary: | udev race condition -event loop when hotplug is enabled and vlan interface is put down and up | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Milos Vyletel <milos.vyletel> | ||||||||||
Component: | initscripts | Assignee: | David Kaspar // Dee'Kej <deekej> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 6.2 | CC: | a15y87, deekej, harald, joseph.keller, jrieden, jzhenyon, milos.vyletel, pletisan, psedlak, vlad | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Release Note | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-10-31 13:57:57 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1075802, 1159926, 1172231, 1269194, 1356047, 1356056 | ||||||||||||
Attachments: |
|
Created attachment 610441 [details]
eth0 config
Forgot to mention hardware specs: System Information Manufacturer: HP Product Name: ProLiant BL460c G1 BIOS Information Vendor: HP Version: I15 Release Date: 10/25/2010 [root@localhost ~]# ethtool -i eth0 driver: bnx2 version: 2.1.11 firmware-version: bc 4.4.1 bus-info: 0000:03:00.0 [root@localhost ~]# modinfo bnx2 filename: /lib/modules/2.6.32-220.el6.x86_64/kernel/drivers/net/bnx2.ko firmware: bnx2/bnx2-rv2p-09ax-6.0.17.fw firmware: bnx2/bnx2-rv2p-09-6.0.17.fw firmware: bnx2/bnx2-mips-09-6.2.1a.fw firmware: bnx2/bnx2-rv2p-06-6.0.15.fw firmware: bnx2/bnx2-mips-06-6.2.1.fw version: 2.1.11 license: GPL description: Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver author: Michael Chan <mchan> srcversion: 61BD2699C6587068253C2BB alias: pci:v000014E4d0000163Csv*sd*bc*sc*i* alias: pci:v000014E4d0000163Bsv*sd*bc*sc*i* alias: pci:v000014E4d0000163Asv*sd*bc*sc*i* alias: pci:v000014E4d00001639sv*sd*bc*sc*i* alias: pci:v000014E4d000016ACsv*sd*bc*sc*i* alias: pci:v000014E4d000016AAsv*sd*bc*sc*i* alias: pci:v000014E4d000016AAsv0000103Csd00003102bc*sc*i* alias: pci:v000014E4d0000164Csv*sd*bc*sc*i* alias: pci:v000014E4d0000164Asv*sd*bc*sc*i* alias: pci:v000014E4d0000164Asv0000103Csd00003106bc*sc*i* alias: pci:v000014E4d0000164Asv0000103Csd00003101bc*sc*i* depends: vermagic: 2.6.32-220.el6.x86_64 SMP mod_unload modversions parm: disable_msi:Disable Message Signaled Interrupt (MSI) (int) Not sure if it's important but I'm including it just in case. If you have more questions don't hesitate to ask. Thanks for the report, I was able to reproduce this. The main issue here is that if we just call ifup eth.280, ifup is started twice ifup.280 -> kernel event -> udev reaction -> net.hotplug -> ifup.280 (which is definitely bad behavior) and same thing happens with ifdown. I don't think that some locking would help, so we have two options 1) Ignore hotplug's calls of ifup and ifdown for vlans (but I am not sure if this will not break something) 2) Reassign this to kernel or maybe udev and they might be able solve this better on their level. I've tried option 1) and it did not work: # Ethernet 802.1Q VLAN support -if [ "${VLAN}" = "yes" ] && [ "$ISALIAS" = "no" ]; then +if [ "${VLAN}" = "yes" ] && [ "$ISALIAS" = "no" ] && [ -z "$IN_HOTPLUG" ]; then not only the vlan ended in down state, I still could see one unnecessary kernel/udev events but they do not loop forever. Also as you've said it may actually break even more things... [root@localhost ~]# ifdown eth0.280; ifup eth0.280 KERNEL[1347284812.857702] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1347284812.857771] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1347284812.857884] remove /devices/virtual/net/eth0.280 (net) UDEV [1347284812.857980] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1347284812.858008] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1347284812.885215] add /devices/virtual/net/eth0.280 (net) KERNEL[1347284812.885254] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1347284812.885270] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1347284813.000878] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1347284813.000930] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1347284813.001009] remove /devices/virtual/net/eth0.280 (net) UDEV [1347284814.016270] remove /devices/virtual/net/eth0.280 (net) UDEV [1347284814.100786] add /devices/virtual/net/eth0.280 (net) UDEV [1347284814.100987] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1347284814.101012] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1347284814.101143] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1347284814.101160] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1347284814.202761] remove /devices/virtual/net/eth0.280 (net) Having said that I'm fine with reassigning to kernel/udev if you think they are the ones that should be fixing it. However I personally think that initscripts are still responsible. Kernel/udev may have limited ways of knowing if the ifup/ifdown was the trigger for the event they received. In the end it's your call. Created attachment 663210 [details]
patch proposal
Wait for udev to process all current events before exiting. This fixes race condition we've been having with vlan interfaces. All comments are appreciated.
Comment on attachment 663210 [details]
patch proposal
swaped filenames in diff
Created attachment 663211 [details]
patch
Hi, any update? Did anyone had time to look at the proposed patch? This, patch looks quite sane. We will consider to include it in next release. Great. Thanks. udevadm settle --timeout=5 so, timeout=5 hardcoded? I don't think, that is a good idea. Fair enough. I don't like that hardcoded value either but could not come up with better solution. What do you suggest? And I also think, that ifup/ifdown should somehow file lock (see flock(1) for shell). Concurrently operating on routing tables, interface settings, etc. does not seem to be a way to get consistent settings. I can confirm this is happening on Red Hat Enterprise Linux Server release 6.4 (Santiago), x84_64 arch. Sleeping for 1 second between ifdown and ifup works around the issue. *** Bug 952538 has been marked as a duplicate of this bug. *** According to Lukas, he thinks this BZ has been already fixed: https://github.com/fedora-sysv/initscripts/commit/0c78d0c The locking mechanism for ifup/ifdown is nice to have feature, but it would still not work correctly if someone would call different networking subscripts manually, or from some other non-RHEL scripts. Therefore, I'm closing this BZ as WONTFIX. In case anyone still faces this issue, please use the workaround: > HOTPLUG=no as mentioned in comment #0. Best regards, David *** Bug 1398326 has been marked as a duplicate of this bug. *** |
Created attachment 610440 [details] eth0.280 config Description of problem: We've discovered race condition when ifdown and ifup is called without any delay on VLAN interface udevd ends up in endless loop. Udevd itself does not seem to be a problem. It just does whatever kernel tells it to. The problem itself is with the interaction of initscripts and udev rules. Here's what's going on the server ifdown eth0.280 | KERNEL remove event | ifup eht0.280 UDEV remove event | KERNEL add event net.hoplug calls idown | UDEV add event KERNEL remove event | net.hoplug calls ifup UDEV remove event | KERNEL add event ... | UDEV add event | ... I have not seen any race when using physical interface or bridge. This seems to be isolated problem for VLAN interfaces. Disabling hotplug (HOTPLUG=no) for VLAN interfaces eliminates this race condition. As well as putting sleep 1 between ifdown and ifup to allow net.hoplug finish before ifup is called again. I was trying to find a fix but I'm not really sure what the proper fix is. I was thinking about adding some kind of locking to the if{up,down} scripts to lock the execution to only 1 instance at a time. But this may be a bit too complicated and maybe only default to HOTPLUG=no for VLANs would be sufficient. Version-Release number of selected component (if applicable): kernel-2.6.32-220.el6.x86_64 initscripts-9.03.27-1.el6.x86_64 udev-147-2.40.el6.x86_64 How reproducible: always Steps to Reproduce: 1. create vlan interface (see attached ifcfg-eth0(.280)) 2. service network restart 3. ifdown eth0.280; ifup eth0.280 4. udevadm monitor (to see the actual udev loop) Actual results: (while running udevadm monitor in background, HOTPLUG=yes (default)) KERNEL[1346945864.946898] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945864.946925] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1346945864.947003] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945864.947080] remove /devices/virtual/net/eth0.280 (net) UDEV [1346945864.947195] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1346945864.974109] add /devices/virtual/net/eth0.280 (net) KERNEL[1346945864.974233] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945864.974252] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1346945865.086871] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945865.086900] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1346945865.086924] remove /devices/virtual/net/eth0.280 (net) UDEV [1346945866.102280] remove /devices/virtual/net/eth0.280 (net) KERNEL[1346945866.137424] add /devices/virtual/net/eth0.280 (net) KERNEL[1346945866.137546] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945866.137564] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1346945866.212688] add /devices/virtual/net/eth0.280 (net) UDEV [1346945866.213016] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1346945866.213045] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1346945866.213064] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1346945866.213077] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1346945866.350869] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945866.350994] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1346945866.351101] remove /devices/virtual/net/eth0.280 (net) UDEV [1346945867.115054] remove /devices/virtual/net/eth0.280 (net) KERNEL[1346945867.150201] add /devices/virtual/net/eth0.280 (net) <snip> loop continues until udevd is killed Expected results: (while running udevadm monitor in background, HOTPLUG=no) [root@localhost network-scripts]# ifdown eth0.280 && ifup eth0.280 KERNEL[1346945676.096862] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945676.096921] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) KERNEL[1346945676.097102] remove /devices/virtual/net/eth0.280 (net) UDEV [1346945676.097348] remove /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1346945676.097383] remove /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1346945676.123261] remove /devices/virtual/net/eth0.280 (net) KERNEL[1346945676.136736] add /devices/virtual/net/eth0.280 (net) KERNEL[1346945676.136764] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) KERNEL[1346945676.136784] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) UDEV [1346945676.169711] add /devices/virtual/net/eth0.280 (net) UDEV [1346945676.169972] add /devices/virtual/net/eth0.280/queues/rx-0 (queues) UDEV [1346945676.170128] add /devices/virtual/net/eth0.280/queues/tx-0 (queues) Additional info: