Bug 498089 - Strange race involving udevd, udevtrigger, and kernel causing misconfigured eth devices
Strange race involving udevd, udevtrigger, and kernel causing misconfigured e...
Status: CLOSED DUPLICATE of bug 471657
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
high Severity high
: rc
: ---
Assigned To: Bill Nottingham
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-28 16:06 EDT by Casey Dahlin
Modified: 2014-06-18 04:46 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-13 18:30:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Casey Dahlin 2009-04-28 16:06:00 EDT
When adding a single ethernet card with 4 ports (reproducible with 2 or more) to an existing system and rebooting, the devices show up as eth1, eth2, eth3, and __tmp1234567 (where 123.. is a random number, and eth0 is the pre-existing built in NIC).

I can get the devices to show up correctly by:
1) commenting out /sbin/start_udev in /etc/rc.sysinit
2) booting into single usermode
3) running udev &
4) running disown
5) waiting a bit (not necessary?)
6) running udevtrigger

This presents the expected interface names eth1-eth4. ANY attempt I've made to script steps 3-6 has resulted in the error reappearing. ONLY the manual process yields the desired results.

I've looked at the struct net_device for the poorly-named port with crash while the system was exhibiting the issue. All instances of the device name were __tmp1234567 (the name field of the struct itself, and the k_name and name fields of the child kobject).

Version-Release number of selected component (if applicable):
kernel 2.6.18-128.el5
udev 095-14.19.el5

How reproducible:
Always
Comment 1 Bill Nottingham 2009-04-30 12:09:01 EDT
What's the network configuration beforehand? (ifcfg files,etc.)
Comment 2 Casey Dahlin 2009-04-30 12:45:17 EDT
It was first observed on a fresh install immediately after firstboot.

For my reproduction, I actually disabled kudzu prior to installing the card, so it has no ifcfg scripts (I do have scripts for lo and eth0)
Comment 3 Bill Nottingham 2009-04-30 13:04:45 EDT
If kudzu was disabled before adding the card, weird things *can* happen with respect to device names.

What likely happened is that the new 4 ports were originally enumerated as eth0 -> eth3. Then the prior eth0 came up as eth4; it was renamed (per the configuration) by the udev helper as eth0, with the 'new' eth0 getting a temporary name, as there was no configuration for it.

In this case, it's kudzu that notices the temporary name (once all adapters have been added), and renames it back to something in the ethX namespace, and writes a config file for it.
Comment 4 Casey Dahlin 2009-04-30 14:20:17 EDT
In the customer's case kudzu did no such thing. The device kept the temp name and was not configured.
Comment 5 Bill Nottingham 2009-04-30 14:25:26 EDT
Ah, ok; I was reading from comment #2 that kudzu was disabled in the customer's case as well; if it wasn't, that changes things.
Comment 6 Casey Dahlin 2009-04-30 14:30:23 EDT
I should add that by copying the config script from eth3 to eth4 and updating it to match mac addresses with the __tmp interface the customer was able to get the system working normally.
Comment 7 Bill Nottingham 2009-04-30 14:37:51 EDT
Can you attach modprobe.conf, /etc/sysconfig/hwconf & ifcfg-* from the customer's system after the initially experienced the error?
Comment 8 Casey Dahlin 2009-05-04 16:48:34 EDT
From the customer:

The following was taking from a fresh install of RHEL5.2 w/ the Chelsio card.

__tmp7656812 Link encap:Ethernet  HWaddr 00:07:43:05:97:3D

inet addr:9.37.176.65  Bcast:9.37.191.255  Mask:255.255.240.0
RX packets:707 errors:0 dropped:0 overruns:0 frame:0
RX bytes:173625 (169.5 KiB)  TX bytes:9084 (8.8 KiB)

RX bytes:171997 (167.9 KiB)  TX bytes:0 (0.0 b)

# cat /etc/modprobe.conf
alias eth0 ehea
alias eth1 ehea
alias scsi_hostadapter ipr
alias eth2 cxgb3

# cat /etc/sysconfig/hwconf
-
class: OTHER
bus: USB
detached: 0
desc: "Linux 2.6.18-92.el5 ehci_hcd EHCI Host Controller"
usbclass: 9
usbsubclass: 0
usbprotocol: 0
usbbus: 1
usblevel: 0
usbport: 0
usbdev: 1
vendorId: 0000
deviceId: 0000
usbmfr: Linux 2.6.18-92.el5 ehci_hcd
usbprod: EHCI Host Controller
-
class: OTHER
bus: USB
detached: 0
desc: "Linux 2.6.18-92.el5 ohci_hcd OHCI Host Controller"
usbclass: 9
usbsubclass: 0
usbprotocol: 0
usbbus: 2
usblevel: 0
usbport: 0
usbdev: 1
vendorId: 0000
deviceId: 0000
usbmfr: Linux 2.6.18-92.el5 ohci_hcd
usbprod: OHCI Host Controller
-
class: OTHER
bus: USB
detached: 0
desc: "Linux 2.6.18-92.el5 ohci_hcd OHCI Host Controller"
usbclass: 9
usbsubclass: 0
usbprotocol: 0
usbbus: 3
usblevel: 0
usbport: 0
usbdev: 1
vendorId: 0000
deviceId: 0000
usbmfr: Linux 2.6.18-92.el5 ohci_hcd
usbprod: OHCI Host Controller
-
class: OTHER
bus: SCSI
detached: 0
device: sg1
desc: "IBM VSBPD1BB   SAS"
host: 0
id: 0
channel: 8
lun: 0
-
class: OTHER
bus: SCSI
detached: 0
device: sg2
desc: "IBM 57D0001SISIOA"
host: 0
id: 255
channel: 255
lun: 255
-
class: NETWORK
bus: PCI
detached: 0
device: eth2
desc: "Chelsio Communications Inc T320 10GbE Dual Port Protocol Engine Ethernet Adapter"
network.hwaddr: 00:07:43:05:97:3c
vendorId: 1425
deviceId: 0031
subVendorId: 1425
subDeviceId: 0001
pciType: 1
pcidom:    3
pcibus:  1
pcidev:  0
pcifn:  0
-
class: VIDEO
bus: PCI
detached: 0
device: fb0
desc: "ATI Technologies Inc ES1000"
video.xdriver: radeon
vendorId: 1002
deviceId: 515e
subVendorId: 1002
subDeviceId: 515e
pciType: 1
pcidom:    2
pcibus:  0
pcidev:  1
pcifn:  0
-
class: HD
bus: SCSI
detached: 0
device: sda
desc: "IBM-ESXS ST973402SS"
host: 0
id: 0
channel: 2
lun: 0
-
class: RAID
bus: PCI
detached: 0
driver: ipr
desc: "IBM Obsidian chipset SCSI controller"
vendorId: 1014
deviceId: 02bd
subVendorId: 1014
subDeviceId: 02c1
pciType: 1
pcidom:    0
pcibus:  0
pcidev:  1
pcifn:  0
-
class: KEYBOARD
bus: KEYBOARD
detached: 0
device: hvc0
desc: "pSeries LPAR console"
-
class: USB
bus: PCI
detached: 0
driver: ehci-hcd
desc: "NEC Corporation USB 2.0"
vendorId: 1033
deviceId: 00e0
subVendorId: 1033
subDeviceId: 00e0
pciType: 1
pcidom:    1
pcibus:  0
pcidev:  1
pcifn:  2
-
class: USB
bus: PCI
detached: 0
driver: ohci-hcd
desc: "NEC Corporation USB"
vendorId: 1033
deviceId: 0035
subVendorId: 1033
subDeviceId: 0035
pciType: 1
pcidom:    1
pcibus:  0
pcidev:  1
pcifn:  1
-
class: USB
bus: PCI
detached: 0
driver: ohci-hcd
desc: "NEC Corporation USB"
vendorId: 1033
deviceId: 0035
subVendorId: 1033
subDeviceId: 0035
pciType: 1
pcidom:    1
pcibus:  0
pcidev:  1
pcifn:  0

# cat /etc/sysconfig/network-scripts/ifcfg-*
BOOTPROTO=static
HWADDR=00:1A:64:D8:19:0A
IPADDR=9.37.176.65
HWADDR=00:1A:64:D8:19:0B
DEVICE=lo
IPADDR=127.0.0.1
NETMASK=255.0.0.0
NETWORK=127.0.0.0
# If you're having problems with gated making 127.0.0.0/8 a martian,
# you can change this to something else (255.255.255.255, for example)
BROADCAST=127.255.255.255
NAME=loopback
[root@rhel5 ~]#

[root@rhel5 ~]# ethtool -i __tmp7656812
firmware-version: T 5.0.0 TP 1.1.0
[root@rhel5 ~]# ethtool -i eth0
version: EHEA_0076-05
[root@rhel5 ~]# ethtool -i eth1
version: EHEA_0076-05
[root@rhel5 ~]# ethtool -i eth2
driver: cxgb3
firmware-version: T 5.0.0 TP 1.1.0
[root@rhel5 ~]#

driver: ehea
firmware-version:
bus-info:

driver: cxgb3
version: 1.0-ko
bus-info: 0003:01:00.0

driver: cxgb3
version: 1.0-ko
bus-info: 0003:01:00.0

driver: ehea
firmware-version:
bus-info:

# Chelsio Communications Inc T320 10GbE Dual Port Protocol Engine Ethernet Adapter
ONBOOT=yes
BOOTPROTO=dhcp

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

BROADCAST MULTICAST  MTU:1500  Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

# PORT - 1 IBM Host Ethernet Adapter
DEVICE=eth0
NETMASK=255.255.240.0
ONBOOT=yes

DEVICE=eth1

DEVICE=eth2
ONBOOT=yes
HWADDR=00:07:43:05:97:3c

# PORT - 2 IBM Host Ethernet Adapter
ONBOOT=no

# ifconfig -a
RX packets:0 errors:0 dropped:0 overruns:0 frame:0

BROADCAST MULTICAST  MTU:1500  Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

BROADCAST MULTICAST  MTU:1500  Metric:1
TX packets:49 errors:0 dropped:0 overruns:0 carrier:0

RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)

sit0      Link encap:IPv6-in-IPv4
NOARP  MTU:1480  Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Benny Rayner - brayner@us.ibm.com

collisions:0 txqueuelen:1000
Interrupt:33 Memory:7fe7e000-7fe7efff

eth0      Link encap:Ethernet  HWaddr 00:1A:64:D8:19:0A
inet6 addr: fe80::21a:64ff:fed8:190a/64 Scope:Link
collisions:0 txqueuelen:1000

eth1      Link encap:Ethernet  HWaddr 00:1A:64:D8:19:0B
collisions:0 txqueuelen:1000

eth2      Link encap:Ethernet  HWaddr 00:07:43:05:97:3C
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
collisions:0 txqueuelen:1000
Interrupt:33 Memory:7fe7e000-7fe7efff

lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING  MTU:16436  Metric:1
collisions:0 txqueuelen:0

#
Comment 9 Bill Nottingham 2009-05-13 18:30:13 EDT
Aha. "Chelsio" is the magic word here. I'm marking this as a duplicate of a prior bug.

*** This bug has been marked as a duplicate of bug 471657 ***
Comment 10 Casey Dahlin 2009-05-14 10:07:11 EDT
I reproduced the exact same symptoms with a non-Chelsio card. lspci:

04:04.0 Ethernet Controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
04:05.0 Ethernet Controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)

04:06.0 Ethernet Controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)

04:07.0 Ethernet Controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)

Note You need to log in before you can comment on or make changes to this bug.