Bug 1264024 - no network on xen guests: Error: Connection activation failed: No suitable device found for this connection.
no network on xen guests: Error: Connection activation failed: No suitable de...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager (Show other bugs)
7.2
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Thomas Haller
Desktop QE
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-17 06:41 EDT by Jan Stancek
Modified: 2015-11-19 06:04 EST (History)
12 users (show)

See Also:
Fixed In Version: NetworkManager-1.0.6-11.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-19 06:04:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages from xen pv guest (RHEL-7.2-20150904.0) (128.73 KB, text/plain)
2015-09-17 06:43 EDT, Jan Stancek
no flags Details

  None (edit)
Description Jan Stancek 2015-09-17 06:41:33 EDT
Description of problem:
RHEL-7.2-20150904.0 PV and HVM guests on Xen (RHEL5-Server-U11 x86_64) have no network after boot.

# systemctl status network
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network)
   Active: failed (Result: exit-code) since Thu 2015-09-17 09:14:06 EDT; 2h 44min left
     Docs: man:systemd-sysv-generator(8)
  Process: 683 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)

Sep 17 09:14:06 dhcp47-156.lab.bos.redhat.com network[683]: [FAILED]
Sep 17 09:14:06 dhcp47-156.lab.bos.redhat.com systemd[1]: network.service: control process exited, code=exited status=1
Sep 17 09:14:06 dhcp47-156.lab.bos.redhat.com systemd[1]: Failed to start LSB: Bring up/down networking.
Sep 17 09:14:06 dhcp47-156.lab.bos.redhat.com systemd[1]: Unit network.service entered failed state.
Sep 17 09:14:06 dhcp47-156.lab.bos.redhat.com systemd[1]: network.service failed.

After tracing "ifup" I get:
+ is_nm_handling eth0
+ grep -q '^\(eth0:connected\)\|\(eth0:connecting.*\)$'
+ LANG=C
+ nmcli -t --fields device,state dev status
+ nmcli con up uuid afee9fbc-adb0-4c30-ba68-3a2357e7e9ae
Error: Connection activation failed: No suitable device found for this connection.
+ exit 4

And this is repeatable with nmcli:
# nmcli con up uuid afee9fbc-adb0-4c30-ba68-3a2357e7e9ae
Error: Connection activation failed: No suitable device found for this connection.

Problem seems to be related to fact, that NM can't tell what device this is:
# nmcli c
NAME  UUID                                  TYPE            DEVICE 
eth0  afee9fbc-adb0-4c30-ba68-3a2357e7e9ae  802-3-ethernet  --     

# ethtool  -i eth0
driver: vif
version: 
firmware-version: 
bus-info: vif-0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

I enabled DEBUG logging for NM (full log will be attached), and the problem appears to originate in ethtool_get():
Sep 17 09:01:52 dhcp47-156 NetworkManager[614]: <debug> [1442494912.079108] [platform/nm-platform.c:2896] log_link(): platform: signal: link changed: 2: eth0 <DOWN;broadcast,multicast> mtu 1500 arp 1 ethernet init addrgenmode eui64 addr 52:56:00:00:00:31 driver vif
Sep 17 09:01:52 dhcp47-156 NetworkManager[614]: <debug> [1442494912.079274] [platform/nm-platform-utils.c:69] ethtool_get(): ethtool: Request failed: Operation not supported
Sep 17 09:01:52 dhcp47-156 NetworkManager[614]: <debug> [1442494912.086196] [platform/nm-platform.c:2896] log_link(): platform: signal: link changed: 1: lo <UP,LOWER_UP;loopback,up,running,lowerup> mtu 65536 arp 772 loopback init addrgenmode eui64 addr 00:00:00:00:00:00 driver unknown

Glancing over strace logs, my guess would be it's some call related to "read permanent MAC address":
[pid  2315] 05:15:32.797626 sendto(5, "<30>Sep 17 05:15:32 NetworkManager[2315]: <debug> [1442481332.797601] [devices/nm-device.c:9107] constructed(): [0x7ff5b2acb6a0] (eth0): read permanent MAC address 00:00:00:00:00:00", 181, MSG_NOSIGNAL, NULL, 0) = 181
[pid  2315] 05:15:32.798173 sendto(5, "<30>Sep 17 05:15:32 NetworkManager[2315]: <info>  (eth0): link connected", 72, MSG_NOSIGNAL, NULL, 0) = 72
[pid  2315] 05:15:32.798736 access("/sys/class/net/eth0", F_OK) = 0
[pid  2315] 05:15:32.798828 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 16
[pid  2315] 05:15:32.798908 ioctl(16, SIOCETHTOOL, 0x7ffd4f395510) = -1 EOPNOTSUPP (Operation not supported)

Version-Release number of selected component (if applicable):
NetworkManager-1.0.6-6.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. install RHEL-7.2-20150904.0 PV and HVM guest on RHEL5-Server-U11
2. after installation guests have no network

Actual results:
no network on xen guests: Error: Connection activation failed: No suitable device found for this connection.

Expected results:
network on guests is up and running

Additional info:
"dhclient eth0" will fix the network
RHEL7.1GA compose does not have this problem
Comment 1 Jan Stancek 2015-09-17 06:43:36 EDT
Created attachment 1074394 [details]
/var/log/messages from xen pv guest (RHEL-7.2-20150904.0)
Comment 4 Jirka Klimes 2015-09-17 10:54:08 EDT
The eth0 connection has 802-3-ethernet.mac-address set to 52:56:00:00:00:01. This says that the connection only applies to a device with such MAC.

The problem is that the eth0 interface does not report the MAC, but 00:00:00:00:00:00 is returned instead. (Even if the MAC is visible with ip link).

# ethtool -P eth0
Permanent address: 00:00:00:00:00:00

In NM code it is
validate_activation_request()
  -> nm_manager_get_best_device_for_connection()
     -> nm_device_check_connection_available()
        -> check_connection_available() in nm-device-ethernet.c
            perm_hw_addr = nm_device_get_permanent_hw_address (device);
Comment 5 Jirka Klimes 2015-09-17 11:04:22 EDT
(In reply to Jan Stancek from comment #0)
> Description of problem:
> Glancing over strace logs, my guess would be it's some call related to "read
> permanent MAC address":
> [pid  2315] 05:15:32.797626 sendto(5, "<30>Sep 17 05:15:32
> NetworkManager[2315]: <debug> [1442481332.797601] [devices/nm-device.c:9107]
> constructed(): [0x7ff5b2acb6a0] (eth0): read permanent MAC address
> 00:00:00:00:00:00", 181, MSG_NOSIGNAL, NULL, 0) = 181
> [pid  2315] 05:15:32.798173 sendto(5, "<30>Sep 17 05:15:32
> NetworkManager[2315]: <info>  (eth0): link connected", 72, MSG_NOSIGNAL,
> NULL, 0) = 72
> [pid  2315] 05:15:32.798736 access("/sys/class/net/eth0", F_OK) = 0
> [pid  2315] 05:15:32.798828 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 16
> [pid  2315] 05:15:32.798908 ioctl(16, SIOCETHTOOL, 0x7ffd4f395510) = -1
> EOPNOTSUPP (Operation not supported)
> 

This shows that the virtual 'vif' driver does not support ethtool. Maybe, we should fallback to reading '/sys/class/net/<interface>/address' directly in such cases.
Comment 6 Dan Williams 2015-09-17 12:34:54 EDT
(In reply to Jirka Klimes from comment #5)
> (In reply to Jan Stancek from comment #0)
> > Description of problem:
> > Glancing over strace logs, my guess would be it's some call related to "read
> > permanent MAC address":
> > [pid  2315] 05:15:32.797626 sendto(5, "<30>Sep 17 05:15:32
> > NetworkManager[2315]: <debug> [1442481332.797601] [devices/nm-device.c:9107]
> > constructed(): [0x7ff5b2acb6a0] (eth0): read permanent MAC address
> > 00:00:00:00:00:00", 181, MSG_NOSIGNAL, NULL, 0) = 181
> > [pid  2315] 05:15:32.798173 sendto(5, "<30>Sep 17 05:15:32
> > NetworkManager[2315]: <info>  (eth0): link connected", 72, MSG_NOSIGNAL,
> > NULL, 0) = 72
> > [pid  2315] 05:15:32.798736 access("/sys/class/net/eth0", F_OK) = 0
> > [pid  2315] 05:15:32.798828 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 16
> > [pid  2315] 05:15:32.798908 ioctl(16, SIOCETHTOOL, 0x7ffd4f395510) = -1
> > EOPNOTSUPP (Operation not supported)
> > 
> 
> This shows that the virtual 'vif' driver does not support ethtool. Maybe, we
> should fallback to reading '/sys/class/net/<interface>/address' directly in
> such cases.

If the device doesn't have a permanent address, then we can just use the device's current hardware address though?  /sys/class/net/xxx/address is just whatever the current address is, and I thought that was already handled internally by copying it over in nm-device.c::setup().
Comment 7 Thomas Haller 2015-09-17 18:28:43 EDT
(In reply to Jirka Klimes from comment #5)
> (In reply to Jan Stancek from comment #0)
> > Description of problem:
> > Glancing over strace logs, my guess would be it's some call related to "read
> > permanent MAC address":
> > [pid  2315] 05:15:32.797626 sendto(5, "<30>Sep 17 05:15:32
> > NetworkManager[2315]: <debug> [1442481332.797601] [devices/nm-device.c:9107]
> > constructed(): [0x7ff5b2acb6a0] (eth0): read permanent MAC address
> > 00:00:00:00:00:00", 181, MSG_NOSIGNAL, NULL, 0) = 181
> > [pid  2315] 05:15:32.798173 sendto(5, "<30>Sep 17 05:15:32
> > NetworkManager[2315]: <info>  (eth0): link connected", 72, MSG_NOSIGNAL,
> > NULL, 0) = 72
> > [pid  2315] 05:15:32.798736 access("/sys/class/net/eth0", F_OK) = 0
> > [pid  2315] 05:15:32.798828 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 16
> > [pid  2315] 05:15:32.798908 ioctl(16, SIOCETHTOOL, 0x7ffd4f395510) = -1
> > EOPNOTSUPP (Operation not supported)
> > 
> 
> This shows that the virtual 'vif' driver does not support ethtool. Maybe, we
> should fallback to reading '/sys/class/net/<interface>/address' directly in
> such cases.

/sys/class/net/<interface>/address does not return the permanent hardware address but the current address.


It seems that the bug is that we accept a permanent address 00:00:00:00:00:00

(eth0): hardware address now 52:56:00:00:00:31
(eth0): read initial MAC address 52:56:00:00:00:31
(eth0): read permanent MAC address 00:00:00:00:00:00

we should reject 00:00:00:00:00:00, and then we would properly fallback to the current hwaddr: http://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/devices/nm-device.c?id=f05b42e6df432838f571a73fd917e1117c4c59a7#n9095


Please review patch on branch th/platform-permanent-hwaddr-rh1264024
Comment 8 Jan Stancek 2015-09-18 05:08:24 EDT
(In reply to Thomas Haller from comment #7)
> 
> Please review patch on branch th/platform-permanent-hwaddr-rh1264024

Thomas, I applied the two patches from this branch to NetworkManager-1.0.6-6.el7 and I can confirm that guest now has network when it boots.
Comment 10 Beniamino Galvani 2015-09-18 07:29:26 EDT
LGTM
Comment 11 Thomas Haller 2015-09-18 08:02:45 EDT
(In reply to Thomas Haller from comment #7)

> Please review patch on branch th/platform-permanent-hwaddr-rh1264024

Patch merged upstream.


master: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=b6459ace2f30cf935d9ea4b8da0e31e8fff4b9a8

nm-1-0: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=52923b6c73b0888178a1142f79083c66b17aecdc
Comment 19 G Crowe 2015-11-08 04:23:18 EST
Note that this affects Fedora 23 also.

Also, the work-around given in the initial post does not work when DHCP is not used. I have managed to get mine to work by simply commenting out the HWADDR link in ifcfg-eth0.
Comment 20 errata-xmlrpc 2015-11-19 06:04:23 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2315.html

Note You need to log in before you can comment on or make changes to this bug.