Bug 838907 - NetworkManager - setting up of bonding device fails
NetworkManager - setting up of bonding device fails
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: NetworkManager (Show other bugs)
17
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Dan Williams
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-10 08:17 EDT by Narendra K
Modified: 2013-07-31 19:36 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-31 19:36:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages with debugging enabled (255.97 KB, text/plain)
2012-07-10 09:06 EDT, Narendra K
no flags Details

  None (edit)
Description Narendra K 2012-07-10 08:17:42 EDT
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Narendra K 2012-07-10 08:33:19 EDT
Hi, sorry. It seems like i hit enter before filling in all the details. I am updating all the details here.

Description of problem:

It looks like there is an issue with setting up of bonding devices in NetworkManager from Fedora 17. 


Version-Release number of selected component (if applicable):

# rpm -qi NetworkManager
Name        : NetworkManager
Epoch       : 1
Version     : 0.9.4.0
Release     : 7.git20120403.fc17
Architecture: x86_64

Kernel Version:

# uname -r
3.3.4-5.fc17.x86_64

How reproducible: Always.


Steps to Reproduce:

1. Setup configuration files create bonding device as follows -

# cat ifcfg-bond0
DEVICE="bond0"
ONBOOT="yes"
BOOTPROTO="dhcp"
BONDING_OPTS="miimon=100 mode=balance-alb"
BONDING_MASTER="yes"
NM_CONTROLLED="yes"

# cat ifcfg-em3
#NETBOOT="yes"
#HWADDR="d4:ae:52:81:cc:6c"
BOOTPROTO="none"
DEVICE="em3"
ONBOOT="yes"
MASTER="bond0"
SLAVE="yes"
TYPE=Ethernet
NM_CONTROLLED="yes"

2. Restart NetworkManager:

systemctl restart NetworkManager.service


Actual results:

NetworkManager does not set up bond0 correctly.

# ip addr show dev bond0
21: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

Observe that 
- the MAC id of bond0 is not set correctly. 
- The slave as configured in ifcfg-em3 is not enslaved
# cat /sys/class/net/bond0/bonding/slaves
<nothing>

Expected results:

NM sets up bond0 correctly as configured in ifcfg-bond0 and ifcfg-em3.

Additional info:

It seems like the NM in Fedora 17 has support to create bonding, but the GUI (connection editor ?) does not yet support it. But as i understand, NM can understand ifcfg-bond0 and set up bonding. But it seems like there is an issue.
Comment 2 Narendra K 2012-07-10 09:04:16 EDT
To add to observations, the em3 which is designated as a slave device in (ifcfg-em3) is brought up independently and has an IP address.

# ip addr show dev em3
12: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether d4:ae:52:81:cc:6c brd ff:ff:ff:ff:ff:ff
    inet 172.16.66.54/18 brd 172.16.127.255 scope global em3
    inet6 fe80::d6ae:52ff:fe81:cc6c/64 scope link
       valid_lft forever preferred_lft forever


Attaching the /var/log/messages file. Please let me know if any additional information/ log files are required.
Comment 3 Narendra K 2012-07-10 09:06:13 EDT
Created attachment 597330 [details]
/var/log/messages with debugging enabled
Comment 4 Dan Williams 2012-07-20 16:40:45 EDT
It looks like the bond isn't actually enslaving anything, and thus DHCP fails because of course there's no slave transport.  What is the expected setup here?  What devices are supposed to be slaves of bond0?

NM is also creating "default wired" connections for all the interfaces that don't have configuration, which is causing em3 to be started on its own, not as a slave of bond0.  That shouldn't be happening if ifcfg-em3 is set up correctly (barring bugs in NM of course).

Actually, looking back through the logs we have:

Jul 10 18:25:39 r720-fedora17 NetworkManager[2971]:    ifcfg-rh:     error: No IP configuration allowed for bonding slave

so ifcfg-em3 is ignored because the config is invalid.  If you remove the  IP configuration details from ifcfg-em3, do things work as expected?
Comment 5 Dan Williams 2012-07-20 16:41:47 EDT
(follow-up question: are there actually cases where an IP configuration on a slave is actually valid when it's enslaved to a bond?  Upstream kernel network developers said "no", so we went with that...)
Comment 6 Narendra K 2012-07-23 05:40:39 EDT
(In reply to comment #5)
> (follow-up question: are there actually cases where an IP configuration on a
> slave is actually valid when it's enslaved to a bond?  Upstream kernel
> network developers said "no", so we went with that...)

Hi Dan,

Thank you for looking into this.

I am not aware of any configuration where IP configuration is valid when it is enslaved to bond0. And that was not the expected config/behavior i was trying to set up.
Comment 7 Narendra K 2012-07-23 07:46:25 EDT
(In reply to comment #4)
> It looks like the bond isn't actually enslaving anything, and thus DHCP
> fails because of course there's no slave transport.  What is the expected
> setup here?  What devices are supposed to be slaves of bond0?
> 

The expected setup is 

master device -> bond0
slave device  -> em3

# cat ifcfg-bond0
DEVICE="bond0"
ONBOOT="yes"
BOOTPROTO="dhcp"
BONDING_OPTS="miimon=100 mode=balance-tlb"
BONDING_MASTER="yes"
NM_CONTROLLED="yes"

# cat ifcfg-em3
#NETBOOT="yes"
#HWADDR="d4:ae:52:81:cc:6c"
BOOTPROTO="none"
DEVICE="em3"
ONBOOT="yes"
MASTER="bond0"
SLAVE="yes"
TYPE=Ethernet
NM_CONTROLLED="yes"

> NM is also creating "default wired" connections for all the interfaces that
> don't have configuration, which is causing em3 to be started on its own, not
> as a slave of bond0.  That shouldn't be happening if ifcfg-em3 is set up
> correctly (barring bugs in NM of course).
> 
> Actually, looking back through the logs we have:
> 
> Jul 10 18:25:39 r720-fedora17 NetworkManager[2971]:    ifcfg-rh:     error:
> No IP configuration allowed for bonding slave
> 
> so ifcfg-em3 is ignored because the config is invalid.  If you remove the 
> IP configuration details from ifcfg-em3, do things work as expected?

Thank you pointing out that. Just curious here though, the BOOTPROTO="none" and ONBOOT="yes" are considered as IP configuration directives though the tradidtional ifup method uses these in the slave configuration.
Comment 8 Narendra K 2012-07-23 08:01:24 EDT
(In reply to comment #4)

[I hit enter before completely typing the details. sorry.

> 
> Actually, looking back through the logs we have:
> 
> Jul 10 18:25:39 r720-fedora17 NetworkManager[2971]:    ifcfg-rh:     error:
> No IP configuration allowed for bonding slave
> 
> so ifcfg-em3 is ignored because the config is invalid.  If you remove the 
> IP configuration details from ifcfg-em3, do things work as expected?

I added another slave 'em4' just so that we have two slaves. I commented 'BOOTPROTO' and 'ONBOOT' fields from ifcfg-em3 and ifcfg-em4.

- above warnings went away. 
- now the slave devices 'em3' and 'em4' do not have IP address.
- but the slaves were not enslaved

observe that the message 'Jul 10 18:25:39 r720-fedora17 NetworkManager[2971]:    ifcfg-rh:     error: No IP configuration allowed for bonding slave' is not seen anymore.


Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-em4 ...
Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh:     read connection 'System em4'
Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-em3 ...
Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh:     read connection 'System em3'
Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-bond0 ...
Jul 23 17:24:40 r720-fedora17 NetworkManager[3450]:    ifcfg-rh:     read connection 'Bond bond0'

Here is the configuration files for 'bond0', 'em3' and 'em4'.

# cat ifcfg-bond0
DEVICE="bond0"
BOOTPROTO="dhcp"
ONBOOT="yes"
NM_CONTROLLED="yes"
BONDING_MASTER="yes"
BONDING_OPTS="miimon=50 mode=balance-tlb"

# cat ifcfg-em3
#BOOTPROTO="none" <-------- commented
#BOOTPROTO="dhcp" <-------- commented
DEVICE="em3"
#ONBOOT="yes"     <-------- commented
MASTER="bond0"
SLAVE="yes"
TYPE=Ethernet
NM_CONTROLLED="yes"
HWADDR="d4:ae:52:81:cc:6c"


# cat ifcfg-em4
#BOOTPROTO="none"  <-------- commented
#BOOTPROTO="dhcp"  <-------- commented
DEVICE="em4"
#ONBOOT="yes"      <-------- commented
MASTER="bond0"
SLAVE="yes"
TYPE=Ethernet
NM_CONTROLLED="yes"
HWADDR="d4:ae:52:81:cc:6e"


# cat /etc/modprobe.d/bonding.conf
alias bond0 bonding


Hi Dan,

Please let me if there is anything missing from the configuration files. Or if there is any additional configuration required.
Comment 9 Narendra K 2012-07-23 10:42:15 EDT
Hi Dan,

While exploring the messages from '/var/log/messages' i observed the following messages -

Jul 23 17:25:32 r720-fedora17 NetworkManager[3450]: <info> (bond0): taking down device.
Jul 23 17:25:32 r720-fedora17 NetworkManager[3450]: <info> (bond0): bringing up device.

They seem to be coming from 'src/nm-device-bond.c'

real_act_stage1_prepare
   nm_device_hw_take_down
   nm_system_apply_bonding_config
   nm_device_hw_bring_up

I was expecting the slaves to be enslaved somewhere in 'nm_system_apply_bonding_config', but could not find any code related to enslaving the slaves. But it does seem to be applying 'miimon' and 'mode' here.

I could be missing some thing here as i am still trying to get familiar with the NM code/implementation. It would be great if you could provide few pointers on how the bonding set up function call sequence is expected to be.
Comment 10 Dan Winship 2012-07-23 13:01:29 EDT
fixed upstream to just ignore the IP configuration in bond slave ifcfg files, since that's what the scripts do.

leaving open for other issues
Comment 11 Narendra K 2012-07-24 09:54:56 EDT
(In reply to comment #10)
> fixed upstream to just ignore the IP configuration in bond slave ifcfg
> files, since that's what the scripts do.
> 
> leaving open for other issues

Hi, thank you.

The issue of NM not setting up bonding is still seen with 0.9.4 in Fedora 17.
Comment 12 Narendra K 2012-07-24 10:22:19 EDT
1. While trying to figure out the sequence of function calls to create bonding, i had few prints in the function 

'real_act_stage1_prepare' from 'src/nm-device.c'. Interestingly, i see that the prints only show 'bond0' and i do not see 'em3' and 'em4' hitting this function.

The following code block looked interesting -

 if (master_ac && NM_IS_ACT_REQUEST (master_ac)) {
                /* FIXME: handle VPNs here too */

                master = NM_DEVICE (nm_act_request_get_device (NM_ACT_REQUEST (master_ac)));
                g_assert (master);
                               if (!nm_device_enslave_slave (master, self)) {
                        ret = NM_ACT_STAGE_RETURN_FAILURE;
                        nm_log_info (LOGD_DEVICE, " **naren** in real_act_stage1_prepare, 'nm_device_enslave_slave' failed", nm_device_get_iface (self));
                }
        }

It seems like this is the code block which enslaves the slaves to bonding device. For that to succeed, this function should be called for both 'em3' and 'em4'. Then the above code block retrieves their master device and issues 'nm_device_enslave_slave (master, self)' for each slave. But it seems like due to some issue, this function is not called for 'em3' and 'em4'.

2. Also, the following messages in syslog look interesting -

[ifcfg-rh did not complain about the config files]

Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em3): carrier is OFF
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em3): new Ethernet device (driver: 'bnx2x' ifindex: 5)
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em3): exported as /org/freedesktop/NetworkManager/Devices/3
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em3): now managed
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em3): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em3): bringing up device.

[...]

Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): carrier is OFF
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): new Ethernet device (driver: 'bnx2x' ifindex: 6)
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): exported as /org/freedesktop/NetworkManager/Devices/4
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): now managed
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): bringing up device.
Jul 24 17:50:04 r720-fedora17 kernel: [89951.566452] bnx2x 0000:01:00.3: em4: using MSI-X  IRQs: sp 211  fp[0] 213 ... fp[14] 227
Jul 24 17:50:04 r720-fedora17 kernel: [89951.767905] ADDRCONF(NETDEV_UP): em4: link is not ready
Jul 24 17:50:04 r720-fedora17 NetworkManager[15677]: <info> (em4): preparing device.

[...]

Observe the following messages related to 'em3' and 'em4' showing that the activation is 'waiting on dependency 'Bond Bond0'


Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Auto-activating connection 'Bond bond0'.
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) starting connection 'Bond bond0'
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (bond0): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 1 of 5 (Device Prepare) scheduled...
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Auto-activating connection 'System em3'.
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (em3) starting connection 'System em3'
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (em3): device state change: disconnected -> prepare (reason 'none') [30 40 0]

<-------------
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (em3) connection 'System em3' waiting on dependency 'Bond bond0'
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Auto-activating connection 'System em4'.
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (em4) starting connection 'System em4'
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (em4): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (em4) connection 'System em4' waiting on dependency 'Bond bond0'
<------------------------

Prints i put in 
<info> Activation (bond0) Stage 1 of 5 (Device Prepare) started...

Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (bond0)  in real_act_stage1_prepare...
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (bond0)  in real_act_stage1_prepare, 'req' is true
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (bond0)  in real_act_stage1_prepare,'master_ac' is false
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (bond0) in real_act_stage1_prepare,NM_IS_ACT_REQUEST (master_ac) is false

I expected to see this sequence for 'em3' and 'em4' also. Meanwhile the 'bond0' device is goint through the Activation stage 1 to 5

Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 2 of 5 (Device Configure) scheduled...
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 1 of 5 (Device Prepare) complete.
Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 2 of 5 (Device Configure) starting...

Now observe that the DHCP is launched on bond0 and it times out. And also NM reports that activation of dependency for em3 failed.

Jul 24 17:50:08 r720-fedora17 NetworkManager[15677]: <info> (bond0): DHCPv4 state changed nbi -> preinit
Jul 24 17:50:08 r720-fedora17 dhclient[15689]: Listening on LPF/bond0/00:00:00:00:00:00
Jul 24 17:50:08 r720-fedora17 dhclient[15689]: Sending on   LPF/bond0/00:00:00:00:00:00
Jul 24 17:50:08 r720-fedora17 dhclient[15689]: Sending on   Socket/fallback
Jul 24 17:50:08 r720-fedora17 dhclient[15689]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 6 (xid=0x84b17d2)


2<------------------
Jul 24 17:50:39 r720-fedora17 dhclient[15689]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 13 (xid=0x84b17d2)
Jul 24 17:50:52 r720-fedora17 dhclient[15689]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 17 (xid=0x84b17d2)
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <warn> (bond0): DHCPv4 request timed out.
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> (bond0): canceled DHCP transaction, DHCP client pid 15689
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 4 of 5 (IPv4 Configure Timeout) scheduled...
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 4 of 5 (IPv4 Configure Timeout) started...
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> (bond0): device state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5]
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <warn> Activation (em3) connection 'System em3' dependency failed
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <warn> Activation (em4) connection 'System em4' dependency failed
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <warn> Activation (bond0) failed.
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) Stage 4 of 5 (IPv4 Configure Timeout) complete.
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> (em3): device state change: prepare -> failed (reason 'dependency-failed') [40 120 50]
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <warn> Activation (em3) failed.
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> (em4): device state change: prepare -> failed (reason 'dependency-failed') [40 120 50]
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <warn> Activation (em4) failed.
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> (bond0): device state change: failed -> disconnected (reason 'none') [120 30 0]
Jul 24 17:50:53 r720-fedora17 NetworkManager[15677]: <info> (bond0): deactivating device (reason 'none') [0]
Jul 24 17:50:56 r720-fedora17 NetworkManager[15677]: <info> Auto-activating connection 'Bond bond0'.
Jul 24 17:50:56 r720-fedora17 NetworkManager[15677]: <info> Activation (bond0) starting connection 'Bond bond0'
2<-------------------------

From the messages it seems like the enslaving of slaves is dependent on 'bond0' getting activated and slaves seem to be waiting for some kind of notification about it. But the operation times out because for the bond0 to be activated it requires a slave so that DHCPv4 could succeed. The sequence would not succeed.
Comment 13 Narendra K 2012-07-24 15:17:57 EDT
Continuing further on lines of observations made in comment #12 and to confirm the observations, i made the following change and got NM to enslave both 'em3' and 'em4'.

@@ -3254,7 +3278,7 @@ nm_device_activate (NMDevice *self, NMActRequest *req, GError **error)
                                                                            G_CALLBACK (act_dep_result_cb),
                                                                            self);
                        priv->act_dep_timeout_id = g_timeout_add_seconds (60, act_dep_timeout_cb, self);
-                       break;
+//                     break;
                default:
                        g_warn_if_reached ();
                        /* fall through */

The change in 'nm_device_activate' in 'src/nm-device.c' allows the slave device activation code path to fall through and hit the switch case

case NM_ACT_REQUEST_DEP_RESULT_READY:
                        nm_device_activate_schedule_stage1_device_prepare (self);
                        break;
                }

And eventuall 'real_act_stage1_prepare' is called where the 'em3' and 'em4' are enslaved.

See the messages from syslog -

Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> (em4): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) connection 'System em4' waiting on dependency 'Bond bond0'
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: (nm-device.c:3283):nm_device_activate: code should not be reached
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 1 of 5 (Device Prepare) scheduled...
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 1 of 5 (Device Prepare) started...
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info>  (em4)  in real_act_stage1_prepare...
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info>  (em4)  in real_act_stage1_prepare, 'req' is true
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info>  (em4) in real_act_stage1_prepare,'master_ac' is true
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info>  (em4) in real_act_stage1_prepare,NM_IS_ACT_REQUEST (master_ac) is true
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> in real_act_stage1_prepare, calling nm_device_enslave_slave
Jul 25 00:18:15 r720-fedora17 dbus[1375]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> (em4): taking down device.
Jul 25 00:18:15 r720-fedora17 dbus-daemon[1375]: dbus[1375]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
Jul 25 00:18:15 r720-fedora17 dbus-daemon[1375]: dbus[1375]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 25 00:18:15 r720-fedora17 dbus[1375]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 25 00:18:15 r720-fedora17 kernel: [113208.408560] bnx2x 0000:01:00.3: em4: using MSI-X  IRQs: sp 211  fp[0] 213 ... fp[14] 227


<------------
Jul 25 00:18:15 r720-fedora17 kernel: [113208.611319] bonding: bond0: enslaving em4 as an active interface with a down link.
<-----------


Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 2 of 5 (Device Configure) scheduled...
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 1 of 5 (Device Prepare) complete.
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> (em4): carrier now OFF (device state 40, deferring action for 4 seconds)
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 2 of 5 (Device Configure) starting...
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> (em4): device state change: prepare -> config (reason 'none') [40 50 0]
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 2 of 5 (Device Configure) successful.
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 3 of 5 (IP Configure Start) scheduled.
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 2 of 5 (Device Configure) complete.
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 3 of 5 (IP Configure Start) started...
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> (em4): device state change: config -> ip-config (reason 'none') [50 70 0]
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) Stage 3 of 5 (IP Configure Start) complete.
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> (em4): device state change: ip-config -> activated (reason 'none') [70 100 0]
Jul 25 00:18:15 r720-fedora17 NetworkManager[30727]: <info> Activation (em4) successful, device activated.
                                                                                                                                          
It seems most likely that there is some kind of state is not being set for 'master' device on which slave configuration is dependent.

1. bond0 configuration started
2. slave em3 configuration started -> but postponed as it has dependency on bond0
3. slave em4 configuration started -> but postponed as it has dependency on bond0

Now some processing as part of point 1 needs to set a state which will be processed in 'act_dep_result_cb' which would process the 'result' being equal to 'NM_ACT_REQUEST_DEP_RESULT_READY' and launch 'nm_device_activate_schedule_stage1_device_prepare(self)' so that the activation proceeds and enslaving succeeds.

Hi Dan,

It would be great if you could share your thoughts on these observations and what could lead to it ? Also, curious to know which code sets 'NMActRequestDependencyResult result' and calls the callback 'act_dep_result_cb'.
Comment 14 Narendra K 2012-08-08 07:17:37 EDT
I applied the following patch to NM in fedora 17 and rebuilt it.

ifcfg-rh: ignore IP config on bond slaves
http://cgit.freedesktop.org/NetworkManager/NetworkManager/patch/?id=3324bd2cdfffb65bee52370bd1046814e367c212

It does not address the original issue of NM failing to set up bonding. NM does not enslave the designated interfaces to bond0. Here are my configuration files -

# cat ifcfg-bond0   <------ bonding master device
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=dhcp
BONDING_OPTS="miimon=100 mode=balance-alb"
NM_CONTROLLED="yes"
BONDING_MASTER="yes"

# cat ifcfg-p4p1    <---------- slave 1
NM_CONTROLLED="yes"
HWADDR="A0:36:9F:01:E3:68"
DEVICE="p4p1"
MASTER=bond0
SLAVE=yes


# cat ifcfg-p4p2    <----------- slave 2
NM_CONTROLLED="yes"
HWADDR="A0:36:9F:01:E3:6A"
DEVICE="p4p2"
MASTER=bond0
SLAVE=yes

# ifconfig bond0
bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
        ether 00:00:00:00:00:00  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 


# cat /sys/class/net/bond0/bonding/slaves
<no slaves>





syslog has the following messages -

NetworkManager[19648]:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-p4p1 ...
NetworkManager[19648]: <warn> failed to allocate link cache: (-10) Operation not supported
NetworkManager[19648]:    ifcfg-rh:     warning: ignoring IP6 configuration
NetworkManager[19648]:    ifcfg-rh:     warning: ignoring IP4 configuration
NetworkManager[19648]:    ifcfg-rh:     read connection 'System p4p1'

[...]

NetworkManager[19648]: <info> Activation (p4p1) connection 'System p4p1' waiting on dependency 'Bond bond0'

[...]
 DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 13 (xid=0x5b43758a)
 <warn> (bond0): DHCPv4 request timed out.
<info> (bond0): canceled DHCP transaction, DHCP client pid 19650
<info> Activation (bond0) Stage 4 of 5 (IPv4 Configure Timeout) scheduled...
<info> Activation (bond0) Stage 4 of 5 (IPv4 Configure Timeout) started...

<warn> Activation (p4p1) connection 'System p4p1' dependency failed
<warn> Activation (p4p2) connection 'System p4p2' dependency failed
<warn> Activation (bond0) failed.
<info> (p4p1): device state change: prepare -> failed (reason 'dependency-failed') [40 120 50]
 <info> (p4p2): device state change: prepare -> failed (reason 'dependency-failed') [40 120 50]
<warn> Activation (p4p2) failed.
 <info> (bond0): device state change: failed -> disconnected (reason 'none') [120 30 0]
<info> (bond0): deactivating device (reason 'none') [0]
Comment 15 Pavel Šimerda (pavlix) 2012-08-29 06:20:30 EDT
Related to:

https://bugzilla.gnome.org/show_bug.cgi?id=540995
Comment 16 Fedora End Of Life 2013-07-03 18:00:55 EDT
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 17 Fedora End Of Life 2013-07-31 19:36:10 EDT
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.