Bug 1123492

Summary: NetworkManager must be disabled in staypuft deployments
Product: Red Hat OpenStack Reporter: Lars Kellogg-Stedman <lars>
Component: rhel-osp-installerAssignee: Mike Burns <mburns>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: ajeain, amuller, breeler, lars, mburns, rhos-maint, sclewis, yeylon
Target Milestone: ga   
Target Release: Installer   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhel-osp-installer-0.1.6-3.el6ost Doc Type: Bug Fix
Doc Text:
In some deployment scenarios, puppet configures a NIC as part of a bridge. As a consequence, if NetworkManager is running, this change causes the puppet agent to terminate when the NIC being changed is the one in use. This has been fixed by disabling NetworkManager, so now puppet-runs no longer get killed mid-run as a result of a valid configuration change.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-21 18:06:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lars Kellogg-Stedman 2014-07-25 20:01:36 UTC
I am deploying a non-HA configuration and having staypuft configure external network connectivity.  During the initial puppet run when the system boots, br-ex is getting partially configured, but left with an address.  This leaves the system without any network connectivity, since I have configured staypuft to use eth0 (aka the provisioning interface) for external access.

This is the log from puppet:

Jul 25 19:33:12 mac52540036b16c.localdomain puppet-agent[1959]: (/Stage[main]/Neutron::Agents::Ovs/Neutron_plugin_ovs[OVS/bridge_mappi
ngs]/ensure) created
Jul 25 19:33:12 mac52540036b16c.localdomain ovs-vsctl[11004]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-br br-ex
Jul 25 19:33:12 mac52540036b16c.localdomain puppet-agent[1959]: (/Stage[main]/Neutron::Agents::Ovs/Neutron::Plugins::Ovs::Bridge[physnet-external:br-ex]/Vs_bridge[br-ex]/ensure) created
Jul 25 19:33:13 mac52540036b16c.localdomain ovs-vsctl[11021]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port br-ex eth0
Jul 25 19:33:14 mac52540036b16c.localdomain systemd[1]: Stopping Puppet agent...
Jul 25 19:33:14 mac52540036b16c.localdomain puppet-agent[1755]: Caught TERM; calling stop
Jul 25 19:33:14 mac52540036b16c.localdomain puppet-agent[1959]: Caught TERM; calling stop
Jul 25 19:33:14 mac52540036b16c.localdomain systemd[1]: Starting Puppet agent...
Jul 25 19:33:14 mac52540036b16c.localdomain systemd[1]: Started Puppet agent.
Jul 25 19:33:16 mac52540036b16c.localdomain puppet-agent[11059]: Starting Puppet client version 3.6.2
Jul 25 19:33:17 mac52540036b16c.localdomain puppet-agent[11069]: Unable to fetch my node definition, but the agent run will continue:

As you can see, puppet-agent stops immediately after the call to 'add-port br-ex eth0'. Looking at the system logs implicates NetworkManager:

Jul 25 19:33:14 mac52540036b16c.localdomain NetworkManager[632]: <info> (eth0): deactivating device (reason 'connection-removed') [38]
Jul 25 19:33:14 mac52540036b16c.localdomain dbus[526]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jul 25 19:33:14 mac52540036b16c.localdomain dbus-daemon[526]: dbus[526]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jul 25 19:33:14 mac52540036b16c.localdomain systemd[1]: Starting Network Manager Script Dispatcher Service...
Jul 25 19:33:14 mac52540036b16c.localdomain NetworkManager[632]: <info> (eth0): canceled DHCP transaction, DHCP client pid 1373
Jul 25 19:33:14 mac52540036b16c.localdomain dbus-daemon[526]: dbus[526]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 25 19:33:14 mac52540036b16c.localdomain dbus[526]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 25 19:33:14 mac52540036b16c.localdomain systemd[1]: Started Network Manager Script Dispatcher Service.
Jul 25 19:33:14 mac52540036b16c.localdomain systemd[1]: Stopping Puppet agent...
Jul 25 19:33:14 mac52540036b16c.localdomain puppet-agent[1755]: Caught TERM; calling stop
Jul 25 19:33:14 mac52540036b16c.localdomain puppet-agent[1959]: Caught TERM; calling stop

Here you can see that NM starts the dispatcher service to handle the disconnect event for eth0, and immediately puppet-agent gets killed.

Comment 1 Lars Kellogg-Stedman 2014-07-25 20:18:44 UTC
And with NetworkManager disabled, br-ex gets configured correctly and the deploy continues successfully:

Jul 25 20:16:24 mac52540036b16c.localdomain ovs-vsctl[11132]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-br br-ex
Jul 25 20:16:24 mac52540036b16c.localdomain kernel: device br-ex entered promiscuous mode
Jul 25 20:16:24 mac52540036b16c.localdomain puppet-agent[1969]: (/Stage[main]/Neutron::Agents::Ovs/Neutron::Plugins::Ovs::Bridge[physnet-external:br-ex]/Vs_bridge[br-ex]/ensure) created
Jul 25 20:16:25 mac52540036b16c.localdomain ovs-vsctl[11149]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port br-ex eth0
Jul 25 20:16:25 mac52540036b16c.localdomain kernel: device eth0 entered promiscuous mode
Jul 25 20:16:26 mac52540036b16c.localdomain ovs-vsctl[11244]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-ex eth0
Jul 25 20:16:26 mac52540036b16c.localdomain kernel: device eth0 left promiscuous mode
Jul 25 20:16:27 mac52540036b16c.localdomain ovs-vsctl[11300]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br br-ex
Jul 25 20:16:27 mac52540036b16c.localdomain kernel: device br-ex left promiscuous mode
Jul 25 20:16:27 mac52540036b16c.localdomain ovs-vsctl[11334]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --may-exist add-br br-ex -- set bridge br-ex other-config:hwaddr=52:54:00:36:b1:6c
Jul 25 20:16:27 mac52540036b16c.localdomain kernel: device br-ex entered promiscuous mode
Jul 25 20:16:28 mac52540036b16c.localdomain ovs-vsctl[11394]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --may-exist add-port br-ex eth0
Jul 25 20:16:28 mac52540036b16c.localdomain kernel: device eth0 entered promiscuous mode
Jul 25 20:16:28 mac52540036b16c.localdomain dhclient[11419]: DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 3 (xid=0x40c1bfaf)
Jul 25 20:16:28 mac52540036b16c.localdomain dhclient[11419]: DHCPREQUEST on br-ex to 255.255.255.255 port 67 (xid=0x40c1bfaf)
Jul 25 20:16:28 mac52540036b16c.localdomain dhclient[11419]: DHCPOFFER from 172.16.0.1
Jul 25 20:16:28 mac52540036b16c.localdomain dhclient[11419]: DHCPACK from 172.16.0.1 (xid=0x40c1bfaf)
Jul 25 20:16:30 mac52540036b16c.localdomain NET[11464]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Jul 25 20:16:30 mac52540036b16c.localdomain dhclient[11419]: bound to 172.16.0.6 -- renewal in 296 seconds.

Comment 8 Toni Freger 2014-08-07 12:17:30 UTC
NetworkManager is down as expected.

ruby193-rubygem-staypuft-0.1.22.el6ost

Comment 9 Assaf Muller 2014-08-12 13:51:09 UTC
This bug is only relevant when you use the same NIC for both provisioning and external access, correct?

Also, with NM off, before the puppet run, eth0 has an IP address, and when the run finishes that IP is on br-ex? This is not the case with NM on? If you would run with NM on and SSH into the machine on a different NIC, would br-ex end up with an IP or not?

Comment 10 errata-xmlrpc 2014-08-21 18:06:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1090.html