Bug 1367580 - Change to initscripts causing NetworkManager to down OpenStack bridges
Summary: Change to initscripts causing NetworkManager to down OpenStack bridges
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
: 1366348 (view as bug list)
Depends On:
Blocks: 1393867 1400961 1406478
TreeView+ depends on / blocked
 
Reported: 2016-08-16 20:52 UTC by Dan Sneddon
Modified: 2020-12-14 07:41 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-10 14:33:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1364583 0 urgent CLOSED rhel-osp-director: upgrade from 7.3 ->8.0 , br-ex on the controllers was missing ip address and pacemaker was unable to ... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1366348 0 high CLOSED Recent Update to initscripts Causing OpenStack Bridge To Fail 2021-02-22 00:41:40 UTC

Description Dan Sneddon 2016-08-16 20:52:26 UTC
Description of problem:
A recent change to initscripts caused interfaces to be read by "nmcli con load <ifcfg>" even if NM_CONTROLLED=no. NetworkManager is causing the bridges used by OpenStack to go from up to down when this happens, causing outages.

Version-Release number of selected component (if applicable):
RHEL 7.2
initscripts 9.49.30-1.el7_2.3

How reproducible:
100%

Steps to Reproduce:
1. Run upgrade on Red Hat OpenStack Platform from version 7.3 to version 8.0

Actual results:
Part of the upgrade process involves running yum update, which causes the latest iniscripts RPM to be loaded. This causes NetworkManager to read every ifcfg file, even those with NM_CONTROLLED=no. NetworkManager appears to be causing the main OpenStack bridge to go down when the ifcfg file is read.

Expected results:
The existing br-ex bridge, which is up and running when the upgrade process begins, is brought down when changes to other network interfaces are made.

Additional info:
There is log information in the two attached BZs.
Red Hat OSP: https://bugzilla.redhat.com/show_bug.cgi?id=1364583
initscripts BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1366348

Comment 1 Dan Sneddon 2016-08-16 20:54:47 UTC
Relevant logs from https://bugzilla.redhat.com/show_bug.cgi?id=1366348

Here are the /var/log/messages logs from immediately after the bridge interface was brought up via "ifup br-ex". There is more info in the linked BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1364583

Aug  9 20:44:44 overcloud-controller-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-ex em1 -- add-port br-ex em1
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <info>  (em1): enslaved to non-master-type device ovs-system; ignoring
Aug  9 20:44:44 overcloud-controller-1 kernel: device em1 entered promiscuous mode
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <info>  ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-br-ex (f0123855-5f72-fb68-339e-ef4f1d038014,"System br-ex")
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <warn>  ifcfg-rh: Ignoring connection /etc/sysconfig/network-scripts/ifcfg-br-ex (f0123855-5f72-fb68-339e-ef4f1d038014,"System br-
ex") / device 'br-ex' due to NM_CONTROLLED=no.
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <info>  (br-ex): device state change: activated -> unmanaged (reason 'unmanaged') [100 10 3]
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <info>  NetworkManager state is now CONNECTED_LOCAL
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <info>  NetworkManager state is now DISCONNECTED
Aug  9 20:44:44 overcloud-controller-1 kernel: IPv6: ADDRCONF(NETDEV_UP): br-ex: link is not ready
Aug  9 20:44:44 overcloud-controller-1 NetworkManager[775]: <info>  (br-ex): link disconnected
Aug  9 20:44:44 overcloud-controller-1 dbus-daemon: dbus[758]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Aug  9 20:44:44 overcloud-controller-1 dbus[758]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Aug  9 20:44:44 overcloud-controller-1 systemd: Starting Network Manager Script Dispatcher Service...
Aug  9 20:44:44 overcloud-controller-1 dbus[758]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Aug  9 20:44:44 overcloud-controller-1 systemd: Started Network Manager Script Dispatcher Service.
Aug  9 20:44:44 overcloud-controller-1 dbus-daemon: dbus[758]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Aug  9 20:44:44 overcloud-controller-1 nm-dispatcher: Dispatching action 'down' for br-ex

Comment 3 Thomas Haller 2016-08-16 21:15:58 UTC
can you please enable debug logging for NetworkManager, reproduce the problem and attach the full logfile?

You do that, by editing /etc/NetworkManager/NetworkManager.conf to contain:

[logging]
level=TRACE

and restart NM.

Thank you

Comment 5 Mike Burns 2016-08-17 13:48:00 UTC
(In reply to Thomas Haller from comment #3)
> can you please enable debug logging for NetworkManager, reproduce the
> problem and attach the full logfile?
> 
> You do that, by editing /etc/NetworkManager/NetworkManager.conf to contain:
> 
> [logging]
> level=TRACE
> 
> and restart NM.
> 
> Thank you

Omri, can you do this? ^^

7->8 upgrade without the workaround to disable NM

Comment 6 Thomas Haller 2016-08-17 18:23:17 UTC
*** Bug 1366348 has been marked as a duplicate of this bug. ***

Comment 7 Thomas Haller 2016-08-18 08:09:14 UTC
related to bug 1363995

Comment 9 Thomas Haller 2016-11-18 17:09:41 UTC
some notes:


- when calling `ifup`, initscripts first do a `nmcli connection load` on the ifcfg-file. That is to ensure that NetworkManager has the current version of the file loaded.

- a device that is currently up and managed by NetworkManager, is taken down immediately, when the device becomes unmanaged. That happens for example when reloading the ifcfg-rh file with a change in NM_CONTROLLED=no.
"stopping managing the device" means to bring down the interface and clean it up. There is rh#1371433 which ask to release the device with leaving it up. That may make sense for special cases, but in general saying NM to "unmanage" a device should continue to bring the current device down. Leaving it up is anyway not something that works in general (e.g. DHCP addresses will timeout).


- the mentioned change in initscripts is http://pkgs.devel.redhat.com/cgit/rpms/initscripts/commit/?h=rhel-7.2&id=4aeb2f7ee2b31630ae5ff27e8046b5117b7f7a22 . That is, always call `nmcli connection load`, also if the ifcfg-rh file contains NM_CONTROLLED=no.
Note that this initscripts patch is correct. It only has effects, when the device is already managed by NerworkManager. When the user sets" NM_CONTROLLED=no" followed by ifup, it is correct that NetworkManager stops managing the device.



Why does upgrading initscripts package result in an ifup-call? That seems wrong, note that upgrading NetworkManager package does neither involve restarting the daemon nor changing networking. It does not do that on purpose, but of course that has other issues.

Maybe updating the initscripts RPM should just do nothing to the runtime configuration too.


I think NM is behaving as intended. Reassigning to initscripts for evaluation from their side.

Comment 20 David Kaspar // Dee'Kej 2017-05-19 13:08:19 UTC
Hello folks,

I'm really sorry, but I don't see anything in initscripts specfile that would do network restart or ifdown/ifup during update:

==============================================================

%pre
/usr/sbin/groupadd -g 22 -r -f utmp

%post
touch /var/log/wtmp /var/run/utmp /var/log/btmp
chown root:utmp /var/log/wtmp /var/run/utmp /var/log/btmp
chmod 664 /var/log/wtmp /var/run/utmp
chmod 600 /var/log/btmp

/usr/sbin/chkconfig --add network
/usr/sbin/chkconfig --add netconsole
if [ $1 -eq 1 ]; then
        /usr/bin/systemctl daemon-reload > /dev/null 2>&1 || :
fi

===============================================================

https://github.com/fedora-sysv/initscripts/blob/rhel7-branch/initscripts.spec#L77

You will have to debug this by yourself, guys. I do not have any knowledge regarding OpenStack. Sorry. :-/

Dee'Kej

Comment 21 Red Hat Bugzilla Rules Engine 2017-05-19 13:08:46 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.


Note You need to log in before you can comment on or make changes to this bug.