Bug 1498639 - rhosp-director: restarting network service on undercloud causes the vip IP to disappear.
Summary: rhosp-director: restarting network service on undercloud causes the vip IP to...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 12.0 (Pike)
Assignee: Angus Thomas
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-04 20:13 UTC by Alexander Chuzhoy
Modified: 2018-09-19 08:19 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-17 18:18:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1434621 1 urgent CLOSED [RHOSP10 Bug]: Connections to instances have been disconnected during overcloud upgrade. 2024-02-26 13:49:28 UTC
Red Hat Bugzilla 1491628 0 high CLOSED OSP11 -> OSP12 upgrade: Unable to spawn instance post upgrade: Failed to allocate the network(s), not rescheduling.", "c... 2022-08-02 18:02:03 UTC

Description Alexander Chuzhoy 2017-10-04 20:13:00 UTC
rhosp-director: restarting network service on undercloud causes the vip IP to disappear.

Environment:
openstack-tripleo-heat-templates-7.0.1-0.20170925173114.el7ost.1.noarch
puppet-keepalived-0.0.2-0.20170823225813.bbca37a.el7ost.noarch
keepalived-1.3.5-1.el7.x86_64
instack-undercloud-7.4.1-0.20170925172804.el7ost.noarch


Steps to reproduce:
(undercloud) [stack@undercloud-0 ~]$ sudo ip a show dev br-ctlplane
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 52:54:00:09:41:3d brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe09:413d/64 scope link 
       valid_lft forever preferred_lft forever


(undercloud) [stack@undercloud-0 ~]$ sudo service network restart
Restarting network (via systemctl):                        [  OK  ]

(undercloud) [stack@undercloud-0 ~]$ sudo ip a show dev br-ctlplane
10: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 52:54:00:09:41:3d brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe09:413d/64 scope link 
       valid_lft forever preferred_lft forever



Note: restarting keepalived restores the IP.

(undercloud) [stack@undercloud-0 ~]$ sudo systemctl restart keepalived

(undercloud) [stack@undercloud-0 ~]$ sudo ip a show dev br-ctlplane
10: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 52:54:00:09:41:3d brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe09:413d/64 scope link 
       valid_lft forever preferred_lft forever

Comment 1 Dan Sneddon 2017-10-04 21:06:32 UTC
Recent changes to os-net-config now add this to the OVS_EXTRA setting in ifcfg-br-ctlplane file:

set bridge br-ctlplane fail_mode=standalone -- del-controller br-ctlplane

These were added to address bugs that occurred due to changes in Neutron ML2/OVS.

Unfortunately, this is being run when the network service is restarted. This is causing the bridge to be deleted and recreated, and keepalived loses its IP addresses.

In theory, those commands only need to run on reboot after an upgrade. For a long-term fix, perhaps instead of writing these to every OVS bridge ifcfg file we should create a startup script to perform these commands that would run before keepalived starts, or even run once on reboot after upgrade.

Comment 2 Dan Sneddon 2017-10-04 21:10:33 UTC
(In reply to Dan Sneddon from comment #1)
> Recent changes to os-net-config now add this to the OVS_EXTRA setting in
> ifcfg-br-ctlplane file:
> 
> set bridge br-ctlplane fail_mode=standalone -- del-controller br-ctlplane

I believe it is only the "del-controller br-ctlplane" that is causing this issue. The bridge should already be set up for fail_mode standalone so that command will have no effect.

Comment 3 Dan Sneddon 2017-10-05 18:07:00 UTC
(In reply to Dan Sneddon from comment #2)
> (In reply to Dan Sneddon from comment #1)
> > Recent changes to os-net-config now add this to the OVS_EXTRA setting in
> > ifcfg-br-ctlplane file:
> > 
> > set bridge br-ctlplane fail_mode=standalone -- del-controller br-ctlplane
> 
> I believe it is only the "del-controller br-ctlplane" that is causing this
> issue. The bridge should already be set up for fail_mode standalone so that
> command will have no effect.

Jakub Libosvar responded to this:

"""
I don't understand how deleting the controller affects the VIP in the
undercloud case. AFAIK calling ifdown on ovs bridge deletes the bridge
from ovsdb and then VIP is not added back. Does the VIP disappearing
happen even without the del-controller command?
"""

Can you please try removing the "-- del-controller br-ctlplane" from the ifcfg-br-ctlplane file on the Undercloud and try restarting networking? If the VIP still disappears, then we know that this is not a regression and this issue has always been present. There isn't usually a reason to restart networking or ifdown/ifup the br-ctlplane interface once the undercloud is up and running.

Comment 4 Alexander Chuzhoy 2017-10-05 23:15:09 UTC
Removed,

Here's what the ifcfg file looks like now:

 [root@undercloud-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br-ctlplane 
# This file is autogenerated by os-net-config
DEVICE=br-ctlplane
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSBridge
MTU=1500
BOOTPROTO=static
IPADDR=192.168.24.1
NETMASK=255.255.255.0
OVS_EXTRA="set bridge br-ctlplane other-config:hwaddr=52:54:00:2d:a9:2b -- br-set-external-id br-ctlplane bridge-id br-ctlplane -- set bridge br-ctlplane fail_mode=standalone"

Restarted the network with "service network restart".


The problem is still there:
 [root@undercloud-0 ~]# ip a show dev br-ctlplane
12: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 52:54:00:2d:a9:2b brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe2d:a92b/64 scope link 
       valid_lft forever preferred_lft forever

Comment 5 Bob Fournier 2017-10-24 14:42:26 UTC
Since this isn't a regression, as shown in comment 4, we're closing this as its not expected that the network service would be restarted on the undercloud.

Comment 6 Udi Shkalim 2018-09-17 16:22:47 UTC
(In reply to Bob Fournier from comment #5)
> Since this isn't a regression, as shown in comment 4, we're closing this as
> its not expected that the network service would be restarted on the
> undercloud.

Hi Bob,

Can you please explain why it is not expected that the network will be restarted?
The undercloud machine is still under the customer control. Adding different network configuration is an utmost valid use case and expecting not to restart the network is not a proper solution to the problem.

I have encountered a similar issue with ironic services and I believe this should be re-opened.

Comment 8 Bob Fournier 2018-09-17 18:12:14 UTC
Sure its reasonable that this should be fixed. We've come across a more general case though of the same problem - https://bugzilla.redhat.com/show_bug.cgi?id=1626357.  As that has a proposed solution I'm going to mark this as a duplicate of 1626357, unless there are objections.

Comment 9 Michele Baldessari 2018-09-17 18:18:22 UTC
(In reply to Bob Fournier from comment #8)
> Sure its reasonable that this should be fixed. We've come across a more
> general case though of the same problem -
> https://bugzilla.redhat.com/show_bug.cgi?id=1626357.  As that has a proposed
> solution I'm going to mark this as a duplicate of 1626357, unless there are
> objections.

Fully agreed, it is the same issue and Harald also confirms that newer keepalived makes things better. Thanks for the feedback, Bob.

*** This bug has been marked as a duplicate of bug 1626357 ***

Comment 10 Harald Jensås 2018-09-19 08:19:47 UTC
I am removing the duplicate flag and setting this to CLOSED - NEXTRELEASE.

We should have keepalived v2.0.6 or later in the next release, the new version of keepalived should fix the issue described in this bug.


Note You need to log in before you can comment on or make changes to this bug.