Description of problem: In the containerized undercloud re-run removes the undercloud_admin_host and undercloud_public_host ip addresses if config for os-net-config is changed. The br-ctlplane interface is restarted by os-net-config and this removes the undercloud_admin_host and undercloud_public_host ip addresses set up by keepalived. The install/update operation fails later on because services fail to connect to the ip that is no longer there. Version-Release number of selected component (if applicable): Upstream current-dev used when I found the bug. How reproducible: 100% Steps to Reproduce: 1. Deploy undercloud 2. Change the undercloud_nameservers address in undercloud.conf sed -i s/undercloud_nameservers = <old-address>/undercloud_nameservers = <new-address>/g /home/stack/undercloud.conf 3. Re-run undercloud install openstack undercloud install Additional reproducer: ---------------------- 1. Deploy undercloud with routed networks enabled 2. Add more subnets to prepare the undercloud for scale out to additional routed networks leafs 3. Re-run the undercloud installer Because additional routes for the ctlplane network traffic is added, this causes os-net-config to re-run as well. And the restart of br-ctlplane kill's the VIP's. Actual results: Undercloud update fails. Expected results: Undercloud update should succeed. Additional info: 1. The os-net-config is config.json is updated with the new dnsserver. Every 5.0s: diff -aur /etc/os-net-config/config.json /tmp/os-net-config.json.orig Fri Sep 7 08:51:26 2018 --- /etc/os-net-config/config.json 2018-09-07 08:45:39.054174371 +0200 +++ /tmp/os-net-config.json.orig 2018-09-07 08:17:38.597808977 +0200 @@ -1 +1 @@ -{"network_config": [{"addresses": [{"ip_netmask": "172.20.0.200/26"}], "dns_servers": ["192.168.122.1"], "members": [{"mtu": 1500, "name": "eth1", "primary": true, "type": "interface"}], "name": "br-ctlplane", "ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "routes": [], "type": "ovs_bridge", "use_dhcp": false}]} +{"network_config": [{"addresses": [{"ip_netmask": "172.20.0.200/26"}], "dns_servers": ["172.20.0.254"], "members": [{"mtu": 1500, "name": "eth1", "primary": true, "type": "interface"}], "name": "br-ctlplane", " ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "routes": [], "type": "ovs_bridge", "use_dhcp": false}]} 2. After os-net-config applied config the keepalived VIPs are gone: Every 2.0s: ip addr show br-ctlplane Fri Sep 7 08:51:08 2018 47: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 52:54:00:7a:f6:c5 brd ff:ff:ff:ff:ff:ff inet 172.20.0.200/26 brd 172.20.0.255 scope global br-ctlplane valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe7a:f6c5/64 scope link valid_lft forever preferred_lft forever 3. The upgrade is stuck on starting the containers: TASK [Start containers for step 3] ********************************************** 4. Log's show that services are failing to connect to the database via the keepalived VIPs: /var/log/containers/nova/nova-compute.log:2018-09-07 08:52:47.462 6 ERROR oslo_service.periodic_task RemoteError: Remote error: DBConnectionError (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.20.0.201' ([Errno 113] EHOSTUNREACH)") (Background on this error at: http://sqlalche.me/e/e3q8)
Recent version of keepalived have support for 'dynamic_interfaces', looks like that would solve this problem. We would have to package keepalived 2.0.in RDO? And # Allow configuration to include interfaces that don't exist at startup. # This allows keepalived to work with interfaces that may be deleted and restored # and also allows virtual and static routes and rules on VMAC interfaces. dynamic_interfaces I built keepalived-2.0.6-1.el7.x86_64.rpm using the SRPM[1] from Fedora Rawhide in Centos 7. (With only a small tweak the RPM builds.) Enabling dynamic_interfaces and using 2.0.6 version of keepalived in the keepalived container fixes this issue. Suggest we package keepalived 2.0.x and place this in the OSP repositories. [1] https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/development/rawhide/Everything/source/tree/Packages/k/keepalived-2.0.6-1.fc29.src.rpm
*** Bug 1498639 has been marked as a duplicate of this bug. ***
I proposed the following change: https://review.openstack.org/603587 This implements a similar workaround used in pre-containerized undercloud, ensuring keepalived is restarted when the undercloud installer is run. This change fixes the problem described in this bug, causing some undercloud config changes to fail. It however does not fix the issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1498639, to fix that we would need a new version of keepalived.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045
Hi, It happens to me with OSP14 puddle: 2019-01-08.1
Per Comment 18, this bug should not be reopened, please open a new bug. In the bug please describe the changes that were made, i.e. what was the undercloud_nameservers before and after the change. Please include the link to sosreport in the new bug.