Bug 1590651 - [UPGRADES][SPLIT-STACK] Cannot ssh to VM after major upgrade converge step
Summary: [UPGRADES][SPLIT-STACK] Cannot ssh to VM after major upgrade converge step
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
Target Milestone: ga
: 13.0 (Queens)
Assignee: Bob Fournier
QA Contact: Yurii Prokulevych
Depends On:
TreeView+ depends on / blocked
Reported: 2018-06-13 06:41 UTC by Yurii Prokulevych
Modified: 2018-06-27 13:59 UTC (History)
22 users (show)

Fixed In Version: os-net-config-8.4.1-4.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2018-06-27 13:58:15 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
OpenStack gerrit 575220 None master: MERGED os-net-config: Restore the order of params in ifcfg file that was inadvertently changed (I77162d28b1dc173e3a90cb385a3af9... 2018-06-14 17:53:31 UTC
OpenStack gerrit 575432 None stable/queens: NEW os-net-config: Restore the order of params in ifcfg file that was inadvertently changed (I77162d28b1dc173e3a90cb385a3af9... 2018-06-14 17:53:26 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:59:06 UTC

Description Yurii Prokulevych 2018-06-13 06:41:48 UTC
Description of problem:
Cannot ssh to VM after/during major upgrade converge step though instances are reported active.

openstack server list -f yaml
- Flavor: v1-1G-5G
  ID: d19b8c8c-54cc-40f2-9ae7-d847bc68fe6d
  Image: upgrade_workload
  Name: instance_6e00778d92
  Networks: internal_net=,
  Status: ACTIVE
- Flavor: v1-1G-5G
  ID: f781803e-81c6-472d-8fed-f8887da08922
  Image: upgrade_workload
  Name: instance_5c39032710
  Networks: internal_net=,
  Status: ACTIVE

ssh cirros@
ssh: connect to host port 22: No route to host

ssh cirros@
ssh: connect to host port 22: No route to host

Version-Release number of selected component (if applicable):

Steps to Reproduce:
1. Install RHOS-12 with pre-provisioned servers(split-stack)
2. Upgrade UC to RHOS-13
3. Launch VM and associate floating ip to it, make sure it's reachable
4. Upgrade OC to RHOS-13
5. Try to reach VM with its FIP

Actual results:
VM is not reachable

Expected results:
VM is reachable:

Additional info:
Virtual split-stack environment: 3controllers + 3messaging + 3database + 3ceph + 2networker + 2compute

Comment 2 nlevinki 2018-06-13 08:42:23 UTC
Is it the same cause as 1589684 ?

Comment 3 Slawek Kaplonski 2018-06-13 09:39:31 UTC
@nlevinki: I don't think it's same issue. In sos reports attached there I don't see br-ex to be down and up again which caused this issue.

Problem here is that during upgrade process br-ex bridge interface was "restarted":

Jun 12 17:42:47 networker-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br br-ex
Jun 12 17:42:47 networker-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --may-exist add-br br-ex -- set bridge br-ex other-config:hwaddr=52:54:00:b9:fc:e0 -- set bridge br-ex fail_mode=standalone -- del-controller br-ex

This was triggered by os-net-config script which (probably) did some changes in one of files /etc/sysconfig/network-srcipts/{ifcfg-br-ex,route-br-ex,route6-br-ex}

After bridge was created again, it don't have proper openflow rules which should be created by neutron-openvswitch-agent and because of that, there is no connectivity to qrouter-XXX namespace.
As a workaround You may restart neutron_ovs_agent container and it will reconfigure flows on this bridge.

There is already patch merged to upstream Queens branch which adds monitoring of such external bridges, so ovs agent should reconfigure such bridge automatically without any restart.
BZ for that is: https://bugzilla.redhat.com/show_bug.cgi?id=1576286 and upstream patch:  https://review.openstack.org/#/c/567145/

Comment 4 Assaf Muller 2018-06-13 13:21:35 UTC
I've marked https://bugzilla.redhat.com/show_bug.cgi?id=1576286 as a blocker, we'll merge the fix right now.

Comment 5 Carlos Camacho 2018-06-13 14:48:16 UTC
@dalvarez moved to POST but there is no tracker? can you confirm that https://review.openstack.org/#/c/567145/ should fix this? If so, can we add it as a tracker?

Comment 6 Bernard Cafarelli 2018-06-13 14:51:42 UTC
Tracker is in https://bugzilla.redhat.com/show_bug.cgi?id=1576286 (which has blocker+ flag), should probably mark this one as depending on #1576286 or maybe duplicate?

Comment 7 Slawek Kaplonski 2018-06-13 14:53:02 UTC
@Bernard: I wouldn't mark it as duplicate as in fact those one different issues where one is an result of another. So IMO "depends on" would be better here

Comment 8 Assaf Muller 2018-06-13 15:22:46 UTC
Agreed, depends on is better. This is *not* a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1576286 because in this RHBZ, we're seeing two issues:

1) Director upgrade via os-net-config is restarting ifcfg files, which also deletes and recreates br-ex
2) If (1) happens, the OVS doesn't reprogram flows on br-ex

This RHBZ should track (1), while https://bugzilla.redhat.com/show_bug.cgi?id=1576286 is tracking (2).

Comment 9 Assaf Muller 2018-06-13 15:23:53 UTC
In light of comment 8 I'm moving this to HardProv DFG.

Comment 10 Bob Fournier 2018-06-13 15:47:49 UTC
I'd like to get some info on the configuration prior to upgrade. For example were the old-style nic config files being used and you needed to change to the new style configs (which is required in OSP-13)?  Can you provide the nic configs and network environment files before and after upgrade (if different, otherwise just before)?  Also, what was the deployment command that was run on upgrade (i.e. what files were included), and has that changed from the initial deployment?

Comment 13 Bob Fournier 2018-06-13 20:31:36 UTC
We believe that all interfaces and bridges are getting restarted on upgrade because the order of parameters in the ifcfg files has changed slightly in Queens due to this change - https://review.openstack.org/#/c/485132/9/os_net_config/impl_ifcfg.py.

Here is an OSP-12 ifcfg file:
[root@networker-1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br-ex
# This file is autogenerated by os-net-config
Here is an OSP-13 ifcfg file:
# This file is autogenerated by os-net-config
ONBOOT=yes   <=== different location

os-net-config does a file diff between the existing ifcfg and what it intends to write and would treat this as a change requiring restart of devices.

There is a patch upstream:

Comment 24 Marius Cornea 2018-06-15 21:40:19 UTC
FWIW in the upgrade tasks there is a workaround that prevents os-net-config from triggering the ifcfg restarts(running os-net-config with --no-activate option):


But in case of the pre-deployed servers os-net-config gets updated before the upgrade tasks by:


Hence the following workaround condition fails(os-net-config is already updated at the upgrade tasks time):



Comment 25 Yurii Prokulevych 2018-06-19 13:31:28 UTC
Verified with os-net-config-8.4.1-4.el7ost.noarch

Comment 27 errata-xmlrpc 2018-06-27 13:58:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.