Bug 1590651
| Summary: | [UPGRADES][SPLIT-STACK] Cannot ssh to VM after major upgrade converge step | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> |
| Component: | os-net-config | Assignee: | Bob Fournier <bfournie> |
| Status: | CLOSED ERRATA | QA Contact: | Yurii Prokulevych <yprokule> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 13.0 (Queens) | CC: | amuller, augol, bcafarel, bfournie, ccamacho, chrisw, dalvarez, dsneddon, hbrock, hjensas, jschluet, jslagle, mandreou, mbultel, mburns, mcornea, nlevinki, nyechiel, sclewis, skaplons, srevivo, yprokule |
| Target Milestone: | ga | Keywords: | Triaged |
| Target Release: | 13.0 (Queens) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | os-net-config-8.4.1-4.el7ost | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-06-27 13:58:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yurii Prokulevych
2018-06-13 06:41:48 UTC
Hi, Is it the same cause as 1589684 ? @nlevinki: I don't think it's same issue. In sos reports attached there I don't see br-ex to be down and up again which caused this issue.
Problem here is that during upgrade process br-ex bridge interface was "restarted":
Jun 12 17:42:47 networker-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br br-ex
Jun 12 17:42:47 networker-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --may-exist add-br br-ex -- set bridge br-ex other-config:hwaddr=52:54:00:b9:fc:e0 -- set bridge br-ex fail_mode=standalone -- del-controller br-ex
This was triggered by os-net-config script which (probably) did some changes in one of files /etc/sysconfig/network-srcipts/{ifcfg-br-ex,route-br-ex,route6-br-ex}
After bridge was created again, it don't have proper openflow rules which should be created by neutron-openvswitch-agent and because of that, there is no connectivity to qrouter-XXX namespace.
As a workaround You may restart neutron_ovs_agent container and it will reconfigure flows on this bridge.
There is already patch merged to upstream Queens branch which adds monitoring of such external bridges, so ovs agent should reconfigure such bridge automatically without any restart.
BZ for that is: https://bugzilla.redhat.com/show_bug.cgi?id=1576286 and upstream patch: https://review.openstack.org/#/c/567145/
I've marked https://bugzilla.redhat.com/show_bug.cgi?id=1576286 as a blocker, we'll merge the fix right now. @dalvarez moved to POST but there is no tracker? can you confirm that https://review.openstack.org/#/c/567145/ should fix this? If so, can we add it as a tracker? Tracker is in https://bugzilla.redhat.com/show_bug.cgi?id=1576286 (which has blocker+ flag), should probably mark this one as depending on #1576286 or maybe duplicate? @Bernard: I wouldn't mark it as duplicate as in fact those one different issues where one is an result of another. So IMO "depends on" would be better here Agreed, depends on is better. This is *not* a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1576286 because in this RHBZ, we're seeing two issues: 1) Director upgrade via os-net-config is restarting ifcfg files, which also deletes and recreates br-ex 2) If (1) happens, the OVS doesn't reprogram flows on br-ex This RHBZ should track (1), while https://bugzilla.redhat.com/show_bug.cgi?id=1576286 is tracking (2). In light of comment 8 I'm moving this to HardProv DFG. I'd like to get some info on the configuration prior to upgrade. For example were the old-style nic config files being used and you needed to change to the new style configs (which is required in OSP-13)? Can you provide the nic configs and network environment files before and after upgrade (if different, otherwise just before)? Also, what was the deployment command that was run on upgrade (i.e. what files were included), and has that changed from the initial deployment? We believe that all interfaces and bridges are getting restarted on upgrade because the order of parameters in the ifcfg files has changed slightly in Queens due to this change - https://review.openstack.org/#/c/485132/9/os_net_config/impl_ifcfg.py. Here is an OSP-12 ifcfg file: [root@networker-1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br-ex # This file is autogenerated by os-net-config DEVICE=br-ex ONBOOT=yes HOTPLUG=no NM_CONTROLLED=no PEERDNS=no DEVICETYPE=ovs TYPE=OVSBridge <snip> Here is an OSP-13 ifcfg file: # This file is autogenerated by os-net-config DEVICE=br-ex HOTPLUG=no ONBOOT=yes <=== different location NM_CONTROLLED=no DEVICETYPE=ovs TYPE=OVSBridge <snip> os-net-config does a file diff between the existing ifcfg and what it intends to write and would treat this as a change requiring restart of devices. There is a patch upstream: https://review.openstack.org/#/c/575220/ FWIW in the upgrade tasks there is a workaround that prevents os-net-config from triggering the ifcfg restarts(running os-net-config with --no-activate option): https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L84-L90 But in case of the pre-deployed servers os-net-config gets updated before the upgrade tasks by: https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/deployed-server/deployed-server-bootstrap-rhel.sh#L5-L12 Hence the following workaround condition fails(os-net-config is already updated at the upgrade tasks time): https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L93 https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L74-L77 Verified with os-net-config-8.4.1-4.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |