Bug 1884101 - [SDN] upgrade from 4.5 to 4.6 fails, systemd openvswitch.service is not enabled, ovs-vswitchd is not running on the host
Summary: [SDN] upgrade from 4.5 to 4.6 fails, systemd openvswitch.service is not enabl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-01 02:48 UTC by Ross Brattain
Modified: 2020-11-28 12:48 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:47:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 816 0 None closed Bug 1884101: Fixes systemd ovs check for ovn/sdn 2021-02-08 10:01:54 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:47:22 UTC

Description Ross Brattain 2020-10-01 02:48:03 UTC
Description of problem:

Upgrade fails, OVS is not running so the network is not functional.


Upgraded from 4.5.0-0.nightly-2020-09-28-124031 to 4.6.0-0.nightly-2020-09-30-145011 on Azure.

The change in https://bugzilla.redhat.com/show_bug.cgi?id=1874696 changed the fallback to running OVS in a container so now we fail if openvswitch.service is not enabled.

During the upgrade some  the node that switch to host OVS get stuck.

Upgrading from 4.5.0-0.nightly-2020-09-28-124031 to 4.6.0-0.nightly-2020-09-30-091659 succeeded on AWS.


Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-30-145011

How reproducible:


Steps to Reproduce:
1. Upgrade from 4.5.0-0.nightly-2020-09-28-124031 to 4.6.0-0.nightly-2020-09-30-145011 on Azure
2.
3.

Actual results:

Nodes are stuck in SchedulingDisabled, openvswitch.service is not enabled, ovs-vswitchd is not running

Expected results:

openvswitch.service is enabled, ovs-vswitchd is running


Additional info:




sh-4.4# systemctl status openvswitch
● openvswitch.service - Open vSwitch
Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
Active: inactive (dead)

sh-4.4# systemctl status ovs-vswitchd
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
  Drop-In: /etc/systemd/system/ovs-vswitchd.service.d
           └─10-ovs-vswitchd-restart.conf
   Active: inactive (dead)

sh-4.4# ls -l /etc/systemd/system/multi-user.target.wants/openvswitch.service
ls: cannot access '/etc/systemd/system/multi-user.target.wants/openvswitch.service': No such file or directory


ovs container logs

openvswitch is running in container
Starting ovsdb-server.
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
Configuring Open vSwitch system IDs.
Enabling remote OVSDB managers.
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
Starting ovs-vswitchd.
Enabling remote OVSDB managers.
2020-09-30 21:11:30 info: Loading previous flows ...
2020-09-30 21:11:30 info: Adding br0 if it doesn't exist ...
2020-09-30 21:11:30 info: Created br0, now adding flows ...
+ ovs-ofctl add-tlv-map br0 ''
2020-09-30T21:11:30Z|00001|vconn|WARN|unix:/var/run/openvswitch/br0.mgmt: version negotiation failed (we support version 0x01, peer supports version 0x04)
ovs-ofctl: br0: failed to connect to socket (Broken pipe)
+ ovs-ofctl -O OpenFlow13 add-groups br0 /var/run/openvswitch/ovs-save.nVSt9McrJW/br0.groups.dump
+ ovs-ofctl -O OpenFlow13 replace-flows br0 /var/run/openvswitch/ovs-save.nVSt9McrJW/br0.flows.dump
+ rm -rf /var/run/openvswitch/ovs-save.nVSt9McrJW
2020-09-30 21:11:30 info: Done restoring the existing flows ...
2020-09-30 21:11:30 info: Remove other config ...
2020-09-30 21:11:30 info: Removed other config ...
2020-09-30T21:11:29.736Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2020-09-30T21:11:29.741Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.11.5
2020-09-30T21:11:29.748Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer
2020-09-30T21:11:29.748Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer)
2020-09-30T21:11:30.131Z|00031|bridge|INFO|bridge br0: added interface vethec1140a0 on port 4
2020-09-30T21:11:30.132Z|00032|bridge|INFO|bridge br0: added interface br0 on port 65534
2020-09-30T21:11:30.132Z|00033|bridge|INFO|bridge br0: added interface vetha0b45f6d on port 6
2020-09-30T21:11:30.132Z|00034|bridge|INFO|bridge br0: added interface vethe77d62ce on port 10
2020-09-30T21:11:30.132Z|00035|bridge|INFO|bridge br0: using datapath ID 00001a0aabc20744
2020-09-30T21:11:30.132Z|00036|connmgr|INFO|br0: added service controller "punix:/var/run/openvswitch/br0.mgmt"
2020-09-30T21:11:30.135Z|00037|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.11.5
2020-09-30T21:11:30.197Z|00038|vconn|WARN|unix#0: version negotiation failed (we support version 0x04, peer supports version 0x01)
2020-09-30T21:11:30.197Z|00039|rconn|WARN|br0<->unix#0: connection dropped (Protocol error)
2020-09-30T21:11:30.252Z|00040|connmgr|INFO|br0<->unix#6: 111 flow_mods in the last 0 s (111 adds)
2020-09-30T21:11:39.747Z|00005|memory|INFO|7496 kB peak resident set size after 10.0 seconds
2020-09-30T21:11:39.747Z|00006|memory|INFO|cells:652 json-caches:1 monitors:2 sessions:2
2020-09-30T21:11:40.138Z|00041|memory|INFO|59596 kB peak resident set size after 10.3 seconds
2020-09-30T21:11:40.138Z|00042|memory|INFO|handlers:1 ports:10 revalidators:1 rules:115 udpif keys:132
2020-09-30T21:18:15.278Z|00043|connmgr|INFO|br0<->unix#58: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:18:15.309Z|00044|connmgr|INFO|br0<->unix#61: 4 flow_mods in the last 0 s (4 deletes)
2020-09-30T21:18:15.339Z|00045|bridge|INFO|bridge br0: deleted interface veth956fb903 on port 3
2020-09-30T21:18:26.104Z|00046|bridge|INFO|bridge br0: added interface vethd3e3323a on port 12
2020-09-30T21:18:26.142Z|00047|connmgr|INFO|br0<->unix#64: 5 flow_mods in the last 0 s (5 adds)
2020-09-30T21:18:26.183Z|00048|connmgr|INFO|br0<->unix#67: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:28:05.860Z|00049|connmgr|INFO|br0<->unix#132: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:28:06.011Z|00050|connmgr|INFO|br0<->unix#137: 4 flow_mods in the last 0 s (4 deletes)
2020-09-30T21:28:06.121Z|00051|bridge|INFO|bridge br0: deleted interface vetha0b45f6d on port 6
2020-09-30T21:28:06.256Z|00052|connmgr|INFO|br0<->unix#141: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:28:06.400Z|00053|connmgr|INFO|br0<->unix#144: 4 flow_mods in the last 0 s (4 deletes)
2020-09-30T21:28:06.725Z|00054|bridge|INFO|bridge br0: deleted interface veth7add96e2 on port 9
2020-09-30T21:28:06.878Z|00055|connmgr|INFO|br0<->unix#147: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:28:07.031Z|00056|connmgr|INFO|br0<->unix#150: 4 flow_mods in the last 0 s (4 deletes)
2020-09-30T21:28:07.334Z|00057|bridge|INFO|bridge br0: deleted interface vethec1140a0 on port 4
2020-09-30T21:28:07.471Z|00058|connmgr|INFO|br0<->unix#153: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:28:07.594Z|00059|connmgr|INFO|br0<->unix#156: 4 flow_mods in the last 0 s (4 deletes)
2020-09-30T21:28:07.675Z|00060|bridge|INFO|bridge br0: deleted interface vethe77d62ce on port 10
2020-09-30T21:28:08.166Z|00061|connmgr|INFO|br0<->unix#159: 2 flow_mods in the last 0 s (2 deletes)
2020-09-30T21:28:08.249Z|00062|connmgr|INFO|br0<->unix#162: 4 flow_mods in the last 0 s (4 deletes)
2020-09-30T21:28:08.376Z|00063|bridge|INFO|bridge br0: deleted interface vethac02a791 on port 8
2020-09-30 21:28:16 info: Saving flows ...
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
rm: cannot remove '/var/run/openvswitch/ovs-vswitchd.pid': No such file or directory
openvswitch is running in systemd
(objectpath '/org/freedesktop/systemd1/job/796',)
tail: cannot open '/host/var/log/openvswitch/ovs-vswitchd.log' for reading: No such file or directory
tail: cannot open '/host/var/log/openvswitch/ovsdb-server.log' for reading: No such file or directory
tail: '/host/var/log/openvswitch/ovsdb-server.log' has appeared;  following new file
2020-09-30T21:28:56.511Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2020-09-30T21:28:56.518Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.2
2020-09-30T21:28:58.661Z|00003|jsonrpc|WARN|unix#4: receive error: Connection reset by peer
2020-09-30T21:28:58.661Z|00004|reconnect|WARN|unix#4: connection dropped (Connection reset by peer)
2020-09-30T21:29:00.177Z|00005|jsonrpc|WARN|unix#7: receive error: Connection reset by peer
2020-09-30T21:29:00.177Z|00006|reconnect|WARN|unix#7: connection dropped (Connection reset by peer)
2020-09-30T21:29:06.526Z|00007|memory|INFO|7640 kB peak resident set size after 10.0 seconds
2020-09-30T21:29:06.526Z|00008|memory|INFO|cells:122 monitors:2 sessions:1
2020-09-30T21:29:44.579Z|00009|jsonrpc|WARN|unix#19: receive error: Connection reset by peer
2020-09-30T21:29:44.579Z|00010|reconnect|WARN|unix#19: connection dropped (Connection reset by peer)
2020-09-30T21:29:47.487Z|00011|jsonrpc|WARN|unix#21: receive error: Connection reset by peer
2020-09-30T21:29:47.487Z|00012|reconnect|WARN|unix#21: connection dropped (Connection reset by peer)
2020-09-30T21:29:52.488Z|00013|jsonrpc|WARN|unix#22: receive error: Connection reset by peer
2020-09-30T21:29:52.488Z|00014|reconnect|WARN|unix#22: connection dropped (Connection reset by peer)
2020-09-30T21:29:57.488Z|00015|jsonrpc|WARN|unix#23: receive error: Connection reset by peer
2020-09-30T21:29:57.488Z|00016|reconnect|WARN|unix#23: connection dropped (Connection reset by peer)
2020-09-30T21:30:02.484Z|00017|jsonrpc|WARN|unix#24: receive error: Connection reset by peer
2020-09-30T21:30:02.484Z|00018|reconnect|WARN|unix#24: connection dropped (Connection reset by peer)
2020-09-30T21:30:07.487Z|00019|jsonrpc|WARN|unix#25: receive error: Connection reset by peer
2020-09-30T21:30:07.487Z|00020|reconnect|WARN|unix#25: connection dropped (Connection reset by peer)
2020-09-30T21:30:12.494Z|00021|jsonrpc|WARN|unix#26: receive erro

Comment 1 Antonio Murdaca 2020-10-01 08:39:00 UTC
not sure why it's MCO since it has been assessed this is a network ovn thing, moving there

Comment 3 Dan Williams 2020-10-02 04:48:00 UTC
*** Bug 1883521 has been marked as a duplicate of this bug. ***

Comment 5 Ross Brattain 2020-10-09 00:18:38 UTC
Verified upgrade from 4.5.14 to 4.6.0-0.nightly-2020-10-08-043318 on

upi-on-vsphere/versioned-installer-vsphere_slave
ipi-on-osp/versioned-installer-https_proxy-etcd_encryption-ci

Comment 7 errata-xmlrpc 2020-10-27 16:47:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.