This bug was initially created as a copy of Bug #1885848 I am copying this bug because: Description of problem: Downgrade(4.6.0-0.nightly-2020-10-05-234751 -> 4.5.0-0.nightly-2020-10-05-204452) stuck on the network operator. Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-10-05-234751 4.5.0-0.nightly-2020-10-05-204452 How reproducible: Always Steps to Reproduce: 1.Install 4.6 cluster 2.Downgrade 4.6 to 4.5 cluster version 3. Actual results: Downgrade stuck on the network operator Expected results: Downgrade should successful. Additional info: $ oc logs ovs-6xdx4 ovsdb-server: /var/run/openvswitch/ovsdb-server.pid: pidfile check failed (No such process), aborting Starting ovsdb-server ... failed!
ovs pod crashed when downgrade from 4.6.0-0.nightly-2020-10-12-223649 to 4.5.0-0.nightly-2020-10-10-013307 oc get pod -n openshift-sdn NAME READY STATUS RESTARTS AGE ovs-4l7d8 1/1 Running 0 111m ovs-9k6qp 1/1 Running 0 105m ovs-9l9kc 1/1 Running 0 105m ovs-dcv72 0/1 CrashLoopBackOff 11 29m ovs-ms9d9 1/1 Running 0 111m ovs-qg4k7 1/1 Running 0 111m sdn-2fk22 1/1 Running 0 29m sdn-4qq2v 1/1 Running 0 29m sdn-5gvbh 1/1 Running 0 28m sdn-5wd6b 1/1 Running 0 29m sdn-controller-6c2j9 1/1 Running 0 29m sdn-controller-g74ll 1/1 Running 0 29m sdn-controller-z9dmk 1/1 Running 0 29m sdn-l2kmw 1/1 Running 0 29m sdn-n57x8 1/1 Running 0 28m $ oc logs ovs-dcv72 -n openshift-sdn openvswitch is running in systemd rm: cannot remove '/var/run/openvswitch/flows.sh': No such file or directory ==> /host/var/log/openvswitch/ovs-vswitchd.log <== 2020-10-13T07:48:36.295Z|00300|connmgr|INFO|br0<->unix#1033: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:48:38.176Z|00301|bridge|INFO|bridge br0: added interface veth00c406f2 on port 36 2020-10-13T07:48:38.221Z|00302|connmgr|INFO|br0<->unix#1036: 5 flow_mods in the last 0 s (5 adds) 2020-10-13T07:48:38.256Z|00303|connmgr|INFO|br0<->unix#1039: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:49:37.697Z|00304|connmgr|INFO|br0<->unix#1048: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:49:37.723Z|00305|connmgr|INFO|br0<->unix#1051: 4 flow_mods in the last 0 s (4 deletes) 2020-10-13T07:49:37.742Z|00306|bridge|INFO|bridge br0: deleted interface veth32348243 on port 33 2020-10-13T07:49:47.016Z|00307|connmgr|INFO|br0<->unix#1054: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:49:47.043Z|00308|connmgr|INFO|br0<->unix#1057: 4 flow_mods in the last 0 s (4 deletes) 2020-10-13T07:49:47.063Z|00309|bridge|INFO|bridge br0: deleted interface vethb1890334 on port 34 ==> /host/var/log/openvswitch/ovsdb-server.log <== 2020-10-13T06:11:45.485Z|00021|jsonrpc|WARN|unix#49: send error: Broken pipe 2020-10-13T06:11:45.485Z|00022|reconnect|WARN|unix#49: connection dropped (Broken pipe) 2020-10-13T06:11:45.568Z|00023|jsonrpc|WARN|unix#51: send error: Broken pipe 2020-10-13T06:11:45.568Z|00024|reconnect|WARN|unix#51: connection dropped (Broken pipe) 2020-10-13T06:27:10.031Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log 2020-10-13T06:27:10.047Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.2 2020-10-13T06:27:20.059Z|00003|memory|INFO|6292 kB peak resident set size after 10.0 seconds 2020-10-13T06:27:20.059Z|00004|memory|INFO|cells:358 monitors:3 sessions:2 2020-10-13T06:27:22.865Z|00005|jsonrpc|WARN|unix#22: send error: Broken pipe 2020-10-13T06:27:22.865Z|00006|reconnect|WARN|unix#22: connection dropped (Broken pipe) oc describe pod ovs-dcv72 -n openshift-sdn Name: ovs-dcv72 Namespace: openshift-sdn Priority: 2000001000 Priority Class Name: system-node-critical Node: ip-10-0-136-200.us-east-2.compute.internal/10.0.136.200 Start Time: Tue, 13 Oct 2020 15:27:28 +0800 Labels: app=ovs component=network controller-revision-hash=774dd84995 openshift.io/component=network pod-template-generation=2 type=infra Annotations: <none> Status: Running IP: 10.0.136.200 IPs: IP: 10.0.136.200 Controlled By: DaemonSet/ovs Containers: openvswitch: Container ID: cri-o://e6d53f5a2b387e3d0e9d0c5924477970c787e9d80f870b163c1df4f095db29f3 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:94012d3b73f7c59f93c8fb04eb85d25b85437b3eea72765166253d6ba79b8a34 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:94012d3b73f7c59f93c8fb04eb85d25b85437b3eea72765166253d6ba79b8a34 Port: <none> Host Port: <none> Command: /bin/bash -c #!/bin/bash set -euo pipefail chown -R openvswitch:openvswitch /var/run/openvswitch chown -R openvswitch:openvswitch /etc/openvswitch if [ -f /host/var/run/ovs-config-executed ]; then echo "openvswitch is running in systemd" # Don't need to worry about restoring flows; this can only change if we've rebooted rm /var/run/openvswitch/flows.sh || true exec tail -F /host/var/log/openvswitch/ovs-vswitchd.log /host/var/log/openvswitch/ovsdb-server.log # executes forever fi # if another process is listening on the cni-server socket, wait until it exits retries=0 while true; do if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1 sleep 15 & wait (( retries += 1 )) else break fi if [[ "${retries}" -gt 40 ]]; then echo "error: Another process is currently managing OVS, exiting" 2>&1 exit 1 fi done function quit { # Save the flows echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Saving flows ..." 2>&1 bridges=$(ovs-vsctl -- --real list-br) TMPDIR=/var/run/openvswitch /usr/share/openvswitch/scripts/ovs-save save-flows $bridges > /var/run/openvswitch/flows.sh echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Saved flows" 2>&1 # Don't allow ovs-vswitchd to clear datapath flows on exit kill -9 $(cat /var/run/openvswitch/ovs-vswitchd.pid 2>/dev/null) 2>/dev/null || true kill $(cat /var/run/openvswitch/ovsdb-server.pid 2>/dev/null) 2>/dev/null || true exit 0 } trap quit SIGTERM # launch OVS # Start the ovsdb so that we can prep it before we start the ovs-vswitchd /usr/share/openvswitch/scripts/ovs-ctl start --ovs-user=openvswitch:openvswitch --no-ovs-vswitchd --system-id=random --no-monitor # Set the flow-restore-wait to true so ovs-vswitchd will wait till flows are restored ovs-vsctl --no-wait set Open_vSwitch . other_config:flow-restore-wait=true # Restrict the number of pthreads ovs-vswitchd creates to reduce the # amount of RSS it uses on hosts with many cores # https://bugzilla.redhat.com/show_bug.cgi?id=1571379 # https://bugzilla.redhat.com/show_bug.cgi?id=1572797 if [[ `nproc` -gt 12 ]]; then ovs-vsctl --no-wait set Open_vSwitch . other_config:n-revalidator-threads=4 ovs-vsctl --no-wait set Open_vSwitch . other_config:n-handler-threads=10 fi # And finally start the ovs-vswitchd now the DB is prepped /usr/share/openvswitch/scripts/ovs-ctl start --ovs-user=openvswitch:openvswitch --no-ovsdb-server --system-id=random --no-monitor # Load any flows that we saved echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Loading previous flows ..." 2>&1 if [[ -f /var/run/openvswitch/flows.sh ]]; then echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Adding br0 if it doesn't exist ..." 2>&1 /usr/bin/ovs-vsctl --may-exist add-br br0 -- set Bridge br0 fail_mode=secure protocols=OpenFlow13 echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Created br0, now adding flows ..." 2>&1 mv /var/run/openvswitch/flows.sh /var/run/openvswitch/flows-old.sh sh -x /var/run/openvswitch/flows-old.sh echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Done restoring the existing flows ..." 2>&1 rm /var/run/openvswitch/flows-old.sh fi echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Remove other config ..." 2>&1 ovs-vsctl --no-wait --if-exists remove Open_vSwitch . other_config flow-restore-wait=true echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Removed other config ..." 2>&1 tail -F --pid=$(cat /var/run/openvswitch/ovs-vswitchd.pid) /var/log/openvswitch/ovs-vswitchd.log & tail -F --pid=$(cat /var/run/openvswitch/ovsdb-server.pid) /var/log/openvswitch/ovsdb-server.log & wait State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: penvswitch is running in systemd rm: cannot remove '/var/run/openvswitch/flows.sh': No such file or directory ==> /host/var/log/openvswitch/ovs-vswitchd.log <== 2020-10-13T07:48:36.295Z|00300|connmgr|INFO|br0<->unix#1033: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:48:38.176Z|00301|bridge|INFO|bridge br0: added interface veth00c406f2 on port 36 2020-10-13T07:48:38.221Z|00302|connmgr|INFO|br0<->unix#1036: 5 flow_mods in the last 0 s (5 adds) 2020-10-13T07:48:38.256Z|00303|connmgr|INFO|br0<->unix#1039: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:49:37.697Z|00304|connmgr|INFO|br0<->unix#1048: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:49:37.723Z|00305|connmgr|INFO|br0<->unix#1051: 4 flow_mods in the last 0 s (4 deletes) 2020-10-13T07:49:37.742Z|00306|bridge|INFO|bridge br0: deleted interface veth32348243 on port 33 2020-10-13T07:49:47.016Z|00307|connmgr|INFO|br0<->unix#1054: 2 flow_mods in the last 0 s (2 deletes) 2020-10-13T07:49:47.043Z|00308|connmgr|INFO|br0<->unix#1057: 4 flow_mods in the last 0 s (4 deletes) 2020-10-13T07:49:47.063Z|00309|bridge|INFO|bridge br0: deleted interface vethb1890334 on port 34 ==> /host/var/log/openvswitch/ovsdb-server.log <== 2020-10-13T06:11:45.485Z|00021|jsonrpc|WARN|unix#49: send error: Broken pipe 2020-10-13T06:11:45.485Z|00022|reconnect|WARN|unix#49: connection dropped (Broken pipe) 2020-10-13T06:11:45.568Z|00023|jsonrpc|WARN|unix#51: send error: Broken pipe 2020-10-13T06:11:45.568Z|00024|reconnect|WARN|unix#51: connection dropped (Broken pipe) 2020-10-13T06:27:10.031Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log 2020-10-13T06:27:10.047Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.2 2020-10-13T06:27:20.059Z|00003|memory|INFO|6292 kB peak resident set size after 10.0 seconds 2020-10-13T06:27:20.059Z|00004|memory|INFO|cells:358 monitors:3 sessions:2 2020-10-13T06:27:22.865Z|00005|jsonrpc|WARN|unix#22: send error: Broken pipe 2020-10-13T06:27:22.865Z|00006|reconnect|WARN|unix#22: connection dropped (Broken pipe) Exit Code: 137 Started: Tue, 13 Oct 2020 15:52:13 +0800 Finished: Tue, 13 Oct 2020 15:53:08 +0800 Ready: False Restart Count: 11 Requests: cpu: 100m memory: 400Mi Liveness: exec [/bin/bash -c #!/bin/bash /usr/bin/ovs-appctl -T 5 ofproto/list > /dev/null && /usr/bin/ovs-vsctl -t 5 show > /dev/null && if /usr/bin/ovs-vsctl -t 5 br-exists br0; then /usr/bin/ovs-ofctl -t 5 -O OpenFlow13 probe br0; else true; fi ] delay=15s timeout=21s period=5s #success=1 #failure=3 Readiness: exec [/bin/bash -c #!/bin/bash /usr/share/openvswitch/scripts/ovs-ctl status > /dev/null && /usr/bin/ovs-appctl -T 5 ofproto/list > /dev/null && /usr/bin/ovs-vsctl -t 5 show > /dev/null ] delay=15s timeout=11s period=5s #success=1 #failure=3 Environment: <none> Mounts: /etc/openvswitch from host-config-openvswitch (rw) /host from host-slash (ro) /lib/modules from host-modules (ro) /run/openvswitch from host-run-ovs (rw) /sys from host-sys (ro) /var/run/openvswitch from host-run-ovs (rw) /var/run/secrets/kubernetes.io/serviceaccount from sdn-token-4v95v (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True
I met the same issue when downgrade from 4.6 to latest 4.5 release to verfy storage downgrade issue (storage passed).
Aha, I suspect I see the issue. Do you have a cluster in this state for me to test something? Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.16 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4268