Bug 1886127
| Summary: | 4.5 clusters should handle systemd openvswitch from a 4.6 downgrade | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Casey Callendrello <cdc> |
| Component: | Networking | Assignee: | Casey Callendrello <cdc> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | bbennett, wduan, wking |
| Version: | 4.5 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.5.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-26 15:11:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1885848, 1886148 | ||
| Bug Blocks: | 1914958 | ||
|
Description
Casey Callendrello
2020-10-07 17:15:47 UTC
ovs pod crashed when downgrade from 4.6.0-0.nightly-2020-10-12-223649 to 4.5.0-0.nightly-2020-10-10-013307
oc get pod -n openshift-sdn
NAME READY STATUS RESTARTS AGE
ovs-4l7d8 1/1 Running 0 111m
ovs-9k6qp 1/1 Running 0 105m
ovs-9l9kc 1/1 Running 0 105m
ovs-dcv72 0/1 CrashLoopBackOff 11 29m
ovs-ms9d9 1/1 Running 0 111m
ovs-qg4k7 1/1 Running 0 111m
sdn-2fk22 1/1 Running 0 29m
sdn-4qq2v 1/1 Running 0 29m
sdn-5gvbh 1/1 Running 0 28m
sdn-5wd6b 1/1 Running 0 29m
sdn-controller-6c2j9 1/1 Running 0 29m
sdn-controller-g74ll 1/1 Running 0 29m
sdn-controller-z9dmk 1/1 Running 0 29m
sdn-l2kmw 1/1 Running 0 29m
sdn-n57x8 1/1 Running 0 28m
$ oc logs ovs-dcv72 -n openshift-sdn
openvswitch is running in systemd
rm: cannot remove '/var/run/openvswitch/flows.sh': No such file or directory
==> /host/var/log/openvswitch/ovs-vswitchd.log <==
2020-10-13T07:48:36.295Z|00300|connmgr|INFO|br0<->unix#1033: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:48:38.176Z|00301|bridge|INFO|bridge br0: added interface veth00c406f2 on port 36
2020-10-13T07:48:38.221Z|00302|connmgr|INFO|br0<->unix#1036: 5 flow_mods in the last 0 s (5 adds)
2020-10-13T07:48:38.256Z|00303|connmgr|INFO|br0<->unix#1039: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:49:37.697Z|00304|connmgr|INFO|br0<->unix#1048: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:49:37.723Z|00305|connmgr|INFO|br0<->unix#1051: 4 flow_mods in the last 0 s (4 deletes)
2020-10-13T07:49:37.742Z|00306|bridge|INFO|bridge br0: deleted interface veth32348243 on port 33
2020-10-13T07:49:47.016Z|00307|connmgr|INFO|br0<->unix#1054: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:49:47.043Z|00308|connmgr|INFO|br0<->unix#1057: 4 flow_mods in the last 0 s (4 deletes)
2020-10-13T07:49:47.063Z|00309|bridge|INFO|bridge br0: deleted interface vethb1890334 on port 34
==> /host/var/log/openvswitch/ovsdb-server.log <==
2020-10-13T06:11:45.485Z|00021|jsonrpc|WARN|unix#49: send error: Broken pipe
2020-10-13T06:11:45.485Z|00022|reconnect|WARN|unix#49: connection dropped (Broken pipe)
2020-10-13T06:11:45.568Z|00023|jsonrpc|WARN|unix#51: send error: Broken pipe
2020-10-13T06:11:45.568Z|00024|reconnect|WARN|unix#51: connection dropped (Broken pipe)
2020-10-13T06:27:10.031Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2020-10-13T06:27:10.047Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.2
2020-10-13T06:27:20.059Z|00003|memory|INFO|6292 kB peak resident set size after 10.0 seconds
2020-10-13T06:27:20.059Z|00004|memory|INFO|cells:358 monitors:3 sessions:2
2020-10-13T06:27:22.865Z|00005|jsonrpc|WARN|unix#22: send error: Broken pipe
2020-10-13T06:27:22.865Z|00006|reconnect|WARN|unix#22: connection dropped (Broken pipe)
oc describe pod ovs-dcv72 -n openshift-sdn
Name: ovs-dcv72
Namespace: openshift-sdn
Priority: 2000001000
Priority Class Name: system-node-critical
Node: ip-10-0-136-200.us-east-2.compute.internal/10.0.136.200
Start Time: Tue, 13 Oct 2020 15:27:28 +0800
Labels: app=ovs
component=network
controller-revision-hash=774dd84995
openshift.io/component=network
pod-template-generation=2
type=infra
Annotations: <none>
Status: Running
IP: 10.0.136.200
IPs:
IP: 10.0.136.200
Controlled By: DaemonSet/ovs
Containers:
openvswitch:
Container ID: cri-o://e6d53f5a2b387e3d0e9d0c5924477970c787e9d80f870b163c1df4f095db29f3
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:94012d3b73f7c59f93c8fb04eb85d25b85437b3eea72765166253d6ba79b8a34
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:94012d3b73f7c59f93c8fb04eb85d25b85437b3eea72765166253d6ba79b8a34
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
#!/bin/bash
set -euo pipefail
chown -R openvswitch:openvswitch /var/run/openvswitch
chown -R openvswitch:openvswitch /etc/openvswitch
if [ -f /host/var/run/ovs-config-executed ]; then
echo "openvswitch is running in systemd"
# Don't need to worry about restoring flows; this can only change if we've rebooted
rm /var/run/openvswitch/flows.sh || true
exec tail -F /host/var/log/openvswitch/ovs-vswitchd.log /host/var/log/openvswitch/ovsdb-server.log
# executes forever
fi
# if another process is listening on the cni-server socket, wait until it exits
retries=0
while true; do
if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then
echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1
sleep 15 & wait
(( retries += 1 ))
else
break
fi
if [[ "${retries}" -gt 40 ]]; then
echo "error: Another process is currently managing OVS, exiting" 2>&1
exit 1
fi
done
function quit {
# Save the flows
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Saving flows ..." 2>&1
bridges=$(ovs-vsctl -- --real list-br)
TMPDIR=/var/run/openvswitch /usr/share/openvswitch/scripts/ovs-save save-flows $bridges > /var/run/openvswitch/flows.sh
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Saved flows" 2>&1
# Don't allow ovs-vswitchd to clear datapath flows on exit
kill -9 $(cat /var/run/openvswitch/ovs-vswitchd.pid 2>/dev/null) 2>/dev/null || true
kill $(cat /var/run/openvswitch/ovsdb-server.pid 2>/dev/null) 2>/dev/null || true
exit 0
}
trap quit SIGTERM
# launch OVS
# Start the ovsdb so that we can prep it before we start the ovs-vswitchd
/usr/share/openvswitch/scripts/ovs-ctl start --ovs-user=openvswitch:openvswitch --no-ovs-vswitchd --system-id=random --no-monitor
# Set the flow-restore-wait to true so ovs-vswitchd will wait till flows are restored
ovs-vsctl --no-wait set Open_vSwitch . other_config:flow-restore-wait=true
# Restrict the number of pthreads ovs-vswitchd creates to reduce the
# amount of RSS it uses on hosts with many cores
# https://bugzilla.redhat.com/show_bug.cgi?id=1571379
# https://bugzilla.redhat.com/show_bug.cgi?id=1572797
if [[ `nproc` -gt 12 ]]; then
ovs-vsctl --no-wait set Open_vSwitch . other_config:n-revalidator-threads=4
ovs-vsctl --no-wait set Open_vSwitch . other_config:n-handler-threads=10
fi
# And finally start the ovs-vswitchd now the DB is prepped
/usr/share/openvswitch/scripts/ovs-ctl start --ovs-user=openvswitch:openvswitch --no-ovsdb-server --system-id=random --no-monitor
# Load any flows that we saved
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Loading previous flows ..." 2>&1
if [[ -f /var/run/openvswitch/flows.sh ]]; then
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Adding br0 if it doesn't exist ..." 2>&1
/usr/bin/ovs-vsctl --may-exist add-br br0 -- set Bridge br0 fail_mode=secure protocols=OpenFlow13
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Created br0, now adding flows ..." 2>&1
mv /var/run/openvswitch/flows.sh /var/run/openvswitch/flows-old.sh
sh -x /var/run/openvswitch/flows-old.sh
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Done restoring the existing flows ..." 2>&1
rm /var/run/openvswitch/flows-old.sh
fi
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Remove other config ..." 2>&1
ovs-vsctl --no-wait --if-exists remove Open_vSwitch . other_config flow-restore-wait=true
echo "$(date -u "+%Y-%m-%d %H:%M:%S") info: Removed other config ..." 2>&1
tail -F --pid=$(cat /var/run/openvswitch/ovs-vswitchd.pid) /var/log/openvswitch/ovs-vswitchd.log &
tail -F --pid=$(cat /var/run/openvswitch/ovsdb-server.pid) /var/log/openvswitch/ovsdb-server.log &
wait
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: penvswitch is running in systemd
rm: cannot remove '/var/run/openvswitch/flows.sh': No such file or directory
==> /host/var/log/openvswitch/ovs-vswitchd.log <==
2020-10-13T07:48:36.295Z|00300|connmgr|INFO|br0<->unix#1033: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:48:38.176Z|00301|bridge|INFO|bridge br0: added interface veth00c406f2 on port 36
2020-10-13T07:48:38.221Z|00302|connmgr|INFO|br0<->unix#1036: 5 flow_mods in the last 0 s (5 adds)
2020-10-13T07:48:38.256Z|00303|connmgr|INFO|br0<->unix#1039: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:49:37.697Z|00304|connmgr|INFO|br0<->unix#1048: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:49:37.723Z|00305|connmgr|INFO|br0<->unix#1051: 4 flow_mods in the last 0 s (4 deletes)
2020-10-13T07:49:37.742Z|00306|bridge|INFO|bridge br0: deleted interface veth32348243 on port 33
2020-10-13T07:49:47.016Z|00307|connmgr|INFO|br0<->unix#1054: 2 flow_mods in the last 0 s (2 deletes)
2020-10-13T07:49:47.043Z|00308|connmgr|INFO|br0<->unix#1057: 4 flow_mods in the last 0 s (4 deletes)
2020-10-13T07:49:47.063Z|00309|bridge|INFO|bridge br0: deleted interface vethb1890334 on port 34
==> /host/var/log/openvswitch/ovsdb-server.log <==
2020-10-13T06:11:45.485Z|00021|jsonrpc|WARN|unix#49: send error: Broken pipe
2020-10-13T06:11:45.485Z|00022|reconnect|WARN|unix#49: connection dropped (Broken pipe)
2020-10-13T06:11:45.568Z|00023|jsonrpc|WARN|unix#51: send error: Broken pipe
2020-10-13T06:11:45.568Z|00024|reconnect|WARN|unix#51: connection dropped (Broken pipe)
2020-10-13T06:27:10.031Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2020-10-13T06:27:10.047Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.2
2020-10-13T06:27:20.059Z|00003|memory|INFO|6292 kB peak resident set size after 10.0 seconds
2020-10-13T06:27:20.059Z|00004|memory|INFO|cells:358 monitors:3 sessions:2
2020-10-13T06:27:22.865Z|00005|jsonrpc|WARN|unix#22: send error: Broken pipe
2020-10-13T06:27:22.865Z|00006|reconnect|WARN|unix#22: connection dropped (Broken pipe)
Exit Code: 137
Started: Tue, 13 Oct 2020 15:52:13 +0800
Finished: Tue, 13 Oct 2020 15:53:08 +0800
Ready: False
Restart Count: 11
Requests:
cpu: 100m
memory: 400Mi
Liveness: exec [/bin/bash -c #!/bin/bash
/usr/bin/ovs-appctl -T 5 ofproto/list > /dev/null &&
/usr/bin/ovs-vsctl -t 5 show > /dev/null &&
if /usr/bin/ovs-vsctl -t 5 br-exists br0; then /usr/bin/ovs-ofctl -t 5 -O OpenFlow13 probe br0; else true; fi
] delay=15s timeout=21s period=5s #success=1 #failure=3
Readiness: exec [/bin/bash -c #!/bin/bash
/usr/share/openvswitch/scripts/ovs-ctl status > /dev/null &&
/usr/bin/ovs-appctl -T 5 ofproto/list > /dev/null &&
/usr/bin/ovs-vsctl -t 5 show > /dev/null
] delay=15s timeout=11s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/openvswitch from host-config-openvswitch (rw)
/host from host-slash (ro)
/lib/modules from host-modules (ro)
/run/openvswitch from host-run-ovs (rw)
/sys from host-sys (ro)
/var/run/openvswitch from host-run-ovs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from sdn-token-4v95v (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
I met the same issue when downgrade from 4.6 to latest 4.5 release to verfy storage downgrade issue (storage passed). Aha, I suspect I see the issue. Do you have a cluster in this state for me to test something? Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.16 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4268 |