Description of problem: CNO can get wedged on its surge rollingUpdate during a cluster upgrade if it gets scheduled onto the same master as the existing CNO pod. $ oc -n openshift-network-operator get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES network-operator-6c95c58b67-9gsb7 0/1 CrashLoopBackOff 9 23m 142.34.194.136 mcs-master-03.dmz <none> <none> network-operator-f88c9fdf9-mh7hq 1/1 Running 0 13d 142.34.194.136 mcs-master-03.dmz <none> <none> $ oc -n openshift-network-operator logs network-operator-6c95c58b67-9gsb7 W0404 17:11:37.101246 1 cmd.go:204] Using insecure, self-signed certificates I0404 17:11:37.333403 1 observer_polling.go:159] Starting file observer I0404 17:11:37.374956 1 builder.go:238] network-operator version 4.8.0-202203102349.p0.g9150952.assembly.stream-9150952-9150952e02594242937e5c7a3c8bd073d9f1ada0 F0404 17:11:37.375302 1 cmd.go:129] failed to create listener: failed to listen on 0.0.0.0:9104: listen tcp 0.0.0.0:9104: bind: address already in use Version-Release number of selected component (if applicable): 4.8.28 to 4.8.35 How reproducible: Very Random Steps to Reproduce: 1. Preform cluster upgrade 2. 3. Actual results: CNO was stuck, had to delete the CrashLoopBackOff pod so it got scheduled elsewhere. Expected results: Should not schedule on the same master as the current CNO pod. Additional info:
Verified this bug on 4.11.0-0.nightly-2022-04-26-181148 $ oc get deployment -n openshift-network-operator network-operator -o yaml | grep ports: -A4 ports: - containerPort: 9104 hostPort: 9104 name: cno protocol: TCP
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069