Version: ./openshift-install 4.10.0-0.nightly-2022-01-05-052228 built from commit 22d874c8d0751d5645de95121662e32d17d6eada release image registry.ci.openshift.org/ocp/release@sha256:934dfba08338fbb64926f77950ab69d1fe23d5e1efe3f4ed66aa1740bb181c72 release architecture amd64 Platform: alibabacloud Please specify: * IPI (automated install with `openshift-install`. If you don't know, then it's IPI) What happened? The operator 'cloud-controller-manager' doesn't tell the expected VERSION. What did you expect to happen? It should tell the expected VERSION, as all other operators. How to reproduce it (as minimally and precisely as possible)? Not sure but sometimes, we got the issue 3 times so far. Anything else we need to know? $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-05-052228 True False 64m Error while reconciling 4.10.0-0.nightly-2022-01-05-052228: cloud-controller-manager has an unknown error: ClusterOperatorUpdating $ oc get co | grep -Ev '4.10.0-0.nightly-2022-01-05-052228 True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE >cloud-controller-manager True False False 96m etcd 4.10.0-0.nightly-2022-01-05-052228 True True False 92m NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 5; 1 nodes are at revision 9 kube-scheduler 4.10.0-0.nightly-2022-01-05-052228 True True False 91m NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 5; 1 nodes are at revision 7 $ oc get pods -n openshift-cloud-controller-manager-operator -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-cloud-controller-manager-operator-7bbb479445-fk44b 1/2 CrashLoopBackOff 19 (3m11s ago) 79m 10.0.0.212 jiwei-405-j8w4h-master-0 <none> <none> $ oc -n openshift-cloud-controller-manager-operator logs cluster-cloud-controller-manager-operator-7bbb479445-fk44b -c cluster-cloud-controller-manager I0106 09:49:25.929382 1 request.go:665] Waited for 1.047151096s due to client-side throttling, not priority and fairness, request: GET:https://api-int.jiwei-405.alicloud-qe.devcluster.openshift.com:6443/apis/template.openshift.io/v1?timeout=32s I0106 09:49:27.082311 1 logr.go:249] CCMOperator/controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080" E0106 09:49:27.082641 1 logr.go:265] CCMOperator/controller-runtime/metrics "msg"="metrics server failed to listen. You may want to disable the metrics server or use another port if it is due to conflicts" "error"="error listening on :8080: listen tcp :8080: bind: address already in use" >E0106 09:49:27.082702 1 logr.go:265] CCMOperator/setup "msg"="unable to start manager" "error"="error listening on :8080: listen tcp :8080: bind: address already in use" $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-405-j8w4h-master-0 Ready master 95m v1.22.1+6859754 jiwei-405-j8w4h-master-1 Ready master 74m v1.22.1+6859754 jiwei-405-j8w4h-master-2 Ready master 97m v1.22.1+6859754 jiwei-405-j8w4h-worker-us-east-1a-cvvhj Ready worker 85m v1.22.1+6859754 jiwei-405-j8w4h-worker-us-east-1b-qgngd Ready worker 85m v1.22.1+6859754 $ $ oc debug node/jiwei-405-j8w4h-master-0 Starting pod/jiwei-405-j8w4h-master-0-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.212 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# netstat -lnpt | grep 8080 tcp6 0 0 :::8080 :::* LISTEN 32690/alibaba-cloud sh-4.4# ps -ef | grep 32690 root 32690 32668 0 08:33 ? 00:00:02 /bin/alibaba-cloud-controller-manager --allow-untagged-cloud=true --leader-elect=true --leader-elect-lease-duration=137s --leader-elect-renew-deadline=107s --leader-elect-retry-period=26s --leader-elect-resource-namespace=openshift-cloud-controller-manager --cloud-provider=alicloud --use-service-account-credentials=true --cloud-config=/etc/alibaba/config/cloud-config.conf --feature-gates=ServiceNodeExclusion=true --configure-cloud-routes=false --allocate-node-cidrs=false root 140671 140444 0 09:53 ? 00:00:00 grep 32690 sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... $
@
@dmoiseev Could you please add a new port to the port registry https://github.com/openshift/enhancements/blob/master/dev-guide/host-port-registry.md for the config sync controller (I'd suggest 10260) and then make sure that the config sync controller is using the assigned port for its metrics listener
Validated on nightly - 4.10.0-0.nightly-2022-01-07-050246 oc get co | grep controller cloud-controller-manager 4.10.0-0.nightly-2022-01-07-050246 True False False 77m oc logs cluster-cloud-controller-manager-operator-b6686989f-cjzb8 -c config-sync-controllers | less . . I0107 10:57:21.717684 1 internal.go:362] CCCMOConfigSyncControllers "msg"="Starting server" "addr"={"IP":"127.0.0.1","Port":9260,"Zone":""} "kind"="health probe" I0107 10:57:21.718086 1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-controller-manager-operator/cluster-cloud-config-sync-leader... I0107 10:57:21.727969 1 leaderelection.go:258] successfully acquired lease openshift-cloud-controller-manager-operator/cluster-cloud-config-sync-leader I0107 10:57:21.728276 1 controller.go:178] CCCMOConfigSyncControllers/controller/configmap "msg"="Starting EventSource" "reconciler group"="" "reconciler kind"="ConfigMap" "source"="kind source: *v1.ConfigMap" I0107 10:57:21.728338 1 controller.go:178] CCCMOConfigSyncControllers/controller/configmap "msg"="Starting EventSource" "reconciler group"="" "reconciler kind"="ConfigMap" "source"="kind source: *v1.Infrastructure" . . Additional info: Cluster was not fully deployed successfully , but does this port change looks good ? [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 81m Working towards 4.10.0-0.nightly-2022-01-07-050246: 665 of 766 done (86% complete) Will add must-gather in a while
I've reviewed the must gather attached and I think the port changes are ok. I'm confident that this has resolved the issue reported in this bug. Please move to verified
Thanks @Joel
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056