Bug 2037689
Summary: | [IPI on Alibabacloud] sometimes operator 'cloud-controller-manager' tells empty VERSION, due to conflicts on listening tcp :8080 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> |
Component: | Cloud Compute | Assignee: | jigu |
Cloud Compute sub component: | Cloud Controller Manager | QA Contact: | Milind Yadav <miyadav> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aos-bugs, gpei, jiwei, zhsun |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 2037680 | Environment: | |
Last Closed: | 2022-03-10 16:37:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 1
Joel Speed
2022-01-06 10:46:19 UTC
I've had a quick look through to see how this could be done, Controller Runtime allows you to add extra http handlers to the metrics bind. My suggestion would be to check if the metrics and health port flags you have are the same, and if they are, set the metrics server up in the controller runtime manager, and add the health check as an extra handler to that server. If you look at how the AddHealthzCheck works, it shows how to construct a Healthz handler so it should be pretty straight forward The metrics exposed by Alibaba CCM are limited, so we plan not to expose the metric port by default. Thanks for your suggestion, I wil investigate how to move metrics and health endpoints to that same listener 10258. It may be completed in next release. Validated on - [miyadav@miyadav alicloud]$ oc get clusterversion --kubeconfig config NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-10-014106 True False 14m Cluster version is 4.10.0-0.nightly-2022-01-10-014106 Steps : 1. Port 8080 for metrics is no longer exposed as seen from logs ( compared them to earlier installations when the fix was not present ) oc logs alibaba-cloud-controller-manager-7dd8499f4b-7rrnm -n openshift-cloud-controller-manager --kubeconfig config . . . . I0110 12:21:29.109046 1 main.go:40] "msg"="Version of operator-sdk: v0.19.4" I0110 12:21:30.160271 1 request.go:665] Waited for 1.040726849s due to client-side throttling, not priority and fairness, request: GET:https://api-int.miyadav-01j.alicloud-qe.devcluster.openshift.com:6443/ apis/operators.coreos.com/v2?timeout=32s I0110 12:21:30.366530 1 clientMgr.go:199] clientMgr "msg"="use ram role mode to get token" I0110 12:21:30.374531 1 clientMgr.go:176] clientMgr "msg"="wait for Token ready" I0110 12:21:30.374597 1 main.go:83] "msg"="Registering Components." I0110 12:21:30.374711 1 main.go:88] "msg"="Loaded controllers: [node route service]" I0110 12:21:30.374725 1 main.go:92] "msg"="Starting the Cmd." I0110 12:21:30.374891 1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-controller-manager/ccm... I0110 12:21:30.387748 1 leaderelection.go:258] successfully acquired lease openshift-cloud-controller-manager/ccm I0110 12:21:30.387976 1 controller.go:178] controller/service-controller "msg"="Starting EventSource" "source"="kind source: *v1.Service" I0110 12:21:30.388070 1 controller.go:178] controller/service-controller "msg"="Starting EventSource" "source"="kind source: *v1.Endpoints" I0110 12:21:30.388088 1 controller.go:178] controller/service-controller "msg"="Starting EventSource" "source"="kind source: *v1.Node" I0110 12:21:30.388099 1 controller.go:186] controller/service-controller "msg"="Starting Controller" I0110 12:21:30.387977 1 controller.go:178] controller/node-controller "msg"="Starting EventSource" "source"="kind source: *v1.Node" . . . Additional Info : logs without fix : . . . . I0110 10:24:31.992827 1 main.go:36] "msg"="Cloud Controller Manager Version: v1.9.3.376-g5c84e19-aliyun-217-gd3779d52d, git commit: d3779d52d51f5b1937d4ccde7d7440437d9c690a, build date: 2022-01-01T01:16:11+0000" I0110 10:24:31.992844 1 main.go:38] "msg"="Go Version: go1.17.2" I0110 10:24:31.992866 1 main.go:39] "msg"="Go OS/Arch: linux/amd64" I0110 10:24:31.992871 1 main.go:40] "msg"="Version of operator-sdk: v0.19.4" I0110 10:24:33.043757 1 request.go:665] Waited for 1.03241868s due to client-side throttling, not priority and fairness, request: GET:https://api-int.miyadav-0110.alicloud-qe.devcluster.openshift.com:6443/apis/autoscaling/v1?timeout=32s I0110 10:24:33.247022 1 deleg.go:130] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080" I0110 10:24:33.250468 1 clientMgr.go:199] clientMgr "msg"="use ram role mode to get token" I0110 10:24:33.258126 1 clientMgr.go:176] clientMgr "msg"="wait for Token ready" I0110 10:24:33.258198 1 main.go:83] "msg"="Registering Components." . . . Moving to VERIFIED based on above . Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |