Bug 2097954
| Summary: | 4.11 installation failed at monitoring and network clusteroperators with error "conmon: option parsing failed: Unknown option --log-global-size-max" making all jobs failing | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xingxing Xia <xxia> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | akaris, pehunt, rbrattai, stbenjam, wking |
| Version: | 4.11 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | conmon-2.1.2-2 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:18:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Xingxing Xia
2022-06-17 03:57:22 UTC
This is also killing `ovnkube-node` containers on RHEL 8.6 workers. network 4.11.0-0.ci.test-2022-06-16-162452-ci-ln-106kkgt-latest True True True 6h22m DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-06-17T00:36:44Z LAST SEEN TYPE REASON OBJECT MESSAGE 38s Warning Unhealthy pod/ovnkube-node-8n6jb Readiness probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: conmon: option parsing failed: Unknown option --log-global-size-max... 3m30s Warning Unhealthy pod/ovnkube-node-68pzf Readiness probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: conmon: option parsing failed: Unknown option --log-global-size-max... Continuing comment 0: $ oc get po -n openshift-sdn NAME READY STATUS RESTARTS AGE sdn-bnc9n 1/2 Running 1 (101m ago) 102m ... sdn-f7x8t 1/2 Running 3 (101m ago) 102m sdn-gxd9t 1/2 Running 0 111m sdn-hqk99 1/2 Running 4 (101m ago) 102m sdn-mt9bg 1/2 Running 0 111m sdn-whc2g 1/2 Running 0 111m $ oc get po -n openshift-monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 5/6 Running 0 92m alertmanager-main-1 5/6 Running 0 87m ... prometheus-k8s-0 5/6 Running 0 92m prometheus-k8s-1 5/6 Running 0 87m From above, these pods all have one container that is not ready. Running oc describe on them, all show below kubelet error, so reporting this bug on kubelet, if wrong, please correct, thx: kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: conmon: option parsing failed: Unknown option --log-global-size-max $ oc describe po sdn-gxd9t -n openshift-sdn Name: sdn-gxd9t ... Containers: sdn: ... State: Running Started: Fri, 17 Jun 2022 10:15:59 +0800 Ready: False ... Normal Started 111m kubelet Started container kube-rbac-proxy Warning Unhealthy 91s (x1406 over 111m) kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: conmon: option parsing failed: Unknown option --log-global-size-max , stderr: , exit code -1 $ oc describe po prometheus-k8s-0 -n openshift-monitoring Name: prometheus-k8s-0 ... Containers: prometheus: ... State: Running Started: Fri, 17 Jun 2022 10:27:31 +0800 Ready: False ... Readiness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3 Startup: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=15s #success=1 #failure=60 ... Normal Started 71m kubelet Started container kube-rbac-proxy-thanos Warning Unhealthy 95s (x281 over 71m) kubelet Startup probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: conmon: option parsing failed: Unknown option --log-global-size-max , stderr: , exit code -1 $ oc describe po alertmanager-main-0 -n openshift-monitoring Name: alertmanager-main-0 ... Containers: alertmanager: ... State: Running Started: Fri, 17 Jun 2022 10:27:21 +0800 Ready: False ... Normal Started 93m kubelet Started container prom-label-proxy Warning Unhealthy 3m42s (x537 over 93m) kubelet Startup probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: conmon: option parsing failed: Unknown option --log-global-size-max , stderr: , exit code -1 *** Bug 2098151 has been marked as a duplicate of this bug. *** Today launched cluster 4.11.0-0.nightly-2022-06-21-040754 successfully. But leaving the default QA Contact to further verify conmon-2.1.2-2, if any. Thanks! Checking with latest payload and install was successful. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-22-015220 True False 132m Cluster version is 4.11.0-0.nightly-2022-06-22-015220 % oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-40.us-east-2.compute.internal Ready worker 146m v1.24.0+284d62a ip-10-0-131-143.us-east-2.compute.internal Ready master 152m v1.24.0+284d62a ip-10-0-162-23.us-east-2.compute.internal Ready worker 145m v1.24.0+284d62a ip-10-0-183-140.us-east-2.compute.internal Ready master 152m v1.24.0+284d62a ip-10-0-202-170.us-east-2.compute.internal Ready worker 145m v1.24.0+284d62a ip-10-0-212-210.us-east-2.compute.internal Ready master 152m v1.24.0+284d62a Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |