Bug 1618873
| Summary: | cluster-quota-reconciler terminates controller process when apiservice is not available | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | Master | Assignee: | David Eads <deads> |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | aos-bugs, jokerman, mfojtik, mmccomas, wking, yinzhou |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-11 07:25:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1582094 | ||
|
Description
Clayton Coleman
2018-08-17 20:56:47 UTC
> The openshift-controller manager pod From your description, you're reporting against Version 3.11 instead of 3.9? I encountered same error, but in my side it reported about servicecatalog instead of metrics. my reproducing steps are: Launch a 3.11 (v3.11.0-0.17.0) cluster in AWS. Stop the master for minutes. Start the master. Check pods. [root@ip-172-18-12-204 ~]# oc get po -w --all-namespaces default docker-registry-1-fjfqf 1/1 Running 0 32m default router-1-l6h8l 1/1 Running 0 32m kube-service-catalog apiserver-gn9gc 1/1 Running 1 29m kube-service-catalog controller-manager-kxlmh 0/1 CrashLoopBackOff 6 29m kube-system master-api-ip-172-18-12-204.ec2.internal 1/1 Running 1 39m kube-system master-controllers-ip-172-18-12-204.ec2.internal 0/1 CrashLoopBackOff 6 38m kube-system master-etcd-ip-172-18-12-204.ec2.internal 1/1 Running 1 38m The controllers pod's logs (see full log in attachment): ... I0820 08:35:59.814022 1 leaderelection.go:190] failed to acquire lease kube-system/kube-scheduler I0820 08:35:59.823344 1 client_builder.go:233] Verified credential for cluster-quota-reconciliation-controller/openshift-infra I0820 08:35:59.830345 1 request.go:1099] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" } F0820 08:35:59.849987 1 controller_manager.go:127] Error starting "openshift.io/cluster-quota-reconciliation" (unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the request) In a word, the controllers pod is constantly CrashLoopBackOff with log "Error starting "openshift.io/cluster-quota-reconciliation" (unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1". I find there are other bugs, one with SAME controllers logs error: bug 1619116 . One with SIMILAR error like bug 1595997. So I linked them. Like bug bug 1595997, when the aggregated API service is removed, the controllers pod becomes started back. Today in QE's public test env, the issue happened and blocked the env to be normal for test. So changing some fields of the bug. Fixed here: https://github.com/openshift/origin/pull/20693 Verified in: openshift v3.11.0-0.20.0 kubernetes v1.11.0+d4cacc0 The controllers pod issue is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |