Bug 1680342
Summary: | Installer failed at [openshift_service_catalog : Wait for Controller Manager rollout success] | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Takayoshi Tanaka <tatanaka> |
Component: | Service Catalog | Assignee: | Dan Geoffroy <dageoffr> |
Status: | CLOSED ERRATA | QA Contact: | Jian Zhang <jiazha> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.11.0 | CC: | aos-bugs, asolanas, jokerman, jrosenta, mmccomas, szustkowski |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:44:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Bug Depends On: | 1679898 | ||
Bug Blocks: |
Description
Takayoshi Tanaka
2019-02-24 05:16:07 UTC
I'm not convinced you are hitting the issue of the secret not being created yet or the volume unable to be mounted. I see this early in the event list: MountVolume.SetUp failed for volume \"service-catalog-ssl\" : secrets \"controllermanager-ssl\" not found" but the last seen indicates an hour ago, and there is only 3 occurrences. Do you have additional evidence that indicates this condition persisted for more then 3 attempts? Could you collect several describe pods on the controller-manager during these failures and also attempt to get some logs from it? Thanks! Thanks Joel. I suspect you may be hitting https://github.com/kubernetes/kubernetes/issues/65848 but I don't know for certain. What verbosity level is the Kube API Server configured for? If its >5 it will cause this error and log output "unable to set dialer for kube-service-catalog/apiserver as rest transport is of type *transport.debuggingRoundTripper\n" Indeed, they have DEBUG_LOGLEVEL=8, but do you mean this is what is causing the issue? Yes, that is correct. If they drop the level to 5 or less and restart the master API servers it should address the issue. Customer confirmed that lowering the debug_loglevel did the trick, thanks Jay! Excellent, thanks Joel. This is a nasty issue that impacts all aggregated api servers, fixed in Kubernetes 1.12. For 4.0, Cluster version is 4.0.0-0.nightly-2019-03-06-074438 1, Change the log level of the kube-apiserver to "TraceAll"(-v=8). [jzhang@dhcp-140-18 ocp-09]$ oc edit kubeapiserver cluster spec: forceRedeploymentReason: "" logLevel: TraceAll But, it doesn't take effect, depends on bug 1679898. Set the kubeapiserver/openshiftapiserver log level to "8". [jzhang@dhcp-140-18 ocp14]$ oc edit kubeapiserver cluster kubeapiserver.operator.openshift.io/cluster edited [jzhang@dhcp-140-18 ocp14]$ oc rsh kube-apiserver-ip-10-0-141-91.us-east-2.compute.internal Defaulting container name to kube-apiserver-11. Use 'oc describe pod/kube-apiserver-ip-10-0-141-91.us-east-2.compute.internal -n openshift-kube-apiserver' to see all of the containers in this pod. sh-4.2# ps -elf|cat F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S root 1 0 54 80 0 - 413686 futex_ 09:17 ? 00:01:42 hypershift openshift-kube-apiserver --config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml -v=8 [jzhang@dhcp-140-18 ocp14]$ oc edit openshiftapiserver cluster openshiftapiserver.operator.openshift.io/cluster edited [jzhang@dhcp-140-18 ocp14]$ oc rsh apiserver-gq9sx sh-4.2# ps -elf |cat F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S root 1 0 5 80 0 - 258941 futex_ 09:25 ? 00:00:18 hypershift openshift-apiserver --config=/var/run/configmaps/config/config.yaml -v=8 [jzhang@dhcp-140-18 ocp14]$ oc version oc v4.0.0-0.177.0 kubernetes v1.12.4+6a9f178753 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://api.jian-14.qe.devcluster.openshift.com:6443 kubernetes v1.12.4+4836bc9 The apiserver of the Serice catalog works well. [jzhang@dhcp-140-18 ocp14]$ oc get pods -n openshift-service-catalog-apiserver NAME READY STATUS RESTARTS AGE apiserver-kq828 1/1 Running 0 2m apiserver-mrt9z 1/1 Running 0 2m apiserver-rxvm7 1/1 Running 0 2m2s LGTM, verify it. *** Bug 1689263 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |