Hide Forgot
Description of problem: Installer failed at [openshift_service_catalog : Wait for Controller Manager rollout success]. It seems the issue is similar to the situation described in the KCS [1], but the issue is not temporary but permanent. Version-Release number of the following components: # rpm -q openshift-ansible openshift-ansible-3.11.82-3.git.0.9718d0a.el7.noarch # rpm -q ansible ansible-2.6.13-1.el7ae.noarch # ansible --version ansible 2.6.13 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] How reproducible: Always in the customer's env. They have tried both multi-master nodes and a single master node and both failed. Steps to Reproduce: 1. Run the installer. Attach the files in private later. Actual results: Will add in private. Expected results: Succeded without failure. Additional info: Please attach logs from ansible-playbook with the -vvv flag Attach in private. [1] https://access.redhat.com/solutions/3814821
I'm not convinced you are hitting the issue of the secret not being created yet or the volume unable to be mounted. I see this early in the event list: MountVolume.SetUp failed for volume \"service-catalog-ssl\" : secrets \"controllermanager-ssl\" not found" but the last seen indicates an hour ago, and there is only 3 occurrences. Do you have additional evidence that indicates this condition persisted for more then 3 attempts? Could you collect several describe pods on the controller-manager during these failures and also attempt to get some logs from it? Thanks!
Thanks Joel. I suspect you may be hitting https://github.com/kubernetes/kubernetes/issues/65848 but I don't know for certain. What verbosity level is the Kube API Server configured for? If its >5 it will cause this error and log output "unable to set dialer for kube-service-catalog/apiserver as rest transport is of type *transport.debuggingRoundTripper\n"
Indeed, they have DEBUG_LOGLEVEL=8, but do you mean this is what is causing the issue?
Yes, that is correct. If they drop the level to 5 or less and restart the master API servers it should address the issue.
Customer confirmed that lowering the debug_loglevel did the trick, thanks Jay!
Excellent, thanks Joel. This is a nasty issue that impacts all aggregated api servers, fixed in Kubernetes 1.12.
For 4.0, Cluster version is 4.0.0-0.nightly-2019-03-06-074438 1, Change the log level of the kube-apiserver to "TraceAll"(-v=8). [jzhang@dhcp-140-18 ocp-09]$ oc edit kubeapiserver cluster spec: forceRedeploymentReason: "" logLevel: TraceAll But, it doesn't take effect, depends on bug 1679898.
Set the kubeapiserver/openshiftapiserver log level to "8". [jzhang@dhcp-140-18 ocp14]$ oc edit kubeapiserver cluster kubeapiserver.operator.openshift.io/cluster edited [jzhang@dhcp-140-18 ocp14]$ oc rsh kube-apiserver-ip-10-0-141-91.us-east-2.compute.internal Defaulting container name to kube-apiserver-11. Use 'oc describe pod/kube-apiserver-ip-10-0-141-91.us-east-2.compute.internal -n openshift-kube-apiserver' to see all of the containers in this pod. sh-4.2# ps -elf|cat F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S root 1 0 54 80 0 - 413686 futex_ 09:17 ? 00:01:42 hypershift openshift-kube-apiserver --config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml -v=8 [jzhang@dhcp-140-18 ocp14]$ oc edit openshiftapiserver cluster openshiftapiserver.operator.openshift.io/cluster edited [jzhang@dhcp-140-18 ocp14]$ oc rsh apiserver-gq9sx sh-4.2# ps -elf |cat F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 4 S root 1 0 5 80 0 - 258941 futex_ 09:25 ? 00:00:18 hypershift openshift-apiserver --config=/var/run/configmaps/config/config.yaml -v=8 [jzhang@dhcp-140-18 ocp14]$ oc version oc v4.0.0-0.177.0 kubernetes v1.12.4+6a9f178753 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://api.jian-14.qe.devcluster.openshift.com:6443 kubernetes v1.12.4+4836bc9 The apiserver of the Serice catalog works well. [jzhang@dhcp-140-18 ocp14]$ oc get pods -n openshift-service-catalog-apiserver NAME READY STATUS RESTARTS AGE apiserver-kq828 1/1 Running 0 2m apiserver-mrt9z 1/1 Running 0 2m apiserver-rxvm7 1/1 Running 0 2m2s LGTM, verify it.
*** Bug 1689263 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758