+++ This bug was initially created as a clone of Bug #1664753 +++ Description of problem: There is a single health probe endpoint in both API Server and Controller Manager pods for Service Catalog that covers both readiness and liveness. The readiness functionality should be split to a dedicated probe specific for readiness to avoid unnecessary pod restarts. Example from the API Server health probes: readinessProbe: httpGet: port: 6443 path: /healthz ..... livenessProbe: httpGet: port: 6443 path: /healthz readiness should have its own url.
PR in progress: https://github.com/openshift/openshift-ansible/pull/10976
@Jay, Thanks for your quickly response. We will double check today. In fact, I also did not see this problem in your 3.11 fix. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1664753
Verified # oc version oc v3.10.111 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server openshift v3.10.111 kubernetes v1.10.0+b81c8f8 # oc describe pod/apiserver-8m8sp -n kube-service-catalog Name: apiserver-8m8sp Namespace: kube-service-catalog Node: ip-172-18-9-35.ec2.internal/172.18.9.35 Start Time: Wed, 13 Feb 2019 22:01:08 -0500 Labels: app=apiserver controller-revision-hash=2303939755 pod-template-generation=1 Annotations: ca_hash=06feea6dbab2ec6e03e273ecb10c034fa40be1e3 openshift.io/scc=hostmount-anyuid Status: Running IP: 10.128.0.5 Controlled By: DaemonSet/apiserver Containers: apiserver: Container ID: cri-o://4fac1c26679511df5026b01598479196fc633a411bb95351764a085abb4ea0fa Image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10 Image ID: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:d1188d4a73549484956a4d22799858d3aa07b8b7ea39b904ec76ae5ba18fa813 Port: 6443/TCP Host Port: 0/TCP Command: /usr/bin/service-catalog Args: apiserver --storage-type etcd --secure-port 6443 --etcd-servers https://ip-172-18-9-35.ec2.internal:2379 --etcd-cafile /etc/origin/master/master.etcd-ca.crt --etcd-certfile /etc/origin/master/master.etcd-client.crt --etcd-keyfile /etc/origin/master/master.etcd-client.key -v 3 --cors-allowed-origins localhost --enable-admission-plugins KubernetesNamespaceLifecycle,DefaultServicePlan,ServiceBindingsLifecycle,ServicePlanChangeValidator,BrokerAuthSarCheck --feature-gates OriginatingIdentity=true State: Running Started: Wed, 13 Feb 2019 22:01:21 -0500 Ready: True Restart Count: 0 Liveness: http-get https://:6443/healthz delay=30s timeout=5s period=10s #success=1 #failure=3 Readiness: http-get https://:6443/healthz/ready delay=30s timeout=5s period=5s #success=1 #failure=1 Environment: <none> Mounts: /etc/origin/master from etcd-host-cert (ro) /var/run/kubernetes-service-catalog from apiserver-ssl (ro) /var/run/secrets/kubernetes.io/serviceaccount from service-catalog-apiserver-token-9c284 (ro) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: apiserver-ssl: Type: Secret (a volume populated by a Secret) SecretName: apiserver-ssl Optional: false etcd-host-cert: Type: HostPath (bare host directory volume) Path: /etc/origin/master HostPathType: data-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: service-catalog-apiserver-token-9c284: Type: Secret (a volume populated by a Secret) SecretName: service-catalog-apiserver-token-9c284 Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/master=true Tolerations: node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/unreachable:NoExecute Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 17m kubelet, ip-172-18-9-35.ec2.internal pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10" Normal Pulled 17m kubelet, ip-172-18-9-35.ec2.internal Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10" Normal Created 17m kubelet, ip-172-18-9-35.ec2.internal Created container Normal Started 17m kubelet, ip-172-18-9-35.ec2.internal Started container
reproduce this issue
the issue 'Readiness probe failed' should be cluster dns issue, not related to service catalog. move back to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0328