1664799 – Service Catalog readiness probe should be split from liveness probe

Bug 1664799 - Service Catalog readiness probe should be split from liveness probe

Summary: Service Catalog readiness probe should be split from liveness probe

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Catalog
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.z
Assignee:	Jay Boyd
QA Contact:	Dongbo Yan
Docs Contact:
URL:
Whiteboard:
Depends On:	1664753
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-09 16:55 UTC by Jay Boyd
Modified:	2019-02-20 10:11 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1664753
Environment:
Last Closed:	2019-02-20 10:11:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0328	0	None	None	None	2019-02-20 10:11:17 UTC

Description Jay Boyd 2019-01-09 16:55:09 UTC

+++ This bug was initially created as a clone of Bug #1664753 +++

Description of problem:
There is a single health probe endpoint in both API Server and Controller Manager pods for Service Catalog that covers both readiness and liveness.  The readiness functionality should be split to a dedicated probe specific for readiness to avoid unnecessary pod restarts.

Example from the API Server health probes:
        readinessProbe:
          httpGet:
            port: 6443
            path: /healthz
   .....
        livenessProbe:
          httpGet:
            port: 6443
            path: /healthz


readiness should have its own url.

Comment 1 Jay Boyd 2019-01-09 17:04:17 UTC

PR in progress:  https://github.com/openshift/openshift-ansible/pull/10976

Comment 5 Zhang Cheng 2019-02-13 02:39:32 UTC

@Jay,

Thanks for your quickly response.
We will double check today. In fact, I also did not see this problem in your 3.11 fix. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1664753

Comment 6 Dongbo Yan 2019-02-14 05:37:35 UTC

Verified
# oc version
oc v3.10.111
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server 
openshift v3.10.111
kubernetes v1.10.0+b81c8f8

# oc describe pod/apiserver-8m8sp -n kube-service-catalog
Name:           apiserver-8m8sp
Namespace:      kube-service-catalog
Node:           ip-172-18-9-35.ec2.internal/172.18.9.35
Start Time:     Wed, 13 Feb 2019 22:01:08 -0500
Labels:         app=apiserver
                controller-revision-hash=2303939755
                pod-template-generation=1
Annotations:    ca_hash=06feea6dbab2ec6e03e273ecb10c034fa40be1e3
                openshift.io/scc=hostmount-anyuid
Status:         Running
IP:             10.128.0.5
Controlled By:  DaemonSet/apiserver
Containers:
  apiserver:
    Container ID:  cri-o://4fac1c26679511df5026b01598479196fc633a411bb95351764a085abb4ea0fa
    Image:         registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10
    Image ID:      registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:d1188d4a73549484956a4d22799858d3aa07b8b7ea39b904ec76ae5ba18fa813
    Port:          6443/TCP
    Host Port:     0/TCP
    Command:
      /usr/bin/service-catalog
    Args:
      apiserver
      --storage-type
      etcd
      --secure-port
      6443
      --etcd-servers
      https://ip-172-18-9-35.ec2.internal:2379
      --etcd-cafile
      /etc/origin/master/master.etcd-ca.crt
      --etcd-certfile
      /etc/origin/master/master.etcd-client.crt
      --etcd-keyfile
      /etc/origin/master/master.etcd-client.key
      -v
      3
      --cors-allowed-origins
      localhost
      --enable-admission-plugins
      KubernetesNamespaceLifecycle,DefaultServicePlan,ServiceBindingsLifecycle,ServicePlanChangeValidator,BrokerAuthSarCheck
      --feature-gates
      OriginatingIdentity=true
    State:          Running
      Started:      Wed, 13 Feb 2019 22:01:21 -0500
    Ready:          True
    Restart Count:  0
    Liveness:       http-get https://:6443/healthz delay=30s timeout=5s period=10s #success=1 #failure=3
    Readiness:      http-get https://:6443/healthz/ready delay=30s timeout=5s period=5s #success=1 #failure=1
    Environment:    <none>
    Mounts:
      /etc/origin/master from etcd-host-cert (ro)
      /var/run/kubernetes-service-catalog from apiserver-ssl (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from service-catalog-apiserver-token-9c284 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  apiserver-ssl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apiserver-ssl
    Optional:    false
  etcd-host-cert:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/origin/master
    HostPathType:  
  data-dir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  service-catalog-apiserver-token-9c284:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  service-catalog-apiserver-token-9c284
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=true
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type    Reason   Age   From                                  Message
  ----    ------   ----  ----                                  -------
  Normal  Pulling  17m   kubelet, ip-172-18-9-35.ec2.internal  pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10"
  Normal  Pulled   17m   kubelet, ip-172-18-9-35.ec2.internal  Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10"
  Normal  Created  17m   kubelet, ip-172-18-9-35.ec2.internal  Created container
  Normal  Started  17m   kubelet, ip-172-18-9-35.ec2.internal  Started container

Comment 7 Dongbo Yan 2019-02-18 06:05:53 UTC

reproduce this issue

Comment 9 Dongbo Yan 2019-02-18 09:10:33 UTC

the issue 'Readiness probe failed' should be cluster dns issue, not related to service catalog.
move back to verified

Comment 11 errata-xmlrpc 2019-02-20 10:11:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0328

Note You need to log in before you can comment on or make changes to this bug.