Bug 1647511 - Requirement of Liveness or Readiness probe in ds/controller-manager [NEEDINFO]
Summary: Requirement of Liveness or Readiness probe in ds/controller-manager
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Catalog
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.z
Assignee: Jay Boyd
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On: 1630324
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-07 16:17 UTC by Jay Boyd
Modified: 2018-12-13 17:09 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Liveness & Readiness probes have been added for the Service Catalog API Server and Controller Manager. If these pods stop responding OpenShift will restart the pods. Previously there were no probes to monitor the health of Service Catalog.
Clone Of: 1630324
Environment:
Last Closed: 2018-12-13 17:09:08 UTC
jaboyd: needinfo? (suchaudh)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3750 None None None 2018-12-13 17:09:16 UTC

Comment 1 Jay Boyd 2018-11-07 18:43:59 UTC
Tracks delivering fix to 3.10.z.

Comment 2 Jay Boyd 2018-11-07 18:55:50 UTC
fixed in 3.10.z by https://github.com/openshift/openshift-ansible/pull/10629

Comment 12 Jian Zhang 2018-12-04 06:24:07 UTC
I install/uninstall the ServiceCatalog successfully via the openshift-ansible release-3.10 branch. It works well as we expected. Verify it.

The openshift-ansible info:
mac:openshift-ansible jianzhang$ git branch
  master
* release-3.10
mac:openshift-ansible jianzhang$ git log
commit 12699eb551747059c7db622cadd9237dde84205b (HEAD -> release-3.10, origin/release-3.10)
Author: AOS Automation Release Team <aos-team-art@redhat.com>
Date:   Sat Dec 1 07:38:28 2018 -0500

    Automatic commit of package [openshift-ansible] release [3.10.83-1].
...


When I config another port(such as: 6444) for the controller-manager of the ServiceCatalog, we can see below info:
1) The liveness probe works well.
[root@ip-172-18-9-32 ~]# oc describe pods controller-manager-6qr4k
...
  Normal   Created    13s (x3 over 1m)  kubelet, ip-172-18-9-32.ec2.internal  Created container
  Warning  Unhealthy  13s (x5 over 1m)  kubelet, ip-172-18-9-32.ec2.internal  Liveness probe failed: Get https://10.128.0.10:6443/healthz: dial tcp 10.128.0.10:6443: getsockopt: connection refused
  Normal   Killing    13s (x2 over 1m)  kubelet, ip-172-18-9-32.ec2.internal  Killing container with id docker://controller-manager:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Started    12s (x3 over 1m)  kubelet, ip-172-18-9-32.ec2.internal  Started container

[root@ip-172-18-9-32 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-gkfjf            1/1       Running   0          42m
controller-manager-6qr4k   0/1       Running   2          1m

2) The pods cannot server the traffic now. The readiness works well.
[root@ip-172-18-9-32 ~]# oc get ep
NAME                 ENDPOINTS         AGE
apiserver            10.128.0.8:6443   42m
controller-manager                     41m

The same operations to the apiserver of ServiceCatalog, it works as we expected.

[root@ip-172-18-9-32 ~]# oc exec controller-manager-sqbcz -- service-catalog --version
v3.10.83;Upstream:v0.1.19

Comment 14 errata-xmlrpc 2018-12-13 17:09:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3750


Note You need to log in before you can comment on or make changes to this bug.