Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1579813

Summary:	kuryr-controller healthcheck requires admin rights
Product:	Red Hat OpenStack	Reporter:	Luis Tomas Bolivar <ltomasbo>
Component:	openstack-kuryr-kubernetes	Assignee:	Luis Tomas Bolivar <ltomasbo>
Status:	CLOSED ERRATA	QA Contact:	Jon Uriarte <juriarte>
Severity:	high	Docs Contact:
Priority:	high
Version:	13.0 (Queens)	CC:	asegurap, joflynn, jschluet, lmarsh, tsedovic
Target Milestone:	z3	Keywords:	AutomationBlocker, Triaged, ZStream
Target Release:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-kuryr-kubernetes-0.4.5-2.el7ost	Doc Type:	Bug Fix
Doc Text:	Kuryr Controller's health check probes perform a request that required OpenStack admin rights. The kuryr-controller pod entered a crash/restart loop when deployed by a non-admin OpenStack user. One workaround is to disable health checks by setting the enable_kuryr_controller_probes parameter to False in the openshift-ansible inventory (inventory/group_vars/all.yml). However, the Kuryr Controller might not restart when something goes wrong. The health check probes now verifies the keystone connection by generating a token. This request does not require OpenStack admin rights. You can deploy OpenShift with the default Kuryr health check setting (enable_kuryr_controller_probes: True) and the status of the controller is not affected.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-14 01:14:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1639671
Bug Blocks:

Description Luis Tomas Bolivar 2018-05-18 11:49:32 UTC

The Kuryr-controller healthcheck for verifying keystone connectivity performs a project list request, which by default requires admin rights. We need to move to another 'cheap' call that does not require admin rights, like region list.

Comment 5 Jon Uriarte 2018-06-25 15:54:13 UTC

As a workaround (for Openshift on Openstack deployments) until this BZ is delivered (OSP 13 Z-stream), kuryr controller healthcheck probes can be disabled in Openstack inventory file.

inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
enable_kuryr_controller_probes: False

Comment 12 Jon Uriarte 2018-08-27 16:33:44 UTC

This fix makes use of 'openstack region list', which is launched from the kuryr-controller (from a VM on the overcloud), and that command does not work in my actual environment, we are investigating it.

It timeouts trying to reach the keystone admin interface (http://192.168.24.14:35357).

$ openstack endpoint list
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
| ID                               | Region    | Service Name | Service Type   | Enabled | Interface | URL                                           |
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
...
| 1031aea7c323476a9232f7fbc3d6ca00 | regionOne | keystone     | identity       | True    | internal  | http://172.17.1.18:5000                       |
| 4731b04f24e24f4abf4eb7cb535bc45d | regionOne | keystone     | identity       | True    | public    | http://10.46.22.6:5000                        |
| c44762e7d5774d10adfc80a85802433e | regionOne | keystone     | identity       | True    | admin     | http://192.168.24.14:35357                    |
...
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+

In my deployment I cannot reach keystone admin interface from the overcloud nodes.

Still under investigation.

Comment 15 Luis Tomas Bolivar 2018-08-30 06:47:18 UTC

Moving it back to assigned as it needs extra modifications on the kuryr-kubernetes side

Comment 16 Jon Uriarte 2018-10-24 12:27:39 UTC

Verified in kuryr-controller https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-controller/images/13.0-81 image, with kuryr-kubernetes-controller rpm version:
 openstack-kuryr-kubernetes-controller-0.4.5-2.el7ost.noarch

Kuryr controller probes must not be disabled in inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
#enable_kuryr_controller_probes: False

After installing OCP (openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch) the kuryr controller healthchecks are working correctly.

Make sure probes are configured in the deployment app:

$ oc -n openshift-infra get deployment.apps kuryr-controller -o yaml | less 

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /alive
            port: 8082
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8082
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

$ oc get pods --all-namespaces
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE
default           docker-registry-1-b82dg                             1/1       Running   0          1h
default           registry-console-1-lhpl8                            1/1       Running   0          1h
default           router-1-7lx65                                      1/1       Running   0          1h
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          1h
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   1          1h
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   1          1h
openshift-infra   kuryr-cni-ds-6nplt                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-bmcdf                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-fwxn7                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-ktbr2                                  2/2       Running   0          1h
openshift-infra   kuryr-controller-65c98f7444-rbsvv                   1/1       Running   0          1h
openshift-node    sync-49d4t                                          1/1       Running   0          1h
openshift-node    sync-fvxcl                                          1/1       Running   0          1h
openshift-node    sync-j8dpp                                          1/1       Running   0          1h
openshift-node    sync-mvmmb                                          1/1       Running   0          1h


Checked in kuryr controller logs that liveness and readiness probes are executed correctly:

$ oc -n openshift-infra logs kuryr-controller-65c98f7444-rbsvv |less
2018-10-24 11:12:04.533 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /ready HTTP/1.1" 200 -
2018-10-24 11:12:04.534 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /alive HTTP/1.1" 200 -
2018-10-24 11:12:14.480 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified.

Comment 19 errata-xmlrpc 2018-11-14 01:14:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3611

Comment 20 Luis Tomas Bolivar 2018-11-14 07:31:16 UTC

I modified the Doc Text to make it match the final solution (generating a token instead of using the region list API)