1579813 – kuryr-controller healthcheck requires admin rights

Bug 1579813 - kuryr-controller healthcheck requires admin rights

Summary: kuryr-controller healthcheck requires admin rights

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-kuryr-kubernetes
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z3
Target Release:	13.0 (Queens)
Assignee:	Luis Tomas Bolivar
QA Contact:	Jon Uriarte
Docs Contact:
URL:
Whiteboard:
Depends On:	1639671
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-18 11:49 UTC by Luis Tomas Bolivar
Modified:	2018-11-14 07:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:	openstack-kuryr-kubernetes-0.4.5-2.el7ost
Doc Type:	Bug Fix
Doc Text:	Kuryr Controller's health check probes perform a request that required OpenStack admin rights. The kuryr-controller pod entered a crash/restart loop when deployed by a non-admin OpenStack user. One workaround is to disable health checks by setting the enable_kuryr_controller_probes parameter to False in the openshift-ansible inventory (inventory/group_vars/all.yml). However, the Kuryr Controller might not restart when something goes wrong. The health check probes now verifies the keystone connection by generating a token. This request does not require OpenStack admin rights. You can deploy OpenShift with the default Kuryr health check setting (enable_kuryr_controller_probes: True) and the status of the controller is not affected.
Clone Of:
Environment:
Last Closed:	2018-11-14 01:14:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1772005	None	None	None	2018-05-18 11:51:47 UTC
Launchpad	1789632	None	None	None	2018-08-29 11:29:35 UTC
OpenStack gerrit	569395	None	MERGED	Remove admin rights need from verify_keystone_connection	2020-02-26 06:48:30 UTC
OpenStack gerrit	569431	None	MERGED	Remove admin rights need from verify_keystone_connection	2020-02-26 06:48:30 UTC
OpenStack gerrit	597458	None	MERGED	Verify keystone connection using token	2020-02-26 06:48:30 UTC
OpenStack gerrit	598429	None	MERGED	Verify keystone connection using token	2020-02-26 06:48:30 UTC
OpenStack gerrit	598430	None	MERGED	Verify keystone connection using token	2020-02-26 06:48:30 UTC
Red Hat Product Errata	RHBA-2018:3611	None	None	None	2018-11-14 01:15:45 UTC

Description Luis Tomas Bolivar 2018-05-18 11:49:32 UTC

The Kuryr-controller healthcheck for verifying keystone connectivity performs a project list request, which by default requires admin rights. We need to move to another 'cheap' call that does not require admin rights, like region list.

Comment 5 Jon Uriarte 2018-06-25 15:54:13 UTC

As a workaround (for Openshift on Openstack deployments) until this BZ is delivered (OSP 13 Z-stream), kuryr controller healthcheck probes can be disabled in Openstack inventory file.

inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
enable_kuryr_controller_probes: False

Comment 12 Jon Uriarte 2018-08-27 16:33:44 UTC

This fix makes use of 'openstack region list', which is launched from the kuryr-controller (from a VM on the overcloud), and that command does not work in my actual environment, we are investigating it.

It timeouts trying to reach the keystone admin interface (http://192.168.24.14:35357).

$ openstack endpoint list
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
| ID                               | Region    | Service Name | Service Type   | Enabled | Interface | URL                                           |
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
...
| 1031aea7c323476a9232f7fbc3d6ca00 | regionOne | keystone     | identity       | True    | internal  | http://172.17.1.18:5000                       |
| 4731b04f24e24f4abf4eb7cb535bc45d | regionOne | keystone     | identity       | True    | public    | http://10.46.22.6:5000                        |
| c44762e7d5774d10adfc80a85802433e | regionOne | keystone     | identity       | True    | admin     | http://192.168.24.14:35357                    |
...
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+

In my deployment I cannot reach keystone admin interface from the overcloud nodes.

Still under investigation.

Comment 15 Luis Tomas Bolivar 2018-08-30 06:47:18 UTC

Moving it back to assigned as it needs extra modifications on the kuryr-kubernetes side

Comment 16 Jon Uriarte 2018-10-24 12:27:39 UTC

Verified in kuryr-controller https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-controller/images/13.0-81 image, with kuryr-kubernetes-controller rpm version:
 openstack-kuryr-kubernetes-controller-0.4.5-2.el7ost.noarch

Kuryr controller probes must not be disabled in inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
#enable_kuryr_controller_probes: False

After installing OCP (openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch) the kuryr controller healthchecks are working correctly.

Make sure probes are configured in the deployment app:

$ oc -n openshift-infra get deployment.apps kuryr-controller -o yaml | less 

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /alive
            port: 8082
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8082
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

$ oc get pods --all-namespaces
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE
default           docker-registry-1-b82dg                             1/1       Running   0          1h
default           registry-console-1-lhpl8                            1/1       Running   0          1h
default           router-1-7lx65                                      1/1       Running   0          1h
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          1h
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   1          1h
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   1          1h
openshift-infra   kuryr-cni-ds-6nplt                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-bmcdf                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-fwxn7                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-ktbr2                                  2/2       Running   0          1h
openshift-infra   kuryr-controller-65c98f7444-rbsvv                   1/1       Running   0          1h
openshift-node    sync-49d4t                                          1/1       Running   0          1h
openshift-node    sync-fvxcl                                          1/1       Running   0          1h
openshift-node    sync-j8dpp                                          1/1       Running   0          1h
openshift-node    sync-mvmmb                                          1/1       Running   0          1h


Checked in kuryr controller logs that liveness and readiness probes are executed correctly:

$ oc -n openshift-infra logs kuryr-controller-65c98f7444-rbsvv |less
2018-10-24 11:12:04.533 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /ready HTTP/1.1" 200 -
2018-10-24 11:12:04.534 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /alive HTTP/1.1" 200 -
2018-10-24 11:12:14.480 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified.

Comment 19 errata-xmlrpc 2018-11-14 01:14:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3611

Comment 20 Luis Tomas Bolivar 2018-11-14 07:31:16 UTC

I modified the Doc Text to make it match the final solution (generating a token instead of using the region list API)

Note You need to log in before you can comment on or make changes to this bug.