Bug 1579813 - kuryr-controller healthcheck requires admin rights
Summary: kuryr-controller healthcheck requires admin rights
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-kuryr-kubernetes
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 13.0 (Queens)
Assignee: Luis Tomas Bolivar
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On: 1639671
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-18 11:49 UTC by Luis Tomas Bolivar
Modified: 2018-11-14 07:31 UTC (History)
5 users (show)

Fixed In Version: openstack-kuryr-kubernetes-0.4.5-2.el7ost
Doc Type: Bug Fix
Doc Text:
Kuryr Controller's health check probes perform a request that required OpenStack admin rights. The kuryr-controller pod entered a crash/restart loop when deployed by a non-admin OpenStack user. One workaround is to disable health checks by setting the enable_kuryr_controller_probes parameter to False in the openshift-ansible inventory (inventory/group_vars/all.yml). However, the Kuryr Controller might not restart when something goes wrong. The health check probes now verifies the keystone connection by generating a token. This request does not require OpenStack admin rights. You can deploy OpenShift with the default Kuryr health check setting (enable_kuryr_controller_probes: True) and the status of the controller is not affected.
Clone Of:
Environment:
Last Closed: 2018-11-14 01:14:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1772005 0 None None None 2018-05-18 11:51:47 UTC
Launchpad 1789632 0 None None None 2018-08-29 11:29:35 UTC
OpenStack gerrit 569395 0 None MERGED Remove admin rights need from verify_keystone_connection 2020-02-26 06:48:30 UTC
OpenStack gerrit 569431 0 None MERGED Remove admin rights need from verify_keystone_connection 2020-02-26 06:48:30 UTC
OpenStack gerrit 597458 0 None MERGED Verify keystone connection using token 2020-02-26 06:48:30 UTC
OpenStack gerrit 598429 0 None MERGED Verify keystone connection using token 2020-02-26 06:48:30 UTC
OpenStack gerrit 598430 0 None MERGED Verify keystone connection using token 2020-02-26 06:48:30 UTC
Red Hat Product Errata RHBA-2018:3611 0 None None None 2018-11-14 01:15:45 UTC

Description Luis Tomas Bolivar 2018-05-18 11:49:32 UTC
The Kuryr-controller healthcheck for verifying keystone connectivity performs a project list request, which by default requires admin rights. We need to move to another 'cheap' call that does not require admin rights, like region list.

Comment 5 Jon Uriarte 2018-06-25 15:54:13 UTC
As a workaround (for Openshift on Openstack deployments) until this BZ is delivered (OSP 13 Z-stream), kuryr controller healthcheck probes can be disabled in Openstack inventory file.

inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
enable_kuryr_controller_probes: False

Comment 12 Jon Uriarte 2018-08-27 16:33:44 UTC
This fix makes use of 'openstack region list', which is launched from the kuryr-controller (from a VM on the overcloud), and that command does not work in my actual environment, we are investigating it.

It timeouts trying to reach the keystone admin interface (http://192.168.24.14:35357).

$ openstack endpoint list
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
| ID                               | Region    | Service Name | Service Type   | Enabled | Interface | URL                                           |
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
...
| 1031aea7c323476a9232f7fbc3d6ca00 | regionOne | keystone     | identity       | True    | internal  | http://172.17.1.18:5000                       |
| 4731b04f24e24f4abf4eb7cb535bc45d | regionOne | keystone     | identity       | True    | public    | http://10.46.22.6:5000                        |
| c44762e7d5774d10adfc80a85802433e | regionOne | keystone     | identity       | True    | admin     | http://192.168.24.14:35357                    |
...
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+

In my deployment I cannot reach keystone admin interface from the overcloud nodes.

Still under investigation.

Comment 15 Luis Tomas Bolivar 2018-08-30 06:47:18 UTC
Moving it back to assigned as it needs extra modifications on the kuryr-kubernetes side

Comment 16 Jon Uriarte 2018-10-24 12:27:39 UTC
Verified in kuryr-controller https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-controller/images/13.0-81 image, with kuryr-kubernetes-controller rpm version:
 openstack-kuryr-kubernetes-controller-0.4.5-2.el7ost.noarch

Kuryr controller probes must not be disabled in inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
#enable_kuryr_controller_probes: False

After installing OCP (openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch) the kuryr controller healthchecks are working correctly.

Make sure probes are configured in the deployment app:

$ oc -n openshift-infra get deployment.apps kuryr-controller -o yaml | less 

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /alive
            port: 8082
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8082
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

$ oc get pods --all-namespaces
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE
default           docker-registry-1-b82dg                             1/1       Running   0          1h
default           registry-console-1-lhpl8                            1/1       Running   0          1h
default           router-1-7lx65                                      1/1       Running   0          1h
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          1h
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   1          1h
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   1          1h
openshift-infra   kuryr-cni-ds-6nplt                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-bmcdf                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-fwxn7                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-ktbr2                                  2/2       Running   0          1h
openshift-infra   kuryr-controller-65c98f7444-rbsvv                   1/1       Running   0          1h
openshift-node    sync-49d4t                                          1/1       Running   0          1h
openshift-node    sync-fvxcl                                          1/1       Running   0          1h
openshift-node    sync-j8dpp                                          1/1       Running   0          1h
openshift-node    sync-mvmmb                                          1/1       Running   0          1h


Checked in kuryr controller logs that liveness and readiness probes are executed correctly:

$ oc -n openshift-infra logs kuryr-controller-65c98f7444-rbsvv |less
2018-10-24 11:12:04.533 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /ready HTTP/1.1" 200 -
2018-10-24 11:12:04.534 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /alive HTTP/1.1" 200 -
2018-10-24 11:12:14.480 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified.

Comment 19 errata-xmlrpc 2018-11-14 01:14:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3611

Comment 20 Luis Tomas Bolivar 2018-11-14 07:31:16 UTC
I modified the Doc Text to make it match the final solution (generating a token instead of using the region list API)


Note You need to log in before you can comment on or make changes to this bug.