Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1579813

Summary: kuryr-controller healthcheck requires admin rights
Product: Red Hat OpenStack Reporter: Luis Tomas Bolivar <ltomasbo>
Component: openstack-kuryr-kubernetesAssignee: Luis Tomas Bolivar <ltomasbo>
Status: CLOSED ERRATA QA Contact: Jon Uriarte <juriarte>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: asegurap, joflynn, jschluet, lmarsh, tsedovic
Target Milestone: z3Keywords: AutomationBlocker, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-kuryr-kubernetes-0.4.5-2.el7ost Doc Type: Bug Fix
Doc Text:
Kuryr Controller's health check probes perform a request that required OpenStack admin rights. The kuryr-controller pod entered a crash/restart loop when deployed by a non-admin OpenStack user. One workaround is to disable health checks by setting the enable_kuryr_controller_probes parameter to False in the openshift-ansible inventory (inventory/group_vars/all.yml). However, the Kuryr Controller might not restart when something goes wrong. The health check probes now verifies the keystone connection by generating a token. This request does not require OpenStack admin rights. You can deploy OpenShift with the default Kuryr health check setting (enable_kuryr_controller_probes: True) and the status of the controller is not affected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-14 01:14:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1639671    
Bug Blocks:    

Description Luis Tomas Bolivar 2018-05-18 11:49:32 UTC
The Kuryr-controller healthcheck for verifying keystone connectivity performs a project list request, which by default requires admin rights. We need to move to another 'cheap' call that does not require admin rights, like region list.

Comment 5 Jon Uriarte 2018-06-25 15:54:13 UTC
As a workaround (for Openshift on Openstack deployments) until this BZ is delivered (OSP 13 Z-stream), kuryr controller healthcheck probes can be disabled in Openstack inventory file.

inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
enable_kuryr_controller_probes: False

Comment 12 Jon Uriarte 2018-08-27 16:33:44 UTC
This fix makes use of 'openstack region list', which is launched from the kuryr-controller (from a VM on the overcloud), and that command does not work in my actual environment, we are investigating it.

It timeouts trying to reach the keystone admin interface (http://192.168.24.14:35357).

$ openstack endpoint list
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
| ID                               | Region    | Service Name | Service Type   | Enabled | Interface | URL                                           |
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+
...
| 1031aea7c323476a9232f7fbc3d6ca00 | regionOne | keystone     | identity       | True    | internal  | http://172.17.1.18:5000                       |
| 4731b04f24e24f4abf4eb7cb535bc45d | regionOne | keystone     | identity       | True    | public    | http://10.46.22.6:5000                        |
| c44762e7d5774d10adfc80a85802433e | regionOne | keystone     | identity       | True    | admin     | http://192.168.24.14:35357                    |
...
+----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+

In my deployment I cannot reach keystone admin interface from the overcloud nodes.

Still under investigation.

Comment 15 Luis Tomas Bolivar 2018-08-30 06:47:18 UTC
Moving it back to assigned as it needs extra modifications on the kuryr-kubernetes side

Comment 16 Jon Uriarte 2018-10-24 12:27:39 UTC
Verified in kuryr-controller https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-controller/images/13.0-81 image, with kuryr-kubernetes-controller rpm version:
 openstack-kuryr-kubernetes-controller-0.4.5-2.el7ost.noarch

Kuryr controller probes must not be disabled in inventory/group_vars/all.yml:

#
# You can also disable the kuryr controller and cni healthcheck probes by
# uncommenting the next
#enable_kuryr_controller_probes: False

After installing OCP (openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch) the kuryr controller healthchecks are working correctly.

Make sure probes are configured in the deployment app:

$ oc -n openshift-infra get deployment.apps kuryr-controller -o yaml | less 

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /alive
            port: 8082
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8082
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5

$ oc get pods --all-namespaces
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE
default           docker-registry-1-b82dg                             1/1       Running   0          1h
default           registry-console-1-lhpl8                            1/1       Running   0          1h
default           router-1-7lx65                                      1/1       Running   0          1h
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          1h
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   1          1h
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   1          1h
openshift-infra   kuryr-cni-ds-6nplt                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-bmcdf                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-fwxn7                                  2/2       Running   0          1h
openshift-infra   kuryr-cni-ds-ktbr2                                  2/2       Running   0          1h
openshift-infra   kuryr-controller-65c98f7444-rbsvv                   1/1       Running   0          1h
openshift-node    sync-49d4t                                          1/1       Running   0          1h
openshift-node    sync-fvxcl                                          1/1       Running   0          1h
openshift-node    sync-j8dpp                                          1/1       Running   0          1h
openshift-node    sync-mvmmb                                          1/1       Running   0          1h


Checked in kuryr controller logs that liveness and readiness probes are executed correctly:

$ oc -n openshift-infra logs kuryr-controller-65c98f7444-rbsvv |less
2018-10-24 11:12:04.533 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /ready HTTP/1.1" 200 -
2018-10-24 11:12:04.534 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /alive HTTP/1.1" 200 -
2018-10-24 11:12:14.480 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified.

Comment 19 errata-xmlrpc 2018-11-14 01:14:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3611

Comment 20 Luis Tomas Bolivar 2018-11-14 07:31:16 UTC
I modified the Doc Text to make it match the final solution (generating a token instead of using the region list API)