Bug 1579813
| Summary: | kuryr-controller healthcheck requires admin rights | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Luis Tomas Bolivar <ltomasbo> |
| Component: | openstack-kuryr-kubernetes | Assignee: | Luis Tomas Bolivar <ltomasbo> |
| Status: | CLOSED ERRATA | QA Contact: | Jon Uriarte <juriarte> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 13.0 (Queens) | CC: | asegurap, joflynn, jschluet, lmarsh, tsedovic |
| Target Milestone: | z3 | Keywords: | AutomationBlocker, Triaged, ZStream |
| Target Release: | 13.0 (Queens) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-kuryr-kubernetes-0.4.5-2.el7ost | Doc Type: | Bug Fix |
| Doc Text: |
Kuryr Controller's health check probes perform a request that required OpenStack admin rights. The kuryr-controller pod entered a crash/restart loop when deployed by a non-admin OpenStack user. One workaround is to disable health checks by setting the enable_kuryr_controller_probes parameter to False in the openshift-ansible inventory (inventory/group_vars/all.yml). However, the Kuryr Controller might not restart when something goes wrong.
The health check probes now verifies the keystone connection by generating a token. This request does not require OpenStack admin rights. You can deploy OpenShift with the default Kuryr health check setting (enable_kuryr_controller_probes: True) and the status of the controller is not affected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-14 01:14:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1639671 | ||
| Bug Blocks: | |||
|
Description
Luis Tomas Bolivar
2018-05-18 11:49:32 UTC
As a workaround (for Openshift on Openstack deployments) until this BZ is delivered (OSP 13 Z-stream), kuryr controller healthcheck probes can be disabled in Openstack inventory file. inventory/group_vars/all.yml: # # You can also disable the kuryr controller and cni healthcheck probes by # uncommenting the next enable_kuryr_controller_probes: False This fix makes use of 'openstack region list', which is launched from the kuryr-controller (from a VM on the overcloud), and that command does not work in my actual environment, we are investigating it. It timeouts trying to reach the keystone admin interface (http://192.168.24.14:35357). $ openstack endpoint list +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+ ... | 1031aea7c323476a9232f7fbc3d6ca00 | regionOne | keystone | identity | True | internal | http://172.17.1.18:5000 | | 4731b04f24e24f4abf4eb7cb535bc45d | regionOne | keystone | identity | True | public | http://10.46.22.6:5000 | | c44762e7d5774d10adfc80a85802433e | regionOne | keystone | identity | True | admin | http://192.168.24.14:35357 | ... +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+ In my deployment I cannot reach keystone admin interface from the overcloud nodes. Still under investigation. Moving it back to assigned as it needs extra modifications on the kuryr-kubernetes side Verified in kuryr-controller https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-controller/images/13.0-81 image, with kuryr-kubernetes-controller rpm version: openstack-kuryr-kubernetes-controller-0.4.5-2.el7ost.noarch Kuryr controller probes must not be disabled in inventory/group_vars/all.yml: # # You can also disable the kuryr controller and cni healthcheck probes by # uncommenting the next #enable_kuryr_controller_probes: False After installing OCP (openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch) the kuryr controller healthchecks are working correctly. Make sure probes are configured in the deployment app: $ oc -n openshift-infra get deployment.apps kuryr-controller -o yaml | less livenessProbe: failureThreshold: 3 httpGet: path: /alive port: 8082 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: controller readinessProbe: failureThreshold: 3 httpGet: path: /ready port: 8082 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 $ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-b82dg 1/1 Running 0 1h default registry-console-1-lhpl8 1/1 Running 0 1h default router-1-7lx65 1/1 Running 0 1h kube-system master-api-master-0.openshift.example.com 1/1 Running 0 1h kube-system master-controllers-master-0.openshift.example.com 1/1 Running 1 1h kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 1h openshift-infra kuryr-cni-ds-6nplt 2/2 Running 0 1h openshift-infra kuryr-cni-ds-bmcdf 2/2 Running 0 1h openshift-infra kuryr-cni-ds-fwxn7 2/2 Running 0 1h openshift-infra kuryr-cni-ds-ktbr2 2/2 Running 0 1h openshift-infra kuryr-controller-65c98f7444-rbsvv 1/1 Running 0 1h openshift-node sync-49d4t 1/1 Running 0 1h openshift-node sync-fvxcl 1/1 Running 0 1h openshift-node sync-j8dpp 1/1 Running 0 1h openshift-node sync-mvmmb 1/1 Running 0 1h Checked in kuryr controller logs that liveness and readiness probes are executed correctly: $ oc -n openshift-infra logs kuryr-controller-65c98f7444-rbsvv |less 2018-10-24 11:12:04.533 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /ready HTTP/1.1" 200 - 2018-10-24 11:12:04.534 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /alive HTTP/1.1" 200 - 2018-10-24 11:12:14.480 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3611 I modified the Doc Text to make it match the final solution (generating a token instead of using the region list API) |