The Kuryr-controller healthcheck for verifying keystone connectivity performs a project list request, which by default requires admin rights. We need to move to another 'cheap' call that does not require admin rights, like region list.
As a workaround (for Openshift on Openstack deployments) until this BZ is delivered (OSP 13 Z-stream), kuryr controller healthcheck probes can be disabled in Openstack inventory file. inventory/group_vars/all.yml: # # You can also disable the kuryr controller and cni healthcheck probes by # uncommenting the next enable_kuryr_controller_probes: False
This fix makes use of 'openstack region list', which is launched from the kuryr-controller (from a VM on the overcloud), and that command does not work in my actual environment, we are investigating it. It timeouts trying to reach the keystone admin interface (http://192.168.24.14:35357). $ openstack endpoint list +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+ ... | 1031aea7c323476a9232f7fbc3d6ca00 | regionOne | keystone | identity | True | internal | http://172.17.1.18:5000 | | 4731b04f24e24f4abf4eb7cb535bc45d | regionOne | keystone | identity | True | public | http://10.46.22.6:5000 | | c44762e7d5774d10adfc80a85802433e | regionOne | keystone | identity | True | admin | http://192.168.24.14:35357 | ... +----------------------------------+-----------+--------------+----------------+---------+-----------+-----------------------------------------------+ In my deployment I cannot reach keystone admin interface from the overcloud nodes. Still under investigation.
Moving it back to assigned as it needs extra modifications on the kuryr-kubernetes side
Verified in kuryr-controller https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-controller/images/13.0-81 image, with kuryr-kubernetes-controller rpm version: openstack-kuryr-kubernetes-controller-0.4.5-2.el7ost.noarch Kuryr controller probes must not be disabled in inventory/group_vars/all.yml: # # You can also disable the kuryr controller and cni healthcheck probes by # uncommenting the next #enable_kuryr_controller_probes: False After installing OCP (openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch) the kuryr controller healthchecks are working correctly. Make sure probes are configured in the deployment app: $ oc -n openshift-infra get deployment.apps kuryr-controller -o yaml | less livenessProbe: failureThreshold: 3 httpGet: path: /alive port: 8082 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: controller readinessProbe: failureThreshold: 3 httpGet: path: /ready port: 8082 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 $ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-b82dg 1/1 Running 0 1h default registry-console-1-lhpl8 1/1 Running 0 1h default router-1-7lx65 1/1 Running 0 1h kube-system master-api-master-0.openshift.example.com 1/1 Running 0 1h kube-system master-controllers-master-0.openshift.example.com 1/1 Running 1 1h kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 1h openshift-infra kuryr-cni-ds-6nplt 2/2 Running 0 1h openshift-infra kuryr-cni-ds-bmcdf 2/2 Running 0 1h openshift-infra kuryr-cni-ds-fwxn7 2/2 Running 0 1h openshift-infra kuryr-cni-ds-ktbr2 2/2 Running 0 1h openshift-infra kuryr-controller-65c98f7444-rbsvv 1/1 Running 0 1h openshift-node sync-49d4t 1/1 Running 0 1h openshift-node sync-fvxcl 1/1 Running 0 1h openshift-node sync-j8dpp 1/1 Running 0 1h openshift-node sync-mvmmb 1/1 Running 0 1h Checked in kuryr controller logs that liveness and readiness probes are executed correctly: $ oc -n openshift-infra logs kuryr-controller-65c98f7444-rbsvv |less 2018-10-24 11:12:04.533 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /ready HTTP/1.1" 200 - 2018-10-24 11:12:04.534 1 INFO werkzeug [-] 192.168.99.12 - - [24/Oct/2018 11:12:04] "GET /alive HTTP/1.1" 200 - 2018-10-24 11:12:14.480 1 INFO kuryr_kubernetes.controller.managers.health [-] Kuryr Controller readiness verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3611
I modified the Doc Text to make it match the final solution (generating a token instead of using the region list API)