Description of problem: Installation of Openshift with RHOS14 with the latest kuryr images fails. ... FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left). fatal: [master-0.openshift.example.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-t52VgQ/admin.kubeconfig"], "delta": "0:00:00.190286", "end": "2019-07-31 04:31:26.602360", "msg": "non-zero return code", "rc": 1, "start": "2019-07-31 04:31:26.412074", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []} The Kuryr controller logs show: 2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet [-] Namespace missing CRD annotations for selecting the corresponding subnet.: KeyError: 'openstack.org/kuryr-net-crd' 2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet Traceback (most recent call last): 2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/namespace_subnet.py", line 65, in _get_namespace_subnet_id 2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet net_crd_name = annotations[constants.K8S_ANNOTATION_NET_CRD] 2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet KeyError: 'openstack.org/kuryr-net-crd' 2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet ESC[00m 2019-07-31 08:16:32.730 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet [-] Namespace missing CRD annotations for selecting the corresponding subnet.: KeyError: 'openstack.org/kuryr-net-crd' Version-Release number of the following components: v3.11.134 openshift-ansible-3.11.134-1.git.0.18e5870.el7.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
This has nothing to do with monitoring operator, nor with the kuryr error shown on the kuryr-controller. Problem was on the kuryr-cni due to listening on a different port than the configured/expected one (due to using a newer kuryr version). This leads containers to not get proper networking, for instance: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1m default-scheduler Successfully assigned default/router-3-deploy to infra-node-0.openshift.example.com Warning FailedCreatePodSandBox 1m kubelet, infra-node-0.openshift.example.com Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ad145a5d727c0807e1730de0a67c956f9cdd3ac173c0004c59ba087266eb6226" network for pod "router-3-deploy": NetworkPlugin cni failed to set up pod "router-3-deploy_default" network: Looks like http://localhost:5036/addNetwork cannot be reached. Is kuryr-daemon running?: Post http://localhost:5036/addNetwork: dial tcp [::1]:5036: connect: connection refused, failed to clean up sandbox container "ad145a5d727c0807e1730de0a67c956f9cdd3ac173c0004c59ba087266eb6226" network for pod "router-3-deploy": NetworkPlugin cni failed to teardown pod "router-3-deploy_default" network: Looks like http://localhost:5036/delNetwork cannot be reached. Is kuryr-daemon running?: Post http://localhost:5036/delNetwork: dial tcp [::1]:5036: connect: connection refused] Normal SandboxChanged 2s (x10 over 1m) kubelet, infra-node-0.openshift.example.com Pod sandbox changed, it will be killed and re-created.
Checked with 3.11.136
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2352