Bug 1734786

Summary: Installation fails customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found
Product: OpenShift Container Platform Reporter: Itzik Brown <itbrown>
Component: InstallerAssignee: Luis Tomas Bolivar <ltomasbo>
Installer sub component: openshift-ansible QA Contact: Itzik Brown <itbrown>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: alegrand, anpicker, erooth, gpei, ltomasbo, mloibl, pkrupa, surbania
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-13 14:09:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Itzik Brown 2019-07-31 12:36:43 UTC
Description of problem:
Installation of Openshift with RHOS14 with the latest kuryr images fails.

...
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left).
fatal: [master-0.openshift.example.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-t52VgQ/admin.kubeconfig"], "delta": "0:00:00.190286", "end": "2019-07-31 04:31:26.602360", "msg": "non-zero return code", "rc": 1, "start": "2019-07-31 04:31:26.412074", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []}


The Kuryr controller logs show:
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet [-] Namespace missing CRD annotations for selecting the corresponding subnet.: KeyError: 'openstack.org/kuryr-net-crd'
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet Traceback (most recent call last):
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/namespace_subnet.py", line 65, in _get_namespace_subnet_id
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet     net_crd_name = annotations[constants.K8S_ANNOTATION_NET_CRD]
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet KeyError: 'openstack.org/kuryr-net-crd'
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet ESC[00m
2019-07-31 08:16:32.730 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet [-] Namespace missing CRD annotations for selecting the corresponding subnet.: KeyError: 'openstack.org/kuryr-net-crd'



Version-Release number of the following components:
v3.11.134
openshift-ansible-3.11.134-1.git.0.18e5870.el7.noarch


How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Luis Tomas Bolivar 2019-08-01 10:26:21 UTC
This has nothing to do with monitoring operator, nor with the kuryr error shown on the kuryr-controller. Problem was on the kuryr-cni due to listening on a different port than the configured/expected one (due to using a newer kuryr version). This leads containers to not get proper networking, for instance:
Events:
  Type     Reason                  Age               From                                         Message
  ----     ------                  ----              ----                                         -------
  Normal   Scheduled               1m                default-scheduler                            Successfully assigned default/router-3-deploy to infra-node-0.openshift.example.com
  Warning  FailedCreatePodSandBox  1m                kubelet, infra-node-0.openshift.example.com  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ad145a5d727c0807e1730de0a67c956f9cdd3ac173c0004c59ba087266eb6226" network for pod "router-3-deploy": NetworkPlugin cni failed to set up pod "router-3-deploy_default" network: Looks like http://localhost:5036/addNetwork cannot be reached. Is kuryr-daemon running?: Post http://localhost:5036/addNetwork: dial tcp [::1]:5036: connect: connection refused, failed to clean up sandbox container "ad145a5d727c0807e1730de0a67c956f9cdd3ac173c0004c59ba087266eb6226" network for pod "router-3-deploy": NetworkPlugin cni failed to teardown pod "router-3-deploy_default" network: Looks like http://localhost:5036/delNetwork cannot be reached. Is kuryr-daemon running?: Post http://localhost:5036/delNetwork: dial tcp [::1]:5036: connect: connection refused]
  Normal   SandboxChanged          2s (x10 over 1m)  kubelet, infra-node-0.openshift.example.com  Pod sandbox changed, it will be killed and re-created.

Comment 6 Itzik Brown 2019-08-07 01:34:26 UTC
Checked with 3.11.136

Comment 8 errata-xmlrpc 2019-08-13 14:09:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2352