Bug 1734786 - Installation fails customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found
Summary: Installation fails customresourcedefinitions.apiextensions.k8s.io \"servicemo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Luis Tomas Bolivar
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-31 12:36 UTC by Itzik Brown
Modified: 2019-08-28 10:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-13 14:09:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift openshift-ansible pull 11785/ None None None 2019-12-09 08:44:29 UTC
Red Hat Product Errata RHBA-2019:2352 None None None 2019-08-13 14:09:24 UTC

Description Itzik Brown 2019-07-31 12:36:43 UTC
Description of problem:
Installation of Openshift with RHOS14 with the latest kuryr images fails.

...
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left).
fatal: [master-0.openshift.example.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-t52VgQ/admin.kubeconfig"], "delta": "0:00:00.190286", "end": "2019-07-31 04:31:26.602360", "msg": "non-zero return code", "rc": 1, "start": "2019-07-31 04:31:26.412074", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []}


The Kuryr controller logs show:
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet [-] Namespace missing CRD annotations for selecting the corresponding subnet.: KeyError: 'openstack.org/kuryr-net-crd'
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet Traceback (most recent call last):
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/namespace_subnet.py", line 65, in _get_namespace_subnet_id
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet     net_crd_name = annotations[constants.K8S_ANNOTATION_NET_CRD]
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet KeyError: 'openstack.org/kuryr-net-crd'
2019-07-31 08:16:32.717 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet ESC[00m
2019-07-31 08:16:32.730 1 ERROR kuryr_kubernetes.controller.drivers.namespace_subnet [-] Namespace missing CRD annotations for selecting the corresponding subnet.: KeyError: 'openstack.org/kuryr-net-crd'



Version-Release number of the following components:
v3.11.134
openshift-ansible-3.11.134-1.git.0.18e5870.el7.noarch


How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Luis Tomas Bolivar 2019-08-01 10:26:21 UTC
This has nothing to do with monitoring operator, nor with the kuryr error shown on the kuryr-controller. Problem was on the kuryr-cni due to listening on a different port than the configured/expected one (due to using a newer kuryr version). This leads containers to not get proper networking, for instance:
Events:
  Type     Reason                  Age               From                                         Message
  ----     ------                  ----              ----                                         -------
  Normal   Scheduled               1m                default-scheduler                            Successfully assigned default/router-3-deploy to infra-node-0.openshift.example.com
  Warning  FailedCreatePodSandBox  1m                kubelet, infra-node-0.openshift.example.com  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ad145a5d727c0807e1730de0a67c956f9cdd3ac173c0004c59ba087266eb6226" network for pod "router-3-deploy": NetworkPlugin cni failed to set up pod "router-3-deploy_default" network: Looks like http://localhost:5036/addNetwork cannot be reached. Is kuryr-daemon running?: Post http://localhost:5036/addNetwork: dial tcp [::1]:5036: connect: connection refused, failed to clean up sandbox container "ad145a5d727c0807e1730de0a67c956f9cdd3ac173c0004c59ba087266eb6226" network for pod "router-3-deploy": NetworkPlugin cni failed to teardown pod "router-3-deploy_default" network: Looks like http://localhost:5036/delNetwork cannot be reached. Is kuryr-daemon running?: Post http://localhost:5036/delNetwork: dial tcp [::1]:5036: connect: connection refused]
  Normal   SandboxChanged          2s (x10 over 1m)  kubelet, infra-node-0.openshift.example.com  Pod sandbox changed, it will be killed and re-created.

Comment 6 Itzik Brown 2019-08-07 01:34:26 UTC
Checked with 3.11.136

Comment 8 errata-xmlrpc 2019-08-13 14:09:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2352


Note You need to log in before you can comment on or make changes to this bug.