Bug 1640287
| Summary: | Director deployed OCP 3.11 fails during TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> |
| Component: | openstack-tripleo-common | Assignee: | Martin André <m.andre> |
| Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 14.0 (Rocky) | CC: | cshereme, dbecker, fedoraproject, gkumar, m.andre, mburns, morazi, racedoro, rbobek, slinaber, whewawal |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | 14.0 (Rocky) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-common-9.4.1-0.20181012010874.67bab16.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-11 11:54:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
There is an upstream patch that aims at adding the needed images for ocp3.11 to the prepare image workflow: https://review.openstack.org/#/c/610663/ We will also need a tht change to set the images, although in my tests I was able to deploy 3.11 just fine (it pulled the right images from the local registry on my undercloud without any additional setting). It looks like the workaround of uploading images only works when CNS is enabled. If CNS is not enabled deployment still fails on: TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] No doc text required. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 I see this in the latest RHOSP 14 without CNS. Workaround does not seems to apply in my case as all images mentioned in the workaround are already available in the undercloud. I can confirm with CNS i dont see this issue.
"TASK [openshift_cluster_monitoring_operator : Apply the cluster monitoring operator ServiceAccount, Roles and Alertmanager config] ***",
"\u001b[0;33mchanged: [openshift-openshiftmaster-0] => (item=cluster-monitoring-operator.yaml)\u001b[0m",
"",
"TASK [openshift_cluster_monitoring_operator : Process cluster-monitoring-operator configmap template] ***",
"\u001b[0;32mok: [openshift-openshiftmaster-0]\u001b[0m",
"",
"TASK [openshift_cluster_monitoring_operator : Create cluster-monitoring-operator configmap] ***",
"\u001b[0;33mchanged: [openshift-openshiftmaster-0]\u001b[0m",
"",
"TASK [openshift_cluster_monitoring_operator : Process cluster-monitoring-operator deployment template] ***",
"\u001b[0;33mchanged: [openshift-openshiftmaster-0]\u001b[0m",
"",
"TASK [openshift_cluster_monitoring_operator : Create cluster-monitoring-operator deployment] ***",
"\u001b[0;33mchanged: [openshift-openshiftmaster-0]\u001b[0m",
"",
"TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] ***",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (29 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (28 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (27 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (26 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (25 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (24 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (23 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (22 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (21 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (20 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (19 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (18 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (17 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (16 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (15 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (14 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (13 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (12 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (11 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (10 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (9 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (8 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (7 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (6 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (5 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (4 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (3 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (2 retries left).\u001b[0m",
"\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left).\u001b[0m",
"\u001b[0;31mfatal: [openshift-openshiftmaster-0]: FAILED! => {\"attempts\": 30, \"changed\": true, \"cmd\": [\"oc\", \"get\", \"crd\", \"servicemonitors.monitoring.coreos.com\", \"-n\", \"openshift-monitoring\", \"--config=/tmp/openshift-cluster-monitoring-ansible-DJrHTs/admin.kubeconfig\"], \"delta\": \"0:00:00.231587\", \"end\": \"2019-02-07 12:25:02.783756\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2019-02-07 12:25:02.552169\", \"stderr\": \"No resources found.\\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \\\"servicemonitors.monitoring.coreos.com\\\" not found\", \"stderr_lines\": [\"No resources found.\", \"Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \\\"servicemonitors.monitoring.coreos.com\\\" not found\"], \"stdout\": \"\", \"stdout_lines\": []}\u001b[0m",
"",
"PLAY RECAP *********************************************************************",
"\u001b[0;32mlocalhost\u001b[0m : \u001b[0;32mok=22 \u001b[0m changed=0 unreachable=0 failed=0 ",
"\u001b[0;33mopenshift-openshiftinfra-0\u001b[0m : \u001b[0;32mok=178 \u001b[0m \u001b[0;33mchanged=73 \u001b[0m unreachable=0 failed=0 ",
"\u001b[0;31mopenshift-openshiftmaster-0\u001b[0m : \u001b[0;32mok=671 \u001b[0m \u001b[0;33mchanged=279 \u001b[0m unreachable=0 \u001b[0;31mfailed=1 \u001b[0m",
"\u001b[0;33mopenshift-openshiftworker-0\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73 \u001b[0m unreachable=0 failed=0 ",
"",
"",
"INSTALLER STATUS ***************************************************************",
"\u001b[0;32mInitialization : Complete (0:01:01)\u001b[0m",
"\u001b[0;32mHealth Check : Complete (0:00:46)\u001b[0m",
"\u001b[0;32mNode Bootstrap Preparation : Complete (0:06:35)\u001b[0m",
"\u001b[0;32metcd Install : Complete (0:01:07)\u001b[0m",
"\u001b[0;32mMaster Install : Complete (0:06:14)\u001b[0m",
"\u001b[0;32mMaster Additional Install : Complete (0:05:35)\u001b[0m",
"\u001b[0;32mNode Join : Complete (0:00:54)\u001b[0m",
"\u001b[0;32mHosted Install : Complete (0:01:21)\u001b[0m",
"\u001b[0;31mCluster Monitoring Operator : In Progress (0:15:28)\u001b[0m",
"\tThis phase can be restarted by running: playbooks/openshift-monitoring/config.yml",
"",
"",
"Failure summary:",
"",
"",
" 1. Hosts: openshift-openshiftmaster-0",
" Play: Configure Cluster Monitoring Operator",
" Task: Wait for the ServiceMonitor CRD to be created",
" Message: \u001b[0;31mnon-zero return code\u001b[0m"
]
|---> warnings: [
"Consider using 'become', 'become_method', and 'become_user' rather than running sudo"
]
(undercloud) [stack@undercloud ~]$
(undercloud) [stack@undercloud ~]$ curl -s http://172.16.0.1:8787/v2/openshift3/ose-ansible/tags/list
{"name":"openshift3/ose-ansible","tags":["v3.11.69-5"]}
(undercloud) [stack@undercloud ~]$
(undercloud) [stack@undercloud ~]$
(undercloud) [stack@undercloud ~]$
(undercloud) [stack@undercloud ~]$
(undercloud) [stack@undercloud ~]$
(undercloud) [stack@undercloud ~]$ rpm -qa | tripleo
-bash: tripleo: command not found
(undercloud) [stack@undercloud ~]$ rpm -qa | grep -i tripleo
openstack-tripleo-validations-9.3.1-0.20181008110759.4064fb7.el7ost.noarch
openstack-tripleo-image-elements-9.0.1-0.20181007200835.el7ost.noarch
python2-tripleo-common-9.4.1-0.20181012010888.el7ost.noarch
python-tripleoclient-10.6.1-0.20181010222413.8c8f259.el7ost.noarch
openstack-tripleo-common-9.4.1-0.20181012010888.el7ost.noarch
ansible-role-tripleo-modify-image-1.0.1-0.20181011160036.48a56c1.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20181007201103.daf9069.el7ost.noarch
openstack-tripleo-heat-templates-9.0.1-0.20181013060908.el7ost.noarch
ansible-tripleo-ipsec-9.0.1-0.20181012162415.8b37e93.el7ost.noarch
python-tripleoclient-heat-installer-10.6.1-0.20181010222413.8c8f259.el7ost.noarch
puppet-tripleo-9.3.1-0.20181010034754.157eaab.el7ost.noarch
openstack-tripleo-common-containers-9.4.1-0.20181012010888.el7ost.noarch
It's very likely the same symptom but different cause. If you still have the environment available, could you connect to one of the master nodes and list all pods with: $ sudo oc get pods --all-namespaces If you see a pod in a failing state, you can get some info on why it failed with: $ sudo oc describe pod <pod_name> --namespace <namespace> Please see below,it seems the deployer is looking for a slightly higher version of the image that is available in the undercloud registry. We use latest tag in the containers-prepare-parameter.yaml
[root@openshift-openshiftmaster-0 ~]# oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default docker-registry-1-rrl9s 1/1 Running 0 18h
default registry-console-1-74hhb 1/1 Running 0 18h
default router-1-6mz9l 1/1 Running 0 18h
kube-system master-api-openshift-openshiftmaster-0 1/1 Running 0 18h
kube-system master-controllers-openshift-openshiftmaster-0 1/1 Running 0 18h
kube-system master-etcd-openshift-openshiftmaster-0 1/1 Running 0 18h
openshift-infra bootstrap-autoapprover-0 1/1 Running 0 18h
openshift-monitoring cluster-monitoring-operator-5bf96f5984-2zpp9 0/1 ImagePullBackOff 0 18h
openshift-node sync-7tlkh 1/1 Running 0 18h
openshift-node sync-stzhx 1/1 Running 0 18h
openshift-node sync-x8cgb 1/1 Running 0 18h
openshift-sdn ovs-6zxhw 1/1 Running 0 18h
openshift-sdn ovs-jtgzg 1/1 Running 0 18h
openshift-sdn ovs-xz27q 1/1 Running 0 18h
openshift-sdn sdn-d49mp 1/1 Running 0 18h
openshift-sdn sdn-swh6r 1/1 Running 0 18h
openshift-sdn sdn-zf6cs 1/1 Running 0 18h
[root@openshift-openshiftmaster-0 ~]# oc describe pod cluster-monitoring-operator-5bf96f5984-2zpp9 -n openshift-monitoring
Name: cluster-monitoring-operator-5bf96f5984-2zpp9
Namespace: openshift-monitoring
Priority: 0
PriorityClassName: <none>
Node: openshift-openshiftinfra-0/172.17.1.15
Start Time: Thu, 07 Feb 2019 12:09:49 -0500
Labels: app=cluster-monitoring-operator
pod-template-hash=1695291540
Annotations: openshift.io/scc=restricted
Status: Pending
IP: 10.129.0.31
Controlled By: ReplicaSet/cluster-monitoring-operator-5bf96f5984
Containers:
cluster-monitoring-operator:
Container ID:
Image: 172.16.0.1:8787/openshift3/ose-cluster-monitoring-operator:v3.11.69-3
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Args:
-namespace=openshift-monitoring
-configmap=cluster-monitoring-config
-logtostderr=true
-v=4
-tags=prometheus-operator=v3.11.69-3
-tags=prometheus-config-reloader=v3.11.69-3
-tags=config-reloader=v3.11.69-3
-tags=prometheus=v3.11.69-3
-tags=alertmanager=v3.11.69-3
-tags=grafana=v3.11.69-3
-tags=oauth-proxy=v3.11.69-3
-tags=node-exporter=v3.11.69-3
-tags=kube-state-metrics=v3.11.69-3
-tags=kube-rbac-proxy=v3.11.69-3
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Limits:
cpu: 20m
memory: 50Mi
Requests:
cpu: 20m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from cluster-monitoring-operator-token-rl8wl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cluster-monitoring-operator-token-rl8wl:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-monitoring-operator-token-rl8wl
Optional: false
QoS Class: Guaranteed
Node-Selectors: node-role.kubernetes.io/infra=true
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 7m (x4854 over 18h) kubelet, openshift-openshiftinfra-0 Back-off pulling image "172.16.0.1:8787/openshift3/ose-cluster-monitoring-operator:v3.11.69-3"
Warning Failed 2m (x4876 over 18h) kubelet, openshift-openshiftinfra-0 Error: ImagePullBackOff
[root@openshift-openshiftmaster-0 ~]#
[root@openshift-openshiftmaster-0 ~]# oc get events -n openshift-monitoring
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
10m 18h 4854 cluster-monitoring-operator-5bf96f5984-2zpp9.1581240f3cfc0699 Pod spec.containers{cluster-monitoring-operator} Normal BackOff kubelet, openshift-openshiftinfra-0 Back-off pulling image "172.16.0.1:8787/openshift3/ose-cluster-monitoring-operator:v3.11.69-3"
52s 18h 4897 cluster-monitoring-operator-5bf96f5984-2zpp9.1581240f3cfc3563 Pod spec.containers{cluster-monitoring-operator} Warning Failed kubelet, openshift-openshiftinfra-0 Error: ImagePullBackOff
[root@openshift-openshiftmaster-0 ~]#
###### image registry in undercloud
stack@undercloud ~]$ curl -s http://172.16.0.1:8787/v2/openshift3/ose-cluster-monitoring-operator/tags/list | jq .tags
[
"v3.11.69-11"
]
[stack@undercloud ~]$ cat wasantha/templates/containers-prepare-parameter.yaml
# Generated with the following on 2019-02-04T14:30:36.649946
#
# openstack tripleo container image prepare default --local-push-destination --output-env-file containers-prepare-parameter.yaml
#
parameter_defaults:
ContainerImagePrepare:
- push_destination: true
set:
ceph_image: rhceph-3-rhel7
ceph_namespace: registry.access.redhat.com/rhceph
ceph_tag: latest
name_prefix: openstack-
name_suffix: ''
namespace: registry.access.redhat.com/rhosp14
neutron_driver: null
openshift_asb_namespace: registry.access.redhat.com/openshift3
openshift_asb_tag: v3.11
openshift_cluster_monitoring_image: ose-cluster-monitoring-operator
openshift_cluster_monitoring_namespace: registry.access.redhat.com/openshift3
openshift_cluster_monitoring_tag: v3.11
openshift_cockpit_image: registry-console
openshift_cockpit_namespace: registry.access.redhat.com/openshift3
openshift_cockpit_tag: v3.11
openshift_configmap_reload_image: ose-configmap-reloader
openshift_configmap_reload_namespace: registry.access.redhat.com/openshift3
openshift_configmap_reload_tag: v3.11
openshift_etcd_image: etcd
openshift_etcd_namespace: registry.access.redhat.com/rhel7
openshift_etcd_tag: latest
openshift_gluster_block_image: rhgs-gluster-block-prov-rhel7
openshift_gluster_image: rhgs-server-rhel7
openshift_gluster_namespace: registry.access.redhat.com/rhgs3
openshift_gluster_tag: latest
openshift_grafana_namespace: registry.access.redhat.com/openshift3
openshift_grafana_tag: v3.11
openshift_heketi_image: rhgs-volmanager-rhel7
openshift_heketi_namespace: registry.access.redhat.com/rhgs3
openshift_heketi_tag: latest
openshift_kube_rbac_proxy_image: ose-kube-rbac-proxy
openshift_kube_rbac_proxy_namespace: registry.access.redhat.com/openshift3
openshift_kube_rbac_proxy_tag: v3.11
openshift_kube_state_metrics_image: ose-kube-state-metrics
openshift_kube_state_metrics_namespace: registry.access.redhat.com/openshift3
openshift_kube_state_metrics_tag: v3.11
openshift_namespace: registry.access.redhat.com/openshift3
openshift_oauth_proxy_tag: v3.11
openshift_prefix: ose
openshift_prometheus_alertmanager_tag: v3.11
openshift_prometheus_config_reload_image: ose-prometheus-config-reloader
openshift_prometheus_config_reload_namespace: registry.access.redhat.com/openshift3
openshift_prometheus_config_reload_tag: v3.11
openshift_prometheus_node_exporter_tag: v3.11
openshift_prometheus_operator_image: ose-prometheus-operator
openshift_prometheus_operator_namespace: registry.access.redhat.com/openshift3
openshift_prometheus_operator_tag: v3.11
openshift_prometheus_tag: v3.11
openshift_tag: v3.11
tag: latest
tag_from_label: '{version}-{release}'
[stack@undercloud ~]$
[stack@undercloud ~]$
The issue you're seeing seems to have the same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1659183, container images have non consistent tags, where the release differs. Could try retagging the images locally see if this goes through with the deployment? Thank you that worked, i had to retag cluster-monitoring-operator and grafana. |
Description of problem: Director deployed OCP 3.11 fails during TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] *** FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (29 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (28 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (27 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (26 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (25 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (24 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (23 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (22 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (21 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (20 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (19 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (18 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (17 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (16 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (15 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (14 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (13 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (12 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (11 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (10 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (9 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (8 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (7 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (6 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (5 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (4 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (3 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (2 retries left). FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left). fatal: [openshift-master-2]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-9qz1v8/admin.kubeconfig"], "delta": "0:00:00.240740", "end": "2018-10-17 12:17:51.308115", "msg": "non-zero return code", "rc": 1, "start": "2018-10-17 12:17:51.067375", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []} PLAY RECAP ********************************************************************* localhost : ok=26 changed=0 unreachable=0 failed=0 openshift-infra-0 : ok=169 changed=70 unreachable=0 failed=0 openshift-infra-1 : ok=169 changed=70 unreachable=0 failed=0 openshift-infra-2 : ok=171 changed=70 unreachable=0 failed=0 openshift-master-0 : ok=339 changed=145 unreachable=0 failed=0 openshift-master-1 : ok=339 changed=145 unreachable=0 failed=0 openshift-master-2 : ok=781 changed=336 unreachable=0 failed=1 openshift-worker-0 : ok=169 changed=70 unreachable=0 failed=0 openshift-worker-1 : ok=169 changed=70 unreachable=0 failed=0 openshift-worker-2 : ok=169 changed=70 unreachable=0 failed=0 INSTALLER STATUS *************************************************************** Initialization : Complete (0:02:04) Health Check : Complete (0:00:41) Node Bootstrap Preparation : Complete (0:05:50) etcd Install : Complete (0:01:46) Master Install : Complete (0:08:22) Master Additional Install : Complete (0:01:17) Node Join : Complete (0:00:44) GlusterFS Install : Complete (0:09:19) Hosted Install : Complete (0:01:46) Cluster Monitoring Operator : In Progress (0:15:48) This phase can be restarted by running: playbooks/openshift-monitoring/config.yml Failure summary: 1. Hosts: openshift-master-2 Play: Configure Cluster Monitoring Operator Task: Wait for the ServiceMonitor CRD to be created Message: non-zero return code Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-9.0.0-0.20181001174822.90afd18.0rc2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OCP 3.11 with OSP14 Actual results: Fails. Expected results: No failure. Additional info: It looks like that failure is caused by missing images. We should upload the require images automatically during deployment. Workaround: for image in ose-cluster-monitoring-operator ose-prometheus-operator grafana oauth-proxy ose-prometheus-config-reloader prometheus ose-configmap-reloader prometheus-alertmanager prometheus-node-exporter ose-kube-rbac-proxy ose-kube-state-metrics ose-console; do skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/$image:v3.11 docker://192.168.24.1:8787/openshift3/$image:v3.11; done