1640287 – Director deployed OCP 3.11 fails during TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created]

Bug 1640287 - Director deployed OCP 3.11 fails during TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created]

Summary: Director deployed OCP 3.11 fails during TASK [openshift_cluster_monitoring_op...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	beta
Target Release:	14.0 (Rocky)
Assignee:	Martin André
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-17 19:01 UTC by Marius Cornea
Modified:	2022-03-13 15:47 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openstack-tripleo-common-9.4.1-0.20181012010874.67bab16.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-11 11:54:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	610663	'None'	MERGED	Add container images for openshift 3.11	2020-12-05 15:47:11 UTC
Red Hat Issue Tracker	OSP-11709	None	None	None	2021-12-10 18:07:36 UTC
Red Hat Product Errata	RHEA-2019:0045	None	None	None	2019-01-11 11:54:18 UTC

Description Marius Cornea 2018-10-17 19:01:32 UTC

Description of problem:
Director deployed OCP 3.11 fails during TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] 

TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] ***
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (29 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (28 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (27 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (26 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (25 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (24 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (23 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (22 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (21 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (20 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (19 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (18 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (17 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (16 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (15 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (14 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (13 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (12 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (11 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (10 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (9 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (8 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (7 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (6 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (5 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (4 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (3 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (2 retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left).
fatal: [openshift-master-2]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-9qz1v8/admin.kubeconfig"], "delta": "0:00:00.240740", "end": "2018-10-17 12:17:51.308115", "msg": "non-zero return code", "rc": 1, "start": "2018-10-17 12:17:51.067375", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
localhost                  : ok=26   changed=0    unreachable=0    failed=0   
openshift-infra-0          : ok=169  changed=70   unreachable=0    failed=0   
openshift-infra-1          : ok=169  changed=70   unreachable=0    failed=0   
openshift-infra-2          : ok=171  changed=70   unreachable=0    failed=0   
openshift-master-0         : ok=339  changed=145  unreachable=0    failed=0   
openshift-master-1         : ok=339  changed=145  unreachable=0    failed=0   
openshift-master-2         : ok=781  changed=336  unreachable=0    failed=1   
openshift-worker-0         : ok=169  changed=70   unreachable=0    failed=0   
openshift-worker-1         : ok=169  changed=70   unreachable=0    failed=0   
openshift-worker-2         : ok=169  changed=70   unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************
Initialization               : Complete (0:02:04)
Health Check                 : Complete (0:00:41)
Node Bootstrap Preparation   : Complete (0:05:50)
etcd Install                 : Complete (0:01:46)
Master Install               : Complete (0:08:22)
Master Additional Install    : Complete (0:01:17)
Node Join                    : Complete (0:00:44)
GlusterFS Install            : Complete (0:09:19)
Hosted Install               : Complete (0:01:46)
Cluster Monitoring Operator  : In Progress (0:15:48)
	This phase can be restarted by running: playbooks/openshift-monitoring/config.yml


Failure summary:


  1. Hosts:    openshift-master-2
     Play:     Configure Cluster Monitoring Operator
     Task:     Wait for the ServiceMonitor CRD to be created
     Message:  non-zero return code


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.0-0.20181001174822.90afd18.0rc2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OCP 3.11 with OSP14


Actual results:
Fails.

Expected results:
No failure.

Additional info:

It looks like that failure is caused by missing images. We should upload the require images automatically during deployment.

Workaround: for image in ose-cluster-monitoring-operator ose-prometheus-operator grafana oauth-proxy ose-prometheus-config-reloader prometheus ose-configmap-reloader prometheus-alertmanager prometheus-node-exporter ose-kube-rbac-proxy ose-kube-state-metrics ose-console; do skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/$image:v3.11 docker://192.168.24.1:8787/openshift3/$image:v3.11; done

Comment 1 Martin André 2018-10-18 11:24:51 UTC

There is an upstream patch that aims at adding the needed images for ocp3.11 to the prepare image workflow: https://review.openstack.org/#/c/610663/

We will also need a tht change to set the images, although in my tests I was able to deploy 3.11 just fine (it pulled the right images from the local registry on my undercloud without any additional setting).

Comment 2 Marius Cornea 2018-10-25 00:38:04 UTC

It looks like the workaround of uploading images only works when CNS is enabled. If CNS is not enabled deployment still fails on:
TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created]

Comment 11 Martin André 2019-01-10 10:22:05 UTC

No doc text required.

Comment 12 errata-xmlrpc 2019-01-11 11:54:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Comment 13 wasantha gamage 2019-02-07 20:02:03 UTC

I see this in the latest RHOSP 14 without CNS. Workaround does not seems to apply in my case as all images mentioned in the workaround are already available in the undercloud. I can confirm with CNS i dont see this issue.

    "TASK [openshift_cluster_monitoring_operator : Apply the cluster monitoring operator ServiceAccount, Roles and Alertmanager config] ***",
    "\u001b[0;33mchanged: [openshift-openshiftmaster-0] => (item=cluster-monitoring-operator.yaml)\u001b[0m",
    "",
    "TASK [openshift_cluster_monitoring_operator : Process cluster-monitoring-operator configmap template] ***",
    "\u001b[0;32mok: [openshift-openshiftmaster-0]\u001b[0m",
    "",
    "TASK [openshift_cluster_monitoring_operator : Create cluster-monitoring-operator configmap] ***",
    "\u001b[0;33mchanged: [openshift-openshiftmaster-0]\u001b[0m",
    "",
    "TASK [openshift_cluster_monitoring_operator : Process cluster-monitoring-operator deployment template] ***",
    "\u001b[0;33mchanged: [openshift-openshiftmaster-0]\u001b[0m",
    "",
    "TASK [openshift_cluster_monitoring_operator : Create cluster-monitoring-operator deployment] ***",
    "\u001b[0;33mchanged: [openshift-openshiftmaster-0]\u001b[0m",
    "",
    "TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] ***",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (29 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (28 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (27 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (26 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (25 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (24 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (23 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (22 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (21 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (20 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (19 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (18 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (17 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (16 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (15 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (14 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (13 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (12 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (11 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (10 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (9 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (8 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (7 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (6 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (5 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (4 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (3 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (2 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left).\u001b[0m",
    "\u001b[0;31mfatal: [openshift-openshiftmaster-0]: FAILED! => {\"attempts\": 30, \"changed\": true, \"cmd\": [\"oc\", \"get\", \"crd\", \"servicemonitors.monitoring.coreos.com\", \"-n\", \"openshift-monitoring\", \"--config=/tmp/openshift-cluster-monitoring-ansible-DJrHTs/admin.kubeconfig\"], \"delta\": \"0:00:00.231587\", \"end\": \"2019-02-07 12:25:02.783756\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2019-02-07 12:25:02.552169\", \"stderr\": \"No resources found.\\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \\\"servicemonitors.monitoring.coreos.com\\\" not found\", \"stderr_lines\": [\"No resources found.\", \"Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \\\"servicemonitors.monitoring.coreos.com\\\" not found\"], \"stdout\": \"\", \"stdout_lines\": []}\u001b[0m",
    "",
    "PLAY RECAP *********************************************************************",
    "\u001b[0;32mlocalhost\u001b[0m                  : \u001b[0;32mok=22  \u001b[0m changed=0    unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftinfra-0\u001b[0m : \u001b[0;32mok=178 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;31mopenshift-openshiftmaster-0\u001b[0m : \u001b[0;32mok=671 \u001b[0m \u001b[0;33mchanged=279 \u001b[0m unreachable=0    \u001b[0;31mfailed=1   \u001b[0m",
    "\u001b[0;33mopenshift-openshiftworker-0\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "",
    "",
    "INSTALLER STATUS ***************************************************************",
    "\u001b[0;32mInitialization               : Complete (0:01:01)\u001b[0m",
    "\u001b[0;32mHealth Check                 : Complete (0:00:46)\u001b[0m",
    "\u001b[0;32mNode Bootstrap Preparation   : Complete (0:06:35)\u001b[0m",
    "\u001b[0;32metcd Install                 : Complete (0:01:07)\u001b[0m",
    "\u001b[0;32mMaster Install               : Complete (0:06:14)\u001b[0m",
    "\u001b[0;32mMaster Additional Install    : Complete (0:05:35)\u001b[0m",
    "\u001b[0;32mNode Join                    : Complete (0:00:54)\u001b[0m",
    "\u001b[0;32mHosted Install               : Complete (0:01:21)\u001b[0m",
    "\u001b[0;31mCluster Monitoring Operator  : In Progress (0:15:28)\u001b[0m",
    "\tThis phase can be restarted by running: playbooks/openshift-monitoring/config.yml",
    "",
    "",
    "Failure summary:",
    "",
    "",
    "  1. Hosts:    openshift-openshiftmaster-0",
    "     Play:     Configure Cluster Monitoring Operator",
    "     Task:     Wait for the ServiceMonitor CRD to be created",
    "     Message:  \u001b[0;31mnon-zero return code\u001b[0m"
]
|---> warnings: [
    "Consider using 'become', 'become_method', and 'become_user' rather than running sudo"
]


(undercloud) [stack@undercloud ~]$ 
(undercloud) [stack@undercloud ~]$ curl -s http://172.16.0.1:8787/v2/openshift3/ose-ansible/tags/list 
{"name":"openshift3/ose-ansible","tags":["v3.11.69-5"]}
(undercloud) [stack@undercloud ~]$ 
(undercloud) [stack@undercloud ~]$ 
(undercloud) [stack@undercloud ~]$ 
(undercloud) [stack@undercloud ~]$ 
(undercloud) [stack@undercloud ~]$ 
(undercloud) [stack@undercloud ~]$ rpm -qa | tripleo
-bash: tripleo: command not found
(undercloud) [stack@undercloud ~]$ rpm -qa | grep -i tripleo
openstack-tripleo-validations-9.3.1-0.20181008110759.4064fb7.el7ost.noarch
openstack-tripleo-image-elements-9.0.1-0.20181007200835.el7ost.noarch
python2-tripleo-common-9.4.1-0.20181012010888.el7ost.noarch
python-tripleoclient-10.6.1-0.20181010222413.8c8f259.el7ost.noarch
openstack-tripleo-common-9.4.1-0.20181012010888.el7ost.noarch
ansible-role-tripleo-modify-image-1.0.1-0.20181011160036.48a56c1.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20181007201103.daf9069.el7ost.noarch
openstack-tripleo-heat-templates-9.0.1-0.20181013060908.el7ost.noarch
ansible-tripleo-ipsec-9.0.1-0.20181012162415.8b37e93.el7ost.noarch
python-tripleoclient-heat-installer-10.6.1-0.20181010222413.8c8f259.el7ost.noarch
puppet-tripleo-9.3.1-0.20181010034754.157eaab.el7ost.noarch
openstack-tripleo-common-containers-9.4.1-0.20181012010888.el7ost.noarch

Comment 14 Martin André 2019-02-08 07:15:33 UTC

It's very likely the same symptom but different cause. If you still have the environment available, could you connect to one of the master nodes and list all pods with:
$ sudo oc get pods --all-namespaces

If you see a pod in a failing state, you can get some info on why it failed with:
$ sudo oc describe pod <pod_name> --namespace <namespace>

Comment 15 wasantha gamage 2019-02-08 12:05:12 UTC

Please see below,it seems the deployer is looking for a slightly higher version of the image that is available in the undercloud registry. We use latest tag in the containers-prepare-parameter.yaml

[root@openshift-openshiftmaster-0 ~]# oc get pods --all-namespaces
NAMESPACE              NAME                                             READY     STATUS             RESTARTS   AGE
default                docker-registry-1-rrl9s                          1/1       Running            0          18h
default                registry-console-1-74hhb                         1/1       Running            0          18h
default                router-1-6mz9l                                   1/1       Running            0          18h
kube-system            master-api-openshift-openshiftmaster-0           1/1       Running            0          18h
kube-system            master-controllers-openshift-openshiftmaster-0   1/1       Running            0          18h
kube-system            master-etcd-openshift-openshiftmaster-0          1/1       Running            0          18h
openshift-infra        bootstrap-autoapprover-0                         1/1       Running            0          18h
openshift-monitoring   cluster-monitoring-operator-5bf96f5984-2zpp9     0/1       ImagePullBackOff   0          18h
openshift-node         sync-7tlkh                                       1/1       Running            0          18h
openshift-node         sync-stzhx                                       1/1       Running            0          18h
openshift-node         sync-x8cgb                                       1/1       Running            0          18h
openshift-sdn          ovs-6zxhw                                        1/1       Running            0          18h
openshift-sdn          ovs-jtgzg                                        1/1       Running            0          18h
openshift-sdn          ovs-xz27q                                        1/1       Running            0          18h
openshift-sdn          sdn-d49mp                                        1/1       Running            0          18h
openshift-sdn          sdn-swh6r                                        1/1       Running            0          18h
openshift-sdn          sdn-zf6cs                                        1/1       Running            0          18h
[root@openshift-openshiftmaster-0 ~]# oc describe pod cluster-monitoring-operator-5bf96f5984-2zpp9 -n openshift-monitoring
Name:               cluster-monitoring-operator-5bf96f5984-2zpp9
Namespace:          openshift-monitoring
Priority:           0
PriorityClassName:  <none>
Node:               openshift-openshiftinfra-0/172.17.1.15
Start Time:         Thu, 07 Feb 2019 12:09:49 -0500
Labels:             app=cluster-monitoring-operator
                    pod-template-hash=1695291540
Annotations:        openshift.io/scc=restricted
Status:             Pending
IP:                 10.129.0.31
Controlled By:      ReplicaSet/cluster-monitoring-operator-5bf96f5984
Containers:
  cluster-monitoring-operator:
    Container ID:  
    Image:         172.16.0.1:8787/openshift3/ose-cluster-monitoring-operator:v3.11.69-3
    Image ID:      
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      -namespace=openshift-monitoring
      -configmap=cluster-monitoring-config
      -logtostderr=true
      -v=4
      -tags=prometheus-operator=v3.11.69-3
      -tags=prometheus-config-reloader=v3.11.69-3
      -tags=config-reloader=v3.11.69-3
      -tags=prometheus=v3.11.69-3
      -tags=alertmanager=v3.11.69-3
      -tags=grafana=v3.11.69-3
      -tags=oauth-proxy=v3.11.69-3
      -tags=node-exporter=v3.11.69-3
      -tags=kube-state-metrics=v3.11.69-3
      -tags=kube-rbac-proxy=v3.11.69-3
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     20m
      memory:  50Mi
    Requests:
      cpu:        20m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from cluster-monitoring-operator-token-rl8wl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cluster-monitoring-operator-token-rl8wl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-monitoring-operator-token-rl8wl
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  node-role.kubernetes.io/infra=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason   Age                  From                                 Message
  ----     ------   ----                 ----                                 -------
  Normal   BackOff  7m (x4854 over 18h)  kubelet, openshift-openshiftinfra-0  Back-off pulling image "172.16.0.1:8787/openshift3/ose-cluster-monitoring-operator:v3.11.69-3"
  Warning  Failed   2m (x4876 over 18h)  kubelet, openshift-openshiftinfra-0  Error: ImagePullBackOff
[root@openshift-openshiftmaster-0 ~]# 
[root@openshift-openshiftmaster-0 ~]# oc get events -n openshift-monitoring
LAST SEEN   FIRST SEEN   COUNT     NAME                                                            KIND      SUBOBJECT                                      TYPE      REASON    SOURCE                                MESSAGE
10m         18h          4854      cluster-monitoring-operator-5bf96f5984-2zpp9.1581240f3cfc0699   Pod       spec.containers{cluster-monitoring-operator}   Normal    BackOff   kubelet, openshift-openshiftinfra-0   Back-off pulling image "172.16.0.1:8787/openshift3/ose-cluster-monitoring-operator:v3.11.69-3"
52s         18h          4897      cluster-monitoring-operator-5bf96f5984-2zpp9.1581240f3cfc3563   Pod       spec.containers{cluster-monitoring-operator}   Warning   Failed    kubelet, openshift-openshiftinfra-0   Error: ImagePullBackOff
[root@openshift-openshiftmaster-0 ~]# 


###### image registry in undercloud

stack@undercloud ~]$ curl -s http://172.16.0.1:8787/v2/openshift3/ose-cluster-monitoring-operator/tags/list | jq .tags
[
  "v3.11.69-11"
]

[stack@undercloud ~]$ cat wasantha/templates/containers-prepare-parameter.yaml 
# Generated with the following on 2019-02-04T14:30:36.649946
#
#   openstack tripleo container image prepare default --local-push-destination --output-env-file containers-prepare-parameter.yaml
#

parameter_defaults:
  ContainerImagePrepare:
  - push_destination: true
    set:
      ceph_image: rhceph-3-rhel7
      ceph_namespace: registry.access.redhat.com/rhceph
      ceph_tag: latest
      name_prefix: openstack-
      name_suffix: ''
      namespace: registry.access.redhat.com/rhosp14
      neutron_driver: null
      openshift_asb_namespace: registry.access.redhat.com/openshift3
      openshift_asb_tag: v3.11
      openshift_cluster_monitoring_image: ose-cluster-monitoring-operator
      openshift_cluster_monitoring_namespace: registry.access.redhat.com/openshift3
      openshift_cluster_monitoring_tag: v3.11
      openshift_cockpit_image: registry-console
      openshift_cockpit_namespace: registry.access.redhat.com/openshift3
      openshift_cockpit_tag: v3.11
      openshift_configmap_reload_image: ose-configmap-reloader
      openshift_configmap_reload_namespace: registry.access.redhat.com/openshift3
      openshift_configmap_reload_tag: v3.11
      openshift_etcd_image: etcd
      openshift_etcd_namespace: registry.access.redhat.com/rhel7
      openshift_etcd_tag: latest
      openshift_gluster_block_image: rhgs-gluster-block-prov-rhel7
      openshift_gluster_image: rhgs-server-rhel7
      openshift_gluster_namespace: registry.access.redhat.com/rhgs3
      openshift_gluster_tag: latest
      openshift_grafana_namespace: registry.access.redhat.com/openshift3
      openshift_grafana_tag: v3.11
      openshift_heketi_image: rhgs-volmanager-rhel7
      openshift_heketi_namespace: registry.access.redhat.com/rhgs3
      openshift_heketi_tag: latest
      openshift_kube_rbac_proxy_image: ose-kube-rbac-proxy
      openshift_kube_rbac_proxy_namespace: registry.access.redhat.com/openshift3
      openshift_kube_rbac_proxy_tag: v3.11
      openshift_kube_state_metrics_image: ose-kube-state-metrics
      openshift_kube_state_metrics_namespace: registry.access.redhat.com/openshift3
      openshift_kube_state_metrics_tag: v3.11
      openshift_namespace: registry.access.redhat.com/openshift3
      openshift_oauth_proxy_tag: v3.11
      openshift_prefix: ose
      openshift_prometheus_alertmanager_tag: v3.11
      openshift_prometheus_config_reload_image: ose-prometheus-config-reloader
      openshift_prometheus_config_reload_namespace: registry.access.redhat.com/openshift3
      openshift_prometheus_config_reload_tag: v3.11
      openshift_prometheus_node_exporter_tag: v3.11
      openshift_prometheus_operator_image: ose-prometheus-operator
      openshift_prometheus_operator_namespace: registry.access.redhat.com/openshift3
      openshift_prometheus_operator_tag: v3.11
      openshift_prometheus_tag: v3.11
      openshift_tag: v3.11
      tag: latest
    tag_from_label: '{version}-{release}'
[stack@undercloud ~]$ 

[stack@undercloud ~]$

Comment 16 Martin André 2019-02-08 13:59:10 UTC

The issue you're seeing seems to have the same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1659183, container images have non consistent tags, where the release differs.
Could try retagging the images locally see if this goes through with the deployment?

Comment 17 wasantha gamage 2019-02-08 16:25:40 UTC

Thank you that worked, i had to retag cluster-monitoring-operator and grafana.

Note You need to log in before you can comment on or make changes to this bug.