Bug 1615504 - Installer fails on task "Wait for the ServiceMonitor CRD to be created"
Summary: Installer fails on task "Wait for the ServiceMonitor CRD to be created"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.11.0
Assignee: Frederic Branczyk
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks: 1655693
TreeView+ depends on / blocked
 
Reported: 2018-08-13 18:08 UTC by Matt Bruzek
Modified: 2019-03-12 21:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1655693 (view as bug list)
Environment:
Last Closed: 2018-10-11 07:24:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:25:07 UTC

Description Matt Bruzek 2018-08-13 18:08:30 UTC
Description of problem:

While running the openshift-ansible install command: 
source /home/cloud-user/keystonerc; ansible-playbook -vvv --user openshift -i inventory -i openshift-ansible/playbooks/openstack/inventory.py openshift-ansible/playbooks/openstack/openshift-cluster/install.yml 2>&1 >> /home/cloud-user/logs/openshift_install.log

The installer fails on:

TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] ***
task path: /home/cloud-user/openshift-ansible/roles/openshift_cluster_monitoring_operator/tasks/install.yaml:115

The cluster-monitoring-operator-cf688f46c-94cnf pod is in an ImagePullBackOff state. According to the log the Deployment object specified the "image": "registry.redhat.io/openshift3/ose-cluster-monitoring-operator:v3.11.0"

I could not pull that image

# docker pull registry.redhat.io/openshift3/ose-cluster-monitoring-operator:v3.11.0
Trying to pull repository registry.redhat.io/openshift3/ose-cluster-monitoring-operator ... 
Get https://registry.redhat.io/v2/openshift3/ose-cluster-monitoring-operator/manifests/v3.11.0: unauthorized: invalid credentials provided when attempting to perform docker authentication

The install is configured to use oreg_url: registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}

I found the image at that url:
# docker pull registry.reg-aws.openshift.com:443/openshift3/ose-cluster-monitoring-operator:v3.11.0

I believe the deployment template for the cluster-monitoring-operator should use the oreg_url for the image location.


Version-Release number of the following components:
ansible 2.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 16 2018, 19:52:45) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
git describe: openshift-ansible-3.9.0-0.10.0-3147-g7ad2385

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] ***
task path: /home/cloud-user/openshift-ansible/roles/openshift_cluster_monitoring_operator/tasks/install.yaml:115
... 30 retries ...
<192.168.0.18> (1, '\n{"changed": true, "end": "2018-08-13 11:33:52.869678", "stdout": "", "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-RtKcWm/admin.kubeconfig"], "failed": true, "delta": "0:00:00.159734", "stderr": "No resources found.\\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \\"servicemonitors.monitoring.coreos.com\\" not found", "rc": 1, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "_raw_params": "oc get crd servicemonitors.monitoring.coreos.com -n openshift-monitoring --config=/tmp/openshift-cluster-monitoring-ansible-RtKcWm/admin.kubeconfig", "removes": null, "argv": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-08-13 11:33:52.709944", "msg": "non-zero return code"}\n', '')
fatal: [master-0.scale-ci.example.com]: FAILED! => {
    "attempts": 30, 
    "changed": true, 
    "cmd": [
        "oc", 
        "get", 
        "crd", 
        "servicemonitors.monitoring.coreos.com", 
        "-n", 
        "openshift-monitoring", 
        "--config=/tmp/openshift-cluster-monitoring-ansible-RtKcWm/admin.kubeconfig"
    ], 
    "delta": "0:00:00.159734", 
    "end": "2018-08-13 11:33:52.869678", 
    "invocation": {
        "module_args": {
            "_raw_params": "oc get crd servicemonitors.monitoring.coreos.com -n openshift-monitoring --config=/tmp/openshift-cluster-monitoring-ansible-RtKcWm/admin.kubeconfig", 
            "_uses_shell": false, 
            "argv": null, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-08-13 11:33:52.709944", 
    "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", 
    "stderr_lines": [
        "No resources found.", 
        "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"
    ], 
    "stdout": "", 
    "stdout_lines": []
}

...

Failure summary:


  1. Hosts:    master-0.scale-ci.example.com
     Play:     Configure Cluster Monitoring Operator
     Task:     Wait for the ServiceMonitor CRD to be created
     Message:  non-zero return code

Expected results: I expect the cluster-monitoring-operator image to use the oreg_url

Comment 2 Scott Dodson 2018-08-13 20:44:48 UTC
Yeah, we need to account for evaluating oreg_url in order to support disconnected installs. Here's a relatively decent pattern to follow,

https://github.com/openshift/openshift-ansible/blob/master/roles/ansible_service_broker/defaults/main.yml#L26-L33

However, I've just noticed a bug, in that the default dictionary should contain ${component}

l_asb_default_images_dict:
  origin: 'docker.io/ansibleplaybookbundle/origin-${component}:latest'
  openshift-enterprise: 'registry.redhat.io/openshift3/ose-${component}:${version}'

l_asb_default_images_default: "{{ l_asb_default_images_dict[openshift_deployment_type] }}"
l_asb_image_url: "{{ oreg_url | default(l_asb_default_images_default) | regex_replace('${version}' | regex_escape, openshift_image_tag) }}"

ansible_service_broker_image: "{{ l_asb_image_url | regex_replace('${component}' | regex_escape, 'ansible-service-broker') }}"

Comment 3 Scott Dodson 2018-08-14 13:38:22 UTC
https://github.com/openshift/openshift-ansible/pull/9477 should address this.

Comment 4 Scott Dodson 2018-08-14 21:24:31 UTC
Should be in openshift-ansible-3.11.0-0.15.0

Comment 7 errata-xmlrpc 2018-10-11 07:24:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.