Description of problem: When metrics_deploy is enabled in inventory file, Re-run config.yaml failed at TASK [openshift_metrics : Wait for image pull and deployer pod]. The root cause is metrics-deployer pod failed for resource already exists. We should set mode=redeploy when redeploy metrics. Version-Release number of selected component (if applicable): atomic-openshift-utils-3.2.28-1 How reproducible: always Steps to Reproduce: 1. enable openshift-metrics in inventory and install openshift openshift_hosted_metrics_deploy=True openshift_hosted_metrics_write_access=True ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml 2. re-run config.yml ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml 3. check the deployer pod status oc logs metrics-deployer-kdv02 -n openshift-infra Actual results: 2. ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml TASK [openshift_metrics : Wait for image pull and deployer pod] **************** FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (60 retries left). <--snip--> <--snip--> FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (3 retries left). FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (2 retries left). FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (1 retries left). fatal: [openshift-111.lab.eng.nay.redhat.com]: FAILED! => {"changed": true, "cmd": "oc get pods -n openshift-infra | grep metrics-deployer.*Completed", "delta": "0:00:00.402017", "end": "2016-10-11 22:56:58.051938", "failed": true, "rc": 1, "start": "2016-10-11 22:56:57.649921", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []} NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry PLAY RECAP ********************************************************************* localhost : ok=13 changed=7 unreachable=0 failed=0 openshift-111.lab.eng.nay.redhat.com : ok=436 changed=28 unreachable=0 failed=1 openshift-112.lab.eng.nay.redhat.com : ok=129 changed=6 unreachable=0 failed=0 3. [root@openshift-111 ~]# oc logs metrics-deployer-kdv02 -n openshift-infra + image_prefix=registry.access.redhat.com/openshift3/ + image_version=3.2.1 + master_url=https://kubernetes.default.svc:443 + redeploy=false + mode=deploy + cassandra_nodes=1 + use_persistent_storage=false + cassandra_pv_size=10Gi + metric_duration=7 + heapster_node_id=nodename + metric_resolution=10s + project=openshift-infra + master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt + token_file=/var/run/secrets/kubernetes.io/serviceaccount/token + dir=/etc/deploy/_output + hawkular_metrics_hostname=hawkular-metrics.1008-lzo.qe.rhcloud.com + hawkular_metrics_alias=hawkular-metrics + hawkular_cassandra_alias=hawkular-cassandra + rm -rf /etc/deploy/_output <--snip--> <--snip--> <--snip--> Creating the Cassandra Certificate Secrets configuration json file ++ echo ++ echo 'Creating the Cassandra Certificate Secrets configuration json file' ++ cat +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.cert +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra-ca.cert Creating Hawkular Metrics & Cassandra Secrets ++ echo 'Creating Hawkular Metrics & Cassandra Secrets' ++ oc create -f /etc/deploy/_output/hawkular-metrics-secrets.json Error from server: error when creating "/etc/deploy/_output/hawkular-metrics-secrets.json": secrets "hawkular-metrics-secrets" already exists Expected results: config.yaml can be re-run without error. Additional info:
This does not seem to fail now, the deployer error is present: [root@ip-172-18-9-38 ec2-user]# oc get pods NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-jm0xg 1/1 Running 0 2h hawkular-metrics-e1wm9 1/1 Running 0 2h heapster-nfkv6 1/1 Running 0 2h metrics-deployer-sy409 0/1 Error 0 1h Creating the Cassandra Certificate Secrets configuration json file ++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.cert ++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra-ca.cert Creating Hawkular Metrics & Cassandra Secrets + echo 'Creating Hawkular Metrics & Cassandra Secrets' + oc create -f /etc/deploy/_output/hawkular-metrics-secrets.json Error from server: error when creating "/etc/deploy/_output/hawkular-metrics-secrets.json": secrets "hawkular-metrics-secrets" already exists However in ansible: FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (2 retries left). FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (1 retries left). changed: [ec2-54-242-151-226.compute-1.amazonaws.com] It appears to be handled gracefully. Scott do you think anything needs to be done here?
Anping, The metrics work that exists in the 3.2 installer was community contribution and doesn't include all of the fixes that we made during the 3.3 development cycle. So metrics deployment is only supported under 3.3 installer and newer. If we can't reproduce there, and Devan's testing so far shows we can't, then we should close this bug.
No such issue with 3.4, so move t verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066