Bug 1383901 - re-run config.yaml failed when metrics_deploy are enabled
Summary: re-run config.yaml failed when metrics_deploy are enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Devan Goodwin
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-12 06:11 UTC by Anping Li
Modified: 2017-03-08 18:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously the installer would re-run the metrics deloyment steps if the configuration playbook was re-run. The playbooks have been updated to only run the metrics dpeloyment tasks once. If a previous installation of metrics has failed the admin must manually resolve the issue or remove the metrics deployment and re-run the config playbook. See the following documentation for cleanup instructions https://docs.openshift.com/container-platform/3.3/install_config/cluster_metrics.html#metrics-cleanup
Clone Of:
Environment:
Last Closed: 2017-01-18 12:42:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Anping Li 2016-10-12 06:11:39 UTC
Description of problem:
When metrics_deploy is enabled in inventory file,  Re-run config.yaml failed at TASK [openshift_metrics : Wait for image pull and deployer pod].

The root cause is metrics-deployer pod failed for resource already exists. 
We should set mode=redeploy when redeploy metrics.  


Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.2.28-1

How reproducible:
always

Steps to Reproduce:
1. enable openshift-metrics in inventory and install openshift

   openshift_hosted_metrics_deploy=True
   openshift_hosted_metrics_write_access=True
   
   ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
   
2. re-run config.yml
   ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
3. check the deployer pod status 
   oc logs metrics-deployer-kdv02 -n openshift-infra

Actual results:
2.  ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
TASK [openshift_metrics : Wait for image pull and deployer pod] ****************
FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (60 retries left).
<--snip-->
<--snip-->
FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (3 retries left).
FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (2 retries left).
FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (1 retries left).
fatal: [openshift-111.lab.eng.nay.redhat.com]: FAILED! => {"changed": true, "cmd": "oc get pods -n openshift-infra | grep metrics-deployer.*Completed", "delta": "0:00:00.402017", "end": "2016-10-11 22:56:58.051938", "failed": true, "rc": 1, "start": "2016-10-11 22:56:57.649921", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}

NO MORE HOSTS LEFT *************************************************************
    to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************
localhost                  : ok=13   changed=7    unreachable=0    failed=0   
openshift-111.lab.eng.nay.redhat.com : ok=436  changed=28   unreachable=0    failed=1   
openshift-112.lab.eng.nay.redhat.com : ok=129  changed=6    unreachable=0    failed=0   

3. [root@openshift-111 ~]# oc logs metrics-deployer-kdv02 -n openshift-infra
+ image_prefix=registry.access.redhat.com/openshift3/
+ image_version=3.2.1
+ master_url=https://kubernetes.default.svc:443
+ redeploy=false
+ mode=deploy
+ cassandra_nodes=1
+ use_persistent_storage=false
+ cassandra_pv_size=10Gi
+ metric_duration=7
+ heapster_node_id=nodename
+ metric_resolution=10s
+ project=openshift-infra
+ master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
+ dir=/etc/deploy/_output
+ hawkular_metrics_hostname=hawkular-metrics.1008-lzo.qe.rhcloud.com
+ hawkular_metrics_alias=hawkular-metrics
+ hawkular_cassandra_alias=hawkular-cassandra
+ rm -rf /etc/deploy/_output
<--snip-->
<--snip-->
<--snip-->
Creating the Cassandra Certificate Secrets configuration json file
++ echo
++ echo 'Creating the Cassandra Certificate Secrets configuration json file'
++ cat
+++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.cert
+++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra-ca.cert
Creating Hawkular Metrics & Cassandra Secrets
++ echo 'Creating Hawkular Metrics & Cassandra Secrets'
++ oc create -f /etc/deploy/_output/hawkular-metrics-secrets.json
Error from server: error when creating "/etc/deploy/_output/hawkular-metrics-secrets.json": secrets "hawkular-metrics-secrets" already exists


Expected results:
config.yaml can be re-run without error.


Additional info:

Comment 1 Devan Goodwin 2016-11-01 17:59:47 UTC
This does not seem to fail now, the deployer error is present:

[root@ip-172-18-9-38 ec2-user]# oc get pods
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-jm0xg   1/1       Running   0          2h
hawkular-metrics-e1wm9       1/1       Running   0          2h
heapster-nfkv6               1/1       Running   0          2h
metrics-deployer-sy409       0/1       Error     0          1h

Creating the Cassandra Certificate Secrets configuration json file
++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.cert
++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra-ca.cert
Creating Hawkular Metrics & Cassandra Secrets
+ echo 'Creating Hawkular Metrics & Cassandra Secrets'
+ oc create -f /etc/deploy/_output/hawkular-metrics-secrets.json
Error from server: error when creating "/etc/deploy/_output/hawkular-metrics-secrets.json": secrets "hawkular-metrics-secrets" already exists


However in ansible:

FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (2 retries left).
FAILED - RETRYING: TASK: openshift_metrics : Wait for image pull and deployer pod (1 retries left).
changed: [ec2-54-242-151-226.compute-1.amazonaws.com]


It appears to be handled gracefully. Scott do you think anything needs to be done here?

Comment 2 Scott Dodson 2016-11-02 13:31:21 UTC
Anping,

The metrics work that exists in the 3.2 installer was community contribution and doesn't include all of the fixes that we made during the 3.3 development cycle. So metrics deployment is only supported under 3.3 installer and newer. If we can't reproduce there, and Devan's testing so far shows we can't, then we should close this bug.

Comment 3 Anping Li 2016-11-08 10:01:06 UTC
No such issue with 3.4, so move t verified.

Comment 5 errata-xmlrpc 2017-01-18 12:42:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.