Bug 1415447 - [IntService_public_295] Ansible metrics failed at openshift_metrics : Stop Heapster
Summary: [IntService_public_295] Ansible metrics failed at openshift_metrics : Stop He...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.5.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Peng Li
URL:
Whiteboard:
Depends On: 1418911
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-22 05:56 UTC by Peng Li
Modified: 2017-07-24 14:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-04-12 18:49:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0903 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-04-12 22:45:42 UTC

Description Peng Li 2017-01-22 05:56:21 UTC
Description of problem:
Try to deploy Metrics 3.5 using Ansible, the job failed at
TASK [openshift_metrics : Stop Heapster] ***************************************
task path: /home/penli/work/src/github.com/penli1/tmp/openshift-ansible/roles/openshift_metrics/tasks/stop_metrics.yaml:13
fatal: [MASTER]: FAILED! => {
    "failed": true, 
    "msg": "'dict object' has no attribute 'stdout_lines'"
}

This might caused by:
1. the output of {{metrics_heapster_rc.stdout_lines}} have multiple vars when task running.
2. the pods are not running yet 

Version-Release number of selected component (if applicable):
openshift v3.5.0.7+390ef18
Metrics 3.5.0

How reproducible:
always

Steps to Reproduce:
1. prepare the inventory file

[oo_first_master]
$MASTER ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/penli/.ssh/libra.pem" openshift_public_hostname=$MASTER

[oo_first_master:vars]
deployment_type=openshift-enterprise
openshift_release=v3.5.0
openshift_metrics_install_metrics=true

openshift_metrics_hawkular_hostname=hawkular-metrics.$SUBDOMAIN
openshift_metrics_project=openshift-infra

openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0

2. git clone https://github.com/openshift/openshift-ansible.git (In this test, I'm using dev's branch  bz_1414477_missing_import_jks_deuce)
)

3. ansible-playbook -vvv -i ~/inventory   playbooks/common/openshift-cluster/openshift_metrics.yml

4. task fail and abort.

5. login the master machine and check the pod status
# oc get pod -n openshift-infra
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-br736   0/1       Running   0          1m
hawkular-metrics-hp7pj       0/1       Running   0          1m
heapster-g3rqr               0/1       Running   0          2m

6. wait for several minutes, pods are healthy and running.
# oc get pod -n openshift-infra
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-br736   1/1       Running   0          16m
hawkular-metrics-hp7pj       1/1       Running   0          17m
heapster-g3rqr               1/1       Running   0          17m

Expected results:
install successfully.

Additional info:
Ansible execution log and Events attached.

Comment 3 Jeff Cantrill 2017-01-23 18:17:56 UTC
@pengli I am unable to determine how you actually got ino this state because the uninstall task should only be executed when 'openshift_metrics_install_metrics' equals 'False'. Made a slight change to conditionally include the start/stop task based on the var evaluation in PR https://github.com/openshift/openshift-ansible/pull/3150

Comment 4 Peng Li 2017-01-24 13:24:19 UTC
(In reply to Jeff Cantrill from comment #3)
> @pengli I am unable to determine how you actually got ino this state because
> the uninstall task should only be executed when
> 'openshift_metrics_install_metrics' equals 'False'. Made a slight change to
> conditionally include the start/stop task based on the var evaluation in PR
> https://github.com/openshift/openshift-ansible/pull/3150

thanks for the update, it do happen every time in my fresh install test, I'll veriry&close it once it's merged to master branch.

Comment 5 Troy Dawson 2017-01-31 20:23:14 UTC
This has been merged into ocp and is in OCP v3.5.0.12 or newer.

Comment 6 Peng Li 2017-02-06 06:27:22 UTC
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1418910#c3, this is also blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1418911

Comment 7 Peng Li 2017-02-07 06:33:40 UTC
verified with master branch, issue is not reproduced.

Comment 9 errata-xmlrpc 2017-04-12 18:49:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903


Note You need to log in before you can comment on or make changes to this bug.