Bug 1382316

Summary: [ceph-ansible] There is no default value set for the "$cluster" variable in rolling_update.yml
Product: Red Hat Storage Console Reporter: Tejas <tchandra>
Component: ceph-ansibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Tejas <tchandra>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2CC: adeza, aschoen, ceph-eng-bugs, flucifre, gmeno, hnallurv, kdreyer, nthomas, sankarshan, seb
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-1.0.5-35.el7scon Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-22 23:41:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Tejas 2016-10-06 10:51:58 UTC
Description of problem:
   
While running the rolling_update.yml it fails with the following error:
TASK: [set osd flags] ********************************************************* 
fatal: [magna046 -> magna006] => One or more undefined variables: 'cluster' is undefined
fatal: [magna052 -> magna006] => One or more undefined variables: 'cluster' is undefined
fatal: [magna058 -> magna006] => One or more undefined variables: 'cluster' is undefined

In the "set osd flags" task, it looks for the cluster variable.
The cluster was commented in group_vars/all:

#cluster: ceph # cluster name

When I uncommented the cluster name it ran correctly.
Could we set the  cluster name default to "ceph" in the rolling_update.yml also, since this is the default cluster name.


Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-34.el7scon.noarch

How reproducible:
Always

Steps to Reproduce:
1. The cluster variable in commented in group_vars/all
2. Run the rolling_update.yml, it fails
3. Uncomment the "cluster: ceph # cluster name" cluster variable, and it runs successfully.

The lines in rolling_update.yml I am referring to is:

hosts: osds
  serial: 3
  become: True
  vars:
    upgrade_ceph_packages: True
    osd_group_name: osds

  pre_tasks:
    - name: set osd flags
      command: ceph osd set {{ item }} --cluster {{ cluster }}   <---------
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups.mons[0] }}"

Comment 3 seb 2016-10-06 12:09:31 UTC
Do you mind testing this "quickly"?
I'd like to have this in the last sync that should happen today...

https://github.com/ceph/ceph-ansible/pull/1012

Thanks!

Comment 4 Tejas 2016-10-06 13:05:12 UTC
Hi Seb,
  Would you let me know from which branch I need to test this?
Is it merged to master?

Thanks,
Tejas

Comment 5 seb 2016-10-06 13:10:38 UTC
Hi,

Just test this branch https://github.com/ceph/ceph-ansible/tree/cluster-name-rolling (from the PR). Thanks

Comment 6 seb 2016-10-06 13:10:59 UTC
I'll merge or modify the PR based on your inputs

Comment 7 Tejas 2016-10-06 13:44:21 UTC
hi,

   I saw a failure while waiting for a clean PG:
But I dont think this is related to the change in any way. The default cluster name change looks good.

FAILED - RETRYING: TASK: waiting for clean pgs... (1 retries left).
fatal: [magna046 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.572476", "end": "2016-10-06 13:42:03.479861", "failed": true, "rc": 1, "start": "2016-10-06 13:42:02.907385", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [magna052 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.471127", "end": "2016-10-06 13:42:03.765836", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.294709", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [magna058 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.554347", "end": "2016-10-06 13:42:03.838683", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.284336", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}

NO MORE HOSTS LEFT *************************************************************
 [WARNING]: Could not create retry file 'rolling_update.retry'.         [Errno 2] No such file or directory: ''


PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   
magna006                   : ok=82   changed=3    unreachable=0    failed=0   
magna009                   : ok=79   changed=3    unreachable=0    failed=0   
magna031                   : ok=79   changed=3    unreachable=0    failed=0   
magna046                   : ok=94   changed=7    unreachable=0    failed=1   
magna052                   : ok=92   changed=7    unreachable=0    failed=1   
magna058                   : ok=96   changed=8    unreachable=0    failed=1 

Thanks,
Tejas

Comment 8 seb 2016-10-06 13:55:23 UTC
Ok the problem is that you should increase the timeout of this task by using:

health_osd_check_retries and health_osd_check_delay

There is another BZ for that already, so if my patch upstream is good, let's close this. The other BZ was fixed by adding the options above.

Comment 10 seb 2016-10-06 21:24:56 UTC
Correct this is fixed in 1.0.8

Comment 11 Federico Lucifredi 2016-10-07 17:08:01 UTC
This will ship concurrently with RHCS 2.1.

Comment 12 Harish NV Rao 2016-10-07 17:13:19 UTC
this will be tested as part of rolling_update tests.

Comment 15 Tejas 2016-10-28 06:53:45 UTC
Verified in build:
ceph-ansible-1.0.5-37.el7scon

Comment 17 errata-xmlrpc 2016-11-22 23:41:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2817