Bug 1382316 - [ceph-ansible] There is no default value set for the "$cluster" variable in rolling_update.yml
Summary: [ceph-ansible] There is no default value set for the "$cluster" variable in r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: ceph-ansible
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2
Assignee: Sébastien Han
QA Contact: Tejas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-06 10:51 UTC by Tejas
Modified: 2016-11-22 23:41 UTC (History)
10 users (show)

Fixed In Version: ceph-ansible-1.0.5-35.el7scon
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 23:41:12 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2817 0 normal SHIPPED_LIVE ceph-iscsi-ansible and ceph-ansible bug fix update 2017-04-18 19:50:43 UTC

Description Tejas 2016-10-06 10:51:58 UTC
Description of problem:
   
While running the rolling_update.yml it fails with the following error:
TASK: [set osd flags] ********************************************************* 
fatal: [magna046 -> magna006] => One or more undefined variables: 'cluster' is undefined
fatal: [magna052 -> magna006] => One or more undefined variables: 'cluster' is undefined
fatal: [magna058 -> magna006] => One or more undefined variables: 'cluster' is undefined

In the "set osd flags" task, it looks for the cluster variable.
The cluster was commented in group_vars/all:

#cluster: ceph # cluster name

When I uncommented the cluster name it ran correctly.
Could we set the  cluster name default to "ceph" in the rolling_update.yml also, since this is the default cluster name.


Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-34.el7scon.noarch

How reproducible:
Always

Steps to Reproduce:
1. The cluster variable in commented in group_vars/all
2. Run the rolling_update.yml, it fails
3. Uncomment the "cluster: ceph # cluster name" cluster variable, and it runs successfully.

The lines in rolling_update.yml I am referring to is:

hosts: osds
  serial: 3
  become: True
  vars:
    upgrade_ceph_packages: True
    osd_group_name: osds

  pre_tasks:
    - name: set osd flags
      command: ceph osd set {{ item }} --cluster {{ cluster }}   <---------
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups.mons[0] }}"

Comment 3 seb 2016-10-06 12:09:31 UTC
Do you mind testing this "quickly"?
I'd like to have this in the last sync that should happen today...

https://github.com/ceph/ceph-ansible/pull/1012

Thanks!

Comment 4 Tejas 2016-10-06 13:05:12 UTC
Hi Seb,
  Would you let me know from which branch I need to test this?
Is it merged to master?

Thanks,
Tejas

Comment 5 seb 2016-10-06 13:10:38 UTC
Hi,

Just test this branch https://github.com/ceph/ceph-ansible/tree/cluster-name-rolling (from the PR). Thanks

Comment 6 seb 2016-10-06 13:10:59 UTC
I'll merge or modify the PR based on your inputs

Comment 7 Tejas 2016-10-06 13:44:21 UTC
hi,

   I saw a failure while waiting for a clean PG:
But I dont think this is related to the change in any way. The default cluster name change looks good.

FAILED - RETRYING: TASK: waiting for clean pgs... (1 retries left).
fatal: [magna046 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.572476", "end": "2016-10-06 13:42:03.479861", "failed": true, "rc": 1, "start": "2016-10-06 13:42:02.907385", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [magna052 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.471127", "end": "2016-10-06 13:42:03.765836", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.294709", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [magna058 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.554347", "end": "2016-10-06 13:42:03.838683", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.284336", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}

NO MORE HOSTS LEFT *************************************************************
 [WARNING]: Could not create retry file 'rolling_update.retry'.         [Errno 2] No such file or directory: ''


PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   
magna006                   : ok=82   changed=3    unreachable=0    failed=0   
magna009                   : ok=79   changed=3    unreachable=0    failed=0   
magna031                   : ok=79   changed=3    unreachable=0    failed=0   
magna046                   : ok=94   changed=7    unreachable=0    failed=1   
magna052                   : ok=92   changed=7    unreachable=0    failed=1   
magna058                   : ok=96   changed=8    unreachable=0    failed=1 

Thanks,
Tejas

Comment 8 seb 2016-10-06 13:55:23 UTC
Ok the problem is that you should increase the timeout of this task by using:

health_osd_check_retries and health_osd_check_delay

There is another BZ for that already, so if my patch upstream is good, let's close this. The other BZ was fixed by adding the options above.

Comment 10 seb 2016-10-06 21:24:56 UTC
Correct this is fixed in 1.0.8

Comment 11 Federico Lucifredi 2016-10-07 17:08:01 UTC
This will ship concurrently with RHCS 2.1.

Comment 12 Harish NV Rao 2016-10-07 17:13:19 UTC
this will be tested as part of rolling_update tests.

Comment 15 Tejas 2016-10-28 06:53:45 UTC
Verified in build:
ceph-ansible-1.0.5-37.el7scon

Comment 17 errata-xmlrpc 2016-11-22 23:41:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2817


Note You need to log in before you can comment on or make changes to this bug.