1382316 – [ceph-ansible] There is no default value set for the "$cluster" variable in rolling_update.yml

Bug 1382316 - [ceph-ansible] There is no default value set for the "$cluster" variable in rolling_update.yml

Summary: [ceph-ansible] There is no default value set for the "$cluster" variable in r...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	ceph-ansible
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	2
Assignee:	Sébastien Han
QA Contact:	Tejas
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-06 10:51 UTC by Tejas
Modified:	2016-11-22 23:41 UTC (History)
CC List:	10 users (show)
Fixed In Version:	ceph-ansible-1.0.5-35.el7scon
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-22 23:41:12 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2817	0	normal	SHIPPED_LIVE	ceph-iscsi-ansible and ceph-ansible bug fix update	2017-04-18 19:50:43 UTC

Description Tejas 2016-10-06 10:51:58 UTC

Description of problem:
   
While running the rolling_update.yml it fails with the following error:
TASK: [set osd flags] ********************************************************* 
fatal: [magna046 -> magna006] => One or more undefined variables: 'cluster' is undefined
fatal: [magna052 -> magna006] => One or more undefined variables: 'cluster' is undefined
fatal: [magna058 -> magna006] => One or more undefined variables: 'cluster' is undefined

In the "set osd flags" task, it looks for the cluster variable.
The cluster was commented in group_vars/all:

#cluster: ceph # cluster name

When I uncommented the cluster name it ran correctly.
Could we set the  cluster name default to "ceph" in the rolling_update.yml also, since this is the default cluster name.


Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-34.el7scon.noarch

How reproducible:
Always

Steps to Reproduce:
1. The cluster variable in commented in group_vars/all
2. Run the rolling_update.yml, it fails
3. Uncomment the "cluster: ceph # cluster name" cluster variable, and it runs successfully.

The lines in rolling_update.yml I am referring to is:

hosts: osds
  serial: 3
  become: True
  vars:
    upgrade_ceph_packages: True
    osd_group_name: osds

  pre_tasks:
    - name: set osd flags
      command: ceph osd set {{ item }} --cluster {{ cluster }}   <---------
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups.mons[0] }}"

Comment 3 seb 2016-10-06 12:09:31 UTC

Do you mind testing this "quickly"?
I'd like to have this in the last sync that should happen today...

https://github.com/ceph/ceph-ansible/pull/1012

Thanks!

Comment 4 Tejas 2016-10-06 13:05:12 UTC

Hi Seb,
  Would you let me know from which branch I need to test this?
Is it merged to master?

Thanks,
Tejas

Comment 5 seb 2016-10-06 13:10:38 UTC

Hi,

Just test this branch https://github.com/ceph/ceph-ansible/tree/cluster-name-rolling (from the PR). Thanks

Comment 6 seb 2016-10-06 13:10:59 UTC

I'll merge or modify the PR based on your inputs

Comment 7 Tejas 2016-10-06 13:44:21 UTC

hi,

   I saw a failure while waiting for a clean PG:
But I dont think this is related to the change in any way. The default cluster name change looks good.

FAILED - RETRYING: TASK: waiting for clean pgs... (1 retries left).
fatal: [magna046 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.572476", "end": "2016-10-06 13:42:03.479861", "failed": true, "rc": 1, "start": "2016-10-06 13:42:02.907385", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [magna052 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.471127", "end": "2016-10-06 13:42:03.765836", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.294709", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [magna058 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph  | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.554347", "end": "2016-10-06 13:42:03.838683", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.284336", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}

NO MORE HOSTS LEFT *************************************************************
 [WARNING]: Could not create retry file 'rolling_update.retry'.         [Errno 2] No such file or directory: ''


PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   
magna006                   : ok=82   changed=3    unreachable=0    failed=0   
magna009                   : ok=79   changed=3    unreachable=0    failed=0   
magna031                   : ok=79   changed=3    unreachable=0    failed=0   
magna046                   : ok=94   changed=7    unreachable=0    failed=1   
magna052                   : ok=92   changed=7    unreachable=0    failed=1   
magna058                   : ok=96   changed=8    unreachable=0    failed=1 

Thanks,
Tejas

Comment 8 seb 2016-10-06 13:55:23 UTC

Ok the problem is that you should increase the timeout of this task by using:

health_osd_check_retries and health_osd_check_delay

There is another BZ for that already, so if my patch upstream is good, let's close this. The other BZ was fixed by adding the options above.

Comment 10 seb 2016-10-06 21:24:56 UTC

Correct this is fixed in 1.0.8

Comment 11 Federico Lucifredi 2016-10-07 17:08:01 UTC

This will ship concurrently with RHCS 2.1.

Comment 12 Harish NV Rao 2016-10-07 17:13:19 UTC

this will be tested as part of rolling_update tests.

Comment 15 Tejas 2016-10-28 06:53:45 UTC

Verified in build:
ceph-ansible-1.0.5-37.el7scon

Comment 17 errata-xmlrpc 2016-11-22 23:41:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2817

Note You need to log in before you can comment on or make changes to this bug.