Description of problem: While running the rolling_update.yml it fails with the following error: TASK: [set osd flags] ********************************************************* fatal: [magna046 -> magna006] => One or more undefined variables: 'cluster' is undefined fatal: [magna052 -> magna006] => One or more undefined variables: 'cluster' is undefined fatal: [magna058 -> magna006] => One or more undefined variables: 'cluster' is undefined In the "set osd flags" task, it looks for the cluster variable. The cluster was commented in group_vars/all: #cluster: ceph # cluster name When I uncommented the cluster name it ran correctly. Could we set the cluster name default to "ceph" in the rolling_update.yml also, since this is the default cluster name. Version-Release number of selected component (if applicable): ceph-ansible-1.0.5-34.el7scon.noarch How reproducible: Always Steps to Reproduce: 1. The cluster variable in commented in group_vars/all 2. Run the rolling_update.yml, it fails 3. Uncomment the "cluster: ceph # cluster name" cluster variable, and it runs successfully. The lines in rolling_update.yml I am referring to is: hosts: osds serial: 3 become: True vars: upgrade_ceph_packages: True osd_group_name: osds pre_tasks: - name: set osd flags command: ceph osd set {{ item }} --cluster {{ cluster }} <--------- with_items: - noout - noscrub - nodeep-scrub delegate_to: "{{ groups.mons[0] }}"
Do you mind testing this "quickly"? I'd like to have this in the last sync that should happen today... https://github.com/ceph/ceph-ansible/pull/1012 Thanks!
Hi Seb, Would you let me know from which branch I need to test this? Is it merged to master? Thanks, Tejas
Hi, Just test this branch https://github.com/ceph/ceph-ansible/tree/cluster-name-rolling (from the PR). Thanks
I'll merge or modify the PR based on your inputs
hi, I saw a failure while waiting for a clean PG: But I dont think this is related to the change in any way. The default cluster name change looks good. FAILED - RETRYING: TASK: waiting for clean pgs... (1 retries left). fatal: [magna046 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.572476", "end": "2016-10-06 13:42:03.479861", "failed": true, "rc": 1, "start": "2016-10-06 13:42:02.907385", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []} fatal: [magna052 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.471127", "end": "2016-10-06 13:42:03.765836", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.294709", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []} fatal: [magna058 -> magna006]: FAILED! => {"changed": true, "cmd": "test \"$(ceph pg stat --cluster ceph | sed 's/^.*pgs://;s/active+clean.*//;s/ //')\" -eq \"$(ceph pg stat --cluster ceph | sed 's/pgs.*//;s/^.*://;s/ //')\" && ceph health --cluster ceph | egrep -sq \"HEALTH_OK|HEALTH_WARN\"", "delta": "0:00:00.554347", "end": "2016-10-06 13:42:03.838683", "failed": true, "rc": 1, "start": "2016-10-06 13:42:03.284336", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []} NO MORE HOSTS LEFT ************************************************************* [WARNING]: Could not create retry file 'rolling_update.retry'. [Errno 2] No such file or directory: '' PLAY RECAP ********************************************************************* localhost : ok=1 changed=0 unreachable=0 failed=0 magna006 : ok=82 changed=3 unreachable=0 failed=0 magna009 : ok=79 changed=3 unreachable=0 failed=0 magna031 : ok=79 changed=3 unreachable=0 failed=0 magna046 : ok=94 changed=7 unreachable=0 failed=1 magna052 : ok=92 changed=7 unreachable=0 failed=1 magna058 : ok=96 changed=8 unreachable=0 failed=1 Thanks, Tejas
Ok the problem is that you should increase the timeout of this task by using: health_osd_check_retries and health_osd_check_delay There is another BZ for that already, so if my patch upstream is good, let's close this. The other BZ was fixed by adding the options above.
Correct this is fixed in 1.0.8
This will ship concurrently with RHCS 2.1.
this will be tested as part of rolling_update tests.
Verified in build: ceph-ansible-1.0.5-37.el7scon
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:2817