Description of problem: This is related to https://bugzilla.redhat.com/show_bug.cgi?id=1618678. Raised this new BZ because this is not a blocker for 3.1 testing. Playbook fails to set up cluster when there are multiple rgw instances involved. "error: 'dict object' has no attribute 'rgw_hostname'" Version-Release number of selected component (if applicable): 3.1.0-0.1.rc21.el7cp How reproducible: Always Actual results: INFO:teuthology.orchestra.run.clara012.stdout:TASK [ceph-config : generate ceph configuration file: c1.conf] ***************** INFO:teuthology.orchestra.run.clara012.stdout:task path: /home/ubuntu/ceph-ansible/roles/ceph-config/tasks/main.yml:12 INFO:teuthology.orchestra.run.clara012.stdout:Friday 17 August 2018 06:36:51 +0000 (0:00:00.351) 0:05:00.432 ********* INFO:teuthology.orchestra.run.clara012.stdout:fatal: [clara012.ceph.redhat.com]: FAILED! => {} MSG: 'dict object' has no attribute 'rgw_hostname' INFO:teuthology.orchestra.run.clara012.stdout:PLAY RECAP ********************************************************************* INFO:teuthology.orchestra.run.clara012.stdout:clara012.ceph.redhat.com : ok=44 changed=12 unreachable=0 failed=1 INFO:teuthology.orchestra.run.clara012.stdout:pluto004.ceph.redhat.com : ok=1 changed=0 unreachable=0 failed=0 Expected results: cCuster configuration with single RGW node works with this version. Additional info: Config parameters: ceph_ansible: rhbuild: '3.1' vars: ceph_conf_overrides: global: mon_max_pg_per_osd: 1024 osd default pool size: 2 osd pool default pg num: 64 osd pool default pgp num: 64 ceph_origin: distro ceph_repository: rhcs ceph_stable: true ceph_stable_release: luminous ceph_stable_rh_storage: true ceph_test: true journal_size: 1024 osd_auto_discovery: true osd_scenario: collocated Logs @ http://magna002.ceph.redhat.com/smanjara-2018-08-23_05:37:58-rgw:multisite-ansible-luminous-distro-basic-multi/307495/teuthology.log
Thomas, no this is not fixed yet.
fixed in v3.1.4
This regression was caught by the downstream OSP automation tests. The required setup to trigger it is install or upgrade ceph on a cluster using FQDN option. any re-run of ceph-ansible (v3.1.3) on an existing cluster configured to "used_fqdn" is going to break the ceph.conf on rgw node The effect is that the ceph.conf on an RGW node will have missing fields It is not in stable-3.0. It was introduced in August The fix is already released as v3.1.4
do I understand correctly that Ceph-ansible 3.1.4 is a newer version than is included in 3.1? Best -F
(In reply to Gregory Meno from comment #9) > This regression was caught by the downstream OSP automation tests. > The required setup to trigger it is install or upgrade ceph on a cluster > using FQDN option. > > any re-run of ceph-ansible (v3.1.3) on an existing cluster configured to > "used_fqdn" is going to break the ceph.conf on rgw node Gregory are you referring to the options 'mon_use_fqdn' and 'mds_use_fqdn'? If yes, then they are not supported in 3.1 as per bz https://bugzilla.redhat.com/show_bug.cgi?id=1613155. We have hit the original issue with and without FQDN mentioned for more than one RGWs in the inventory file.
Just to clarify a bit: As Harish mentionned in c13, we started to not support anymore these options. Therefore, we had to keep backward compatibility with existing cluster. The commit which was supposed to provide this backward compatibility missed something and brought this current bug which has been finally fixed in v3.1.4.
We tried the following scenarios with ceph-ansible version ceph-ansible 3.1.3-2redhat1 on Ubuntu. With inventory file(full names): --------------------- [mons] magna006.ceph.redhat.com [mgrs] magna006.ceph.redhat.com [osds] magna064.ceph.redhat.com devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" dmcrypt="true" magna111.ceph.redhat.com dedicated_devices="['/dev/sdb', '/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="bluestore" dmcrypt="true" magna117.ceph.redhat.com devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" [rgws] magna053.ceph.redhat.com magna061.ceph.redhat.com --------------------- Cluster was up,playbook did not fail, but RGWs were not installed. With inventory file(short names): -------------------- [mons] magna006 [mgrs] magna006[osds] magna064 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" dmcrypt="true" magna111 dedicated_devices="['/dev/sdb', '/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="bluestore" dmcrypt="true" magna117 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" [rgws] magna053 magna061 -------------------- Cluster was up ,RGWs were installed. Ansible logs are kept in magna002:/home/sshreeka/ansible_logs
(In reply to Harish NV Rao from comment #13) > (In reply to Gregory Meno from comment #9) > > This regression was caught by the downstream OSP automation tests. > > The required setup to trigger it is install or upgrade ceph on a cluster > > using FQDN option. > > > > any re-run of ceph-ansible (v3.1.3) on an existing cluster configured to > > "used_fqdn" is going to break the ceph.conf on rgw node > > Gregory are you referring to the options 'mon_use_fqdn' and 'mds_use_fqdn'? > If yes, then they are not supported in 3.1 as per bz > https://bugzilla.redhat.com/show_bug.cgi?id=1613155. > > We have hit the original issue with and without FQDN mentioned for more than > one RGWs in the inventory file. ^^ above is w.r.t RHEL 7.5
ok, This latest build(3.1.5) should address c18. Here's the plan we discussed this morning: We'd like Giulio to run it through the OSP automation that caught this as a blocker. AND ceph QE will chech that they cannot reproduce the error. THEN We'll produce an RC and move forward with 3.1 cheers, G
3.1.5 passed; thanks a lot for caring and for the special effort!
Ceph QE's tests for this fix have passed.
Based on comment 28 and 29, moving this BZ to verified state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819
This happens in OSP13 as well.
(In reply to Donny Davis from comment #33) > This happens in OSP13 as well. Donny, are you seeing the issue with ceph-ansible-3.1.5-1.el7cp or an older version?
I have the latest available package for OSP13 installed. ceph-ansible-3.1.3-1.el7cp.noarch I patched the offending file with this https://github.com/ceph/ceph-ansible/blob/4ce11a84938bb5377f422f01dbf3477bd0f607a9/roles/ceph-config/templates/ceph.conf.j2 And all seems to be well. It also seems to have corrected another issue I was going to raise, which is RGW not working from the OSP Dashboard (horizon). It would just throw an error before.
The bug was fixed in 3.1.5, you should enable the Ceph Tools repos to get the newer version installed instead of the version included in the OSP repos.