Bug 1622505
Summary: | Playbook for cluster setup with multiple RGW instances fails. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | shilpa <smanjara> |
Component: | Ceph-Ansible | Assignee: | Guillaume Abrioux <gabrioux> |
Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 3.1 | CC: | agunn, aschoen, ceph-eng-bugs, dondavis, flucifre, gfidente, gmeno, hnallurv, kdreyer, nobody+410372, nthomas, sankarshan, shan, tchandra, tserlin, vakulkar |
Target Milestone: | rc | Keywords: | Automation |
Target Release: | 3.1 | Flags: | vakulkar:
automate_bug+
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-ansible-3.1.5-1.el7cp Ubuntu: ceph-ansible_3.1.5-2redhat1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-26 18:24:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1578730, 1640093 |
Description
shilpa
2018-08-27 11:06:51 UTC
Thomas, no this is not fixed yet. fixed in v3.1.4 This regression was caught by the downstream OSP automation tests. The required setup to trigger it is install or upgrade ceph on a cluster using FQDN option. any re-run of ceph-ansible (v3.1.3) on an existing cluster configured to "used_fqdn" is going to break the ceph.conf on rgw node The effect is that the ceph.conf on an RGW node will have missing fields It is not in stable-3.0. It was introduced in August The fix is already released as v3.1.4 do I understand correctly that Ceph-ansible 3.1.4 is a newer version than is included in 3.1? Best -F (In reply to Gregory Meno from comment #9) > This regression was caught by the downstream OSP automation tests. > The required setup to trigger it is install or upgrade ceph on a cluster > using FQDN option. > > any re-run of ceph-ansible (v3.1.3) on an existing cluster configured to > "used_fqdn" is going to break the ceph.conf on rgw node Gregory are you referring to the options 'mon_use_fqdn' and 'mds_use_fqdn'? If yes, then they are not supported in 3.1 as per bz https://bugzilla.redhat.com/show_bug.cgi?id=1613155. We have hit the original issue with and without FQDN mentioned for more than one RGWs in the inventory file. Just to clarify a bit: As Harish mentionned in c13, we started to not support anymore these options. Therefore, we had to keep backward compatibility with existing cluster. The commit which was supposed to provide this backward compatibility missed something and brought this current bug which has been finally fixed in v3.1.4. We tried the following scenarios with ceph-ansible version ceph-ansible 3.1.3-2redhat1 on Ubuntu. With inventory file(full names): --------------------- [mons] magna006.ceph.redhat.com [mgrs] magna006.ceph.redhat.com [osds] magna064.ceph.redhat.com devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" dmcrypt="true" magna111.ceph.redhat.com dedicated_devices="['/dev/sdb', '/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="bluestore" dmcrypt="true" magna117.ceph.redhat.com devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" [rgws] magna053.ceph.redhat.com magna061.ceph.redhat.com --------------------- Cluster was up,playbook did not fail, but RGWs were not installed. With inventory file(short names): -------------------- [mons] magna006 [mgrs] magna006[osds] magna064 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" dmcrypt="true" magna111 dedicated_devices="['/dev/sdb', '/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="bluestore" dmcrypt="true" magna117 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated" osd_objectstore="bluestore" [rgws] magna053 magna061 -------------------- Cluster was up ,RGWs were installed. Ansible logs are kept in magna002:/home/sshreeka/ansible_logs (In reply to Harish NV Rao from comment #13) > (In reply to Gregory Meno from comment #9) > > This regression was caught by the downstream OSP automation tests. > > The required setup to trigger it is install or upgrade ceph on a cluster > > using FQDN option. > > > > any re-run of ceph-ansible (v3.1.3) on an existing cluster configured to > > "used_fqdn" is going to break the ceph.conf on rgw node > > Gregory are you referring to the options 'mon_use_fqdn' and 'mds_use_fqdn'? > If yes, then they are not supported in 3.1 as per bz > https://bugzilla.redhat.com/show_bug.cgi?id=1613155. > > We have hit the original issue with and without FQDN mentioned for more than > one RGWs in the inventory file. ^^ above is w.r.t RHEL 7.5 ok, This latest build(3.1.5) should address c18. Here's the plan we discussed this morning: We'd like Giulio to run it through the OSP automation that caught this as a blocker. AND ceph QE will chech that they cannot reproduce the error. THEN We'll produce an RC and move forward with 3.1 cheers, G 3.1.5 passed; thanks a lot for caring and for the special effort! Ceph QE's tests for this fix have passed. Based on comment 28 and 29, moving this BZ to verified state Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819 This happens in OSP13 as well. (In reply to Donny Davis from comment #33) > This happens in OSP13 as well. Donny, are you seeing the issue with ceph-ansible-3.1.5-1.el7cp or an older version? I have the latest available package for OSP13 installed. ceph-ansible-3.1.3-1.el7cp.noarch I patched the offending file with this https://github.com/ceph/ceph-ansible/blob/4ce11a84938bb5377f422f01dbf3477bd0f607a9/roles/ceph-config/templates/ceph.conf.j2 And all seems to be well. It also seems to have corrected another issue I was going to raise, which is RGW not working from the OSP Dashboard (horizon). It would just throw an error before. The bug was fixed in 3.1.5, you should enable the Ceph Tools repos to get the newer version installed instead of the version included in the OSP repos. |