Bug 1600227
| Summary: | ceph-ansible failed to deploy with FAILED! => {"msg": "'dict object' has no attribute u'ansible_ens3f0'"} | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tiffany Nguyen <tunguyen> | ||||
| Component: | Ceph-Ansible | Assignee: | Guillaume Abrioux <gabrioux> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Tiffany Nguyen <tunguyen> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 2.5 | CC: | anharris, aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, kdreyer, mmuench, nthomas, sankarshan, seb, tnielsen, tserlin, tunguyen, vakulkar | ||||
| Target Milestone: | rc | Flags: | vakulkar:
automate_bug?
|
||||
| Target Release: | 3.2 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHEL: ceph-ansible-3.2.0-0.1.rc5.el7cp Ubuntu: ceph-ansible_3.2.0~rc5-2redhat1 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-01-03 19:01:24 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Tiffany Nguyen
2018-07-11 18:15:57 UTC
Andrew pointed out to run without rgw section and it worked and further debug showed out of 13 rgw nodes, one node didn't have the right interface, the error message is bit misleading which throws all nodes dont have interface, we could fix the error message here for better experience
c08-h22-r630.rdu.openstack.engineering.redhat.com | SUCCESS => {
"ansible_facts": {},
"changed": false
}
c07-h25-6048r.rdu.openstack.engineering.redhat.com | SUCCESS => {
"ansible_facts": {
"ansible_ens3f0": {
"active": true,
"device": "ens3f0",
"features": {
"busy_poll": "off [fixed]",
"fcoe_mtu": "off [fixed]",
"generic_receive_offload": "on",
"generic_segmentation_offload": "on",
"highdma": "on",
"hw_tc_offload": "off [fixed]",
"l2_fwd_offload": "off [fixed]",
"large_receive_offload": "off [fixed]",
"loopback": "off [fixed]",
"netns_local": "off [fixed]",
"ntuple_filters": "off",
"receive_hashing": "on",
"rx_all": "off [fixed]",
"rx_checksumming": "on",
"rx_fcs": "off [fixed]",
"rx_udp_tunnel_port_offload": "on",
"rx_vlan_filter": "on [fixed]",
"rx_vlan_offload": "on",
"rx_vlan_stag_filter": "off [fixed]",
"rx_vlan_stag_hw_parse": "off [fixed]"
What engineering work remains for this BZ? What do you think we should say instead of this error mesg? The error message comes straight out of Ansible. We could add a safety check to make sure the interface exists on all the specified nodes and fail otherwise. Setting priority to low, this remains a configuration issue in the end, not ceph-ansible's fault. Patch is merged upstream so this will be in RHCS 3.1, so targeting it back to RHCS 3.1. Unless QE can't ack it. Right, I was thinking something else but yeah let's put it in 3.2. Thanks sorry priority should be high, this is serious issue when setting up large clusters and the error message is misleading, I understand that its ansible's fault, but sebastein as you said you can have one level of verification done before our installation starts. can we not do this in 3.1, why was this moved to 3.2? Sorry Vasu but as much as I'd love to have this for 3.1 this won't be possible. This patch relies on a feature that is planned for 3.2. *** Bug 1643403 has been marked as a duplicate of this bug. *** Using ceph-ansible version 3.2.0-0.1.rc3.el7cp, I don't see a safety check to make sure the interface exists on all the specified nodes. Error message below are still coming from ansible:
Tuesday 20 November 2018 20:28:49 +0000 (0:00:00.139) 0:08:30.962 ******
fatal: [mero005]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp136s1'"}
fatal: [mero006]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp136s1'"}
fatal: [mero007]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp136s1'"}
Step to reproduce the issue: 1. Configure /etc/ansible/hosts with one interface doesn't exist [rgws] mero005 radosgw_interface=enp136s1 mero006 radosgw_interface=enp136s0 mero007 radosgw_interface=enp136s0 2. Deploy ceph using ceph-ansible # ansible-playbook site.yml Created attachment 1509578 [details]
ansible-playbook full log
Verified with 3.2.0-1.el7cp build. Pre-check message is printed out when the interface doesn't exist as expected:
TASK [ceph-validate : fail if enp136s1 does not exist on mero005] ******************************************************************
Thursday 13 December 2018 16:40:18 +0000 (0:00:00.531) 0:00:36.579 *****
fatal: [mero005]: FAILED! => {"changed": false, "msg": "enp136s1 does not exist on mero005"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0020 |