Bug 1600227 - ceph-ansible failed to deploy with FAILED! => {"msg": "'dict object' has no attribute u'ansible_ens3f0'"}
Summary: ceph-ansible failed to deploy with FAILED! => {"msg": "'dict object' has no a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 2.5
Hardware: Unspecified
OS: Linux
high
medium
Target Milestone: rc
: 3.2
Assignee: Guillaume Abrioux
QA Contact: Tiffany Nguyen
URL:
Whiteboard:
: 1643403 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-11 18:15 UTC by Tiffany Nguyen
Modified: 2019-01-03 19:01 UTC (History)
15 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.rc5.el7cp Ubuntu: ceph-ansible_3.2.0~rc5-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-03 19:01:24 UTC
Embargoed:
vakulkar: automate_bug?


Attachments (Terms of Use)
ansible-playbook full log (450.52 KB, text/plain)
2018-11-28 17:07 UTC, Tiffany Nguyen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 2915 0 None closed validate: add checks for interfaces 2020-02-12 13:02:20 UTC
Github ceph ceph-ansible pull 3380 0 None closed validate: change default value for `radosgw_address` 2020-02-12 13:02:22 UTC
Red Hat Product Errata RHBA-2019:0020 0 None None None 2019-01-03 19:01:39 UTC

Description Tiffany Nguyen 2018-07-11 18:15:57 UTC
Description of problem:


Version-Release number of selected component (if applicable):
* RHCS 2.5: Ceph 10.2.10-17.el7cp (9865b1b203321435cc7128257833dca28bd779aa)

How reproducible:
1. purge the scale setup after running to issue: https://bugzilla.redhat.com/show_bug.cgi?id=1599842

2. deploy ceph using ceph-ansible:
   # ansible-playbook -vv -i hosts site.yml

Actual results:
  Ceph failed to deploy.  All the osd nodes are failing with error:
  fatal: [c07-h21-6048r.rdu.openstack.engineering.redhat.com]: FAILED! => {"msg": "'dict object' has no attribute u'ansible_ens3f0'"}

Additional info:
* ansible full log: https://paste.fedoraproject.org/paste/mqMStUdfm7W1nNczjSyT0Q

* ifconfig from OSD node:
enp131s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.70.140  netmask 255.255.0.0  broadcast 172.18.255.255
        inet6 fe80::ae1f:6bff:fe2d:aa50  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:2d:aa:50  txqueuelen 1000  (Ethernet)
        RX packets 164  bytes 9840 (9.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 42  bytes 2700 (2.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp131s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.19.70.140  netmask 255.255.0.0  broadcast 172.19.255.255
        inet6 fe80::ae1f:6bff:fe2d:aa51  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:2d:aa:51  txqueuelen 1000  (Ethernet)
        RX packets 164  bytes 9840 (9.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 42  bytes 2700 (2.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp131s0f0.103: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.22.70.140  netmask 255.255.0.0  broadcast 172.22.255.255
        inet6 fe80::ae1f:6bff:fe2d:aa50  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:2d:aa:50  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp131s0f0.200: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.26.70.140  netmask 255.252.0.0  broadcast 172.27.255.255
        inet6 fe80::ae1f:6bff:fe2d:aa50  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:2d:aa:50  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp131s0f1.104: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.23.70.140  netmask 255.255.0.0  broadcast 172.23.255.255
        inet6 fe80::ae1f:6bff:fe2d:aa51  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:2d:aa:51  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp131s0f1.200: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.27.70.140  netmask 255.252.0.0  broadcast 172.27.255.255
        inet6 fe80::ae1f:6bff:fe2d:aa51  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:2d:aa:51  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp5s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.12.70.140  netmask 255.255.254.0  broadcast 10.12.71.255
        inet6 2620:52:0:c46:ec4:7aff:fe6f:32e8  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::ec4:7aff:fe6f:32e8  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:6f:32:e8  txqueuelen 1000  (Ethernet)
        RX packets 1800  bytes 356341 (347.9 KiB)
        RX errors 0  dropped 24  overruns 0  frame 0
        TX packets 578  bytes 114166 (111.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.70.140  netmask 255.255.0.0  broadcast 172.16.255.255
        inet6 fe80::ec4:7aff:fe19:6a18  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:19:6a:18  txqueuelen 1000  (Ethernet)
        RX packets 7978  bytes 559166 (546.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6083  bytes 601362 (587.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.70.140  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::ec4:7aff:fe19:6a19  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:19:6a:19  txqueuelen 1000  (Ethernet)
        RX packets 164  bytes 9840 (9.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 42  bytes 2700 (2.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3f0.101: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.20.70.140  netmask 255.255.0.0  broadcast 172.20.255.255
        inet6 fe80::ec4:7aff:fe19:6a18  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:19:6a:18  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3f0.200: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.24.70.140  netmask 255.252.0.0  broadcast 172.27.255.255
        inet6 fe80::ec4:7aff:fe19:6a18  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:19:6a:18  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3f1.102: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.21.70.140  netmask 255.255.0.0  broadcast 172.21.255.255
        inet6 fe80::ec4:7aff:fe19:6a19  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:19:6a:19  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens3f1.200: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.25.70.140  netmask 255.252.0.0  broadcast 172.27.255.255
        inet6 fe80::ec4:7aff:fe19:6a19  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:19:6a:19  txqueuelen 1000  (Ethernet)
        RX packets 49  bytes 2254 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 900 (900.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 6  bytes 318 (318.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6  bytes 318 (318.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Comment 3 Vasu Kulkarni 2018-07-11 22:28:07 UTC
Andrew pointed out to run without rgw section and it worked and further debug showed out of 13 rgw nodes, one node didn't have the right interface, the error message is bit misleading which throws all nodes dont have interface, we could fix the error message here for better experience


c08-h22-r630.rdu.openstack.engineering.redhat.com | SUCCESS => {
    "ansible_facts": {}, 
    "changed": false
}
c07-h25-6048r.rdu.openstack.engineering.redhat.com | SUCCESS => {
    "ansible_facts": {
        "ansible_ens3f0": {
            "active": true, 
            "device": "ens3f0", 
            "features": {
                "busy_poll": "off [fixed]", 
                "fcoe_mtu": "off [fixed]", 
                "generic_receive_offload": "on", 
                "generic_segmentation_offload": "on", 
                "highdma": "on", 
                "hw_tc_offload": "off [fixed]", 
                "l2_fwd_offload": "off [fixed]", 
                "large_receive_offload": "off [fixed]", 
                "loopback": "off [fixed]", 
                "netns_local": "off [fixed]", 
                "ntuple_filters": "off", 
                "receive_hashing": "on", 
                "rx_all": "off [fixed]", 
                "rx_checksumming": "on", 
                "rx_fcs": "off [fixed]", 
                "rx_udp_tunnel_port_offload": "on", 
                "rx_vlan_filter": "on [fixed]", 
                "rx_vlan_offload": "on", 
                "rx_vlan_stag_filter": "off [fixed]", 
                "rx_vlan_stag_hw_parse": "off [fixed]"

Comment 5 Ken Dreyer (Red Hat) 2018-07-13 20:23:13 UTC
What engineering work remains for this BZ?

Comment 6 Christina Meno 2018-07-20 22:16:32 UTC
What do you think we should say instead of this error mesg?

Comment 7 seb 2018-07-23 14:04:47 UTC
The error message comes straight out of Ansible.
We could add a safety check to make sure the interface exists on all the specified nodes and fail otherwise.

Setting priority to low, this remains a configuration issue in the end, not ceph-ansible's fault.

Comment 9 seb 2018-07-25 13:13:10 UTC
Patch is merged upstream so this will be in RHCS 3.1, so targeting it back to RHCS 3.1. Unless QE can't ack it.

Comment 11 seb 2018-07-26 12:58:33 UTC
Right, I was thinking something else but yeah let's put it in 3.2.
Thanks

Comment 12 Vasu Kulkarni 2018-07-26 15:32:11 UTC
sorry priority should be high, this is serious issue when setting up large clusters and the error message is misleading, I understand that its ansible's fault, but sebastein as you said you can have one level of verification done before our installation starts.

Comment 13 Vasu Kulkarni 2018-07-26 15:33:50 UTC
can we not do this in 3.1, why was this moved to 3.2?

Comment 14 seb 2018-07-26 15:49:24 UTC
Sorry Vasu but as much as I'd love to have this for 3.1 this won't be possible. This patch relies on a feature that is planned for 3.2.

Comment 21 Sébastien Han 2018-10-26 10:03:54 UTC
*** Bug 1643403 has been marked as a duplicate of this bug. ***

Comment 22 Tiffany Nguyen 2018-11-20 21:03:42 UTC
Using ceph-ansible version 3.2.0-0.1.rc3.el7cp, I don't see a safety check to make sure the interface exists on all the specified nodes.  Error message below are still coming from ansible: 

Tuesday 20 November 2018  20:28:49 +0000 (0:00:00.139)       0:08:30.962 ****** 
fatal: [mero005]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp136s1'"}
fatal: [mero006]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp136s1'"}
fatal: [mero007]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_enp136s1'"}

Comment 26 Tiffany Nguyen 2018-11-28 17:06:10 UTC
Step to reproduce the issue:
1. Configure /etc/ansible/hosts with one interface doesn't exist
   [rgws]
   mero005 radosgw_interface=enp136s1
   mero006 radosgw_interface=enp136s0
   mero007 radosgw_interface=enp136s0

2. Deploy ceph using ceph-ansible
   # ansible-playbook site.yml

Comment 27 Tiffany Nguyen 2018-11-28 17:07:18 UTC
Created attachment 1509578 [details]
ansible-playbook full log

Comment 32 Tiffany Nguyen 2018-12-13 16:45:52 UTC
Verified with 3.2.0-1.el7cp build. Pre-check message is printed out when the interface doesn't exist as expected:

TASK [ceph-validate : fail if enp136s1 does not exist on mero005] ******************************************************************
Thursday 13 December 2018  16:40:18 +0000 (0:00:00.531)       0:00:36.579 ***** 
fatal: [mero005]: FAILED! => {"changed": false, "msg": "enp136s1 does not exist on mero005"}

Comment 34 errata-xmlrpc 2019-01-03 19:01:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020


Note You need to log in before you can comment on or make changes to this bug.