Bug 1540881

Summary: [CEE/SD] monitor_interface with "-" in the name fails with "msg": "'dict object' has no attribute u'ansible_bond-monitor-interface'"
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tomas Petr <tpetr>
Component: Ceph-AnsibleAssignee: Rishabh Dave <ridave>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact: John Brier <jbrier>
Priority: medium    
Version: 3.0CC: adeza, agunn, anharris, aschoen, assingh, ceph-eng-bugs, ceph-qe-bugs, dsavinea, gabrioux, gmeno, jbrier, mamccoma, nthomas, ridave, sankarshan, sdudhgao, shan, tchandra, tpetr, tserlin
Target Milestone: z2   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.9-1.el7cp Ubuntu: ceph-ansible_3.2.9-2redhat1 Doc Type: Bug Fix
Doc Text:
.Ceph Ansible no longer fails if network interface names include dashes When `ceph-ansible` makes an inventory of network interfaces if they have a dash (`-`) in the name the inventory must convert the dashes to undescores (`_`) in order to use them. In some cases conversion did not occur and Ceph installation failed. With this update to {product}, all dashes in the names of network interfaces are converted in the facts and installation completes successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 15:56:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    

Description Tomas Petr 2018-02-01 09:19:54 UTC
Description of problem:
we have set network interface:
bond-monitor-interface as monitor_interface / public network

setting in all.yml
monitor_interface: bond-monitor-interface
public_network: 192.168.1.0/24
cluster_network: 192.168.2.0/28

Ansible deploy will fail with:
fatal: [mons-0]: FAILED! => {"msg": "'dict object' has no attribute u'ansible_bond-monitor-interface'"}

looking at the output of
ansible all -i mons-0 -m setup -c local > file.txt
        "ansible_bond_monitor_interface": {   <------------
            "active": true, 
            "device": "bond-monitor-interface",    <------------
            "features": {
                     .....
            }, 
            "hw_timestamp_filters": [], 
            "ipv4": {
                "address": "192.168.1.2", 
                "broadcast": "192.168.1.255", 
                "netmask": "255.255.255.0", 
                "network": "192.168.1.0"
            }, 
            "ipv6": [
                ...
            ], 
            "lacp_rate": "fast", 
            "macaddress": "aa:bb:cc:dd:ee:ff", 
            "miimon": "0", 
            "mode": "802.3ad", 
            "mtu": 9000, 
            "promisc": false, 
            "slaves": [
                "eth0", 
                "eth1"
            ], 
            "speed": 20000, 
            "timestamping": [
                "rx_software", 
                "software"
            ], 
            "type": "bonding"
        }, 
        ....
        "ansible_interfaces": [
            "lo", 
            "bond-monitor-interface ", 
            "eth0", 
            "eth1"
        ],


Version-Release number of selected component (if applicable):
ceph-ansible-3.0.14-1.el7cp.noarch
ansible-2.4.2.0-2.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. create network interface with "-" on the name
2. set is as monitor_interfave in all.yml
3. deploy ceph cluster with ceph-ansible 
4. watch it fail

Actual results:
if there is a "-" in interface name it will get changed for ansible object with "_" in name and the interface cannot be found
fatal: [mons-0]: FAILED! => {"msg": "'dict object' has no attribute u'ansible_bond-monitor-interface'"}

Expected results:
interface is properly recognized

Additional info:
unsure if it is ceph-ansible or ansible problem

Comment 3 Sébastien Han 2018-02-01 09:32:30 UTC
Fix is upstream already. This will be in 3.1

Comment 5 Servesha 2019-01-17 04:49:47 UTC
Hello Sebastian,

I tried to test the issue by reproducing it in my lab environment with ceph-ansible version 3.2. 
I experienced below error during deployment : 

TASK [ceph-validate : fail if br-ex is not active on servesha-ceph-test2] *********************************************
task path: /usr/share/ceph-ansible/roles/ceph-validate/tasks/check_eth_mon.yml:8
Tuesday 15 January 2019  04:47:00 -0500 (0:00:00.077)       0:00:21.305 ******* 
META: noop
META: noop
fatal: [servesha-ceph-test2]: FAILED! => {
    "msg": "The conditional check 'not hostvars[inventory_hostname]['ansible_' + monitor_interface]['active']' failed. The error was: error while evaluating conditional (not hostvars[inventory_hostname]['ansible_' + monitor_interface]['active']): 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_br-ex'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-validate/tasks/check_eth_mon.yml': line 8, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: \"fail if {{ monitor_interface }} is not active on {{ inventory_hostname }}\"\n  ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes.  Always quote template expression brackets when they\nstart a value. For instance:\n\n    with_items:\n      - {{ foo }}\n\nShould be written as:\n\n    with_items:\n      - \"{{ foo }}\"\n"
}

# rpm -qa | grep ansible
ansible-2.6.11-1.el7ae.noarch
ceph-ansible-3.2.0-1.el7cp.noarch


The bug was expected to be fixed in version 3.1 but it's still there in version 3.2.


Regards,
Servesha

Comment 6 Sébastien Han 2019-01-21 09:13:10 UTC
Indeed, it appears that's still a problem. Rishabh please look into this when you have a moment. Thanks.

Comment 7 Tomas Petr 2019-02-18 17:31:16 UTC
(In reply to Servesha from comment #5)
> Hello Sebastian,
> 
> I tried to test the issue by reproducing it in my lab environment with
> ceph-ansible version 3.2. 
> I experienced below error during deployment : 
> 
> TASK [ceph-validate : fail if br-ex is not active on servesha-ceph-test2]
> *********************************************
> task path:
> /usr/share/ceph-ansible/roles/ceph-validate/tasks/check_eth_mon.yml:8
> Tuesday 15 January 2019  04:47:00 -0500 (0:00:00.077)       0:00:21.305
> ******* 
> META: noop
> META: noop
> fatal: [servesha-ceph-test2]: FAILED! => {
>     "msg": "The conditional check 'not
> hostvars[inventory_hostname]['ansible_' + monitor_interface]['active']'
> failed. The error was: error while evaluating conditional (not
> hostvars[inventory_hostname]['ansible_' + monitor_interface]['active']):
> 'ansible.vars.hostvars.HostVarsVars object' has no attribute
> u'ansible_br-ex'\n\nThe error appears to have been in
> '/usr/share/ceph-ansible/roles/ceph-validate/tasks/check_eth_mon.yml': line
> 8, column 3, but may\nbe elsewhere in the file depending on the exact syntax
> problem.\n\nThe offending line appears to be:\n\n\n- name: \"fail if {{
> monitor_interface }} is not active on {{ inventory_hostname }}\"\n  ^
> here\nWe could be wrong, but this one looks like it might be an issue
> with\nmissing quotes.  Always quote template expression brackets when
> they\nstart a value. For instance:\n\n    with_items:\n      - {{ foo
> }}\n\nShould be written as:\n\n    with_items:\n      - \"{{ foo }}\"\n"
> }
> 
> # rpm -qa | grep ansible
> ansible-2.6.11-1.el7ae.noarch
> ceph-ansible-3.2.0-1.el7cp.noarch
> 
> 
> The bug was expected to be fixed in version 3.1 but it's still there in
> version 3.2.
> 
> 
> Regards,
> Servesha


So the task that causes this fail is Task in ceph-validate
TASK [ceph-validate : fail if br-ex is not active on servesha-ceph-test2]

 in 
./roles/ceph-validate/tasks/check_eth_mon.yml
and
./roles/ceph-validate/tasks/check_eth_rgw.yml


this was added in ceph-ansible 3.2 beta
https://github.com/ceph/ceph-ansible/commit/235d1b3f557dcd9164d392050382398e1cda7084#diff-3f1cf80769de29dc34cad67d08a71ee9


I think this can be fixed by adding the same parts of code as in original issue
https://github.com/ceph/ceph-ansible/pull/2078/files


like (and same for check_eth_rgw.yml)
-----------
# cat ./roles/ceph-validate/tasks/check_eth_mon.yml
---
- name: "fail if {{ monitor_interface }} does not exist on {{ inventory_hostname }}"
  fail:
    msg: "{{ monitor_interface }} does not exist on {{ inventory_hostname }}"
  when:
    - monitor_interface not in ansible_interfaces

- name: "fail if {{ monitor_interface }} is not active on {{ inventory_hostname }}"
  fail:
    msg: "{{ monitor_interface }} is not active on {{ inventory_hostname }}"
  when:
    - not hostvars[inventory_hostname]['ansible_' + (monitor_interface | replace('-', '_'))]['active']

- name: "fail if {{ monitor_interface }} does not have any ip v4 address on {{ inventory_hostname }}"
  fail:
    msg: "{{ monitor_interface }} does not have any IPv4 address on {{ inventory_hostname }}"
  when:
    - ip_version == "ipv4"
    - hostvars[inventory_hostname]['ansible_' + (monitor_interface | replace('-', '_'))]['ipv4'] is not defined

- name: "fail if {{ monitor_interface }} does not have any ip v6 address on {{ inventory_hostname }}"
  fail:
    msg: "{{ monitor_interface }} does not have any IPv6 address on {{ inventory_hostname }}"
  when:
    - ip_version == "ipv6"
    - hostvars[inventory_hostname]['ansible_' + (monitor_interface | replace('-', '_'))]['ipv6'] is not defined

-----------

Seb, can you confirm my thoughts?

Comment 8 Tomas Petr 2019-02-18 17:37:56 UTC
btw all I did was replace 
['ansible_' + monitor_interface]
for
['ansible_' + (monitor_interface | replace('-', '_'))]

Comment 11 Servesha 2019-03-06 15:52:39 UTC
Hello,

I made changes as mentioned in the upstream (made changes in file ./roles/ceph-validate/tasks/check_eth_mon.yml). After making changes playbook is giving errors at same task (TASK [ceph-validate : fail if br-ex is not active on servesha-ceph-test2]) as it was giving previously. 

Best regards,
Servesha

Comment 13 Servesha 2019-03-12 10:08:05 UTC
Hello gabrioux,

I have bridge created on node ssd1. ssd2 is my admin node and also mon,mgr and osd. ssd3 is purely osd node. 
I have made changes in file /usr/share/ceph-ansible/roles/ceph-validate/tasks/check_eth_mon.yml as mentioned in upstream. 

Expected results are : error occurrence while deploying monitor on node which contains br-ex network interface.

Here are my credentials of containerized cluster.

10.74.253.96 ssd2 - admin
10.74.254.21 ssd3
10.74.250.60 ssd1


ssd1 won't be accessible through ssh since it has br-ex.

Best regards,
Servesha

Comment 14 Dimitri Savineau 2019-03-13 18:24:42 UTC
@Servesha

The patch is working fine however the task is failling because your br-ex interface is down (active=false).
Could you try to set the interface up and rerun ceph-ansible ?

$ ip link set br-ex up

Comment 15 Servesha 2019-03-18 14:18:30 UTC
Hello Dimitri,

Yeah sure. I will try and rerun the playbook.

Regards,
Servesha

Comment 16 Dimitri Savineau 2019-03-26 19:33:36 UTC
Any update on this ?

Comment 17 Servesha 2019-03-28 10:46:01 UTC
Hello Dimitri , 

I was using that environment for different task testing, it is not available now. But since the patch is working, I have notified the customer to test workaround and let us know the results. The case is now "WOC". 

Thank you 

Best Regards,
Servesha

Comment 23 errata-xmlrpc 2019-04-30 15:56:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911

Comment 25 Rishabh Dave 2019-05-02 14:58:40 UTC
Hi, sorry for the late reply. I've made a slight change in the last sentence. I've replaced "converted in the inventory" by "converted in the facts" since that is more accurate.