Bug 2089868

Summary: network: the controller device is not completely cleaned up in the bond tests.
Product: Red Hat Enterprise Linux 8 Reporter: Noriko Hosoi <nhosoi>
Component: rhel-system-rolesAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Jon Trossbach <jtrossba>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.7CC: djez, jtrossba, nhosoi, rmeggins, spetrosi, wenliang, zhguan
Target Milestone: rcKeywords: Triaged
Target Release: 8.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: role:network
Fixed In Version: rhel-system-roles-1.19.1-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2089872 (view as bug list) Environment:
Last Closed: 2022-11-08 09:41:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2089872    

Description Noriko Hosoi 2022-05-24 15:38:53 UTC
Description of problem:
If the network test playbooks are executed in the serialized fashion sharing the same VM, tests_bond_removal_initscripts.yml and tests_bond_removal_nm.yml fail with:
Error: Connection activation failed:
No suitable device found for this connection

Complete failed output from tests_bond_removal_initscripts.yml:

TASK [rhel-system-roles.network : Configure networking connection profiles] ****
task path: /usr/share/ansible/roles/rhel-system-roles.network/tasks/main.yml:75
Wednesday 11 May 2022  11:50:36 -0400 (0:00:00.520)       0:00:16.656 *********
"'
[WARNING]: [012] <error> #0, state:up persistent_state:present, 'bond0': call 'ifup bond0' failed with exit status 1
fatal: [/tmp/tmp.Qg8GGc0RZE/image.qcow2.snap]: FAILED! => {
    "_invocation": {
        "module_args": {
            "__debug_flags": "",
            "__header": "#
            # Ansible managed
            #",
            "connections": [
                {
                    "bond": {
                        "miimon": 110,
                        "mode": "active-backup"
                    },
                    "interface_name": "nm-bond",
                    "name": "bond0",
                    "state": "up",
                    "type": "bond"
                },
                {
                    "controller": "bond0",
                    "interface_name": "test1",
                    "name": "bond0.0",
                    "state": "up",
                    "type": "ethernet"
                },
                {
                    "controller": "bond0",
                    "interface_name": "test2",
                    "name": "bond0.1",
                    "state": "up",
                    "type": "ethernet"
                }
            ],
            "force_state_change": false,
            "ignore_errors": false,
            "provider": "initscripts"
        }
    },
    "changed": true
}

STDERR:
[007] <info>  #0, state:up persistent_state:present, 'bond0': add ifcfg-rh profile 'bond0'
[008] <info>  #1, state:up persistent_state:present, 'bond0.0': add ifcfg-rh profile 'bond0.0'
[009] <info>  #2, state:up persistent_state:present, 'bond0.1': add ifcfg-rh profile 'bond0.1'
[010] <info>  #0, state:up persistent_state:present, 'bond0': up connection bond0 (not-active)
[011] <info>  #0, state:up persistent_state:present, 'bond0': call 'ifup bond0': rc=1, out='b'WARN      : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start slave device ifcfg-ib0 for master nm-bond.

Determining IP information for nm-bond... failed.
'', err='b"WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Error: Connection activation failed: No suitable device found for this connection (device eth0 not available because profile is not compatible with device (mismatching interface name)).
dhclient(23010) is already running - exiting. 

This version of ISC DHCP is based on the release available
on ftp.isc.org. Features have been added and other changes
have been made to the base software release in order to make
it work better with this distribution.

Please report issues with this software via: 
https://bugzilla.redhat.com/

exiting.
"'
[012] <error> #0, state:up persistent_state:present, 'bond0': call 'ifup bond0' failed with exit status 1

MSG:
error: call 'ifup bond0' failed with exit status 1


How reproducible: always


How to Reproduce: Run the bond playbooks tests_bond_*.yml in the serialized manner.

Comment 5 Jon Trossbach 2022-06-14 19:46:11 UTC
Looks good.

Using http://brew-task-repos.usersys.redhat.com/repos/scratch/rmeggins/rhel-system-roles/1.19.0/1.el8/noarch/rhel-system-roles-1.19.0-1.el8.noarch.rpm

# uname -r
4.18.0-400.el8.x86_64
# rpm -q rhel-system-roles
rhel-system-roles-1.19.0-1.el8.noarch

# export SYSTEM_ROLES_ONLY_TESTS=network/tests_bond_removal_nm.yml
# export ANSIBLE_VER=2.9
:: [ 10:15:57 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_6_10_GA (Expected 0, got 0)
:: [ 10:47:34 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 10:55:24 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_5_0_GA (Expected 0, got 0)
:: [ 09:45:54 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_6_TESTING (Expected 0, got 0)
:: [ 11:38:53 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 11:40:54 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_0_TESTING (Expected 0, got 0)
:: [ 11:42:43 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_1_TESTING (Expected 0, got 0)

# export ANSIBLE_VER=2.12
:: [ 13:23:31 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_6_10_GA (Expected 0, got 0)
:: [ 13:22:02 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 13:19:56 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_8_5_0_GA (Expected 0, got 0)
:: [ 13:15:15 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_8_6_TESTING (Expected 0, got 0)
:: [ 11:51:08 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 11:48:41 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_9_0_TESTING (Expected 0, got 0)
:: [ 11:45:07 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_9_1_TESTING (Expected 0, got 0)

# export SYSTEM_ROLES_ONLY_TESTS=network/tests_bond_removal_initscripts.yml
# export ANSIBLE_VER=2.9
:: [ 14:52:20 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_6_10_GA (Expected 0, got 0)
:: [ 14:57:10 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 15:01:08 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_5_0_GA (Expected 0, got 0)
:: [ 15:18:36 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_6_TESTING (Expected 0, got 0)
:: [ 15:35:16 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 15:41:29 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_0_TESTING (Expected 0, got 0)
:: [ 15:42:37 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_1_TESTING (Expected 0, got 0)

# export ANSIBLE_VER=2.12
:: [ 13:34:53 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_6_10_GA (Expected 0, got 0)
:: [ 13:37:40 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 13:40:14 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_8_5_0_GA (Expected 0, got 0)
:: [ 13:42:46 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_8_6_TESTING (Expected 0, got 0)
:: [ 13:45:13 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 14:36:15 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_9_0_TESTING (Expected 0, got 0)
:: [ 14:37:30 ] :: [   PASS   ] :: Test network with ANSIBLE-2.12 against RHEL_9_1_TESTING (Expected 0, got 0)

Comment 13 Jon Trossbach 2022-07-29 14:57:29 UTC
The images I am getting now have RHEL 2.13, unfortunately, this is not ready for primetime.

The 2.9 reversioning is not working as intended at the moment. David Jez and I will fix that and get this out the door. 2.13 is confirmed to not be supporting RHEL 6 as of now:
https://issues.redhat.com/browse/RHELPLAN-125014

It is entirely possible that RHEL 6 is working with 2.9 with this change however our testsuite needs an update that will be essential for ensuring escapes don't happen going forward.

Also, the pre-verify results are the results for 2.9 are not actually 2.9 result but rather the testsuite unintentionally being lied to.

Noting, that RHEL 6.10 is failing, the below test results are still pre-verified as working for 2.13

Comment 14 Jon Trossbach 2022-08-02 02:23:17 UTC
I just realized this one still doesn't seem to be working in general. Is there an RPM you thought this was working for?

Comment 15 Rich Megginson 2022-08-02 13:17:04 UTC
(In reply to Jon Trossbach from comment #14)
> I just realized this one still doesn't seem to be working in general. Is
> there an RPM you thought this was working for?

There is some issue with bond testing on some platforms e.g. only happens on certain beaker machines.  We have spent about 2 years, on and off, trying to debug and fix the bond tests, with varying degrees of success.  In other words, we have spent an inordinate amount of engineering resources attempting to fix 4 tests in code that only affects testing.  My suggestion is to ignore these failures, provided that the tests still pass under certain conditions.

As for the future, I'll leave that up to @wenliang and the network team to figure out what they want to do with these tests going forward.  We cannot leave these tests in this state.  Either we need to fix them, once and for all, for all test platforms (beaker, 1mt, local), or we need to skip them if it is simply not possible to fix them.

Comment 16 Jon Trossbach 2022-08-04 02:28:34 UTC
After some good results, we are looking good to go.

dnf install http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/rhel-system-roles/1.19.3/1.el8/noa\
rch/rhel-system-roles-1.19.3-1.el8.noarch.rpm

# uname -r
4.18.0-413.el8.x86_64
# rpm -q rhel-system-roles
rhel-system-roles-1.19.3-1.el8.noarch

# ansible --version
ansible [core 2.13.2]
export SYSTEM_ROLES_ONLY_TESTS=network/tests_bond_removal_nm.yml
:: [ 09:20:15 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 09:26:02 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_8_6_0_GA (Expected 0, got 0)
:: [ 09:30:08 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 09:09:10 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_9_0_0_GA (Expected 0, got 0)
:: [ 09:06:57 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_9_1_TESTING (Expected 0, got 0)

export SYSTEM_ROLES_ONLY_TESTS=network/tests_bond_removal_initscripts.yml
:: [ 08:40:54 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 08:45:27 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_8_6_0_GA (Expected 0, got 0)
:: [ 08:27:29 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 08:48:03 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_9_0_0_GA (Expected 0, got 0)
:: [ 08:56:21 ] :: [   PASS   ] :: Test network with ANSIBLE-2.13 against RHEL_9_1_TESTING (Expected 0, got 0)

# ansible --version
ansible 2.9.27
export SYSTEM_ROLES_ONLY_TESTS=network/tests_bond_removal_nm.yml
:: [ 21:25:28 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_6_10_GA (Expected 0, got 0)
:: [ 21:35:08 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 21:50:12 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_6_0_GA (Expected 0, got 0)
:: [ 21:55:50 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 21:58:15 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_0_0_GA (Expected 0, got 0)
:: [ 22:00:23 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_1_TESTING (Expected 0, got 0)

export SYSTEM_ROLES_ONLY_TESTS=network/tests_bond_removal_initscripts.yml
:: [ 22:04:08 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_6_10_GA (Expected 0, got 0)
:: [ 22:07:30 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_7_9_GA (Expected 0, got 0)
:: [ 22:15:53 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_6_0_GA (Expected 0, got 0)
:: [ 22:20:27 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_8_7_TESTING (Expected 0, got 0)
:: [ 22:22:19 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_0_0_GA (Expected 0, got 0)
:: [ 22:23:59 ] :: [   PASS   ] :: Test network with ANSIBLE-2.9 against RHEL_9_1_TESTING (Expected 0, got 0)


Putting machine used here for future reference: netqe16 and netqe10 for 2.13 and 2.9, respectively. Rich, I su\
spect any machine in netqe lab might work because they are all baremetal.

Comment 17 Wen Liang 2022-08-22 12:37:43 UTC
(In reply to Rich Megginson from comment #15)
> (In reply to Jon Trossbach from comment #14)
> > I just realized this one still doesn't seem to be working in general. Is
> > there an RPM you thought this was working for?
> 
> There is some issue with bond testing on some platforms e.g. only happens on
> certain beaker machines.  We have spent about 2 years, on and off, trying to
> debug and fix the bond tests, with varying degrees of success.  In other
> words, we have spent an inordinate amount of engineering resources
> attempting to fix 4 tests in code that only affects testing.  My suggestion
> is to ignore these failures, provided that the tests still pass under
> certain conditions.
> 
> As for the future, I'll leave that up to @wenliang and the
> network team to figure out what they want to do with these tests going
> forward.  We cannot leave these tests in this state.  Either we need to fix
> them, once and for all, for all test platforms (beaker, 1mt, local), or we
> need to skip them if it is simply not possible to fix them.

Got it. Thanks for making it clear.

Comment 19 errata-xmlrpc 2022-11-08 09:41:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rhel-system-roles bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:7568