Bug 1450219

Summary: [NMCI] race in bond_rename test
Product: Red Hat Enterprise Linux 7 Reporter: Vladimir Benes <vbenes>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: aloughla, atragler, bgalvani, fgiudici, lrintel, rkhan, sukulkar, thaller
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 13:22:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
debug log
none
[PATCH] manager: avoid that auto-activations preempt user activations
none
[PATCH v2] manager: avoid that auto-activations preempt user activations none

Description Vladimir Benes 2017-05-11 21:24:01 UTC
Description of problem:
    Scenario: NM - bond - device rename
     * Add connection type "bond" named "bond0" for device "bondy"
     * Add slave connection for master "nm-bond" on device "eth1" named "bond0.0"
     * Add slave connection for master "nm-bond" on device "eth2" named "bond0.1"
     * Bring "down" connection "bond0"
     * Open editor for connection "bond0"
     * Set a property named "connection.interface-name" to "nm-bond" in editor
     * Save in editor
     Then Value saved message showed in editor
     * Quit editor
^^ often failing here with:
Error: Connection activation failed: New connection activation was enqueued

     * Bring "up" connection "bond0"
     * Bring "up" connection "bond0.0"
     * Bring "up" connection "bond0.1"
     Then Check bond "nm-bond" link state is "up"


Version-Release number of selected component (if applicable):
I think from 1.4 over 1.6 and still present in 1.8

Comment 1 Vladimir Benes 2017-05-11 21:24:30 UTC
Created attachment 1278036 [details]
debug log

Comment 2 Vladimir Benes 2017-05-11 21:25:12 UTC
workaround is to add few seconds sleep after     
 * Bring "down" connection "bond0"

Comment 3 Beniamino Galvani 2017-06-16 15:51:31 UTC
Created attachment 1288394 [details]
[PATCH] manager: avoid that auto-activations preempt user activations

Comment 4 Thomas Haller 2017-06-16 16:14:40 UTC
+    if (nm_auth_subject_is_internal (nm_active_connection_get_subject (active))) 

if (success &&




why the check for
+    && nm_streq0 (nm_active_connection_get_specific_object (candidate), 
                   nm_active_connection_get_specific_object (active))) 

? It seems that is not necessary? The specific-object is like the path to the WifiAP. Seems to me, it doesn't matter if they differ...



Should the check however consider the state of candidate? E.g. if candidate is already about to disconnect, it seems right to proceed with new activation? Dunno.


But good catch, for this issue!!

Comment 5 Beniamino Galvani 2017-06-18 12:48:30 UTC
Created attachment 1288844 [details]
[PATCH v2] manager: avoid that auto-activations preempt user activations

(In reply to Thomas Haller from comment #4)
> why the check for
> +    && nm_streq0 (nm_active_connection_get_specific_object (candidate), 
>                    nm_active_connection_get_specific_object (active))) 
> 
> ? It seems that is not necessary? The specific-object is like the path to
> the WifiAP. Seems to me, it doesn't matter if they differ...

Good point, fixed.

> Should the check however consider the state of candidate? E.g. if candidate
> is already about to disconnect, it seems right to proceed with new
> activation? Dunno.

Yes, makes sense.

Comment 6 Thomas Haller 2017-06-19 08:49:22 UTC

+    if (nm_auth_subject_is_internal (nm_active_connection_get_subject (active))) 

if (success && ...

Comment 7 Beniamino Galvani 2017-06-19 14:10:49 UTC
(In reply to Thomas Haller from comment #6)
> 
> +    if (nm_auth_subject_is_internal (nm_active_connection_get_subject
> (active))) 
> 
> if (success && ...

Ops, fixed.

Applied to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=0922a177385be188b9c9c8ad39c1068533f5a4b3

and nm-1-8:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-8&id=2236c3c728c49d2ebd68e83f1096b5180b2f41dd


After this fix, the following CI test should work reliably without
the extra delay:

https://github.com/NetworkManager/NetworkManager-ci/blob/82dd537b29b5652dc269ef89ca229098877d9100/nmcli/features/bond.feature#L1267

Comment 9 Vladimir Benes 2017-12-06 08:23:11 UTC
New version of test for 1.8.1 introduced w/o the delay after bond down
     # VVV Workaround for rhbz1450219
     * Wait for at least "2" seconds

Comment 12 errata-xmlrpc 2018-04-10 13:22:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778