Bug 1450219

Summary:

[NMCI] race in bond_rename test

Product:

Red Hat Enterprise Linux 7

Reporter:

Vladimir Benes <vbenes>

Component:

NetworkManager

Assignee:

Beniamino Galvani <bgalvani>

Status:

CLOSED ERRATA

QA Contact:

Desktop QE <desktop-qa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

7.4

CC:

aloughla, atragler, bgalvani, fgiudici, lrintel, rkhan, sukulkar, thaller

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-10 13:22:08 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
debug log	none
[PATCH] manager: avoid that auto-activations preempt user activations	none
[PATCH v2] manager: avoid that auto-activations preempt user activations	none

Description Vladimir Benes 2017-05-11 21:24:01 UTC

Description of problem:
    Scenario: NM - bond - device rename
     * Add connection type "bond" named "bond0" for device "bondy"
     * Add slave connection for master "nm-bond" on device "eth1" named "bond0.0"
     * Add slave connection for master "nm-bond" on device "eth2" named "bond0.1"
     * Bring "down" connection "bond0"
     * Open editor for connection "bond0"
     * Set a property named "connection.interface-name" to "nm-bond" in editor
     * Save in editor
     Then Value saved message showed in editor
     * Quit editor
^^ often failing here with:
Error: Connection activation failed: New connection activation was enqueued

     * Bring "up" connection "bond0"
     * Bring "up" connection "bond0.0"
     * Bring "up" connection "bond0.1"
     Then Check bond "nm-bond" link state is "up"


Version-Release number of selected component (if applicable):
I think from 1.4 over 1.6 and still present in 1.8

Comment 1 Vladimir Benes 2017-05-11 21:24:30 UTC

Created attachment 1278036 [details]
debug log

Comment 2 Vladimir Benes 2017-05-11 21:25:12 UTC

workaround is to add few seconds sleep after     
 * Bring "down" connection "bond0"

Comment 3 Beniamino Galvani 2017-06-16 15:51:31 UTC

Created attachment 1288394 [details]
[PATCH] manager: avoid that auto-activations preempt user activations

Comment 4 Thomas Haller 2017-06-16 16:14:40 UTC

+    if (nm_auth_subject_is_internal (nm_active_connection_get_subject (active))) 

if (success &&




why the check for
+    && nm_streq0 (nm_active_connection_get_specific_object (candidate), 
                   nm_active_connection_get_specific_object (active))) 

? It seems that is not necessary? The specific-object is like the path to the WifiAP. Seems to me, it doesn't matter if they differ...



Should the check however consider the state of candidate? E.g. if candidate is already about to disconnect, it seems right to proceed with new activation? Dunno.


But good catch, for this issue!!

Comment 5 Beniamino Galvani 2017-06-18 12:48:30 UTC

Created attachment 1288844 [details]
[PATCH v2] manager: avoid that auto-activations preempt user activations

(In reply to Thomas Haller from comment #4)
> why the check for
> +    && nm_streq0 (nm_active_connection_get_specific_object (candidate), 
>                    nm_active_connection_get_specific_object (active))) 
> 
> ? It seems that is not necessary? The specific-object is like the path to
> the WifiAP. Seems to me, it doesn't matter if they differ...

Good point, fixed.

> Should the check however consider the state of candidate? E.g. if candidate
> is already about to disconnect, it seems right to proceed with new
> activation? Dunno.

Yes, makes sense.

Comment 6 Thomas Haller 2017-06-19 08:49:22 UTC


+    if (nm_auth_subject_is_internal (nm_active_connection_get_subject (active))) 

if (success && ...

Comment 7 Beniamino Galvani 2017-06-19 14:10:49 UTC

(In reply to Thomas Haller from comment #6)
> 
> +    if (nm_auth_subject_is_internal (nm_active_connection_get_subject
> (active))) 
> 
> if (success && ...

Ops, fixed.

Applied to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=0922a177385be188b9c9c8ad39c1068533f5a4b3

and nm-1-8:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-8&id=2236c3c728c49d2ebd68e83f1096b5180b2f41dd


After this fix, the following CI test should work reliably without
the extra delay:

https://github.com/NetworkManager/NetworkManager-ci/blob/82dd537b29b5652dc269ef89ca229098877d9100/nmcli/features/bond.feature#L1267

Comment 9 Vladimir Benes 2017-12-06 08:23:11 UTC

New version of test for 1.8.1 introduced w/o the delay after bond down
     # VVV Workaround for rhbz1450219
     * Wait for at least "2" seconds

Comment 12 errata-xmlrpc 2018-04-10 13:22:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778