Bug 1281301

Summary: NetworkManager infiniband connected mode fails with some adapters
Product: Red Hat Enterprise Linux 7 Reporter: Dominique Martinet <dominique.martinet>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.1CC: aloughla, bgalvani, dcbw, dominique.martinet, lrintel, rkhan, thaller, vbenes
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 19:20:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1301628, 1313485    
Attachments:
Description Flags
[PATCH] device/infiniband: take interface down to set transport mode none

Description Dominique Martinet 2015-11-12 09:26:37 UTC
Description of problem:
Having network manager set connected mode fails for some new adapters (mlx5 driver)


Version-Release number of selected component (if applicable):
NetworkManager-1.0.0-16.git20150121.b4ea599c.el7_1.x86_64

How reproducible:
Always, just set CONNECTED_MODE=yes on such a card

Actual results:

We get this kind of logs and the interface stays down:
Oct 26 14:21:11 test10 NetworkManager[2054]: <info>  NetworkManager state is now CONNECTING
Oct 26 14:21:11 test10 kernel: mlx5_0p1.8012: interface is up, cannot change mode
Oct 26 14:21:11 test10 NetworkManager[2054]: <error> [1445865671.383903] [platform/nm-linux-platform.c:2133] sysctl_set(): sysctl: failed to set '/sys/class/net/mlx5_0p1.8012/mode' to 'datagram': (22) Invalid argument
Oct 26 14:21:11 test10 NetworkManager[2054]: <info>  (mlx5_0p1.8012): device state change: prepare -> failed (reason 'config-failed') [40 120 4]
Oct 26 14:21:11 test10 NetworkManager[2054]: <info>  NetworkManager state is now CONNECTED_LOCAL
Oct 26 14:21:11 test10 NetworkManager[2054]: <warn>  (mlx5_0p1.8012): Activation: failed for connection 'mlx5_0p1.8012'


Expected results:
The interface comes up

Additional info:
As the message clearly says, we need to change the connected mode before the interface is brought up.
As a workaround, it seems to work if the interface is configured as connected beforehand.

Comment 2 Dan Williams 2016-01-04 23:26:02 UTC
Yeah, nm-device-infiniband.c::act_stage1_prepare() should probably take the device down, set the mode, and bring it back up.

Comment 3 Beniamino Galvani 2016-01-05 15:02:19 UTC
Created attachment 1111844 [details]
[PATCH] device/infiniband: take interface down to set transport mode

Hi Dominique,

the attached patch should fix the issue, however I was not able to
test it since the hardware I have available accepts a mode change
even when the interface is up. Could you please grab packages at:

 http://people.redhat.com/~bgalvani/NM/rh1281301/

upgrade them with "rpm -Fvh *.rpm" (so that only the packages you
have already installed will be upgraded), restart NM and check if this
solves the problem? Thanks!

Comment 4 Dominique Martinet 2016-01-05 15:45:24 UTC
Hi,

I can confirm this works for me (I just brought an interface down, changed mode to datagram and restarted NetworkManager after updating the packages - the interface is back up in connected mode)

Thank you,
-- 
Dominique Martinet

Comment 5 Dan Williams 2016-01-05 17:11:02 UTC
(In reply to Beniamino Galvani from comment #3)
> Created attachment 1111844 [details]
> [PATCH] device/infiniband: take interface down to set transport mode
> 
> Hi Dominique,
> 
> the attached patch should fix the issue, however I was not able to
> test it since the hardware I have available accepts a mode change
> even when the interface is up. Could you please grab packages at:
> 
>  http://people.redhat.com/~bgalvani/NM/rh1281301/
> 
> upgrade them with "rpm -Fvh *.rpm" (so that only the packages you
> have already installed will be upgraded), restart NM and check if this
> solves the problem? Thanks!

LGTM

Comment 8 Vladimir Benes 2016-09-14 20:58:01 UTC
Infiniband transport mode is changed after device is taken down. This should prevent any such failures on random devices.

Comment 10 errata-xmlrpc 2016-11-03 19:20:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2581.html