Bug 1498755

Summary: kernel changes in 7.4 prevent to up infiniband port
Product: Red Hat Enterprise Linux 7 Reporter: Vladimir Benes <vbenes>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 7.4CC: atragler, bgalvani, fgiudici, igkioka, lrintel, rkhan, sukulkar, thaller, toneata
Target Milestone: rcKeywords: ZStream
Target Release: ---Flags: thaller: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.8.0-11 Doc Type: Bug Fix
Doc Text:
Previously, when an Infiniband device was disconnected, the kernel in some cases reported a problem that the interface was removed. As a consequence, the NetworkManager's internal state of this device became corrupted. This bug has been fixed, and the NetworkManager service works as expected to provide consistent information about the state of the device.
Story Points: ---
Clone Of:
: 1499282 (view as bug list) Environment:
Last Closed: 2018-04-10 13:29:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1499282    

Description Vladimir Benes 2017-10-05 07:29:57 UTC
Description of problem:
I can up infiniband device mlx5_ib0 but not any of it's ports (e.g. mlx5_ib0.8002). This worked in 7.3 correctly, this works correctly even with the latest NM and kernel from 7.3 (514)

Version-Release number of selected component (if applicable):
kernel-3.10.0-693.2.1.el7.x86_64
NetworkManager-1.8.0-10.el7_4.x86_64

How reproducible:
always

Steps to Reproduce:
1.try to up ininiband port

Actual results:
device seems to be connected but NM time outs and says device is still down.

Expected results:
everything should work as before, connection upped, device connected, etc

Additional info:

Comment 1 Beniamino Galvani 2017-10-05 09:44:15 UTC
There are multiple causes of the failures:

 - with RHEL 7.4 kernel NM sometimes fails to delete infiniband
   partitions. It is not clear yet if this is a kernel bug. In NM logs
   there are messages like: "Failed to remove InfiniBand P_Key
   interface 'inf_ib0.8002': unspecified". This alone does not cause
   a test failure.

 - there is a bug in NM that disables all property notifications for a
   device when the situation above happens; as a consequence clients
   have a wrong view of the device state and don't detect the
   successful activation. This is fixed upstream in commit [1].

Currently I'm investigating the first point; anyway I think that the
NM commit should be backported to 7.4 z-stream.

[1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=24a7f88bc56b66745c1e6b9444df8a80125de059

Comment 8 errata-xmlrpc 2018-04-10 13:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778