Bug 1373137

Summary: NetworkManager brings down ethX configured using iBFT over software iSCSI
Product: Red Hat Enterprise Linux 6 Reporter: Nilesh Javali <nilesh.javali>
Component: NetworkManagerAssignee: Lubomir Rintel <lrintel>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.8CC: aloughla, andrew.vasquez, atragler, bgalvani, cdupuis, girisha.davanageri, GR-Linux-NIC-Dev, jkachuck, joseph.szczypek, karen.skweres, linda.knippers, lrintel, mleitner, nigel.croxon, nilesh.javali, rkhan, shyam.sundar, sukulkar, thaller, tom.vaden, trinh.dao, Yuval.Mintz
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-05 15:06:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1383344, 1425546, 1438054    
Attachments:
Description Flags
Different NetworkManager logs in single user mode and boot failure serial logs
none
NetworkManager logs
none
nmlogs
none
HPE screenshot
none
HPE messages
none
HPE nm
none
New_RHEL_NM_Logs.zip
none
logs after changing the macadrres to lowercase none

Description Nilesh Javali 2016-09-05 10:38:32 UTC
Created attachment 1197848 [details]
Different NetworkManager logs in single user mode and boot failure serial logs

Description of problem:
The RHEL 6.8 installation is successful using an iSCSI boot device over pure software iSCSI (no hardware assist or offload). But the OS fails to boot as the NetworkManager, upon switching the root, brings the ethX interface down causing iSCSI connection loss. 

Version-Release number of selected component (if applicable):

RHEL 6.8 GA

How reproducible:

Consistently

Steps to Reproduce:
1. Install RHEL 6.8 GA on iSCSI boot device over pure software iSCSI 
2. Installation is successful
3. Boot the installed OS and observe the iSCSI connection loss and OS fails to boot.

Actual results:
The OS fails to boot.

Expected results:
The OS should boot successfully

Additional info:
1. The OS boots in single user mode, where issuing command "service NetworkManager restart" causes network connectivity loss.
2. The NetworkManager seems to do some configuration change which causes an internal reload even though ifcfg-eth0 clearly states “NM_CONTROLLED=no”.

Comment 1 Yuval Mintz 2016-09-05 10:56:03 UTC
> 2. The NetworkManager seems to do some configuration change which causes an 
> internal reload even though ifcfg-eth0 clearly states “NM_CONTROLLED=no”.

bnx2x has several driver flows that would cause internal-reload [Changing MTU, disabling LRO, etc.]. Those need to be prevented in BFS scenarios, as they would cause filesystem to disconnect.
So just to focus the effort - the issue here is not why we're seeing the internal-reload of eth0, but rather why can't we prevent the network manager from claiming ownership over eth0.

Comment 4 Marvell Linux NIC Driver 2016-09-08 21:38:20 UTC
Any update on this?

Comment 5 Nilesh Javali 2016-09-09 11:03:46 UTC
Aniss, please update Priority to high. We are unable to edit it somehow.

Comment 6 Chad Dupuis (Cavium) 2016-09-09 12:41:12 UTC
*** Bug 1372411 has been marked as a duplicate of this bug. ***

Comment 7 Trinh Dao 2016-09-29 18:51:30 UTC
RH, any update on this?

Comment 8 sushil kulkarni 2016-10-03 13:33:40 UTC
The team has been tied up with the 7.3 release.. Please standby and we will provide feedback this week.

Comment 9 Beniamino Galvani 2016-10-05 12:37:17 UTC
Hi,

I installed RHEL 6.8 on a iSCSI root using iBFT, rebooted and didn't
experience any issue, so I guess the problem is hardware-dependent.

I see that in the /etc/sysconfig/network-scripts/ifcfg-eth0 file
contains:

 DEVICE="eth0"
 BOOTPROTO="ibft"
 NM_CONTROLLED="no"
 ...

With such configuration, the connection will be handled by the ibft NM
plugin, which will fetch interface parameters from the the firmware
table. The ibft plugin doesn't handle the NM_CONTROLLED key in ifcfg
files and thus every time NM starts it will reconfigure the interface.

In normal conditions this doesn't cause any problems, as NM only
disconnects the device and reconnects it immediately re-applying the
same parameters set by kernel/ramdisk.

I suspect that in your case the device doesn't come up again after NM
reconfigures it. To confirm this, it would be useful to have more
logs. Could you please attach the output of the following command:

 service NetworkManager stop; /usr/sbin/NetworkManager --no-daemon --log-level=DEBUG

and then the 'dmesg' output?


A possible workaround is to tell NM that eth0 should never be managed
by adding a section in /etc/NetworkManager/NetworkManager.conf containing:

[keyfile]
unmanaged-devices=mac:xx:xx:xx:xx:xx:xx

where xx:xx:xx:xx:xx:xx is the MAC address of eth0. After a reboot of
the system, NM should ignore altogether eth0.

Comment 10 Nilesh Javali 2016-10-07 05:32:59 UTC
We have asked our testing QA to collect the data. Will update once debug data is available.

Comment 11 Nilesh Javali 2016-10-19 12:55:00 UTC
We were able to reproduce the issue and boot in single user mode to collect the needed data. Unfortunately, the dmesg does not capture the output of "service NetworkManager stop; /usr/sbin/NetworkManager --no-daemon --log-level=DEBUG". Is there any other place where the logs get captured. I could see the logs on the screen but unable to capture them.

The workaround of adding section under /etc/NetworkManager/NetworkManager.conf also did not work. The system still fails to boot.

Comment 12 Beniamino Galvani 2016-10-25 07:30:15 UTC
(In reply to Nilesh Javali from comment #11)
> We were able to reproduce the issue and boot in single user mode to collect
> the needed data. Unfortunately, the dmesg does not capture the output of
> "service NetworkManager stop; /usr/sbin/NetworkManager --no-daemon
> --log-level=DEBUG". Is there any other place where the logs get captured. I
> could see the logs on the screen but unable to capture them.

You can redirect logs to a file in the following way:

# /usr/sbin/NetworkManager --no-daemon --log-level=DEBUG 2> /root/nm.log

or, if you prefer, add the following snippet to
/etc/NetworkManager/NetworkManager.conf:

[logging]
level=DEBUG

and then run:

# service NetworkManager restart

Logs will be saved to /var/log/messages. Thanks.

Comment 13 Nilesh Javali 2016-10-25 10:54:13 UTC
Created attachment 1213864 [details]
NetworkManager logs

Comment 14 Beniamino Galvani 2016-10-25 13:55:21 UTC
Unfortunately the new logs aren't useful as NM can't start in single mode due
to missing dependent services.

Can you please try to boot in normal (non-single) mode (which should
fail), then boot in single mode and attach the fragment of
/var/log/messages related to previous boot?

Also, you can do an interactive boot [1]: press ESC and when you see
services starting, press 'i' to enter interactive mode; at that point,
press 'n' when the system asks whether NetworkManager should be
started, or 'y' for all other services.

[1] https://access.redhat.com/solutions/25950

After the system has booted, start NM with:

 /usr/sbin/NetworkManager --no-daemon --log-level=DEBUG 2>&1 | tee /root/nm.log

to send the output to both console and a file. This should catch what
NM is doing and why network connectivity is disrupted. Thanks!

Comment 15 Trinh Dao 2016-11-21 20:45:42 UTC
RH, any new update on this bug? 
RHEL6.9 Alpha is releasing next month.

Comment 16 sushil kulkarni 2016-11-21 20:55:51 UTC
Hi Trinh,

Would it be possible to give us the requested information by following the steps provided in comment#14?

Thanks!
Sushil

Comment 17 Trinh Dao 2016-11-21 21:01:22 UTC
yes, I am asking our HPE engineer for info.

Comment 18 Trinh Dao 2016-11-28 16:30:04 UTC
Girisha, can you please answer comment 14?

Comment 19 sushil kulkarni 2016-11-28 16:48:34 UTC
Hi.. thanks for this.. We are coming very close to the end of the lifecycle of the release and will be helpful if you are able to get us this info. as soon as possible.

Thanks!
Sushil

Comment 20 Girisha Davanageri 2016-11-28 18:32:23 UTC
Created attachment 1225453 [details]
nmlogs

find the attached requested logs

Comment 21 Trinh Dao 2016-11-28 20:41:53 UTC
Created attachment 1225475 [details]
HPE screenshot

Comment 22 Trinh Dao 2016-11-28 20:42:23 UTC
Created attachment 1225476 [details]
HPE messages

Comment 23 Trinh Dao 2016-11-28 20:42:49 UTC
Created attachment 1225477 [details]
HPE nm

Comment 24 Trinh Dao 2016-11-28 20:43:55 UTC
Girisha, thank you for adding the files.

Comment 25 Beniamino Galvani 2016-11-29 13:13:36 UTC
The activation of the DHCP connection fails because NM can't create a
configuration file for dhclient, apparently because the root file
system is mounted read-only:

<warn> (eth0): error creating dhclient configuration: Failed to create file '/var/run/nm-dhclient-eth0.conf.ES0ERY': Read-only file system

I can't tell the reason why the filesystem is read-only, which
probably does not depend on NM; anyway it seems that the workaround
suggested in comment 9 was not in place or not effective.

Since iBFT provides configuration for both the interfaces eth0 and ibft0:

 NetworkManager[5189]:    ibft: read connection 'iBFT ibft0'
 NetworkManager[5189]:    ibft: read connection 'iBFT eth0'

please, ensure that the /etc/NetworkManager/NetworkManager.conf contains

 [keyfile]
 unmanaged-devices=mac:xx:xx:xx:xx:xx:xx;mac:yy:yy:yy:yy:yy:yy

where xx:xx:xx:xx:xx:xx and yy:yy:yy:yy:yy:yy are the MAC addresses of
eth0 and ibft0. If possible, please provide the content of
NetworkManager.conf, the output of 'ip link' and the logs with the
workaround in place. Thanks.

Comment 26 Trinh Dao 2016-11-29 17:47:45 UTC
Created attachment 1225942 [details]
New_RHEL_NM_Logs.zip

Comment 27 Beniamino Galvani 2016-11-29 22:55:19 UTC
 unmanaged-devices=mac:5C:B9:01:C5:50:C0;mac:5C:B9:01:C5:50:C4

Ok, this seems correct, with the exception that MAC addresses must be lower-case; I didn't notice in the NetworkManager.conf man page there was this constraint.

Could you please try again with:

 unmanaged-devices=mac:5c:b9:01:c5:50:c0;mac:5c:b9:01:c5:50:c4

This time it should work!

Comment 28 Trinh Dao 2016-12-09 17:58:46 UTC
Girisha,
Can you run test on comment 27?

Comment 29 Trinh Dao 2017-01-03 21:05:01 UTC
Girisha, I need answer for Comment 27.

Comment 30 Trinh Dao 2017-01-09 21:30:37 UTC
Girisha, update please?

Comment 31 Trinh Dao 2017-01-19 18:19:29 UTC
(In reply to Beniamino Galvani from comment #27)
>  unmanaged-devices=mac:5C:B9:01:C5:50:C0;mac:5C:B9:01:C5:50:C4
> 
> Ok, this seems correct, with the exception that MAC addresses must be
> lower-case; I didn't notice in the NetworkManager.conf man page there was
> this constraint.
> 
> Could you please try again with:
> 
>  unmanaged-devices=mac:5c:b9:01:c5:50:c0;mac:5c:b9:01:c5:50:c4
> 
> This time it should work!

Girisha comment: After changing the mac addresses from upper case to lower case the OS boots successfully. With the workaround the OS boots successfully.

Comment 32 Girisha Davanageri 2017-01-19 18:26:32 UTC
Created attachment 1242543 [details]
logs after changing the macadrres to lowercase

Please find the attached logs after changing the mac addresses to lowercase.

Comment 33 Trinh Dao 2017-02-01 17:32:35 UTC
RH, any new update on this bug?

Comment 34 Joseph Kachuck 2017-02-16 19:38:38 UTC
Hello,
Due to where we are in the RHEL 6.9 release. This will not make RHEL 6.9. This is now requested for RHEL 6.10.

Thank You
Joe Kachuck

Comment 36 Joseph Kachuck 2017-03-29 16:06:20 UTC
Hello,
This will not be fixed in RHEL 6.10. Please confirm if you would like the workaround in comment 27 in a kbase?

Thank You
Joe Kachuck

Comment 37 Joseph Kachuck 2017-04-05 15:06:17 UTC
Hello,
RHEL 6 has entered Phase 3. In phase 3 only Critical impact Security Advisories and selected Urgent Priority Bug Fix Advisories will be accepted.
https://access.redhat.com/support/policy/updates/errata

At current this BZ does not meet these requirements. I am closing this BZ as WONTFIX.

Please reopen if this fix is required for RHEL 6. If so please also provide a justification for this fix.

Thank You
Joe Kachuck