Bug 1378910

Summary: HPESD RHEL7.3-SN4- FCoE multipath BFS fails to boot after installation
Product: Red Hat Enterprise Linux 7 Reporter: RAVI <ravi.adabala>
Component: dracutAssignee: Lukáš Nykrýn <lnykryn>
Status: CLOSED ERRATA QA Contact: Release Test Team <release-test-team-automation>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: abeausol, andrew.vasquez, arun.patil, cdupuis, cleech, dinesh.surpur, dracut-maint-list, emilne, jkachuck, jstodola, karen.skweres, lnykryn, mknutson, nagaraj-sangappa.davanakatti, nilesh.bhoi, phinchman, ravi.adabala, revers, shyam.sundar, trinh.dao, vishnu.kumar, vivek.kumar, william.gens, xhe
Target Milestone: alpha   
Target Release: 7.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: dracut-033-520.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1482185 (view as bug list) Environment:
Last Closed: 2018-04-10 18:07:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1438583, 1445812, 1465137, 1482185, 1522983    
Attachments:
Description Flags
Boot logs
none
Installation screen
none
Python errors on console none

Description RAVI 2016-09-23 13:41:49 UTC
Created attachment 1204160 [details]
Boot logs

Steps to Reproduce:
-------------------------
1. Install and configure both the ports of Banjo (HP) adapter for FCoE BFS on RHEL7.3SN4.
2. Installed RHEL 7.3 OS on mapped LUN . (Multipath installation)
3. Once the installation is done, reboot the server.
4. Now the server fails to boot into OS.

Snippet of log:
----------------

dracut-initqueue[837]: fipvlan: fip_recv: error 88 Socket operation on non-socket[   33.137266] 
dracut-initqueue[837]: fipvlan: fip_recv: packet socket recv error[   33.143146] 
dracut-initqueue[837]: fipvlan: fip_recv: error 88 Socket operation on non-socket[   33.143294] 
dracut-initqueue[837]: fipvlan: fip_recv: packet socket recv error[   33.149145] 
dracut-initqueue[837]: fipvlan: fip_recv: error 88 Socket operation on non-socket[   33.149293]
dracut-initqueue[837]: fipvlan: fip_recv: packet socket recv error[   33.155171] 
dracut-initqueue[837]: fipvlan: fip_recv: error 88 Socket operation on non-socket[   33.155479] 
dracut-initqueue[837]: fipvlan: fip_recv: packet socket recv error[   33.161106] 
dracut-initqueue[837]: fipvlan: fip_recv: error 88 Socket operation on non-socket[   33.161294] 
dracut-initqueue[837]: fipvlan: fip_recv: packet socket recv error[   33.167173] 
dracut-initqueue[837]: fipvlan: fip_recv: error 88 Socket operation on non-socket[   33.167401] 
dracut-initqueue[837]: fipvlan: fip_recv: packet socket recv error[   33.173109] 

Additional Information:
-----------------------
NA

Frequency: Always
----------

Expected Results:
-----------------
After installation, should boot into OS

Setup Details:
-----------------
OS                 : RHEL 7.3 SN4
FCoE Driver Version: bnx2fc – 2.10.3
Adapter       : Banjo - HP (57980)
MFW: 7.13.75

Attachments:
-------------
Boot logs

Comment 1 Chad Dupuis (Cavium) 2016-09-23 13:45:47 UTC
This looks similar to another unresolved BZ for 7.1: https://bugzilla.redhat.com/show_bug.cgi?id=1129574.

Comment 3 Chad Dupuis (Cavium) 2016-09-30 15:33:13 UTC
Chris, any idea why fipvlan would be throwing this error?

Comment 4 Chad Dupuis (Cavium) 2017-06-13 13:02:40 UTC
We observed this again on RHEL 7.4 snap 2.  It's possible that this is caused by a small timing window where fipvlan is tried just before the link fully up which could cause the fip socket to not open thus the error spew.

Comment 5 RAVI 2017-07-03 09:30:08 UTC
This is still observed with RHEL 7.4 Snap 5.

Comment 6 Chad Dupuis (Cavium) 2017-07-20 17:45:01 UTC
Looking into this more, this occurs because we don't wait enough time in /usr/lib/dracut/modules.d/95fcoe/fcoe-up.sh.  Specifically this line:

elif [ "$netdriver" = "bnx2x" ]; then
    # If driver is bnx2x, do not use /sys/module/fcoe/parameters/create but fipvlan
    modprobe 8021q
    udevadm settle --timeout=30
    # Sleep for 3 s to allow dcb negotiation
    sleep 3 <-- *** This line ***
    fipvlan "$netif" -c -s
else

we need to increase this to 13 seconds as was done upstream: https://git.kernel.org/pub/scm/boot/dracut/dracut.git/commit/?id=3966a1e1ee0e3d27197258f446f54b683c415208

Comment 7 RAVI 2017-07-21 05:48:13 UTC
Additional information:
We are also observing this in the kdump kernel as well.

Comment 9 Joseph Kachuck 2017-09-12 17:50:53 UTC
Hello Chad,
Is this BZ able to be moved to POSTed state, or is this waiting on up stream?

Thank You
Joe Kachuck

Comment 10 Chad Dupuis (Cavium) 2017-09-12 19:27:01 UTC
(In reply to Joseph Kachuck from comment #9)
> Hello Chad,
> Is this BZ able to be moved to POSTed state, or is this waiting on up stream?
> 
> Thank You
> Joe Kachuck

Change was already upstreamed.

Comment 11 Trinh Dao 2017-10-24 17:11:56 UTC
any new update on this bug?

Comment 16 Trinh Dao 2018-01-09 17:03:28 UTC
JoeK, since bug is ON_QA, is the fix in RHEL7.5 alpha?

Comment 17 Jan Stodola 2018-01-09 17:07:48 UTC
Trinh, this should be fixed in dracut-033-520.el7, which is present in RHEL-7.5 Alpha.
Give it a try to confirm it's fixed for you, please.

Comment 18 Trinh Dao 2018-01-09 17:16:26 UTC
Thank you!
trinh

Comment 19 Trinh Dao 2018-01-19 15:09:01 UTC
Nagaraj D is verified with RHEL7.5 alpha and I will update the bug once I have the test result.

Comment 20 Nagaraj 2018-02-06 15:38:05 UTC
I tried to install RHEL 7.5 with FCOE NX2 cards. The installation gets hung at the package installation stage. Attached the screen shots of installation hung (FCOE-Installation1.PNG) and python errors in the console (FCOE-Installation2.PNG).

Comment 21 Nagaraj 2018-02-06 15:39:29 UTC
Created attachment 1392180 [details]
Installation screen

Comment 22 Nagaraj 2018-02-06 15:40:54 UTC
Created attachment 1392181 [details]
Python errors on console

Comment 23 Jan Stodola 2018-02-06 15:50:06 UTC
Nagaraj,
does it happen every time? Could you please attach logs from the installation? They are stored in /tmp during the installation.
Thank you.

Comment 24 Nagaraj 2018-02-12 09:27:47 UTC
I reinstalled RHEL 7.5 (Snapshot 1) twice again. I didn't see the issue.

Comment 25 Trinh Dao 2018-02-13 20:29:15 UTC
mark HPE verified, bug is closed on HPE side.

Comment 26 Jan Stodola 2018-02-14 09:48:54 UTC
Thanks for verifying the issue is fixed.

Moving to VERIFIED based on previous comments.

Comment 29 errata-xmlrpc 2018-04-10 18:07:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0964