Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 562871

Summary: [Emulex 5.5 bug] Update lpfc driver to 8.2.0.63 FC/FCoE [rhel-5.4.z]
Product: Red Hat Enterprise Linux 5 Reporter: RHEL Program Management <pm-rhel>
Component: distributionAssignee: RHEL Program Management <pm-rhel>
Status: CLOSED ERRATA QA Contact: Ondrej Hudlicky <ohudlick>
Severity: high Docs Contact:
Priority: high    
Version: 5.5CC: ahe, andrew.patterson, andriusb, bzeranski, coughlan, dhoward, jcm, kzhang, laurie.barry, lwang, mgahagan, plyons, pm-eus, revers, rlandry, vaios.papadimitriou
Target Milestone: rcKeywords: OtherQA, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-18 07:36:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 549763    
Bug Blocks:    
Attachments:
Description Flags
x86 fcoe storage results
none
x64 fcoe storage results
none
Final DUPs Logs
none
final dup x86 lpfc logs
none
final dup x64 lpfc logs none

Description RHEL Program Management 2010-02-08 15:55:14 UTC
This bug has been copied from bug #549763 and has been proposed
to be backported to 5.4 z-stream (EUS).

Comment 4 Gregg Shick 2010-02-13 22:21:01 UTC
Created attachment 394164 [details]
x86 fcoe storage results

Comment 5 Gregg Shick 2010-02-13 22:21:24 UTC
Created attachment 394165 [details]
x64 fcoe storage results

Comment 6 Zhang Kexin 2010-02-21 08:01:28 UTC
driver is not preserved across kernel updates. 
I tested two ways:

First method:

A) install rhel5.4 GA on cisco-ca-blade1.rhts.eng.bos.redhat.com, 
B) install DUP driver by following steps:

[root@cisco-ca-blade1 ~]# wget http://people.redhat.com/jolsa/dup/be2net-lpfc/dd-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.iso.gz
[root@cisco-ca-blade1 ~]# gzip -d dd-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.iso.gz 
[root@cisco-ca-blade1 ~]# mkdir mnt
[root@cisco-ca-blade1 ~]# mount -o loop ./dd-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.iso ./mnt/
[root@cisco-ca-blade1 ~]# find ./mnt/ -name *.rpm
./mnt/rpms/2.6.18-164.el5/x86_64/kmod-lpfc-xen-rhel5u4-8.2.0.63-1.0el5.x86_64.rpm
./mnt/rpms/2.6.18-164.el5/x86_64/kmod-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.rpm
[root@cisco-ca-blade1 ~]# rpm -ivh mnt/rpms/2.6.18-164.el5/x86_64/kmod-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.rpm

C) verify DUP driver is installed, then install rhel5.4.z kernel 2.6.18-164.11.1.el5

D) reboot to rhel5.4.z kernel, lpfc driver is still old driver(ie. 0:8.2.0.48.2p) , not DUP driver.
 But if change the /etc/depmod.d/depmod.conf.dist, 
from 
"search updates extra built-in weak-updates"
to "
search updates extra weak-updates built-in"

then removie the rpm then install it again, then DUP is installed.

Second method:

install rhel5.4GA, then install rhel5.4.z, then install DUP driver. reboot to rhel5.4.z kernel, DUP driver is not installed.

Is this a problem?

Comment 8 Vaios Papadimitriou 2010-02-23 16:35:44 UTC
Your description matches my understanding of how the KMOD driver binary RPMs work.

A driver binary RPM is built for a specific kernel rev. The DUDs you are using contain the LPFC 8.2.0.63 driver binary RPMs built for the RHEL5.4 GA kernel, 2.6.18-164.el5.

When you install one of these KMOD RPMs, the RPM install process puts the driver in “/lib/modules/<kernel_version>/extra/”, where <kernel_version> is the specific kernel version it was built for (2.6.18-164.el5 in this case).  

The module-init-tools also creates soft links from “/lib/modules/<other_version>/weak-updates/lpfc.ko” to the real lpfc.ko (under <version>/extra).  These soft links are created for any kernel that has a compatible kABI and can therefore load the lpfc.ko file (rhel5.4.z kernel
2.6.18-164.11.1.el5 in this case).

The loading precedence of various drivers in a given RHEL5 kernel's /lib/modules directory is controlled by “/etc/depmod.d/depmod.conf.dist” which is part of the module-init-tools RPM.

As you mentioned the depmod.conf.dist file contains a few comments and one real line:
    search updates extra built-in weak-updates

So, according to this default load precedence, when you load the rhel5.4.z kernel 2.6.18-164.11.1.el5, because the updated 8.2.0.63 lpfc.ko driver was installed in "weak_updates", the "build-in" or in-box LPFC driver takes precedence (from /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/scsi/lpfc/lpfc.ko).

The way to work around this process in the updated rhel5.4.z kernel 2.6.18-164.11.1.el5, as you correctly stated, is to modify the search line in “/etc/depmod.d/depmod.conf.dist” file.

Is this a problem? I don't know, this appears to be the way the KMOD built drivers work. The problem is, as you found out, if you update to a newer kernel (i.e. errata kernel), and you want a previously installed driver binary RPM to load by default on reboot, you'll need to change the search/load precedence. Either that or rebuild the driver binary RPM for the new/updated kernel rev.

This appears to be a weakness though of the KMOD process, as the general advantage of when you build a KMOD driver binary RPM for a RHEL5 kernel, that automatically this driver binary RPM will be supported/loaded on all subsequent kernel updates, is not transparent on kernel updates and manual intervention is needed.

Comment 9 Andrius Benokraitis 2010-02-23 18:35:22 UTC
I thought there was an "override" option in the depmod file that would help address this...

Comment 12 Zhang Kexin 2010-02-26 05:57:02 UTC
Hi Gregg,

When new lpfc is loaded, console keeps printing following messages:

Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0310 Mailbox command x5 timeout Data: x0 x700 xffff81036a159200
Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0345 Resetting board due to mailbox timeout
Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:(0):2530 Mailbox command x23 cannot issue Data: xd00 x2

Is this a problem?

thanks.

Comment 13 Rob Evers 2010-02-26 15:43:40 UTC
(In reply to comment #12)
> Hi Gregg,
> 
> When new lpfc is loaded, console keeps printing following messages:
> 
> Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0310 Mailbox
> command x5 timeout Data: x0 x700 xffff81036a159200
> Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0345 Resetting
> board due to mailbox timeout
> Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:(0):2530 Mailbox
> command x23 cannot issue Data: xd00 x2
> 
> Is this a problem?
> 
> thanks.    

Vaios,

Can you comment on this?

Thanks, Rob

Comment 14 Alex He 2010-03-11 07:24:08 UTC
 Vaios,  Gregg,
  
   can either of u guys give some comments ?

Alex He

Comment 15 Vaios Papadimitriou 2010-03-11 15:26:52 UTC
What this trace snippet tells us is that init_link mailbox command timed out, which triggered mailbox command timeout handler reset the HBA, and then the following unreg_did mailbox command got rejected.
Without the entire trace (/var/log/messages file) it's hard to tell what happened and what caused this.

Could you please attach the /var/log/messages file?

Also, when you say "When new lpfc is loaded...", what is the version of the new LPFC driver? How are you loading this "new lpfc driver", did you modify the search line in “/etc/depmod.d/depmod.conf.dist” file ?

By the way, what is the HBA used in this configuration?

-Vaios-

Comment 19 Zhang Kexin 2010-03-12 10:28:39 UTC
(In reply to comment #15)
> What this trace snippet tells us is that init_link mailbox command timed out,
> which triggered mailbox command timeout handler reset the HBA, and then the
> following unreg_did mailbox command got rejected.
> Without the entire trace (/var/log/messages file) it's hard to tell what
> happened and what caused this.
> 
> Could you please attach the /var/log/messages file?
> 
> Also, when you say "When new lpfc is loaded...", what is the version of the new
> LPFC driver?
8.2.0.63

> How are you loading this "new lpfc driver", did you modify the
> search line in “/etc/depmod.d/depmod.conf.dist” file ?

I modified the file so that the new driver is loaded, I loaded it by "modprobe lpfc"
 
> By the way, what is the HBA used in this configuration?
Sorry, I did not look the HBA. and the machine cisco-ca-blade1.rhts.eng.bos.redhat.com printed out the info is not available, since network and console does not work now. have asked lab admin to have a look at it. will update you the info as soon as the machine is ok.

Comment 20 laurie barry 2010-03-16 19:18:17 UTC
Created attachment 400540 [details]
Final DUPs Logs

Output for x86 and x86_64 systems with DUP installed and be2net/lpfc/rpm/etc. cmds executed.

Comment 21 Gregg Shick 2010-03-16 20:45:49 UTC
Created attachment 400567 [details]
final dup x86 lpfc logs

Comment 22 Gregg Shick 2010-03-16 20:46:26 UTC
Created attachment 400568 [details]
final dup x64 lpfc logs

Comment 25 errata-xmlrpc 2010-03-18 07:36:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2010-0156.html