Bug 562871 - [Emulex 5.5 bug] Update lpfc driver to 8.2.0.63 FC/FCoE [rhel-5.4.z]
Summary: [Emulex 5.5 bug] Update lpfc driver to 8.2.0.63 FC/FCoE [rhel-5.4.z]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: distribution
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: RHEL Program Management
QA Contact: Ondrej Hudlicky
URL:
Whiteboard:
Depends On: 549763
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-02-08 15:55 UTC by RHEL Program Management
Modified: 2013-01-11 02:45 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-18 07:36:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
x86 fcoe storage results (944.41 KB, application/x-rpm)
2010-02-13 22:21 UTC, Gregg Shick
no flags Details
x64 fcoe storage results (985.99 KB, application/x-rpm)
2010-02-13 22:21 UTC, Gregg Shick
no flags Details
Final DUPs Logs (67.01 KB, application/x-zip-compressed)
2010-03-16 19:18 UTC, laurie barry
no flags Details
final dup x86 lpfc logs (3.55 MB, application/octet-stream)
2010-03-16 20:45 UTC, Gregg Shick
no flags Details
final dup x64 lpfc logs (1.74 MB, application/octet-stream)
2010-03-16 20:46 UTC, Gregg Shick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2010:0156 0 normal SHIPPED_LIVE new packages: kmod-lpfc-rhel5u4-8.2.0.63-1.1 2010-05-20 09:09:16 UTC

Description RHEL Program Management 2010-02-08 15:55:14 UTC
This bug has been copied from bug #549763 and has been proposed
to be backported to 5.4 z-stream (EUS).

Comment 4 Gregg Shick 2010-02-13 22:21:01 UTC
Created attachment 394164 [details]
x86 fcoe storage results

Comment 5 Gregg Shick 2010-02-13 22:21:24 UTC
Created attachment 394165 [details]
x64 fcoe storage results

Comment 6 Zhang Kexin 2010-02-21 08:01:28 UTC
driver is not preserved across kernel updates. 
I tested two ways:

First method:

A) install rhel5.4 GA on cisco-ca-blade1.rhts.eng.bos.redhat.com, 
B) install DUP driver by following steps:

[root@cisco-ca-blade1 ~]# wget http://people.redhat.com/jolsa/dup/be2net-lpfc/dd-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.iso.gz
[root@cisco-ca-blade1 ~]# gzip -d dd-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.iso.gz 
[root@cisco-ca-blade1 ~]# mkdir mnt
[root@cisco-ca-blade1 ~]# mount -o loop ./dd-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.iso ./mnt/
[root@cisco-ca-blade1 ~]# find ./mnt/ -name *.rpm
./mnt/rpms/2.6.18-164.el5/x86_64/kmod-lpfc-xen-rhel5u4-8.2.0.63-1.0el5.x86_64.rpm
./mnt/rpms/2.6.18-164.el5/x86_64/kmod-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.rpm
[root@cisco-ca-blade1 ~]# rpm -ivh mnt/rpms/2.6.18-164.el5/x86_64/kmod-lpfc-rhel5u4-8.2.0.63-1.0el5.x86_64.rpm

C) verify DUP driver is installed, then install rhel5.4.z kernel 2.6.18-164.11.1.el5

D) reboot to rhel5.4.z kernel, lpfc driver is still old driver(ie. 0:8.2.0.48.2p) , not DUP driver.
 But if change the /etc/depmod.d/depmod.conf.dist, 
from 
"search updates extra built-in weak-updates"
to "
search updates extra weak-updates built-in"

then removie the rpm then install it again, then DUP is installed.

Second method:

install rhel5.4GA, then install rhel5.4.z, then install DUP driver. reboot to rhel5.4.z kernel, DUP driver is not installed.

Is this a problem?

Comment 8 Vaios Papadimitriou 2010-02-23 16:35:44 UTC
Your description matches my understanding of how the KMOD driver binary RPMs work.

A driver binary RPM is built for a specific kernel rev. The DUDs you are using contain the LPFC 8.2.0.63 driver binary RPMs built for the RHEL5.4 GA kernel, 2.6.18-164.el5.

When you install one of these KMOD RPMs, the RPM install process puts the driver in “/lib/modules/<kernel_version>/extra/”, where <kernel_version> is the specific kernel version it was built for (2.6.18-164.el5 in this case).  

The module-init-tools also creates soft links from “/lib/modules/<other_version>/weak-updates/lpfc.ko” to the real lpfc.ko (under <version>/extra).  These soft links are created for any kernel that has a compatible kABI and can therefore load the lpfc.ko file (rhel5.4.z kernel
2.6.18-164.11.1.el5 in this case).

The loading precedence of various drivers in a given RHEL5 kernel's /lib/modules directory is controlled by “/etc/depmod.d/depmod.conf.dist” which is part of the module-init-tools RPM.

As you mentioned the depmod.conf.dist file contains a few comments and one real line:
    search updates extra built-in weak-updates

So, according to this default load precedence, when you load the rhel5.4.z kernel 2.6.18-164.11.1.el5, because the updated 8.2.0.63 lpfc.ko driver was installed in "weak_updates", the "build-in" or in-box LPFC driver takes precedence (from /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/scsi/lpfc/lpfc.ko).

The way to work around this process in the updated rhel5.4.z kernel 2.6.18-164.11.1.el5, as you correctly stated, is to modify the search line in “/etc/depmod.d/depmod.conf.dist” file.

Is this a problem? I don't know, this appears to be the way the KMOD built drivers work. The problem is, as you found out, if you update to a newer kernel (i.e. errata kernel), and you want a previously installed driver binary RPM to load by default on reboot, you'll need to change the search/load precedence. Either that or rebuild the driver binary RPM for the new/updated kernel rev.

This appears to be a weakness though of the KMOD process, as the general advantage of when you build a KMOD driver binary RPM for a RHEL5 kernel, that automatically this driver binary RPM will be supported/loaded on all subsequent kernel updates, is not transparent on kernel updates and manual intervention is needed.

Comment 9 Andrius Benokraitis 2010-02-23 18:35:22 UTC
I thought there was an "override" option in the depmod file that would help address this...

Comment 12 Zhang Kexin 2010-02-26 05:57:02 UTC
Hi Gregg,

When new lpfc is loaded, console keeps printing following messages:

Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0310 Mailbox command x5 timeout Data: x0 x700 xffff81036a159200
Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0345 Resetting board due to mailbox timeout
Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:(0):2530 Mailbox command x23 cannot issue Data: xd00 x2

Is this a problem?

thanks.

Comment 13 Rob Evers 2010-02-26 15:43:40 UTC
(In reply to comment #12)
> Hi Gregg,
> 
> When new lpfc is loaded, console keeps printing following messages:
> 
> Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0310 Mailbox
> command x5 timeout Data: x0 x700 xffff81036a159200
> Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:0345 Resetting
> board due to mailbox timeout
> Feb 26 00:56:35 cisco-ca-blade1 kernel: lpfc 0000:04:00.1: 1:(0):2530 Mailbox
> command x23 cannot issue Data: xd00 x2
> 
> Is this a problem?
> 
> thanks.    

Vaios,

Can you comment on this?

Thanks, Rob

Comment 14 Alex He 2010-03-11 07:24:08 UTC
 Vaios,  Gregg,
  
   can either of u guys give some comments ?

Alex He

Comment 15 Vaios Papadimitriou 2010-03-11 15:26:52 UTC
What this trace snippet tells us is that init_link mailbox command timed out, which triggered mailbox command timeout handler reset the HBA, and then the following unreg_did mailbox command got rejected.
Without the entire trace (/var/log/messages file) it's hard to tell what happened and what caused this.

Could you please attach the /var/log/messages file?

Also, when you say "When new lpfc is loaded...", what is the version of the new LPFC driver? How are you loading this "new lpfc driver", did you modify the search line in “/etc/depmod.d/depmod.conf.dist” file ?

By the way, what is the HBA used in this configuration?

-Vaios-

Comment 19 Zhang Kexin 2010-03-12 10:28:39 UTC
(In reply to comment #15)
> What this trace snippet tells us is that init_link mailbox command timed out,
> which triggered mailbox command timeout handler reset the HBA, and then the
> following unreg_did mailbox command got rejected.
> Without the entire trace (/var/log/messages file) it's hard to tell what
> happened and what caused this.
> 
> Could you please attach the /var/log/messages file?
> 
> Also, when you say "When new lpfc is loaded...", what is the version of the new
> LPFC driver?
8.2.0.63

> How are you loading this "new lpfc driver", did you modify the
> search line in “/etc/depmod.d/depmod.conf.dist” file ?

I modified the file so that the new driver is loaded, I loaded it by "modprobe lpfc"
 
> By the way, what is the HBA used in this configuration?
Sorry, I did not look the HBA. and the machine cisco-ca-blade1.rhts.eng.bos.redhat.com printed out the info is not available, since network and console does not work now. have asked lab admin to have a look at it. will update you the info as soon as the machine is ok.

Comment 20 laurie barry 2010-03-16 19:18:17 UTC
Created attachment 400540 [details]
Final DUPs Logs

Output for x86 and x86_64 systems with DUP installed and be2net/lpfc/rpm/etc. cmds executed.

Comment 21 Gregg Shick 2010-03-16 20:45:49 UTC
Created attachment 400567 [details]
final dup x86 lpfc logs

Comment 22 Gregg Shick 2010-03-16 20:46:26 UTC
Created attachment 400568 [details]
final dup x64 lpfc logs

Comment 25 errata-xmlrpc 2010-03-18 07:36:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2010-0156.html


Note You need to log in before you can comment on or make changes to this bug.