Bug 249856

Summary: Qlogic Driver causes kernel errors
Product: [Fedora] Fedora Reporter: dhageman
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 8CC: andrew.vasquez, chris.brown, triage
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-09 07:11:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 175429    
Attachments:
Description Flags
Messages log with various attempts with different kernels none

Description dhageman 2007-07-27 14:19:21 UTC
Description of problem:

The Qlogic driver no longer has the firmware included.  Adding the firmware from
Qlogic's site causes the driver to panic to the point of not working.  

I have attached my messages file with various booting with different kernels
(including the one in fc8test tree).  

If the qla2xxx driver is included in the initrd image, then it will never find
the firmware.

After boot, you can remove and re-insert the driver ... it will find the
firmware and then panic.

If you don't include the driver in the initrd, it will find the firmware and
panic during boot.

Comment 1 dhageman 2007-07-27 14:19:21 UTC
Created attachment 160118 [details]
Messages log with various attempts with different kernels

Comment 2 Christopher Brown 2007-09-21 11:30:51 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 3 dhageman 2007-09-21 19:16:55 UTC
I just tried kernel 2.6.22.5-76.fc7 and the problem still exists.

Comment 4 Christopher Brown 2007-09-21 23:55:26 UTC
This from 2.6.23-rc7 changelog:

commit 8fef696b00b863c8c898293bd09be581b934849b
Author: Andrew Vasquez <andrew.vasquez>
Date:   Sun Aug 12 18:22:53 2007 -0700

    [SCSI] qla2xxx: Don't modify parity bits during ISP25XX restart.

Please could you test with the latest kernel from rawhide and see if this fixes
the issue for you.

Cheers
Chris

Comment 5 dhageman 2007-09-25 13:52:20 UTC
The results are still the same.


(In reply to comment #4)
> This from 2.6.23-rc7 changelog:
> 
> commit 8fef696b00b863c8c898293bd09be581b934849b
> Author: Andrew Vasquez <andrew.vasquez>
> Date:   Sun Aug 12 18:22:53 2007 -0700
> 
>     [SCSI] qla2xxx: Don't modify parity bits during ISP25XX restart.
> 
> Please could you test with the latest kernel from rawhide and see if this fixes
> the issue for you.
> 
> Cheers
> Chris



Comment 6 Andrew Vasquez 2007-09-25 17:11:03 UTC
The reboot at:

> Jul 27 08:55:16 test syslogd 1.4.2: restart.
> Jul 27 08:55:16 test kernel: klogd 1.4.2, log source = /proc/kmsg started.
> Jul 27 08:55:16 test kernel: Linux version 2.6.20-1.2962.fc6
(brewbuilder.redhat.com) (gcc version 4.1.1 2007010

Had a successful load of the driver:

> Jul 27 08:55:16 test kernel: QLogic Fibre Channel HBA Driver
> Jul 27 08:55:16 test kernel: ACPI: PCI Interrupt 0000:03:02.0[A] -> GSI 35
(level, low) -> IRQ 21
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Found an ISP2312, irq 21,
iobase 0xf881a000
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Configuring PCI space... 
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Configure NVRAM parameters...
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Verifying loaded RISC code...
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Allocated (412 KB) for
firmware dump...
> Jul 27 08:55:16 test kernel: scsi2 : qla2xxx
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: LOOP UP detected (2 Gbps).
> Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0:  
> Jul 27 08:55:16 test kernel:  QLogic Fibre Channel HBA Driver: 8.01.07-k4
> Jul 27 08:55:16 test kernel:   QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single
Channel
> Jul 27 08:55:16 test kernel:   ISP2312: PCI-X (133 MHz) @ 0000:03:02.0 hdma-,
host#=2, fw=3.03.20 IPX
> Jul 27 08:55:16 test kernel: scsi 2:0:0:0: Direct-Access     NEXSAN  
ATAboy(C0A84207) 5035 PQ: 1 ANSI: 4
> Jul 27 08:55:16 test kernel: scsi 2:0:0:9: Direct-Access     NEXSAN  
ATAboy(C0A84207) 5035 PQ: 0 ANSI: 4

which one of your test cases was this with?

> If the qla2xxx driver is included in the initrd image, then it will never find
> the firmware.
> 
> After boot, you can remove and re-insert the driver ... it will find the
> firmware and then panic.
> 
> If you don't include the driver in the initrd, it will find the firmware and
> panic during boot.

My guess is the last with the firmware properly located in /lib/firmware,
loaded via 'hotplug' and *not* panic'ng?

Working backward in the messages file, there's this snippet, which 
shows the driver loading but the firmware unavailable:

> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Found an ISP2312, irq 21,
iobase 0xf882c000
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Configuring PCI space...
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Configure NVRAM parameters...
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Verifying loaded RISC code...
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Firmware image unavailable.
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Firmware images can be
retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
> Jul 27 08:51:05 test kernel: scsi4 : qla2xxx
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: 
> Jul 27 08:51:05 test kernel:  QLogic Fibre Channel HBA Driver: 8.01.07-k5
> Jul 27 08:51:05 test kernel:   QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single
Channel
> Jul 27 08:51:05 test kernel:   ISP2312: PCI-X (133 MHz) @ 0000:03:02.0 hdma-,
host#=4, fw=0.00.00 (0)
> Jul 27 08:51:05 test kernel: scsi: waiting for bus probes to complete ...
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Parity error -- HCCR=74a0,
Dumping firmware!
> Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: No buffer available for dump!!!
> Jul 27 08:51:05 test kernel: kjournald starting.  Commit interval 5 seconds

Is this an initrd boot?  In any case, if so, then the initrd image isn't being
properly built with the firmware .bin files.  Has support been added for this in
FC7, we were working with some RH engineers in getting this ready for RHEL5.1,
but it was deferred to 5.2 (possibly).  Given the HBA never completed init-time,
I'd discard/differ the 'parity-error' during this load, as it's just noise...

This entry:

> Jul 27 08:35:21 test kernel: QLogic Fibre Channel HBA Driver
> Jul 27 08:35:21 test kernel: ACPI: PCI Interrupt 0000:03:02.0[A] -> GSI 35
(level, low) -> IRQ 21
> Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Found an ISP2312, irq 21,
iobase 0xf88be000
> Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Configuring PCI space...
> Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Configure NVRAM parameters...
> Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Verifying loaded RISC code...
> Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Allocated (412 KB) for
firmware dump...
> Jul 27 08:35:21 test kernel: scsi5 : qla2xxx
> Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: 
> Jul 27 08:35:21 test kernel:  QLogic Fibre Channel HBA Driver: 8.02.00-k2
> Jul 27 08:35:21 test kernel:   QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single
Channel
> Jul 27 08:35:21 test kernel:   ISP2312: PCI-X (133 MHz) @ 0000:03:02.0 hdma-,
host#=5, fw=3.03.20 IPX
> Jul 27 08:35:22 test kernel: qla2xxx 0000:03:02.0: LOOP UP detected (2 Gbps).
> Jul 27 08:35:22 test kernel: scsi 5:0:0:0: Attached scsi generic sg2 type -1
> Jul 27 08:35:22 test kernel: scsi 5:0:0:0: Direct-Access     NEXSAN  
ATAboy(C0A84207) 5035 PQ: 1 ANSI: 4

Shows the driver loading (my guess is post boot-time), but a WARN_ON()
hitting with during midlayer creation of the relevant SCSI devices:

> Jul 27 08:35:22 test kernel: kobject_add failed for 5:0:0:0 with -EEXIST,
don't try to register things with the same name in the same directory.
> Jul 27 08:35:22 test kernel:  [<c04063c8>] show_trace_log_lvl+0x1a/0x2f
> Jul 27 08:35:22 test kernel:  [<c0406e61>] show_trace+0x12/0x14
> Jul 27 08:35:22 test kernel:  [<c0406e79>] dump_stack+0x16/0x18
> Jul 27 08:35:22 test kernel:  [<c04fcbac>] kobject_shadow_add+0x164/0x192
> Jul 27 08:35:22 test kernel:  [<c04fcbe4>] kobject_add+0xa/0xc
> Jul 27 08:35:22 test kernel:  [<c0570e26>] device_add+0xa2/0x52a
> Jul 27 08:35:22 test kernel:  [<f8872e53>] scsi_sysfs_add_sdev+0x2d/0x1d3
[scsi_mod]
> Jul 27 08:35:22 test kernel:  [<f887107f>] scsi_probe_and_add_lun+0x97b/0xaac
[scsi_mod]
> Jul 27 08:35:22 test kernel:  [<f88716c3>] __scsi_scan_target+0xb5/0x5bb
[scsi_mod]
> Jul 27 08:35:22 test kernel:  [<f8872112>] scsi_scan_target+0x83/0x92 [scsi_mod]
> Jul 27 08:35:22 test kernel:  [<f8922a97>] fc_scsi_scan_rport+0x5a/0x76
[scsi_transport_fc]
> Jul 27 08:35:22 test kernel:  [<c043b1ff>] run_workqueue+0x7d/0x129
> Jul 27 08:35:22 test kernel:  [<c043bbbf>] worker_thread+0xbb/0xc8
> Jul 27 08:35:22 test kernel:  [<c043e3a8>] kthread+0x3b/0x63
> Jul 27 08:35:22 test kernel:  [<c0405e4b>] kernel_thread_helper+0x7/0x10
> Jul 27 08:35:22 test kernel:  =======================

This is/was a common problem that was hit sometime back (check google), I had though
most of the midlayer fixes were present that addressed this...

During your shutdown (some 7 minutes later), there's some badness occuring
not during SCSI handling, but during some 'other' device creation...  Not
sure about this:

> Jul 27 08:42:27 test shutdown[6881]: shutting down for system reboot
> Jul 27 08:42:28 test kernel: list_add corruption. prev->next should be next
(c0736364), but was f4ef41b4. (prev=f4ef41b4).
> Jul 27 08:42:28 test kernel: ------------[ cut here ]------------
> Jul 27 08:42:28 test kernel: kernel BUG at lib/list_debug.c:33!
> Jul 27 08:42:28 test kernel: invalid opcode: 0000 [#1]
> Jul 27 08:42:28 test kernel: SMP 
> Jul 27 08:42:28 test kernel: Modules linked in: qla2xxx scsi_transport_fc
autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 nf_conntrack_netbios_ns
nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink xt_tcpudp ipt_REJECT
iptable_filter ip_tables x_tables dm_mirror dm_multipath dm_mod video output sbs
dock battery ac parport_pc lp parport loop i2c_i801 i2c_core rtc_cmos serio_raw
tg3 button iTCO_wdt i3000_edac iTCO_vendor_support edac_core sr_mod cdrom sg
ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd
uhci_hcd
> Jul 27 08:42:28 test kernel: CPU:    0
> Jul 27 08:42:28 test kernel: EIP:    0060:[<c0502dc8>]    Not tainted VLI
> Jul 27 08:42:28 test kernel: EFLAGS: 00210296   (2.6.23-0.49.rc1.git3.fc8 #1)
> Jul 27 08:42:28 test kernel: EIP is at __list_add+0x4b/0x60
> Jul 27 08:42:28 test kernel: eax: 00000061   ebx: f56d95f4   ecx: c04260ec  
edx: f382cd50
> Jul 27 08:42:28 test kernel: esi: f4ef41b4   edi: f7f7e034   ebp: d1363dcc  
esp: d1363db4
> Jul 27 08:42:28 test kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> Jul 27 08:42:28 test kernel: Process init (pid: 6914, ti=d1363000
task=f382cd50 task.ti=d1363000)
> Jul 27 08:42:28 test kernel: Stack: c06db42b c0736364 f4ef41b4 f4ef41b4
f56d95d8 00000000 d1363e00 c04fcafe 
> Jul 27 08:42:28 test kernel:        ffffffff 00000000 c06cd721 f56d94f0
f56d95dc fffffffe f7f7e034 c04fcee2 
> Jul 27 08:42:28 test kernel:        f56d94f0 f56d94f0 f7f7e034 d1363e08
c04fcbe4 d1363e44 c0570e26 f56d95d8 
> Jul 27 08:42:28 test kernel: Call Trace:
> Jul 27 08:42:28 test kernel:  [<c04063c8>] show_trace_log_lvl+0x1a/0x2f
> Jul 27 08:42:28 test kernel:  [<c0406478>] show_stack_log_lvl+0x9b/0xa3
> Jul 27 08:42:28 test init: no more processes left in this runlevel
> Jul 27 08:42:28 test kernel:  [<c0406638>] show_registers+0x1b8/0x289
> Jul 27 08:42:28 test kernel:  [<c0406820>] die+0x117/0x24a
> Jul 27 08:42:28 test kernel:  [<c062cdf0>] do_trap+0x8a/0xa3
> Jul 27 08:42:28 test kernel:  [<c0406cb5>] do_invalid_op+0x88/0x92
> Jul 27 08:42:28 test kernel:  [<c062cbba>] error_code+0x72/0x78
> Jul 27 08:42:28 test kernel:  [<c04fcafe>] kobject_shadow_add+0xb6/0x192
> Jul 27 08:42:28 test kernel:  [<c04fcbe4>] kobject_add+0xa/0xc
> Jul 27 08:42:28 test kernel:  [<c0570e26>] device_add+0xa2/0x52a
> Jul 27 08:42:28 test kernel:  [<c05712c0>] device_register+0x12/0x15
> Jul 27 08:42:28 test kernel:  [<c057183c>] device_create+0x77/0x9b
> Jul 27 08:42:28 test kernel:  [<c0554887>] vcs_make_sysfs+0x37/0x72
> Jul 27 08:42:28 test kernel:  [<c0559a37>] con_open+0x72/0x80
> Jul 27 08:42:28 test kernel:  [<c054fbb5>] tty_open+0x174/0x2b9
> Jul 27 08:42:28 test kernel:  [<c048b8eb>] chrdev_open+0x103/0x15a
> Jul 27 08:42:28 test kernel:  [<c0487cd8>] __dentry_open+0xc2/0x178
> Jul 27 08:42:28 test kernel:  [<c0487e0f>] nameidata_to_filp+0x27/0x37
> Jul 27 08:42:28 test kernel:  [<c0487e57>] do_filp_open+0x38/0x40
> Jul 27 08:42:28 test kernel:  [<c0487ea6>] do_sys_open+0x47/0xcc
> Jul 27 08:42:28 test kernel:  [<c0487f63>] sys_open+0x1c/0x1e
> Jul 27 08:42:28 test kernel:  [<c0405196>] syscall_call+0x7/0xb
> Jul 27 08:42:28 test kernel:  =======================
> Jul 27 08:42:28 test kernel: Code: db b3 6d c0 e8 dd b5 f2 ff 0f 0b eb fe 8b
32 39 ce 74 1c 89 54 24 0c 89 74 24 08 89 4c 24 04 c7 04 24 2b b4 6d c0 e8 bb b5
f2 ff <0f> 0b eb fe 89 59 04 89 0b 89 43 04 89 18 83 c4 10 5b 5e 5d c3 
> Jul 27 08:42:28 test kernel: EIP: [<c0502dc8>] __list_add+0x4b/0x60 SS:ESP
0068:d1363db4

All the other previous instances show the driver unable to load
due to the firmware being unavailable, not present in the initrd and ready
during the request_firmware() request...

I am curious though, I'll look around, But i was sure fixes for these
EEXIST manifestations went in during 2.6.20ish:

http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/a9fdee2337511956/8372b8185a31eb50?lnk=st&q=EEXIST+kobject_add+scsi&rnum=5#8372b8185a31eb50

If you are still seeing them with .22/.23, you'll want to post a note to
linux-scsi.


Comment 7 dhageman 2007-09-25 18:13:13 UTC
(In reply to comment #6)

Any successful boots was(In reply to comment #6)
> The reboot at:
> 
> > Jul 27 08:55:16 test syslogd 1.4.2: restart.
> > Jul 27 08:55:16 test kernel: klogd 1.4.2, log source = /proc/kmsg started.
> > Jul 27 08:55:16 test kernel: Linux version 2.6.20-1.2962.fc6
> (brewbuilder.redhat.com) (gcc version 4.1.1 2007010
> 
> Had a successful load of the driver:

Please take note of the kernel version (2.6.20-1.2962.fc6) of the successful
boot.  I am using a Fedora Core 6 kernel because they actually work.  


Comment 8 dhageman 2007-09-25 18:25:44 UTC
(In reply to comment #6)

Andrew,

I missed some of your additional comments inline.  It has been awhile since I
first generated this ticket, but I think your individual analysis of the
situation each time was correct in regards to booting/loading module after the
fact, etc.



Comment 9 Christopher Brown 2008-01-10 17:16:35 UTC
No update in a while - what is the latest on this - any success with later kernels?

Comment 10 dhageman 2008-01-10 22:31:22 UTC
(In reply to comment #9)
> No update in a while - what is the latest on this - any success with later
kernels?

No change.

Comment 11 Christopher Brown 2008-01-11 00:37:43 UTC
Added SCSI blocker bug. As Andrew mentioned, it may be worth posting to
linux-scsi. There were a few commits to qla2xxx for 2.6.24 but none that look
like they might resolve the issue for you. You might want to consider grabbing a
kernel from rawhide to test if you are able or waiting to see what 2.6.24 final
brings...

Comment 12 dhageman 2008-02-18 16:33:08 UTC
(In reply to comment #11)
> Added SCSI blocker bug. As Andrew mentioned, it may be worth posting to
> linux-scsi. There were a few commits to qla2xxx for 2.6.24 but none that look
> like they might resolve the issue for you. You might want to consider grabbing a
> kernel from rawhide to test if you are able or waiting to see what 2.6.24 final
> brings...

I just tried the latest kernel and still no change.

Comment 13 Bug Zapper 2008-05-14 13:42:35 UTC
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 14 Bug Zapper 2008-11-26 07:36:26 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Bug Zapper 2009-01-09 07:11:05 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.