Bug 249856
Summary: | Qlogic Driver causes kernel errors | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | dhageman | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 8 | CC: | andrew.vasquez, chris.brown, triage | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-01-09 07:11:05 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 175429 | ||||||
Attachments: |
|
Description
dhageman
2007-07-27 14:19:21 UTC
Created attachment 160118 [details]
Messages log with various attempts with different kernels
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Cheers Chris I just tried kernel 2.6.22.5-76.fc7 and the problem still exists. This from 2.6.23-rc7 changelog: commit 8fef696b00b863c8c898293bd09be581b934849b Author: Andrew Vasquez <andrew.vasquez> Date: Sun Aug 12 18:22:53 2007 -0700 [SCSI] qla2xxx: Don't modify parity bits during ISP25XX restart. Please could you test with the latest kernel from rawhide and see if this fixes the issue for you. Cheers Chris The results are still the same. (In reply to comment #4) > This from 2.6.23-rc7 changelog: > > commit 8fef696b00b863c8c898293bd09be581b934849b > Author: Andrew Vasquez <andrew.vasquez> > Date: Sun Aug 12 18:22:53 2007 -0700 > > [SCSI] qla2xxx: Don't modify parity bits during ISP25XX restart. > > Please could you test with the latest kernel from rawhide and see if this fixes > the issue for you. > > Cheers > Chris The reboot at: > Jul 27 08:55:16 test syslogd 1.4.2: restart. > Jul 27 08:55:16 test kernel: klogd 1.4.2, log source = /proc/kmsg started. > Jul 27 08:55:16 test kernel: Linux version 2.6.20-1.2962.fc6 (brewbuilder.redhat.com) (gcc version 4.1.1 2007010 Had a successful load of the driver: > Jul 27 08:55:16 test kernel: QLogic Fibre Channel HBA Driver > Jul 27 08:55:16 test kernel: ACPI: PCI Interrupt 0000:03:02.0[A] -> GSI 35 (level, low) -> IRQ 21 > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Found an ISP2312, irq 21, iobase 0xf881a000 > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Configuring PCI space... > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Configure NVRAM parameters... > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Verifying loaded RISC code... > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: Allocated (412 KB) for firmware dump... > Jul 27 08:55:16 test kernel: scsi2 : qla2xxx > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: LOOP UP detected (2 Gbps). > Jul 27 08:55:16 test kernel: qla2xxx 0000:03:02.0: > Jul 27 08:55:16 test kernel: QLogic Fibre Channel HBA Driver: 8.01.07-k4 > Jul 27 08:55:16 test kernel: QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel > Jul 27 08:55:16 test kernel: ISP2312: PCI-X (133 MHz) @ 0000:03:02.0 hdma-, host#=2, fw=3.03.20 IPX > Jul 27 08:55:16 test kernel: scsi 2:0:0:0: Direct-Access NEXSAN ATAboy(C0A84207) 5035 PQ: 1 ANSI: 4 > Jul 27 08:55:16 test kernel: scsi 2:0:0:9: Direct-Access NEXSAN ATAboy(C0A84207) 5035 PQ: 0 ANSI: 4 which one of your test cases was this with? > If the qla2xxx driver is included in the initrd image, then it will never find > the firmware. > > After boot, you can remove and re-insert the driver ... it will find the > firmware and then panic. > > If you don't include the driver in the initrd, it will find the firmware and > panic during boot. My guess is the last with the firmware properly located in /lib/firmware, loaded via 'hotplug' and *not* panic'ng? Working backward in the messages file, there's this snippet, which shows the driver loading but the firmware unavailable: > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Found an ISP2312, irq 21, iobase 0xf882c000 > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Configuring PCI space... > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Configure NVRAM parameters... > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Verifying loaded RISC code... > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Firmware image unavailable. > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/. > Jul 27 08:51:05 test kernel: scsi4 : qla2xxx > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: > Jul 27 08:51:05 test kernel: QLogic Fibre Channel HBA Driver: 8.01.07-k5 > Jul 27 08:51:05 test kernel: QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel > Jul 27 08:51:05 test kernel: ISP2312: PCI-X (133 MHz) @ 0000:03:02.0 hdma-, host#=4, fw=0.00.00 (0) > Jul 27 08:51:05 test kernel: scsi: waiting for bus probes to complete ... > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: Parity error -- HCCR=74a0, Dumping firmware! > Jul 27 08:51:05 test kernel: qla2xxx 0000:03:02.0: No buffer available for dump!!! > Jul 27 08:51:05 test kernel: kjournald starting. Commit interval 5 seconds Is this an initrd boot? In any case, if so, then the initrd image isn't being properly built with the firmware .bin files. Has support been added for this in FC7, we were working with some RH engineers in getting this ready for RHEL5.1, but it was deferred to 5.2 (possibly). Given the HBA never completed init-time, I'd discard/differ the 'parity-error' during this load, as it's just noise... This entry: > Jul 27 08:35:21 test kernel: QLogic Fibre Channel HBA Driver > Jul 27 08:35:21 test kernel: ACPI: PCI Interrupt 0000:03:02.0[A] -> GSI 35 (level, low) -> IRQ 21 > Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Found an ISP2312, irq 21, iobase 0xf88be000 > Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Configuring PCI space... > Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Configure NVRAM parameters... > Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Verifying loaded RISC code... > Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: Allocated (412 KB) for firmware dump... > Jul 27 08:35:21 test kernel: scsi5 : qla2xxx > Jul 27 08:35:21 test kernel: qla2xxx 0000:03:02.0: > Jul 27 08:35:21 test kernel: QLogic Fibre Channel HBA Driver: 8.02.00-k2 > Jul 27 08:35:21 test kernel: QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel > Jul 27 08:35:21 test kernel: ISP2312: PCI-X (133 MHz) @ 0000:03:02.0 hdma-, host#=5, fw=3.03.20 IPX > Jul 27 08:35:22 test kernel: qla2xxx 0000:03:02.0: LOOP UP detected (2 Gbps). > Jul 27 08:35:22 test kernel: scsi 5:0:0:0: Attached scsi generic sg2 type -1 > Jul 27 08:35:22 test kernel: scsi 5:0:0:0: Direct-Access NEXSAN ATAboy(C0A84207) 5035 PQ: 1 ANSI: 4 Shows the driver loading (my guess is post boot-time), but a WARN_ON() hitting with during midlayer creation of the relevant SCSI devices: > Jul 27 08:35:22 test kernel: kobject_add failed for 5:0:0:0 with -EEXIST, don't try to register things with the same name in the same directory. > Jul 27 08:35:22 test kernel: [<c04063c8>] show_trace_log_lvl+0x1a/0x2f > Jul 27 08:35:22 test kernel: [<c0406e61>] show_trace+0x12/0x14 > Jul 27 08:35:22 test kernel: [<c0406e79>] dump_stack+0x16/0x18 > Jul 27 08:35:22 test kernel: [<c04fcbac>] kobject_shadow_add+0x164/0x192 > Jul 27 08:35:22 test kernel: [<c04fcbe4>] kobject_add+0xa/0xc > Jul 27 08:35:22 test kernel: [<c0570e26>] device_add+0xa2/0x52a > Jul 27 08:35:22 test kernel: [<f8872e53>] scsi_sysfs_add_sdev+0x2d/0x1d3 [scsi_mod] > Jul 27 08:35:22 test kernel: [<f887107f>] scsi_probe_and_add_lun+0x97b/0xaac [scsi_mod] > Jul 27 08:35:22 test kernel: [<f88716c3>] __scsi_scan_target+0xb5/0x5bb [scsi_mod] > Jul 27 08:35:22 test kernel: [<f8872112>] scsi_scan_target+0x83/0x92 [scsi_mod] > Jul 27 08:35:22 test kernel: [<f8922a97>] fc_scsi_scan_rport+0x5a/0x76 [scsi_transport_fc] > Jul 27 08:35:22 test kernel: [<c043b1ff>] run_workqueue+0x7d/0x129 > Jul 27 08:35:22 test kernel: [<c043bbbf>] worker_thread+0xbb/0xc8 > Jul 27 08:35:22 test kernel: [<c043e3a8>] kthread+0x3b/0x63 > Jul 27 08:35:22 test kernel: [<c0405e4b>] kernel_thread_helper+0x7/0x10 > Jul 27 08:35:22 test kernel: ======================= This is/was a common problem that was hit sometime back (check google), I had though most of the midlayer fixes were present that addressed this... During your shutdown (some 7 minutes later), there's some badness occuring not during SCSI handling, but during some 'other' device creation... Not sure about this: > Jul 27 08:42:27 test shutdown[6881]: shutting down for system reboot > Jul 27 08:42:28 test kernel: list_add corruption. prev->next should be next (c0736364), but was f4ef41b4. (prev=f4ef41b4). > Jul 27 08:42:28 test kernel: ------------[ cut here ]------------ > Jul 27 08:42:28 test kernel: kernel BUG at lib/list_debug.c:33! > Jul 27 08:42:28 test kernel: invalid opcode: 0000 [#1] > Jul 27 08:42:28 test kernel: SMP > Jul 27 08:42:28 test kernel: Modules linked in: qla2xxx scsi_transport_fc autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 nf_conntrack_netbios_ns nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_mirror dm_multipath dm_mod video output sbs dock battery ac parport_pc lp parport loop i2c_i801 i2c_core rtc_cmos serio_raw tg3 button iTCO_wdt i3000_edac iTCO_vendor_support edac_core sr_mod cdrom sg ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd > Jul 27 08:42:28 test kernel: CPU: 0 > Jul 27 08:42:28 test kernel: EIP: 0060:[<c0502dc8>] Not tainted VLI > Jul 27 08:42:28 test kernel: EFLAGS: 00210296 (2.6.23-0.49.rc1.git3.fc8 #1) > Jul 27 08:42:28 test kernel: EIP is at __list_add+0x4b/0x60 > Jul 27 08:42:28 test kernel: eax: 00000061 ebx: f56d95f4 ecx: c04260ec edx: f382cd50 > Jul 27 08:42:28 test kernel: esi: f4ef41b4 edi: f7f7e034 ebp: d1363dcc esp: d1363db4 > Jul 27 08:42:28 test kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > Jul 27 08:42:28 test kernel: Process init (pid: 6914, ti=d1363000 task=f382cd50 task.ti=d1363000) > Jul 27 08:42:28 test kernel: Stack: c06db42b c0736364 f4ef41b4 f4ef41b4 f56d95d8 00000000 d1363e00 c04fcafe > Jul 27 08:42:28 test kernel: ffffffff 00000000 c06cd721 f56d94f0 f56d95dc fffffffe f7f7e034 c04fcee2 > Jul 27 08:42:28 test kernel: f56d94f0 f56d94f0 f7f7e034 d1363e08 c04fcbe4 d1363e44 c0570e26 f56d95d8 > Jul 27 08:42:28 test kernel: Call Trace: > Jul 27 08:42:28 test kernel: [<c04063c8>] show_trace_log_lvl+0x1a/0x2f > Jul 27 08:42:28 test kernel: [<c0406478>] show_stack_log_lvl+0x9b/0xa3 > Jul 27 08:42:28 test init: no more processes left in this runlevel > Jul 27 08:42:28 test kernel: [<c0406638>] show_registers+0x1b8/0x289 > Jul 27 08:42:28 test kernel: [<c0406820>] die+0x117/0x24a > Jul 27 08:42:28 test kernel: [<c062cdf0>] do_trap+0x8a/0xa3 > Jul 27 08:42:28 test kernel: [<c0406cb5>] do_invalid_op+0x88/0x92 > Jul 27 08:42:28 test kernel: [<c062cbba>] error_code+0x72/0x78 > Jul 27 08:42:28 test kernel: [<c04fcafe>] kobject_shadow_add+0xb6/0x192 > Jul 27 08:42:28 test kernel: [<c04fcbe4>] kobject_add+0xa/0xc > Jul 27 08:42:28 test kernel: [<c0570e26>] device_add+0xa2/0x52a > Jul 27 08:42:28 test kernel: [<c05712c0>] device_register+0x12/0x15 > Jul 27 08:42:28 test kernel: [<c057183c>] device_create+0x77/0x9b > Jul 27 08:42:28 test kernel: [<c0554887>] vcs_make_sysfs+0x37/0x72 > Jul 27 08:42:28 test kernel: [<c0559a37>] con_open+0x72/0x80 > Jul 27 08:42:28 test kernel: [<c054fbb5>] tty_open+0x174/0x2b9 > Jul 27 08:42:28 test kernel: [<c048b8eb>] chrdev_open+0x103/0x15a > Jul 27 08:42:28 test kernel: [<c0487cd8>] __dentry_open+0xc2/0x178 > Jul 27 08:42:28 test kernel: [<c0487e0f>] nameidata_to_filp+0x27/0x37 > Jul 27 08:42:28 test kernel: [<c0487e57>] do_filp_open+0x38/0x40 > Jul 27 08:42:28 test kernel: [<c0487ea6>] do_sys_open+0x47/0xcc > Jul 27 08:42:28 test kernel: [<c0487f63>] sys_open+0x1c/0x1e > Jul 27 08:42:28 test kernel: [<c0405196>] syscall_call+0x7/0xb > Jul 27 08:42:28 test kernel: ======================= > Jul 27 08:42:28 test kernel: Code: db b3 6d c0 e8 dd b5 f2 ff 0f 0b eb fe 8b 32 39 ce 74 1c 89 54 24 0c 89 74 24 08 89 4c 24 04 c7 04 24 2b b4 6d c0 e8 bb b5 f2 ff <0f> 0b eb fe 89 59 04 89 0b 89 43 04 89 18 83 c4 10 5b 5e 5d c3 > Jul 27 08:42:28 test kernel: EIP: [<c0502dc8>] __list_add+0x4b/0x60 SS:ESP 0068:d1363db4 All the other previous instances show the driver unable to load due to the firmware being unavailable, not present in the initrd and ready during the request_firmware() request... I am curious though, I'll look around, But i was sure fixes for these EEXIST manifestations went in during 2.6.20ish: http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/a9fdee2337511956/8372b8185a31eb50?lnk=st&q=EEXIST+kobject_add+scsi&rnum=5#8372b8185a31eb50 If you are still seeing them with .22/.23, you'll want to post a note to linux-scsi. (In reply to comment #6) Any successful boots was(In reply to comment #6) > The reboot at: > > > Jul 27 08:55:16 test syslogd 1.4.2: restart. > > Jul 27 08:55:16 test kernel: klogd 1.4.2, log source = /proc/kmsg started. > > Jul 27 08:55:16 test kernel: Linux version 2.6.20-1.2962.fc6 > (brewbuilder.redhat.com) (gcc version 4.1.1 2007010 > > Had a successful load of the driver: Please take note of the kernel version (2.6.20-1.2962.fc6) of the successful boot. I am using a Fedora Core 6 kernel because they actually work. (In reply to comment #6) Andrew, I missed some of your additional comments inline. It has been awhile since I first generated this ticket, but I think your individual analysis of the situation each time was correct in regards to booting/loading module after the fact, etc. No update in a while - what is the latest on this - any success with later kernels? (In reply to comment #9) > No update in a while - what is the latest on this - any success with later kernels? No change. Added SCSI blocker bug. As Andrew mentioned, it may be worth posting to linux-scsi. There were a few commits to qla2xxx for 2.6.24 but none that look like they might resolve the issue for you. You might want to consider grabbing a kernel from rawhide to test if you are able or waiting to see what 2.6.24 final brings... (In reply to comment #11) > Added SCSI blocker bug. As Andrew mentioned, it may be worth posting to > linux-scsi. There were a few commits to qla2xxx for 2.6.24 but none that look > like they might resolve the issue for you. You might want to consider grabbing a > kernel from rawhide to test if you are able or waiting to see what 2.6.24 final > brings... I just tried the latest kernel and still no change. This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists. Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs: http://docs.fedoraproject.org/release-notes/ The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |