Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 299058 Details for
Bug 437384
[QLogic/IBM 5.3 bug] EEH error on JS22 results in panic w/qla2xx 8.02.00-k5-rhel5.2-02 driver
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
Test/Debug Notes_0325
bz43135_0325.txt (text/plain), 13.61 KB, created by
IBM Bug Proxy
on 2008-03-25 19:01:03 UTC
(
hide
)
Description:
Test/Debug Notes_0325
Filename:
MIME Type:
Creator:
IBM Bug Proxy
Created:
2008-03-25 19:01:03 UTC
Size:
13.61 KB
patch
obsolete
>RHEL5.2-snap1 (20080313) w/ 8.02.00-k5-rhel5.2-03 + qla2xxx patches: >fix_eeh -- from marcus > >fixeeh_1.patch: >Index: b/drivers/scsi/qla2xxx/qla_version.h >=================================================================== >--- a/drivers/scsi/qla2xxx/qla_version.h >+++ b/drivers/scsi/qla2xxx/qla_version.h >@@ -7,7 +7,7 @@ > /* > * Driver version > */ >-#define QLA2XXX_VERSION "8.02.00-k5-rhel5.2-03" >+#define QLA2XXX_VERSION "8.02.00-k5-rhel5.2-05" > > #define QLA_DRIVER_MAJOR_VER 8 > #define QLA_DRIVER_MINOR_VER 1 > >fixeeh_2.patch: >Index: b/drivers/scsi/qla2xxx/qla_os.c >=================================================================== >--- a/drivers/scsi/qla2xxx/qla_os.c >+++ b/drivers/scsi/qla2xxx/qla_os.c >@@ -3004,6 +3004,12 @@ qla2xxx_pci_slot_reset(struct pci_dev *p > if (ha->isp_ops->pci_config(ha)) > return ret; > >+ /* >+ * on SLES10, following is required as a workaround. >+ * Below should be removed when the kernel starts to handles it. >+ */ >+ pdev->error_state = pci_channel_io_normal; >+ > set_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags); > if (qla2x00_abort_isp(ha)== QLA_SUCCESS) > ret = PCI_ERS_RESULT_RECOVERED; > >Note: even though the above patch states for SLES10, it is also required >for RHEL5 at the moment. As RHEL5 kernel kernel does not eliminate need >for workaround. > >On RHEL5.2 fix_eeh from bugzilla and fixeeh_1 patch above I injected an >EEH error and the driver did not recover as expected, instead a panic >occured producing the following console messages: > ># errinjct ioa-bus-error -v -f 16 -s scsi_host/host1 >errinjct: Could not find errinjct function for rtas token "ioa-bus-error-64" >Injecting an ioa-bus-error with the following data: > >BUS ADDR: 00000000 >ADDR MASK: 00000000 >CONFIG ADDR: 800 >PHB UNIT_ID: 800000020000203 >FUNCTION: 16 >DMA write to PCI Memory Address Space - inject an Address Parity Error >Call to RTAS errinjct succeeded! > >If the correct information was provided and there is >activity on the bus, the hardware should hit the error >However, if incorrect information was provided or there >is no bus activity, you may not get a hit. > >[root@turkey-4 qla2xxx]# EEH: This PCI device has failed 1 times since last reboot: location=U78A5.001.WIH03DA-P1-C6-T1 driver=qla2xxx pci addr=0003:00:01.0 >qla2xxx 0003:00:01.0: Failed mailbox send register test >qla2xxx 0003:00:01.1: Failed mailbox send register test >EEH: Unable to recover from failure of PCI device at location=U78A5.001.WIH03DA-P1-C6-T1 driver=qla2xxx pci addr=0003:00:01.0 >Please try reseating this device or replacing it. >EEH: Device driver ignored 100000 bad reads, panicing >Call Trace: >[C00000001FE3B830] [C000000000010378] .show_stack+0x68/0x1b0 (unreliable) >[C00000001FE3B8D0] [C00000000004FD60] .eeh_dn_check_failure+0x108/0x29c >[C00000001FE3B980] [C000000000050104] .eeh_check_failure+0xe0/0x108 >[C00000001FE3BA00] [D000000000468BA8] .qla2x00_chip_diag+0x22c/0xa00 [qla2xxx] >[C00000001FE3BAD0] [D00000000046A718] .qla2x00_try_to_stop_firmware+0x54/0xe0 [qla2xxx] >[C00000001FE3BB60] [D000000000461C80] .qla2x00_free_device+0x94/0x120 [qla2xxx] >[C00000001FE3BBE0] [D000000000461EAC] .qla2x00_remove_one+0x1a0/0x1dc [qla2xxx] >[C00000001FE3BC80] [D000000000462628] .qla2xxx_pci_error_detected+0xd4/0xfc [qla2xxx] >[C00000001FE3BD20] [C000000000050800] .eeh_report_failure+0xd0/0xec >[C00000001FE3BDA0] [C0000000001C1AD4] .pci_walk_bus+0xf8/0x168 >[C00000001FE3BE50] [C000000000050B20] .handle_eeh_events+0x304/0x340 >[C00000001FE3BF00] [C000000000050E9C] .eeh_event_handler+0xc0/0x160 >[C00000001FE3BF90] [C000000000026FA8] .kernel_thread+0x4c/0x68 > >The line "EEH: Device driver ignored 100000 bad reads, panicing" indicates that the driver >was in a spin loop accessing frozen PCI space for the adapter when adapter has been frozen. > >I then diffed the RHEL5.2 driver vs. the SLES10-SP2 driver (which recovers successfully). > >I found that patch fixeeh_2.patch above was missing. I rebuilt the driver and retested. >Driver behaviour upon detecting EEH error did not hit the above panic, and appeared to recover >but then it seemed stuck in a mode where it was reporting repeating Mailbox timeouts. >So, while behaviour was better, the driver did not fully recover from the EEH errors. > >Console messages and contents of /var/log/messages: > ># errinjct ioa-bus-error -v -f 17 -s scsi_host/host1 >errinjct: Could not find errinjct function for rtas token "ioa-bus-error-64" >Injecting an ioa-bus-error with the following data: > >BUS ADDR: 00000000 >ADDR MASK: 00000000 >CONFIG ADDR: 800 >PHB UNIT_ID: 800000020000203 >FUNCTION: 17 >DMA write to PCI Memory Address Space - inject a Data Parity Error >Call to RTAS errinjct succeeded! > >If the correct information was provided and there is >activity on the bus, the hardware should hit the error >However, if incorrect information was provided or there >is no bus activity, you may not get a hit. > >[root@turkey-4 ~]# dmesg >[root@turkey-4 ~]# EEH: This PCI device has failed 1 times since last reboot: location=U78A5.001.WIH03DA-P1-C6-T1 driver=qla2xxx pci addr=0003:00:01.0 > >[root@turkey-4 ~]# dmesg >EEH: Detected PCI bus error on device 0003:00:01.0 >RTAS: event: 225, Type: Platform Error, Severity: 2 >EEH: This PCI device has failed 1 times since last reboot: location=U78A5.001.WIH03DA-P1-C6-T1 driver=qla2xxx pci addr=0003:00:01.0 >PCI: Enabling device: (0003:00:01.0), cmd 3 >qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >PCI: Enabling device: (0003:00:01.1), cmd 3 >qla2xxx 0003:00:01.1: Performing ISP error recovery - ha= c0000000136a84f8. >qla2xxx 0003:00:01.1: LOOP UP detected (4 Gbps). > ># cat /tmp/messages0325 >Mar 25 11:11:05 turkey-4 kernel: EEH: Detected PCI bus error on device 0003:00:01.0 >Mar 25 11:11:05 turkey-4 kernel: EEH: This PCI device has failed 1 times since last reboot: location=U78A5.001.WIH03DA-P1-C6-T1 driver=qla2xxx pci addr=0003:00:01.0 >Mar 25 11:11:07 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:11:09 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:11:09 turkey-4 kernel: qla2xxx 0003:00:01.1: Performing ISP error recovery - ha= c0000000136a84f8. >Mar 25 11:11:10 turkey-4 kernel: qla2xxx 0003:00:01.1: LOOP UP detected (4 Gbps). >Mar 25 11:12:39 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:12:39 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:12:40 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:12:40 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 28942 2002. >Mar 25 11:13:20 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:13:20 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:13:22 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:13:22 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 28942 2002. >Mar 25 11:14:02 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:14:02 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:14:03 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:14:04 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 2893c 2002. >Mar 25 11:14:44 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:14:44 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:14:45 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:14:45 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 2893d 2002. >Mar 25 11:15:25 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:15:25 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:15:26 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:15:27 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:1): Abort command issued -- 0 2893e 2002. >Mar 25 11:16:07 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:16:07 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:16:08 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:16:08 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:6): Abort command issued -- 0 2893f 2002. >Mar 25 11:16:48 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:16:48 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:16:49 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:16:50 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:6): Abort command issued -- 0 28940 2002. >Mar 25 11:17:30 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:17:30 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:17:31 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:17:31 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:7): Abort command issued -- 0 28941 2002. >Mar 25 11:17:31 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): DEVICE RESET ISSUED. >Mar 25 11:18:01 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:18:01 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:18:03 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:18:03 turkey-4 kernel: qla2xxx 0003:00:01.0: qla2xxx_eh_device_reset: device reset failed >Mar 25 11:18:03 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:1): DEVICE RESET ISSUED. >Mar 25 11:18:33 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:18:33 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:18:34 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:18:34 turkey-4 kernel: qla2xxx 0003:00:01.0: qla2xxx_eh_device_reset: device reset failed >Mar 25 11:18:34 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:6): DEVICE RESET ISSUED. >Mar 25 11:19:04 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:19:04 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:19:06 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:19:06 turkey-4 kernel: qla2xxx 0003:00:01.0: qla2xxx_eh_device_reset: device reset failed >Mar 25 11:19:06 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:7): DEVICE RESET ISSUED. >Mar 25 11:19:36 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:19:36 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:19:37 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:19:37 turkey-4 kernel: qla2xxx 0003:00:01.0: qla2xxx_eh_device_reset: device reset failed >Mar 25 11:19:37 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): LOOP RESET ISSUED. >Mar 25 11:20:07 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:20:07 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:20:09 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:20:44 turkey-4 kernel: rport-1:0-0: blocked FC remote port time out: saving binding >Mar 25 11:21:10 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:21:10 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:21:11 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:21:12 turkey-4 kernel: qla2xxx 0003:00:01.0: qla2xxx_eh_bus_reset: reset succeded >Mar 25 11:22:02 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:22:02 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:22:03 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:22:03 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 28942 2002. >Mar 25 11:22:43 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:22:43 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:22:44 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:22:45 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 2893c 2002. >Mar 25 11:23:25 turkey-4 kernel: qla2xxx 0003:00:01.0: Mailbox command timeout occured. Issuing ISP abort. >Mar 25 11:23:25 turkey-4 kernel: qla2xxx 0003:00:01.0: Performing ISP error recovery - ha= c0000000136ac4f8. >Mar 25 11:23:26 turkey-4 kernel: qla2xxx 0003:00:01.0: LOOP UP detected (2 Gbps). >Mar 25 11:23:26 turkey-4 kernel: qla2xxx 0003:00:01.0: scsi(1:0:0): Abort command issued -- 0 2893d 2002.
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 437384
:
298907
| 299058 |
299284