Bug 589405
Summary: | (CR178148) Running a storage controller fail and drive fail test on an LSI storage array connected to a RHEL5.5 x64 host with QLE4062 host adapter gives a kernel panic | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Sujithkumary <sujith.yadhav> | ||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.5 | CC: | abdel.sadek, abdel.sadek, chekov, ctatman, sujith.yadhav, xdl-apbu-iop-bz | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-10-10 13:36:10 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I am seeing a similar kernel panic with my QLA4010c using the qisioctl driver instead of the ibm/rdac one. since my qla4xxx module is included in my initrd file this actually makes my machine unbootable with any kernel newer than 2.6.18-164.15.1.el5 I just verified with kernel 2.6.18-194.11.3.el5 and the error persists. This should probably not be "low" priority as it effectively precludes anyone from running RHEL5.5 and booting off of iSCSI with a qla4xxx card... -alan This can be closed from NetApp perspective as we've dropped using and supporting the Qlogic QLE406x and QLE405x. This can be closed....per comment # 2. Thanks! --Chris |
Created attachment 411810 [details] Host Serial Logs Description of problem: Running a storage controller fail and drive fail test on an LSI storage array connected to a RHEL5.5 x64 host with QLE4062 host adapter gives a kernel panic after 12 hours of running the test. This is specific to SANboot and this is not observed when the host is booted up with internal HDD. The issue was raised as soon as we hit it. The reproducibility is unknown. The call trace of the panic is below: NMI Watchdog detected LOCKUP on CPU 0 8>CPU 0 8>Modules linked in: ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) lockd sunrpc loop dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev pcspkr shpchp qla3xxx bnx2 serio_raw i5000_edac edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache qisioctl(U) mppVhba(U) ata_piix libata megaraid_sas qla4xxx scsi_transport_iscsi2 scsi_transport_iscsi mppUpper(U) sg sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd 8>Pid: 495, comm: scsi_eh_0 Tainted: G 2.6.18-194.el5 #1 8>RIP: 0010:[ffffffff80065c0b>] [ffffffff80065c0b>] .text.lock.spinlock+0x11/0x30 8>RSP: 0018:ffffffff80448f00 EFLAGS: 00000086 8>RAX: 0000000000000046 RBX: ffff81007eb404f8 RCX: 000000000000005a 8>RDX: ffff81007eb9dd38 RSI: ffff81007eb404f8 RDI: ffff81007eb405f8 8>RBP: 0000000000000000 R08: ffff81007eb9dd80 R09: 000000000000003c 8>R10: 0000000000000002 R11: ffffffff8812226e R12: 000000000000005a 8>R13: 0000000000000000 R14: ffff81007eb9dd38 R15: ffff81007eb9dd38 8>FS: 0000000000000000(0000) GS:ffffffff803cb000(0000) knlGS:0000000000000000 8>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 8>CR2: 000000000911d000 CR3: 00000000355aa000 CR4: 00000000000006e0 8>Process scsi_eh_0 (pid: 495, threadinfo ffff81007eb9c000, task ffff81007f897100) 8>Stack: ffffffff88120687 0000000000000000 ffff81007f5b4440 000000000000005a 8> ffffffff80010c00 ffffffff803e8d80 0000000000005a00 000000000000005a 8> ffff81007f5b4440 ffffffff803e8dbc ffffffff800bbfaf ffffffff80012409 8>Call Trace: 8> IRQ> [ffffffff88120687>] :qla4xxx:qla4xxx_intr_handler+0x3c/0x201 8> [ffffffff80010c00>] handle_IRQ_event+0x51/0xa6 8> [ffffffff800bbfaf>] __do_IRQ+0xa4/0x103 8> [ffffffff80012409>] __do_softirq+0x89/0x133 8> [ffffffff8006da2b>] do_IRQ+0xe7/0xf5 8> [ffffffff8005e615>] ret_from_intr+0x0/0xa 8> EOI> [ffffffff8811a74d>] :qla4xxx:qla4xxx_eh_device_reset+0x151/0x267 8> [ffffffff80151a1a>] kobject_release+0x0/0x9 8> [ffffffff880779a5>] :scsi_mod:scsi_try_bus_device_reset+0x21/0x42 8> [ffffffff88078850>] :scsi_mod:scsi_eh_ready_devs+0x1ad/0x493 8> [ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 8> [ffffffff88079237>] :scsi_mod:scsi_error_handler+0x323/0x4ac 8> [ffffffff88078f14>] :scsi_mod:scsi_error_handler+0x0/0x4ac 8> [ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 8> [ffffffff80032bdc>] kthread+0xfe/0x132 8> [ffffffff8005efb1>] child_rip+0xa/0x11 8> [ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 8> [ffffffff801f85a7>] hcd_submit_urb+0x0/0x752 8> [ffffffff80032ade>] kthread+0x0/0x132 8> [ffffffff8005efa7>] child_rip+0x0/0x11 8> 8> 8>Code: 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe ff ff f3 Memory for crash kernel (0x0 to 0x0) notwithin permissible range Red Hat nash version 5.1.19.6 starting session1: Cannot notify userspace of session event 106. Check iscsi daemon session2: Cannot notify userspace of session event 106. Check iscsi daemon Welcome to Red Hat Enterprise Linux Server Press 'I' to enter interactive startup. Setting clock (utc): Fri Apr 30 10:34:52 IST 2010 [ OK ] Starting udev: qla3xxx QLogic ISP3XXX Network Driver qla3xxx Driver name: qla3xxx, Version: v2.03.00.00.05.03-k4. Kernel panic - not syncing: Out of memory and no killable processes... Version-Release number of selected component (if applicable): RHEL 5.5 kernel 2.6.18-194.el5.ELsmp How reproducible:Unknown Steps to Reproduce: 1) It is 1x2 (Server X array) ISCSI Setup with a Single server with a QLE4062c Host Adapter connected to a LSI 7900 and an LSI 4988 array via a Dell 6224 switch. 2) Create a 50 GB LUN on one array and map it to the host. The host is shutdown, hard disk is removed. It is then powered up and the Host Adapter BIOS is configured to boot from the presented LUN. 3) Install RHEL5.5 x86_64 (2.6.18-194.el5) OS on the mapped LUN. At the point of installation only 1 path to the LUN is presented to the host. 4) Install the host with IBM RDAC(LSI MPP) multipathing driver. Once the multipathing driver is installed, the 2nd path to the LUN is presented to the host. 5) Create 64 LUNs of 1 GB on each array and map to the host. The host sees the volumes successfully. Create ext3 filesystem on 6 volumes and mount them 6) Run a single thread of I/O to all volumes(excluding the 50 GB LUN on which OS is installed) with block size from 32 to 4096.The IO is run on the raw devices as well as the 6 mounted filesystems. 7) Run the CFDF script that sleeps five minutes between each step. a. Offlines both A storage controllers b. Fails a drive c. Onlines both A storage controllers d. Reconstructs the drive e. Repeat above steps, alternating controllers and using a number of different drives. 8) After I/O running successfully for around 12 hours the host hits the panic mentioned above. Actual results: Host hits a Kernel Panic Expected results: I/O's should run gracefully without any failures. Additional info: The host had IBM RDAC(LSI MPP) multipathing driver installed on it.