Bug 140454
Summary: | megaraid and megaraid2 do not see their logical disks | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | Bill Peck <bpeck> | ||||
Component: | kernel | Assignee: | Doug Ledford <dledford> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.1 | CC: | coughlan, jparadis, riel | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-02-28 14:49:36 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Bill Peck
2004-11-22 22:15:54 UTC
Rut row! Its a regression.. .qa.[root@wrdell root]# uname -r 2.4.9-e.49smp .qa.[root@wrdell root]# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 02 Id: 00 Lun: 00 Vendor: MegaRAID Model: LD0 RAID5 34728R Rev: 1L19 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 02 Id: 01 Lun: 00 Vendor: MegaRAID Model: LD1 RAID0 8677R Rev: 1L19 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 02 Id: 02 Lun: 00 Vendor: MegaRAID Model: LD2 RAID0 17364R Rev: 1L19 Type: Direct-Access ANSI SCSI revision: 02 .qa.[root@wrdell root]# lsmod Module Size Used by Not tainted iscsi 40704 0 (unused) nfs 98880 10 (autoclean) lockd 61184 1 (autoclean) [nfs] sunrpc 84432 1 (autoclean) [nfs lockd] 3c59x 32264 1 appletalk 29676 0 (autoclean) ipx 25492 0 (autoclean) usb-uhci 26948 0 (unused) usbcore 68864 1 [usb-uhci] ext3 71264 2 jbd 55636 2 [ext3] megaraid 28256 0 (unused) sd_mod 13856 0 (unused) scsi_mod 127260 3 [iscsi megaraid sd_mod] coughlan is currently investigating I booted U6 (2.4.9-e.56smp) and tried the megaraid drivers from U5 that are preserved in the addon directory (megaraid_118 and megaraid_2106). They both failed to configure the disks, in the same way as described above. Seems to indicate that the problem is not caused by the megaraid updates. I backed out this patch, and now all the megaraid drivers in U6 configure all the storage. --- linux/drivers/scsi/scsi.c.bz138941 2004-11-11 19:55:52.000000000 -0500 +++ linux/drivers/scsi/scsi.c 2004-11-11 19:56:25.000000000 -0500 @@ -1451,6 +1451,9 @@ void scsi_build_commandblocks(Scsi_Devic Scsi_Cmnd *SCpnt; request_queue_t *q = &SDpnt->request_queue; + if (SDpnt->has_cmdblocks) + return; + spin_lock_irqsave(q->queue_lock, flags); if (SDpnt->queue_depth == 0) Reassigning to Doug. Created attachment 107351 [details]
the patch
That's the patch that was supposed to fix command-block memory leaking as seen in Bug 138941. See also Bug 131521 for the related RHEL3 issue. Ugh...this is why I hate having 3 different forked SCSI stacks to take care of. OK, this looks like an interaction between scsi_scan.c and this patch. Changes I made to the RHEL3 version of scsi_scan.c are likely keeping this problem from showing up there. So, as a quick fix, Tom can you try replacing the return in this patch with a call to scsi_release_commandblocks(SDptr) and then fall through to the remainder of the function? It will waste CPU cycles, but it won't leak memory and it won't fail to allocate proper command blocks. Replacing the return in this patch with a call to scsi_release_commandblocks fixes the problem. In IRC Doug indicates that more is needed though: <dledford> There is one other thing that needs to be done to implement the fix properly. <dledford> There are two places in the scsi stack that call the host driver's select_queue_depths routine. Immediately afterwards, it calls build_commandblocks to create device->queue_depth command blocks. But, by calling release_command_blocks if any exist, we will wipe out the queue depth setting. So, those two spots (one in scsi_scan.c and one in scsi.c) should be changed to read like this: <dledford> if (host->hostt->select_queue_depths) { <dledford> scsi_release_commandblocks(sdev); <dledford> host->hostt->select_queue_depths(host); <dledford> scsi_build_commandblocks(sdev); <dledford> } <dledford> RHEL3 *shouldn't* have this problem because the scsi_scan.c functions were modified to always release the command blocks on sdev structs we used for scanning before either A) keeping the sdev because it exists or B) reusing the sdev to scan the next device. AS2.1 doesn't have that scsi_scan cleanup. <dledford> But, it would definitely be a good thing to test under RHEL3 <dledford> OK, that's all I had ;-) <coughlan> yea, I was going to finish with AS 2.1 on this machine and then install REHL 3 on it. <dledford> Should be able to just throw in the RHEL3 kernel and boot it, not have to do a full install. I'll make the new patch, but then I'll be in an all-day meeting. |