Description of problem: megaraid doesn't see any of the three logical disks I have created in the cards BIOS. RAID 5 LD0 (3 18GB drives) RAID 0 LD1 (1 9GB drive) RAID 0 LD2 (1 18GB drive) megaraid2 sees the first disk, RAID 5, but nothing else. Version-Release number of selected component (if applicable): U6 release candidate 2.4.9-e56 How reproducible: Everytime Steps to Reproduce: 1. setup LSI Megaraid controller vendor 0x1000, 0x1960
Rut row! Its a regression.. .qa.[root@wrdell root]# uname -r 2.4.9-e.49smp .qa.[root@wrdell root]# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 02 Id: 00 Lun: 00 Vendor: MegaRAID Model: LD0 RAID5 34728R Rev: 1L19 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 02 Id: 01 Lun: 00 Vendor: MegaRAID Model: LD1 RAID0 8677R Rev: 1L19 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 02 Id: 02 Lun: 00 Vendor: MegaRAID Model: LD2 RAID0 17364R Rev: 1L19 Type: Direct-Access ANSI SCSI revision: 02 .qa.[root@wrdell root]# lsmod Module Size Used by Not tainted iscsi 40704 0 (unused) nfs 98880 10 (autoclean) lockd 61184 1 (autoclean) [nfs] sunrpc 84432 1 (autoclean) [nfs lockd] 3c59x 32264 1 appletalk 29676 0 (autoclean) ipx 25492 0 (autoclean) usb-uhci 26948 0 (unused) usbcore 68864 1 [usb-uhci] ext3 71264 2 jbd 55636 2 [ext3] megaraid 28256 0 (unused) sd_mod 13856 0 (unused) scsi_mod 127260 3 [iscsi megaraid sd_mod]
coughlan is currently investigating
I booted U6 (2.4.9-e.56smp) and tried the megaraid drivers from U5 that are preserved in the addon directory (megaraid_118 and megaraid_2106). They both failed to configure the disks, in the same way as described above. Seems to indicate that the problem is not caused by the megaraid updates. I backed out this patch, and now all the megaraid drivers in U6 configure all the storage. --- linux/drivers/scsi/scsi.c.bz138941 2004-11-11 19:55:52.000000000 -0500 +++ linux/drivers/scsi/scsi.c 2004-11-11 19:56:25.000000000 -0500 @@ -1451,6 +1451,9 @@ void scsi_build_commandblocks(Scsi_Devic Scsi_Cmnd *SCpnt; request_queue_t *q = &SDpnt->request_queue; + if (SDpnt->has_cmdblocks) + return; + spin_lock_irqsave(q->queue_lock, flags); if (SDpnt->queue_depth == 0) Reassigning to Doug.
Created attachment 107351 [details] the patch
That's the patch that was supposed to fix command-block memory leaking as seen in Bug 138941. See also Bug 131521 for the related RHEL3 issue.
Ugh...this is why I hate having 3 different forked SCSI stacks to take care of. OK, this looks like an interaction between scsi_scan.c and this patch. Changes I made to the RHEL3 version of scsi_scan.c are likely keeping this problem from showing up there. So, as a quick fix, Tom can you try replacing the return in this patch with a call to scsi_release_commandblocks(SDptr) and then fall through to the remainder of the function? It will waste CPU cycles, but it won't leak memory and it won't fail to allocate proper command blocks.
Replacing the return in this patch with a call to scsi_release_commandblocks fixes the problem. In IRC Doug indicates that more is needed though: <dledford> There is one other thing that needs to be done to implement the fix properly. <dledford> There are two places in the scsi stack that call the host driver's select_queue_depths routine. Immediately afterwards, it calls build_commandblocks to create device->queue_depth command blocks. But, by calling release_command_blocks if any exist, we will wipe out the queue depth setting. So, those two spots (one in scsi_scan.c and one in scsi.c) should be changed to read like this: <dledford> if (host->hostt->select_queue_depths) { <dledford> scsi_release_commandblocks(sdev); <dledford> host->hostt->select_queue_depths(host); <dledford> scsi_build_commandblocks(sdev); <dledford> } <dledford> RHEL3 *shouldn't* have this problem because the scsi_scan.c functions were modified to always release the command blocks on sdev structs we used for scanning before either A) keeping the sdev because it exists or B) reusing the sdev to scan the next device. AS2.1 doesn't have that scsi_scan cleanup. <dledford> But, it would definitely be a good thing to test under RHEL3 <dledford> OK, that's all I had ;-) <coughlan> yea, I was going to finish with AS 2.1 on this machine and then install REHL 3 on it. <dledford> Should be able to just throw in the RHEL3 kernel and boot it, not have to do a full install. I'll make the new patch, but then I'll be in an all-day meeting.
This regression was solved in U6 by removing the patch. The problem in bz 138517 is planned to be fixed with a slightly larger patch in U7. Closing this BZ.