Bug 140454

Summary: megaraid and megaraid2 do not see their logical disks
Product: Red Hat Enterprise Linux 2.1 Reporter: Bill Peck <bpeck>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: coughlan, jparadis, riel
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-02-28 14:49:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the patch none

Description Bill Peck 2004-11-22 22:15:54 UTC
Description of problem:
megaraid doesn't see any of the three logical disks I have created in
the cards BIOS. 
RAID 5 LD0 (3 18GB drives)
RAID 0 LD1 (1 9GB drive)
RAID 0 LD2 (1 18GB drive)

megaraid2 sees the first disk, RAID 5, but nothing else.

Version-Release number of selected component (if applicable):
U6 release candidate
2.4.9-e56

How reproducible:
Everytime

Steps to Reproduce:
1. setup LSI Megaraid controller vendor 0x1000, 0x1960

Comment 1 Bill Peck 2004-11-23 17:28:08 UTC
Rut row!  Its a regression..

.qa.[root@wrdell root]# uname -r
2.4.9-e.49smp
.qa.[root@wrdell root]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 02 Id: 00 Lun: 00
  Vendor: MegaRAID Model: LD0 RAID5 34728R Rev: 1L19
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 02 Id: 01 Lun: 00
  Vendor: MegaRAID Model: LD1 RAID0  8677R Rev: 1L19
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 02 Id: 02 Lun: 00
  Vendor: MegaRAID Model: LD2 RAID0 17364R Rev: 1L19
  Type:   Direct-Access                    ANSI SCSI revision: 02
.qa.[root@wrdell root]# lsmod
Module                  Size  Used by    Not tainted
iscsi                  40704   0  (unused)
nfs                    98880  10  (autoclean)
lockd                  61184   1  (autoclean) [nfs]
sunrpc                 84432   1  (autoclean) [nfs lockd]
3c59x                  32264   1
appletalk              29676   0  (autoclean)
ipx                    25492   0  (autoclean)
usb-uhci               26948   0  (unused)
usbcore                68864   1  [usb-uhci]
ext3                   71264   2
jbd                    55636   2  [ext3]
megaraid               28256   0  (unused)
sd_mod                 13856   0  (unused)
scsi_mod              127260   3  [iscsi megaraid sd_mod]


Comment 2 Jim Paradis 2004-11-23 21:48:57 UTC
coughlan is currently investigating

Comment 3 Tom Coughlan 2004-11-23 22:27:19 UTC
I booted U6 (2.4.9-e.56smp) and tried the megaraid drivers from U5
that are preserved in the addon directory (megaraid_118 and
megaraid_2106). They both failed to configure the disks, in the same
way as described above. Seems to indicate that the problem is not
caused by the megaraid updates.

I backed out this patch, and now all the megaraid drivers in U6
configure all the storage.

--- linux/drivers/scsi/scsi.c.bz138941  2004-11-11 19:55:52.000000000
-0500
+++ linux/drivers/scsi/scsi.c   2004-11-11 19:56:25.000000000 -0500
@@ -1451,6 +1451,9 @@ void scsi_build_commandblocks(Scsi_Devic
        Scsi_Cmnd *SCpnt;
        request_queue_t *q = &SDpnt->request_queue;

+       if (SDpnt->has_cmdblocks)
+               return;
+
        spin_lock_irqsave(q->queue_lock, flags);

        if (SDpnt->queue_depth == 0)


Reassigning to Doug.

Comment 4 Tom Coughlan 2004-11-23 22:28:36 UTC
Created attachment 107351 [details]
the patch

Comment 5 Jim Paradis 2004-11-23 23:43:09 UTC
That's the patch that was supposed to fix command-block memory leaking
as seen in Bug 138941.  See also Bug 131521 for the related RHEL3 issue.


Comment 7 Doug Ledford 2004-11-30 01:24:43 UTC
Ugh...this is why I hate having 3 different forked SCSI stacks to take
care of.  OK, this looks like an interaction between scsi_scan.c and
this patch.  Changes I made to the RHEL3 version of scsi_scan.c are
likely keeping this problem from showing up there.  So, as a quick
fix, Tom can you try replacing the return in this patch with a call to
scsi_release_commandblocks(SDptr) and then fall through to the
remainder of the function?  It will waste CPU cycles, but it won't
leak memory and it won't fail to allocate proper command blocks.

Comment 8 Tom Coughlan 2004-11-30 14:44:40 UTC
Replacing the return in this patch with a call to
scsi_release_commandblocks fixes the problem. 

In IRC Doug indicates that more is needed though: 

<dledford> There is one other thing that needs to be done to implement
the fix properly.
<dledford> There are two places in the scsi stack that call the host
driver's select_queue_depths routine.  Immediately afterwards, it
calls build_commandblocks to create device->queue_depth command
blocks.  But, by calling release_command_blocks if any exist, we will
wipe out the queue depth setting.  So, those two spots (one in
scsi_scan.c and one in scsi.c) should be changed to read like this:
<dledford> if (host->hostt->select_queue_depths) {
<dledford>   scsi_release_commandblocks(sdev);
<dledford>   host->hostt->select_queue_depths(host);
<dledford>   scsi_build_commandblocks(sdev);
<dledford> }
<dledford> RHEL3 *shouldn't* have this problem because the scsi_scan.c
functions were modified to always release the command blocks on sdev
structs we used for scanning before either A) keeping the sdev because
it exists or B) reusing the sdev to scan the next device.  AS2.1
doesn't have that scsi_scan cleanup.
<dledford> But, it would definitely be a good thing to test under RHEL3
<dledford> OK, that's all I had ;-)
<coughlan> yea, I was going to finish with AS 2.1 on this machine and
then install REHL 3 on it.
<dledford> Should be able to just throw in the RHEL3 kernel and boot
it, not have to do a full install.


I'll make the new patch, but then I'll be in an all-day meeting.

Comment 9 Tom Coughlan 2005-02-28 14:49:36 UTC
This regression was solved in U6 by removing the patch. 

The problem in bz 138517 is planned to be fixed with a slightly larger patch in U7.

Closing this BZ.