Bug 146954 - megaraid2 driver fails to recognize all LSI RAID adapters when there are more than 4 with >=4GB
Summary: megaraid2 driver fails to recognize all LSI RAID adapters when there are more...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: ia32e
OS: Linux
medium
high
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix 186960
TreeView+ depends on / blocked
 
Reported: 2005-02-02 22:56 UTC by Need Real Name
Modified: 2007-11-30 22:07 UTC (History)
11 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 13:19:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
RHEL 3 U4 em64t with 4GB (4.94 KB, text/plain)
2005-02-03 00:23 UTC, Need Real Name
no flags Details
the panic info (15.35 KB, text/plain)
2005-02-24 19:28 UTC, Need Real Name
no flags Details
Patch fixing the issue (2.84 KB, patch)
2005-03-10 03:34 UTC, Suresh Siddha
no flags Details | Diff
Rewrite the patch so we dont fail if there is a swiotlb. (2.82 KB, patch)
2005-04-11 20:34 UTC, Larry Woodman
no flags Details | Diff
Can Intel please grab and test this patch ASAP? (2.83 KB, patch)
2005-09-09 20:40 UTC, Larry Woodman
no flags Details | Diff
Map single panic (17.63 KB, text/plain)
2005-09-09 22:05 UTC, dely.l.sy
no flags Details
SWIOTLB patch fixing the issue (3.15 KB, patch)
2005-09-13 17:18 UTC, Suresh Siddha
no flags Details | Diff
Patch fixing the issue (3.15 KB, patch)
2005-09-15 22:42 UTC, Suresh Siddha
no flags Details | Diff
Fix to larrys proposal in comment #73 (2.99 KB, patch)
2006-01-30 19:58 UTC, Suresh Siddha
no flags Details | Diff
Patch as per conversation with Larry. (4.02 KB, patch)
2006-04-10 17:44 UTC, Suresh Siddha
no flags Details | Diff
My cut to the same patch... (3.80 KB, application/octet-stream)
2006-04-10 17:57 UTC, Larry Woodman
no flags Details
maxdma patch used in the above kernel and to be reviewed by Intel (3.72 KB, patch)
2006-04-11 17:27 UTC, Larry Woodman
no flags Details | Diff
dmesg from run with maxdma=32M (43.27 KB, application/octet-stream)
2006-04-11 22:59 UTC, dely.l.sy
no flags Details
dmesg from run w/o using maxdma boot option (27.90 KB, text/plain)
2006-04-11 23:03 UTC, dely.l.sy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Need Real Name 2005-02-02 22:56:37 UTC
Description of problem:
If we have more than 4 LSI RAID adapters in the system running RHEL 3 U3 or U4 
em64t, the driver fails to allocate either sglist or passthru after the 4th 
adapter (pci_alloc_consistent fails). 

Em64t failure seems to be related to the memory configuration, if we have more 
than 4GB of memory, driver fails to allocate memory after the 4th adapter. 
However if we lower the memory to < 4GB it works.

Additional info:
Decreasing the number of MAX_COMMANDS in megaraid2.h from 126 to i.e 64 
regardless of the memory size seems to work around the issue.

There are other problems we observed as well. With having more than 4 LSI RAID 
adapters, we have seen kernel panics caused by other drivers following the 
megaraid2 driver. This happens only with em64t kernel and when we have >=4GB 
of memory.

I am not sure yet whether there is a megaraid2 issue or a kernel issue which 
happened to surface in these setups.

Comment 1 Need Real Name 2005-02-03 00:23:22 UTC
Created attachment 110581 [details]
RHEL 3 U4 em64t with 4GB

Tested latest driver from LSI and it fails the same way as the inbox driver

Comment 2 Need Real Name 2005-02-18 20:48:30 UTC
In RHEL3 U3 pci_alloc_consistent() for Intel em64t systems is getting 
allocated from ZONE_DMA(which is just 16MB) irrespective of dma mask.  The 
workaround is use 'swiotlb=' boot option. 

If I set for example 'swiotl=32768/65536', then the inbox driver loads and 
finds all the adapters (7 adapters in our case). 

When swiotlb window size is 2MB, swiotlb buffers (which needs to be contiguous 
and get allocated during boot time) are also getting allocated from 16MB 
(falls into ZONE_DMA) and leaving less memory for consistent mappings.     

Comment 3 Need Real Name 2005-02-24 19:25:33 UTC
The same problem causes a panic during the install. EL 3 U4 em64t installer 
panics if there are >= 4GB of memory and > RAID 4 adapters.

Attaching the boot and the panic message.

Comment 4 Need Real Name 2005-02-24 19:28:08 UTC
Created attachment 111393 [details]
the panic info

RHEL 3 U4 em64t installer boot message and the panic info with 4GB and >4 RAID
adapters

Comment 5 Tom Coughlan 2005-02-24 19:34:13 UTC
Can you work around the installer problem by passing the 'swiotlb=' boot option?

Comment 6 Need Real Name 2005-02-24 21:40:51 UTC
Unfortunately, no.

Comment 7 Need Real Name 2005-02-24 21:49:40 UTC
If we use nopobe and load the megaraid2 drivers manually, kernel does not 
panic but we don't see all the adapters. Using 'swiotlb=' in addition to 
noprobe does not change the behavior.

Comment 8 Suresh Siddha 2005-03-10 03:34:20 UTC
Created attachment 111839 [details]
Patch fixing the issue

Attached patch fixes this issue. This patch is a backport of 2.6 patch thats
posted here.
http://www.gelato.unsw.edu.au/linux-ia64/0410/11406.html

Basically for pci_alloc_consistent, we remove the dependency on GFP_DMA by
using GFP_NORMAL, and falling back to swiotlb if we find a memory address >
4GB.

Comment 9 Tom Coughlan 2005-03-10 20:01:06 UTC
Thanks for the patch. It looks good to me.

Unfortunately, the deadline for RHEL 3 U5 has passed. This fix will probably
have to wait for U6. 

Comment 10 Matt Domsch 2005-03-30 17:11:30 UTC
Suresh, isn't the swiotlb still located in ZONE_DMA though?  So its size could
consume all of ZONE_DMA.

A secondary problem is that the scsi_malloc() pool can consume all of ZONE_DMA
too, which is alleviated by decreasing the MAX_COMMANDS in each driver, IT67111
describes my approach to solving this.

Comment 11 Suresh Siddha 2005-03-30 18:04:08 UTC
Matt, default swiotlb size in U5 is increased to 64MB and because of this, its 
now getting allocated from above ZONE_DMA.

About scsi_malloc() pool issue, I agree that it needs to be changed. I don't 
have access to IT67111. Can you send me the gist of it? Is it addressing the 
problem tracked by this bug or just the scsi_malloc() ZONE_DMA pool issue.



Comment 12 Matt Domsch 2005-03-30 22:10:53 UTC
http://marc.theaimsgroup.com/?l=linux-scsi&m=111117365306386&w=2
is the scsi_malloc() pool issue patch I mentioned.

It adds two parameters to scsi_mod:
max_dma_memory=N   where N is in megabytes
which limits how much memory the scsi_malloc() pool can consume (by default
32MB, though that can only be reached on IA64 where ZONE_DMA is defined to be
all of the first 4GB of space, otherwise it runs out well before 16MB.

use_zone_normal=1  to force the pool to come from ZONE_NORMAL rather than
ZONE_DMA.  This isn't always safe, but when it is safe, it's a really good
thing, else the pool can consume all of ZONE_DMA pretty easily.



Comment 13 Susan Denham 2005-04-01 23:19:02 UTC


-----Forwarded Message-----
From: Susan S. Denham <sdenham>
To: Matt Domsch <Matt_Domsch>, John Hull <John_Hull>, Dale
Kaisner <dale_kaisner>, Amit Bhutani <amit_bhutani>
Cc: Rob Landry <rlandry>, Larry Woodman <lwoodman>,
peterm, jburke, Jay Turner <jkt>
Subject: Please test:  U5 test kernel (post U5 beta) that addresses IT 67111 and
50598
Date: 31 Mar 2005 16:04:16 -0500

Guys,

My hero Larry Woodman has done up a post-beta U5 test kernel (these
fixes are being considered for U5 GA) that he thinks kills two birds
with one stone:

- IT 67111 (Dell); IT 68265 (Intel); BZ 146954 - megaraid2 driver fails
to recognize all LSI RAID adapters when there are more than 4 with >=4GB
- IT 50598 - ata_piix doesn't find disks with > = 4GB RAM (Dell MUSTFIX)

 Please grab the kernel from the location below and let us know your
test results ASAP.

http://people.redhat.com/coughlan/RHEL3-swiotlb/kernel-2.4.21-31.swiotlb.EL.ia32e.rpm
http://people.redhat.com/coughlan/RHEL3-swiotlb/kernel-2.4.21-31.swiotlb.EL.x86_64.rpm
http://people.redhat.com/coughlan/RHEL3-swiotlb/kernel-smp-2.4.21-31.swiotlb.EL.x86_64.rpm

Thanks,
Sue



Also sent Larry's test kernel to Levet Akyil of Intel on 3/31 who reports that
it "looks good. On my setup, all modules loaded and worked as expected (I was
able to see all the RAID adapters and no USB or ata-piix oops).  I didn't test
all the failing configurations though since some of them have to be reproduced
in the validation labs but I am pretty sure this would work
for those setups as well.

We can test more thoroughly with the next U5 beta/RC drop."

Comment 14 Larry Woodman 2005-04-07 21:23:51 UTC
Can someone fron Intel please explain whether or not 64 bit capable cards need
to use buffers that are below the 4GB bouandry.  If no, why are we checking for
the 4GB boundary rather than the device dma_mask?


Thanks, Larry Woodman


Comment 15 Suresh Siddha 2005-04-07 21:32:55 UTC
Thats because of the pci_alloc_consistent behavior.

Documentation/DMA-mapping.txt says consistent DMA mapping interface will always 
return SAC addressable DMA address. Though I don't know the reason why this 
behavior is expected!

Comment 16 Larry Woodman 2005-04-08 18:33:27 UTC
The desciption in DMA-mapping.txt seems to say that we should be trusting the
dma_mask since it was established via a call to pci_set_dma_mask().  If this
does the right thing then shouldnt we be checking the buffer against the
dma_mask rather than 0xffffffff(4GB)?

Larry


Comment 17 Suresh Siddha 2005-04-08 18:37:42 UTC
<snip>
Consistent DMA mappings are always SAC addressable.  That is
  to say, consistent DMA addresses given to the driver will always
  be in the low 32-bits of the PCI bus space.
</snip>

Comment 18 Larry Woodman 2005-04-11 20:34:57 UTC
Created attachment 112990 [details]
Rewrite the patch so we dont fail if there is a swiotlb.


I rewrote the patch so that it would call swiotlb_map_single() if the memory
allocation fails.  There is no sence in failing before we try to allocate from
the swiotlb if one exits is there???

Larry

Comment 19 Suresh Siddha 2005-04-13 03:00:30 UTC
Larry, Changes look good to me. Thanks.

Comment 24 Marty Wesley 2005-05-26 06:43:09 UTC
PM ACK for U6

Comment 35 Ernie Petrides 2005-07-27 23:25:14 UTC
Larry Troan, regarding comment #34, this is a RHEL3 bug.  Why is building
the patch into a RHEL4 kernel relevant?


Comment 36 Suresh Siddha 2005-07-27 23:33:50 UTC
I don't see comment #34. AFAIK, this is the issue only with RHEL3.

Comment 37 Issue Tracker 2005-07-28 18:05:32 UTC
From User-Agent: XML-RPC

Per Matt....

Finger check.... kenel-2.4.21-32.EL.smp   RHEL3....


This event sent from IssueTracker by ltroan
 issue 73055

Comment 40 Magdalena Glinkowski 2005-08-03 20:50:17 UTC
This bug is fixed in RHEL 3 U6 early release.

Comment 41 Samuel Benjamin 2005-08-03 21:14:31 UTC
Please provide the kernel version where this fix will be available for Dell
regression. Thanks

Comment 42 Ernie Petrides 2005-08-05 20:58:15 UTC
RHEL3 U6 does *not* contain a fix for this problem.

Comment 44 Ernie Petrides 2005-08-09 18:14:33 UTC
U6 is closed (and in beta already).


Comment 48 Ernie Petrides 2005-08-25 19:56:24 UTC
To bug reporter levent.akyil, does this bugzilla need to
remain confidential to Intel?  If not, could you please uncheck the
"Intel Confidential Group" box below?  Thanks in advance.

Comment 50 Issue Tracker 2005-08-31 14:24:02 UTC
From User-Agent: XML-RPC

Matt, can you elaborate on Dell's (your) results in testing the
Red_Hat/Intel patch in Bug 146954 which Intel has made public.

mdomsch assigned to issue for Dell-Engineering.

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by ltroan
 issue 75976

Comment 51 Larry Troan 2005-08-31 14:31:56 UTC
Opening up comment #50 that requests Dell to elaborate on the prpblems they had
in testing the Intel patch.

Comment 52 Larry Troan 2005-08-31 14:42:14 UTC
This Bug apparently is not a DUP of Bug 146789 (Engineering's call) but is tied
to to it. 

It is believed that there is a common patch which will resolve both the problems
described here and those described in bug 146789. Both bugs are now public.

Comment 53 Larry Woodman 2005-09-07 00:35:51 UTC
I am still waiting to hear back from Dell and Intel as to whether or not the
latest patch I posted works for everyone.  Since there was some disagreement
between Dell and Intel(Dell said the patch did not work but Intel said it did
work) I had to pull it fron the RHEL3-U6 kernel.  

I would like to get this isue resolved but I need to know for sure where it does
work and does not work so I can fix it if necessary.  The sooner I get this
feedback the soner I can fix it if necessary and get the patch into the RHEL3-U7
kernel.

Larry Woodman


Comment 54 dely.l.sy 2005-09-07 16:40:21 UTC
We are in the process of testing this patch.  We'll provide test result ASAP.

Comment 55 Larry Woodman 2005-09-09 20:40:48 UTC
Created attachment 118665 [details]
Can Intel please grab and test this patch ASAP?


I need Intel to grab this patch for inspection and testing ASAP.

Thanks, Larry Woodman

Comment 56 Suresh Siddha 2005-09-09 21:05:23 UTC
Larry, An issue with patch in comment #55.

We shouldn't use PCI_DMA_BIDIRECTIONAL. We should use PCI_DMA_FROMDEVICE while 
calling swiotlb_map_single() and PCI_DMA_TODEVICE while calling 
swiotlb_unmap_single(). This is to avoid memcpy's inside the 
swiotlb_map/unmap_single routines.

Comment 57 dely.l.sy 2005-09-09 22:05:25 UTC
Created attachment 118668 [details]
Map single panic

Comment 58 dely.l.sy 2005-09-09 22:11:35 UTC
We tested the patch posted on 4/11 and the one posted today.  We encountered 
kernel panic during boot when the system has >= 4GB meory.  When we lowered 
the memory to 2GB, the system booted fine.  See above attached kernel panic 
message.

Comment 59 Larry Woodman 2005-09-10 11:13:41 UTC
OK, can you experiment around with "swiotlb=<size>" on the boot line where
size is actual SWIOTLB size/2KB ?  By default is is set to 32768 which gives us
a 64MB SWIOTLB.  Please try doubling that(65536), I suspect some driver asks for
more as the system memory increases.

Thaks, Larry


Comment 60 dely.l.sy 2005-09-12 05:03:15 UTC
Yes, we experimented around with âswiotlb=<size>â and did other testings also:
1. Increased swiotlb to 65536 and 131072 in 2.4.21-37 with 4/11 patch ----- 
still encountered kernel panic with MPT driver as described above in comment 
#57;
2. Used the MPT 2.05.16.02 driver found in 2.4.21-32 kernel in 2.4.21-37 with 
4/11 patch ----- kernel booted up fine;
3. Increased IO_TLB_SEGSIZE to 256 in 2.4.21-37 with 4/11 patch ----- kernel 
booted up fine, but rmmoding and re-insmoding megaraid2 driver resulted in a 
kernel panic with null pointer dereference in __list_del() in 
scsi_softirq_handler().  Donât know if this is related to swiotlb changes 
though.


Comment 61 Suresh Siddha 2005-09-12 05:20:53 UTC
Thanks Dely for the update.

Larry, Newer version of Fusion MPT base driver(mptbase:PrimeIocFifos())  is 
requesting for a bigger chunk(376832  bytes) of pci_alloc_consistent mapping. 
Currently the limit on maximum allowable contiguous chunk with swiotlb is 128
(IO_TLB_SEGSIZE)*2KB.  So Increasing the IO_TLB_SEGSIZE to 256 makes the panic 
in comment #57 go away.

As Dely mentioned, we seem to be having another issue with megaraid2 driver. 
We will check if it has to do anything with swiotlb and getback to you monday 
morning.

Comment 62 dely.l.sy 2005-09-13 16:20:32 UTC
For sighting #3 in comment #60, we identified that it was a megaraid2 driver 
issue.  We put in a patch (submitted for bugzilla 154028) to the driver and 
the rmmoding and insmoding of megaraid2 driver worked fine without panic.

Therefore, Larry's 4/11 patch + increasing IO_TLB_SEGSIZE = 256 (as suggested 
in comment #57) fixes the issue.

Comment 63 dely.l.sy 2005-09-13 16:23:30 UTC
Correction:  increasing IO_TLB_SEGSIZE to 256 is suggested in comment #61. 

Comment 64 Suresh Siddha 2005-09-13 17:18:20 UTC
Created attachment 118765 [details]
SWIOTLB patch fixing the issue

Larry, This is the patch which we have tested and works. This is essentially
same as your patch in comment #18 with an additional change of IO_TLB_SEGSIZE
increased to 256.

We are doing extensive validation of this patch. Dely will post those results
as soon as they are available.

Comment 65 Suresh Siddha 2005-09-15 22:42:26 UTC
Created attachment 118873 [details]
Patch fixing the issue

Dale Busacker from Intel did more validation of the patch in comment #64 and
found out that the Adaptec RAID, ASR2230 requests 634880 bytes of  contiguous
memory and needs IO_TLB_SEGSIZE increased to 512 for making it functional.

We request Redhat to pickup the patch attached to this comment, which increased
IO_TLB_SEGSIZE to 512.

We tested this patch with the below listed scsi controllers(and their driver
versions) successfully.

qla2300      7.05.00-RH1
lpfc		7.3.2
mptscsih    2.06.16.01
aacraid     1.1-5[2361] 
megaraid2   2.10.10.1(along with the driver fix posted in bugzilla #154028)

Comment 73 Larry Woodman 2005-12-12 22:20:52 UTC
For the past few RHEL3 updates we have deferred fixing the DMA allocation
fialures on EM64T.  The problem is that the ia32e systems do not have hardware
IOMMUs so we must allocate DMA zone memory for DMA buffers whenever
there is more than 4GB of RAM.  The reason for this is that there are 2 zones;
the DMA zone for physical addresses between 0 and 16MB and the Normal
zone for physical addresses between 16MB and the end of RAM.  If there is
more than 4GB of RAM on the system we must allocate DMA buffers from
the 16MB DMA zone because we cant be sure that the Normal zone page is
below the 4GB boundary.   Because the DMA zone is so small(only 16MB
or 4096 pages) pci_alloc_consistant() frequently fails which results in
driver loading failures, etc.

In order to solve this problem without backporting the RHEL4 changes I
have added a boot-time option to increase the size of the DMA zone.  This
allows one to increase the size of the DMA zone and therefore significantly
reduce the risk of failing to allocate DMA buffers.
This solution does however eliminate the possibility of using 24-bit isa devices
when the DMA zone has veen increased above 16MB.  This is suposidly not
a problem because no one uses these devices in the EM64T systems anyway.

What does Intel think about this?


--- linux-2.4.21/arch/x86_64/kernel/e820.c.orig
+++ linux-2.4.21/arch/x86_64/kernel/e820.c
@@ -139,6 +139,11 @@ unsigned long end_pfn_map; 
 unsigned long end_user_pfn = MAXMEM>>PAGE_SHIFT;  
 
 /*
+ * last DMA zone pfn
+ */
+unsigned long end_dma_pfn = 0;
+
+/*
  * Find the highest page frame number we have available
  */
 
@@ -570,6 +575,11 @@ void __init parse_cmdline_early (char **
 			from+=8;
 			setup_io_tlb_npages(from);
 		}
+
+		else if (!memcmp(from, "maxdma=", 7)) {
+			end_dma_pfn = memparse(from+7, &from);
+			end_dma_pfn >>= PAGE_SHIFT;
+		}
 #endif
 #ifdef CONFIG_ACPI_PMTMR
 		else if (!memcmp(from, "pmtmr", 5)) {
--- linux-2.4.21/arch/x86_64/mm/numa.c.orig
+++ linux-2.4.21/arch/x86_64/mm/numa.c
@@ -79,6 +79,8 @@ void __init setup_node_bootmem(int nodei
 
 EXPORT_SYMBOL(maxnode);
 
+extern unsigned long end_dma_pfn;
+
 /* Initialize final allocator for a zone */
 void __init setup_node_zones(int nodeid)
 { 
@@ -93,9 +95,14 @@ void __init setup_node_zones(int nodeid)
 	end_pfn = PLAT_NODE_DATA(nodeid)->end_pfn; 
 
 	printk("setting up node %d %lx-%lx\n", nodeid, start_pfn, end_pfn); 
-	
-	/* All nodes > 0 have a zero length zone DMA */ 
-	dma_end_pfn = __pa(MAX_DMA_ADDRESS) >> PAGE_SHIFT; 
+
+	/* bootline maxdma= option overrides MAX_DMA_ADDRESS */
+	if (end_dma_pfn)
+		dma_end_pfn = end_dma_pfn;
+	else
+		dma_end_pfn = __pa(MAX_DMA_ADDRESS) >> PAGE_SHIFT; 
+
+	/* All nodes > 0 have a zero length zone DMA */
 	if (start_pfn < dma_end_pfn) { 
 		zones[ZONE_DMA] = dma_end_pfn - start_pfn;
 		zones[ZONE_NORMAL] = end_pfn - dma_end_pfn; 


Comment 76 Suresh Siddha 2006-01-30 19:58:10 UTC
Created attachment 123878 [details]
Fix to larrys proposal in comment #73

Larry, Your proposed patch in comment #73 doesn't work as it is. Attached patch
has the fixes to your proposal and this patch worked on couple of our test
setups. We are currently doing more tests with this patch.

Larry, What do you think about your proposal now? Modules built with this
proposed patch may not work on earlier update kernels. Is that Ok?

Comment 77 dely.l.sy 2006-01-31 00:39:05 UTC
I tested out the patch that Suresh submitted above and it worked fine on 
system with 6GB memory, 6 MegaRAID cards, 1 Adaptec ASR2230S card.  I tried 
using maxdma = 24M, 32M, 48M, and 64M and I didn't see any error messages from 
the megeraid2 and aacraid modules.  The NIC devices came up fine and the USB 
keyfob worked fine.

Comment 83 Larry Woodman 2006-04-03 19:02:34 UTC
Suresh, MAX_DMA_ADDRESS is the only change.  Why is this necessary, it breaks
the KABI.

Larry
 

Comment 84 Suresh Siddha 2006-04-03 20:27:36 UTC
Larry, Please look at the usage of MAX_DMA_ADDRESS in specifying the goal for 
the alloc_bootmem routines(include/linux/bootmem.h).. These routines exhaust 
the memory above 16MB (and which falls into our extended dma zone) for 
allocating bootmem..

Comment 85 Larry Woodman 2006-04-04 00:58:29 UTC
I realize this, however changing MAX_DMA_ADDRESS breaks the kernel ABI.

Larry

Comment 86 Suresh Siddha 2006-04-04 01:04:01 UTC
how about modifying bootmem routines(include/linux/bootmem.h) to use 
end_dma_pfn?

Comment 87 Larry Woodman 2006-04-06 19:45:54 UTC
Changing #defines in bootmem.h probably breaks the kernel ABI as well because
drivers that use it probably #include that file.

Larry


Comment 88 Suresh Siddha 2006-04-10 17:44:09 UTC
Created attachment 127562 [details]
Patch as per conversation with Larry.

Larry, as per our discussion here is the patch. (modifying
include/linux/bootmem.h)
Dely, please include your test results of this patch as soon as possible.

thanks.

Comment 89 Larry Woodman 2006-04-10 17:57:24 UTC
Created attachment 127564 [details]
My cut to the same patch...


Hi Suresh, here is my cut of the patch.  It looks almost identical :)

Larry


BTW, have you had a change to do any testing?

Comment 90 Larry Woodman 2006-04-10 18:16:19 UTC
BTW, the binary rpm with my patch is located here:

>>>http://people.redhat.com/~lwoodman/.for_intel/


Larry


Comment 91 Bob Johnson 2006-04-11 15:53:35 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 3.8 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 3.8 release.

Comment 92 dely.l.sy 2006-04-11 15:57:03 UTC
I'll test the patch today.

Comment 93 Larry Woodman 2006-04-11 17:20:54 UTC
I've updated the patch based on internal Red Hat feedback and place a new kernel
binary rpm for testing in:

>>>http://people.redhat.com/~lwoodman/.for_intel/



Comment 94 Larry Woodman 2006-04-11 17:27:57 UTC
Created attachment 127624 [details]
maxdma patch used in the above kernel and to be reviewed by Intel


Latest maxdma= boot option patch based on internal Red Hat review.

Comment 95 dely.l.sy 2006-04-11 22:55:14 UTC
I tested out the rpm posted by Larry on Comment #93on a system with 6GB 
memory, 6 MegaRAID cards, 1 Adaptec ASR2230S card.  I tried 
using maxdma = 24M and 32M. I didn't see any error messages from 
the megeraid2 and aacraid modules.  The NIC devices came up fine and the USB 
keyfob worked fine. 

Without using maxdma on 2.4.21-40.6.ia32e.EL or using the RHEL 4 U3 kernel, I 
saw error message from megaraid2 like "RAID: Can't allocate passthru" and 
system hung on initilaizing USB controller.




Comment 96 dely.l.sy 2006-04-11 22:59:17 UTC
Created attachment 127635 [details]
dmesg from run with maxdma=32M

This is the dmesg from test run using Larry's rpm with maxdma=32M.

Comment 97 dely.l.sy 2006-04-11 23:03:48 UTC
Created attachment 127636 [details]
dmesg from run w/o using maxdma boot option

This is the dmesg from test run using Larry's rpm but w/o using maxdma boot
parameter.

Comment 98 Larry Woodman 2006-04-11 23:28:23 UTC
Please grab the latest patch/latest rpm and re-run the test.

Thanks, Larry


Comment 99 dely.l.sy 2006-04-17 17:18:16 UTC
The test run was done with the rpm on comment #93.  Where is the latest patch 
or rpm mentioned on #98?

Comment 100 Larry Woodman 2006-04-17 17:27:56 UTC
Sorry, the latest patch and rpm was the one you tested from comment 93 and 94.
I was just afraid that you didnt have the latest one.

Larry Woodman


Comment 101 dely.l.sy 2006-04-17 17:46:38 UTC
I didn't receive a notification when you sent comment #97, so I didn't respond 
sooner on that. Yes, the test was done with rpm from comment #93.  

Does the dmesg from using maxdma=32M look ok for you?  I did see the 
differnece from testing the latest rpm with or without using maxdma boot 
option.

Comment 102 Ernie Petrides 2006-04-20 01:21:29 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.7.EL).


Comment 104 Joshua Giles 2006-05-30 16:03:32 UTC
A kernel has been released that contains a patch for this problem.  Please
verify if your problem is fixed with the latest available kernel from the RHEL3
public beta channel at rhn.redhat.com.

Comment 105 Ernie Petrides 2006-05-30 20:22:49 UTC
Reverting to ON_QA.

Comment 107 dely.l.sy 2006-06-05 20:58:53 UTC
I tested RHEL 3 U8 private beta on on a Harwich system with 6GB memory, 6 
MegaRAID cards, and 1 Adaptec ASR2230S card. Using maxdma=32M boot paramter 
both during installation and subsequent boot, I was able to install and boot 
on RHEL 3 U8 private beta. 
Without using maxdma=32M, kernel panic was encountered during installation and 
subsequent boots.

Comment 110 dely.l.sy 2006-07-11 07:23:36 UTC
This bugzilla can be closed.  On a Harwich system with 6GB memory, 6 
MegaRAID cards, and 1 Adaptec ASR2230S card,  I was able to install and boot 
on RHEL 3 U8 Beta 2 using maxdma=32M boot paramter on both installation and 
subsequent boot.


Comment 111 Red Hat Bugzilla 2006-07-20 13:19:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.