238855 – disk stress on ata_piix controller cause out of memory with > 4G RAM

Bug 238855 - disk stress on ata_piix controller cause out of memory with > 4G RAM

Summary: disk stress on ata_piix controller cause out of memory with > 4G RAM

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.5
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	John Feeney
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	246028
TreeView+	depends on / blocked

Reported:	2007-05-03 13:29 UTC by Sandeep K. Shandilya
Modified:	2018-10-19 19:25 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-07-05 13:47:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dmesg output at boot. (16.13 KB, text/plain) 2007-05-03 13:29 UTC, Sandeep K. Shandilya	no flags	Details
lsmod showing very high reference count (1.07 KB, text/plain) 2007-05-03 13:31 UTC, Sandeep K. Shandilya	no flags	Details
log of out of Memory. (19.26 KB, text/plain) 2007-05-03 13:33 UTC, Sandeep K. Shandilya	no flags	Details
View All

Description Sandeep K. Shandilya 2007-05-03 13:29:38 UTC

Description of problem:
install RC1 4.5 on a system with ata_piix controller. run any disk stress on
this controller. The system runs out of memory very quickly. This happens only
on systems with > 4 gig of RAM.

Version-Release number of selected component (if applicable):
kernel version 2.6.9-54.ELsmp. ata_piix version 2.00ac7


How reproducible:
always.

Steps to Reproduce:
1. install 4.5 rc1 on an sata controller (driver is ata_piix) system RAM > 4G
2. run any disk stress of 100 writes and reads of 1G.

  
Actual results:
The system runs out of memory in about a minute.

Expected results:
The system should continue to run okay except be a little slow.

Additional info:

attached dmesg and lsmod outputs.
the reference count on the ata_piix module is exteremly high even though the
disk is not in use.

line number 331 in dmesg output may be indicating something?

Comment 1 Sandeep K. Shandilya 2007-05-03 13:29:38 UTC

Created attachment 154031 [details]
dmesg output at boot.

Comment 2 Sandeep K. Shandilya 2007-05-03 13:31:25 UTC

Created attachment 154032 [details]
lsmod showing very high reference count

Comment 3 Sandeep K. Shandilya 2007-05-03 13:33:27 UTC

Created attachment 154033 [details]
log of out of Memory.

Comment 4 Charles Rose 2007-05-03 13:44:41 UTC

We are trying to reproduce this issue with the hugemem kernel. And also with the
RHEL 4.5 GA kernel -55.

Comment 5 Larry Troan 2007-05-03 14:08:38 UTC

Per comment #4 above, I suggested to Dell that they try the RHEL4.5 GA smp
kernel as well as the hugemem kernel. 

There are some instances in the 32 bit environment where we run out of Zone1
memory under specific stress conditions once we get above 4GB and have to employ
bounce buffers. This condition usually occurs when you have an adapter that is
only capable of 32 bit DMA addressing (rather than 64 bit). The hugemem kernel
alleviates the stress by freeing up additional memory. Note that BIOS mapping
can also affect this condition.

Another interesting data point here would be the RHEL5 PAE kernel (use the day0
errata kernel) since Red Hat doesn't ship the hugemem kernel in RHEL5 and there
are some changes to zone allocation and usage that may also alleviate the
situation.

Comment 6 Larry Troan 2007-05-03 14:34:59 UTC

See http://www.linux.com/howtos/IO-Perf-HOWTO/overview.shtml for a discussion on
the use of bounce buffers and their effect on machine operation.

Comment 7 Gary Case 2007-05-03 14:41:12 UTC

Can you attach the reproducer script that you're using? I want to simulate your
system as closely as possible.

Comment 8 Gary Case 2007-05-03 14:50:28 UTC

Dell folks,

What model of system is this?

Comment 9 Gary Case 2007-05-03 14:50:52 UTC

Dell folks,

What model of system is this?

Comment 10 Sandeep K. Shandilya 2007-05-03 15:02:11 UTC

(In reply to comment #8)
> Dell folks,
> 
> What model of system is this?
This is on a poweredge 830 and we have seen this on a Poweredge 1900 too.
This will reproduce on any Poweredge with intel sata controller on board.
> 4G and 686 kernel.

Comment 11 Sandeep K. Shandilya 2007-05-03 15:04:22 UTC

(In reply to comment #5)
> Per comment #4 above, I suggested to Dell that they try the RHEL4.5 GA smp
> kernel as well as the hugemem kernel. 
> 
> There are some instances in the 32 bit environment where we run out of Zone1
> memory under specific stress conditions once we get above 4GB and have to employ
> bounce buffers. This condition usually occurs when you have an adapter that is
> only capable of 32 bit DMA addressing (rather than 64 bit). The hugemem kernel
> alleviates the stress by freeing up additional memory. Note that BIOS mapping
> can also affect this condition.
> 
> Another interesting data point here would be the RHEL5 PAE kernel (use the day0
> errata kernel) since Red Hat doesn't ship the hugemem kernel in RHEL5 and there
> are some changes to zone allocation and usage that may also alleviate the
> situation. 

We have reproduced the issue on RHEL 4.5 GA smp and hugemem kernels(2.6.9-55).

Comment 12 Sandeep K. Shandilya 2007-05-03 16:20:40 UTC

(In reply to comment #7)
> Can you attach the reproducer script that you're using? I want to simulate your
> system as closely as possible.

The reference count issue that I have pointed out in the lsmod attachment can be
straight away reproduced. Just install the module and see the output of lsmod.
the reference count will be some 4294967295. In the meanwhile I could verify if
I could some how get you the tool it is called iobasher. This tool is a Dell
internal only tool. We have seen this issue with other tools too. If you run the
test with just 1 thread and large files sizes of say 1Gig. then we could
reproduce the problem. I have done it with "iozone -s 1g" (http://www.iozone.org)

Comment 13 Sandeep K. Shandilya 2007-05-04 14:52:19 UTC

More to add

The same issue also seen on an AMD system with sata_svw driver. The exact same
behaviour high refcount when the module is installed, but not used.
Disk stress "iozone -s 1g" causes out of memory. The system was a Power Edge
1435 SC with 8G of RAM

Comment 14 Sandeep K. Shandilya 2007-05-04 16:16:20 UTC

This issue is NOT observed on

RHEL4.0 with libata 1.03 and ata_piix 1.02c
RHEL4.3 with libata 1.2 and ata_piix 1.05

The issue is observed on 
RHEL4.4 with libata 1.2 and ata_piix 1.05
RHEL4.5 with libata 2.0 and ata_piix 2.00ac7

looks like a some other change broke sata/libata?

Comment 15 jordan hargrave 2007-05-04 20:37:33 UTC

A similar issue exists with using the iscsi initiator; doing iogen/iobasher on
systems >4Gb will generate out-of-memory problems.  So this may actually be
something in the scsi midlayer vs libata/ata_piix.

With iSCSI initiator, as with ata_piix, the dma_mask is set to 32-bit, but
bounce buffers aren't needed as iscsi isn't doing DMA to a real device (the
network layer handles its own dma).

Comment 16 Sandeep K. Shandilya 2007-05-10 15:24:47 UTC

More updates:

I disabled oom_killer (echo 0 > /proc/sys/vm/oom-kill) and the same stress seems
to running fine for > 12 hours now.

Can we say that the oom_killer is not behaving properly?
The system is low on normal memory, but still functions okay when oom_kill is off.

Comment 17 Sandeep K. Shandilya 2007-05-14 12:23:07 UTC

On RHEL 4.5 hugemem the issue is not reproducable. On the RHEL 4.5 smp kernel
the issue is reproducable.

Comment 18 Sandeep K. Shandilya 2007-05-14 12:57:36 UTC

(In reply to comment #11)
> (In reply to comment #5)
> > Per comment #4 above, I suggested to Dell that they try the RHEL4.5 GA smp
> > kernel as well as the hugemem kernel. 
> > 
> > There are some instances in the 32 bit environment where we run out of Zone1
> > memory under specific stress conditions once we get above 4GB and have to employ
> > bounce buffers. This condition usually occurs when you have an adapter that is
> > only capable of 32 bit DMA addressing (rather than 64 bit). The hugemem kernel
> > alleviates the stress by freeing up additional memory. Note that BIOS mapping
> > can also affect this condition.
> > 
> > Another interesting data point here would be the RHEL5 PAE kernel (use the day0
> > errata kernel) since Red Hat doesn't ship the hugemem kernel in RHEL5 and there
> > are some changes to zone allocation and usage that may also alleviate the
> > situation. 
> 
> We have reproduced the issue on RHEL 4.5 GA smp and hugemem kernels(2.6.9-55).

This should only be RHEL 4.5 GA smp and not hugemem.

Comment 19 John Feeney 2007-05-14 21:44:43 UTC

Since this bug dealt with running out of memory and, specifically, the
oom-killer seemed to play a role, I asked Larry Woodman to take a look at this
bugzilla. Larry looked at the above output and says that this bugzilla is a
duplicate of bz158636 "Out-of-memory with 8Gb RAM and Intel ICH7 SATA
controller". 

He explained that given the (large) amount of system RAM and the configured
maximum number of nr_requests (8196), which equates to the number of bytes in
the bounce buffers, leads to a trap situation when a large I/O operation
takes place on a device. In essence, the bounce buffers suck up available memory
leading to a memory suffocation for the applications.

Upon viewing the output from comment #3, the reader will notice that the 
number of bounce buffer pages currently in use was very high (153270). Such a 
configuration leads to a large memory allocation and since this is "larger than 
Lowmem we exhaust it and potentially start OOM killing" (bz158636). The reader
should review the contents of bz158636 for more internal details, if possible.

For those who are thinking that this does not explain why turning off the 
oom-killer allowed the apps to run for hours, Larry said that the system
was running out of memory, regardless of the state of the oom-killer.  
 
The reader should also take note of the documented use of bounce buffers as
found in comment #6 above where it states that "Systems with a large amount of 
high memory and intense I/O activity can create a large number of bounce 
buffers that can cause memory shortage problems. In addition, the excessive
number of bounce buffer data copies can lead to performance degradation."

The outcome from bz158636 will be repeated here just in case it is not 
possible to see bz158636. First, the kernel configuration can be modified to
restrict the bounce buffer size so it won't suck up so much memory. To do this, 
    "echo 128 > /sys/block/<device>/queue/nr_requests"
to reduce the value of nr_requests from 8192 to 128. Replace <device> with the
name(s) of all 32-bit devices. This value is in line with the value found in 
upstream kernels.

Second, there is a patch for bz158636 and that bugzilla is scheduled for
the RHEL4.6 release. (current status: "rhel-4.6 +")

Third, there is a kbase article in the works, whose proposed contents is found
in bz158636 also.  

Last, as was empirically determined, a hugemem kernel does not exhibit this 
bug so that should be treated as a possible solution too. 

Given the above, this bugzilla should be closed as a dup of bz158636 after 
it is ensured that bz158636 is visible to all and the above configuration
change is proven to work.

Comment 20 Larry Troan 2007-05-31 19:50:21 UTC

KBase article submitted on 5/31/07. Recommendation is to use the hugemem kernel
in thsi situation.

Comment 21 Sandeep K. Shandilya 2007-06-01 08:14:48 UTC

(In reply to comment #19)
> Since this bug dealt with running out of memory and, specifically, the
> oom-killer seemed to play a role, I asked Larry Woodman to take a look at this
> bugzilla. Larry looked at the above output and says that this bugzilla is a
> duplicate of bz158636 "Out-of-memory with 8Gb RAM and Intel ICH7 SATA
> controller". 
> 
> He explained that given the (large) amount of system RAM and the configured
> maximum number of nr_requests (8196), which equates to the number of bytes in
> the bounce buffers, leads to a trap situation when a large I/O operation
> takes place on a device. In essence, the bounce buffers suck up available memory
> leading to a memory suffocation for the applications.
> 
> Upon viewing the output from comment #3, the reader will notice that the 
> number of bounce buffer pages currently in use was very high (153270). Such a 
> configuration leads to a large memory allocation and since this is "larger than 
> Lowmem we exhaust it and potentially start OOM killing" (bz158636). The reader
> should review the contents of bz158636 for more internal details, if possible.
> 
> For those who are thinking that this does not explain why turning off the 
> oom-killer allowed the apps to run for hours, Larry said that the system
> was running out of memory, regardless of the state of the oom-killer.  
>  
> The reader should also take note of the documented use of bounce buffers as
> found in comment #6 above where it states that "Systems with a large amount of 
> high memory and intense I/O activity can create a large number of bounce 
> buffers that can cause memory shortage problems. In addition, the excessive
> number of bounce buffer data copies can lead to performance degradation."
> 
> The outcome from bz158636 will be repeated here just in case it is not 
> possible to see bz158636. First, the kernel configuration can be modified to
> restrict the bounce buffer size so it won't suck up so much memory. To do this, 
>     "echo 128 > /sys/block/<device>/queue/nr_requests"
> to reduce the value of nr_requests from 8192 to 128. Replace <device> with the
> name(s) of all 32-bit devices. This value is in line with the value found in 
> upstream kernels.
> 
> Second, there is a patch for bz158636 and that bugzilla is scheduled for
> the RHEL4.6 release. (current status: "rhel-4.6 +")
> 
> Third, there is a kbase article in the works, whose proposed contents is found
> in bz158636 also.  
> 
> Last, as was empirically determined, a hugemem kernel does not exhibit this 
> bug so that should be treated as a possible solution too. 
> 
> Given the above, this bugzilla should be closed as a dup of bz158636 after 
> it is ensured that bz158636 is visible to all and the above configuration
> change is proven to work.

Tested the patch from bz158636 disk, stress does not reproduce the issue.

Comment 22 Larry Troan 2007-07-05 13:43:19 UTC

See KBase article: http://kbase.redhat.com/faq/FAQ_85_10725.shtm
"Why does my 32-bit Red Hat Enterprise Linux 4 system with a SATA controller run
out of memeory very quickly?"

Comment 23 Larry Troan 2007-07-05 13:47:10 UTC

This bug illustrates how a delicate balancing act with a 32bit DMA adapter under
stress doesn't always work with the RHEL3 SMP kernel. The HUGHMEM kernel should
be used in this case because it allocated memory differently than the SMP kernel
and will not fail under such load. See the above KBase article for additional
details.

Closing as WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.