Description of problem: install RC1 4.5 on a system with ata_piix controller. run any disk stress on this controller. The system runs out of memory very quickly. This happens only on systems with > 4 gig of RAM. Version-Release number of selected component (if applicable): kernel version 2.6.9-54.ELsmp. ata_piix version 2.00ac7 How reproducible: always. Steps to Reproduce: 1. install 4.5 rc1 on an sata controller (driver is ata_piix) system RAM > 4G 2. run any disk stress of 100 writes and reads of 1G. Actual results: The system runs out of memory in about a minute. Expected results: The system should continue to run okay except be a little slow. Additional info: attached dmesg and lsmod outputs. the reference count on the ata_piix module is exteremly high even though the disk is not in use. line number 331 in dmesg output may be indicating something?
Created attachment 154031 [details] dmesg output at boot.
Created attachment 154032 [details] lsmod showing very high reference count
Created attachment 154033 [details] log of out of Memory.
We are trying to reproduce this issue with the hugemem kernel. And also with the RHEL 4.5 GA kernel -55.
Per comment #4 above, I suggested to Dell that they try the RHEL4.5 GA smp kernel as well as the hugemem kernel. There are some instances in the 32 bit environment where we run out of Zone1 memory under specific stress conditions once we get above 4GB and have to employ bounce buffers. This condition usually occurs when you have an adapter that is only capable of 32 bit DMA addressing (rather than 64 bit). The hugemem kernel alleviates the stress by freeing up additional memory. Note that BIOS mapping can also affect this condition. Another interesting data point here would be the RHEL5 PAE kernel (use the day0 errata kernel) since Red Hat doesn't ship the hugemem kernel in RHEL5 and there are some changes to zone allocation and usage that may also alleviate the situation.
See http://www.linux.com/howtos/IO-Perf-HOWTO/overview.shtml for a discussion on the use of bounce buffers and their effect on machine operation.
Can you attach the reproducer script that you're using? I want to simulate your system as closely as possible.
Dell folks, What model of system is this?
(In reply to comment #8) > Dell folks, > > What model of system is this? This is on a poweredge 830 and we have seen this on a Poweredge 1900 too. This will reproduce on any Poweredge with intel sata controller on board. > 4G and 686 kernel.
(In reply to comment #5) > Per comment #4 above, I suggested to Dell that they try the RHEL4.5 GA smp > kernel as well as the hugemem kernel. > > There are some instances in the 32 bit environment where we run out of Zone1 > memory under specific stress conditions once we get above 4GB and have to employ > bounce buffers. This condition usually occurs when you have an adapter that is > only capable of 32 bit DMA addressing (rather than 64 bit). The hugemem kernel > alleviates the stress by freeing up additional memory. Note that BIOS mapping > can also affect this condition. > > Another interesting data point here would be the RHEL5 PAE kernel (use the day0 > errata kernel) since Red Hat doesn't ship the hugemem kernel in RHEL5 and there > are some changes to zone allocation and usage that may also alleviate the > situation. We have reproduced the issue on RHEL 4.5 GA smp and hugemem kernels(2.6.9-55).
(In reply to comment #7) > Can you attach the reproducer script that you're using? I want to simulate your > system as closely as possible. The reference count issue that I have pointed out in the lsmod attachment can be straight away reproduced. Just install the module and see the output of lsmod. the reference count will be some 4294967295. In the meanwhile I could verify if I could some how get you the tool it is called iobasher. This tool is a Dell internal only tool. We have seen this issue with other tools too. If you run the test with just 1 thread and large files sizes of say 1Gig. then we could reproduce the problem. I have done it with "iozone -s 1g" (http://www.iozone.org)
More to add The same issue also seen on an AMD system with sata_svw driver. The exact same behaviour high refcount when the module is installed, but not used. Disk stress "iozone -s 1g" causes out of memory. The system was a Power Edge 1435 SC with 8G of RAM
This issue is NOT observed on RHEL4.0 with libata 1.03 and ata_piix 1.02c RHEL4.3 with libata 1.2 and ata_piix 1.05 The issue is observed on RHEL4.4 with libata 1.2 and ata_piix 1.05 RHEL4.5 with libata 2.0 and ata_piix 2.00ac7 looks like a some other change broke sata/libata?
A similar issue exists with using the iscsi initiator; doing iogen/iobasher on systems >4Gb will generate out-of-memory problems. So this may actually be something in the scsi midlayer vs libata/ata_piix. With iSCSI initiator, as with ata_piix, the dma_mask is set to 32-bit, but bounce buffers aren't needed as iscsi isn't doing DMA to a real device (the network layer handles its own dma).
More updates: I disabled oom_killer (echo 0 > /proc/sys/vm/oom-kill) and the same stress seems to running fine for > 12 hours now. Can we say that the oom_killer is not behaving properly? The system is low on normal memory, but still functions okay when oom_kill is off.
On RHEL 4.5 hugemem the issue is not reproducable. On the RHEL 4.5 smp kernel the issue is reproducable.
(In reply to comment #11) > (In reply to comment #5) > > Per comment #4 above, I suggested to Dell that they try the RHEL4.5 GA smp > > kernel as well as the hugemem kernel. > > > > There are some instances in the 32 bit environment where we run out of Zone1 > > memory under specific stress conditions once we get above 4GB and have to employ > > bounce buffers. This condition usually occurs when you have an adapter that is > > only capable of 32 bit DMA addressing (rather than 64 bit). The hugemem kernel > > alleviates the stress by freeing up additional memory. Note that BIOS mapping > > can also affect this condition. > > > > Another interesting data point here would be the RHEL5 PAE kernel (use the day0 > > errata kernel) since Red Hat doesn't ship the hugemem kernel in RHEL5 and there > > are some changes to zone allocation and usage that may also alleviate the > > situation. > > We have reproduced the issue on RHEL 4.5 GA smp and hugemem kernels(2.6.9-55). This should only be RHEL 4.5 GA smp and not hugemem.
Since this bug dealt with running out of memory and, specifically, the oom-killer seemed to play a role, I asked Larry Woodman to take a look at this bugzilla. Larry looked at the above output and says that this bugzilla is a duplicate of bz158636 "Out-of-memory with 8Gb RAM and Intel ICH7 SATA controller". He explained that given the (large) amount of system RAM and the configured maximum number of nr_requests (8196), which equates to the number of bytes in the bounce buffers, leads to a trap situation when a large I/O operation takes place on a device. In essence, the bounce buffers suck up available memory leading to a memory suffocation for the applications. Upon viewing the output from comment #3, the reader will notice that the number of bounce buffer pages currently in use was very high (153270). Such a configuration leads to a large memory allocation and since this is "larger than Lowmem we exhaust it and potentially start OOM killing" (bz158636). The reader should review the contents of bz158636 for more internal details, if possible. For those who are thinking that this does not explain why turning off the oom-killer allowed the apps to run for hours, Larry said that the system was running out of memory, regardless of the state of the oom-killer. The reader should also take note of the documented use of bounce buffers as found in comment #6 above where it states that "Systems with a large amount of high memory and intense I/O activity can create a large number of bounce buffers that can cause memory shortage problems. In addition, the excessive number of bounce buffer data copies can lead to performance degradation." The outcome from bz158636 will be repeated here just in case it is not possible to see bz158636. First, the kernel configuration can be modified to restrict the bounce buffer size so it won't suck up so much memory. To do this, "echo 128 > /sys/block/<device>/queue/nr_requests" to reduce the value of nr_requests from 8192 to 128. Replace <device> with the name(s) of all 32-bit devices. This value is in line with the value found in upstream kernels. Second, there is a patch for bz158636 and that bugzilla is scheduled for the RHEL4.6 release. (current status: "rhel-4.6 +") Third, there is a kbase article in the works, whose proposed contents is found in bz158636 also. Last, as was empirically determined, a hugemem kernel does not exhibit this bug so that should be treated as a possible solution too. Given the above, this bugzilla should be closed as a dup of bz158636 after it is ensured that bz158636 is visible to all and the above configuration change is proven to work.
KBase article submitted on 5/31/07. Recommendation is to use the hugemem kernel in thsi situation.
(In reply to comment #19) > Since this bug dealt with running out of memory and, specifically, the > oom-killer seemed to play a role, I asked Larry Woodman to take a look at this > bugzilla. Larry looked at the above output and says that this bugzilla is a > duplicate of bz158636 "Out-of-memory with 8Gb RAM and Intel ICH7 SATA > controller". > > He explained that given the (large) amount of system RAM and the configured > maximum number of nr_requests (8196), which equates to the number of bytes in > the bounce buffers, leads to a trap situation when a large I/O operation > takes place on a device. In essence, the bounce buffers suck up available memory > leading to a memory suffocation for the applications. > > Upon viewing the output from comment #3, the reader will notice that the > number of bounce buffer pages currently in use was very high (153270). Such a > configuration leads to a large memory allocation and since this is "larger than > Lowmem we exhaust it and potentially start OOM killing" (bz158636). The reader > should review the contents of bz158636 for more internal details, if possible. > > For those who are thinking that this does not explain why turning off the > oom-killer allowed the apps to run for hours, Larry said that the system > was running out of memory, regardless of the state of the oom-killer. > > The reader should also take note of the documented use of bounce buffers as > found in comment #6 above where it states that "Systems with a large amount of > high memory and intense I/O activity can create a large number of bounce > buffers that can cause memory shortage problems. In addition, the excessive > number of bounce buffer data copies can lead to performance degradation." > > The outcome from bz158636 will be repeated here just in case it is not > possible to see bz158636. First, the kernel configuration can be modified to > restrict the bounce buffer size so it won't suck up so much memory. To do this, > "echo 128 > /sys/block/<device>/queue/nr_requests" > to reduce the value of nr_requests from 8192 to 128. Replace <device> with the > name(s) of all 32-bit devices. This value is in line with the value found in > upstream kernels. > > Second, there is a patch for bz158636 and that bugzilla is scheduled for > the RHEL4.6 release. (current status: "rhel-4.6 +") > > Third, there is a kbase article in the works, whose proposed contents is found > in bz158636 also. > > Last, as was empirically determined, a hugemem kernel does not exhibit this > bug so that should be treated as a possible solution too. > > Given the above, this bugzilla should be closed as a dup of bz158636 after > it is ensured that bz158636 is visible to all and the above configuration > change is proven to work. Tested the patch from bz158636 disk, stress does not reproduce the issue.
See KBase article: http://kbase.redhat.com/faq/FAQ_85_10725.shtm "Why does my 32-bit Red Hat Enterprise Linux 4 system with a SATA controller run out of memeory very quickly?"
This bug illustrates how a delicate balancing act with a 32bit DMA adapter under stress doesn't always work with the RHEL3 SMP kernel. The HUGHMEM kernel should be used in this case because it allocated memory differently than the SMP kernel and will not fail under such load. See the above KBase article for additional details. Closing as WONTFIX.