Description of problem: We have seen this problem on a PE1655MC and then on a PE2800 when trying to duplicate it. This loss of video happens when a USB external hard drive is attached and is being formatted with mkfs.ext2. This is with RHEL4 Beta2 x86, smp kernel, run level 5 with X Windows running: F1 virtual screen: hdparm -t /dev/sda (boot drive) F2 virtual screen: mkfs.ext2 /dev/sdb1 (USB external hard drive) F6 virtual screen: top When we see the usage going up past 10 or so, then we can expect the problem on the 1655MC. On the PE2800 it has to go up to about 13-25 or so before the problem is seen. On the 1655MC blade server we lose all video, even virtual screens, every time during one format of the USB external hard drive. But on the PE2800, we could not see it at first on the first few formats of the USB external hard drive. However, if we run the mkfs.ext2 format in a loop, pausing every 5 seconds after each format, we see it complete many times then we get the loss of video. On the PE2800, this video loss is seen within about 15 minutes or so. Using the Avocent kvm, we lose ALL video (console and all virtual screens are not visible nor accesslble), but using a direct connect monitor, as before on the 1655MC, xfs is seen as the last process killed, but on the monitor we see X Windows collapsing, but then due to it being launched from /etc/inittab, it gets respawned....but by that time the user would have lost any remote connections and editing of files, and their desktop... Version-Release number of selected component (if applicable): How reproducible: Easy Steps to Reproduce: 1. see description 2. 3. Actual results: system crashes as described above Expected results: successful format, system still running Additional info:
"xfs is seen as the last process killed" What log messages do you see when this is happening? This feels like a VM problem. I've fixed similar problems several times in the past, especially on 2.2 / 2.4 systems, where mke2fs was consuming all of memory with dirty buffer heads. If processes are being actively killed by the kernel, that could well be the out-of-memory killer, which would imply that we've got a similar buffer management problem in the current kernel. However, we really need the kernel logs to know for sure --- can somebody try to recreate this with netconsole or serial console enabled? Trying to capture "vmstat 1" during the problem could also help to verify the problem. If we confirm it's an out-of-memory problem, then the next step would be to look for USB errors in the log. If those are present, then that's the next problem to attack. If not, though, it implies that it's a VM balancing problem that we need to deal with.
please verify this on the latest rhel4 beta kernel: http://people.redhat.com/davej/kernels/RHEL4/RPMS.kernel/ thanks.
This problem still occurs on 2.6.9-1.849_ELsmp. It definately seems to me to be the OOM going wild. Right before it goes serial the memory usage seems fine. I am attempting to gather the logs.
Created attachment 108064 [details] OOM on 2.6.9-1.860_ELsmp during mkfs on usb 2.0 HD
Issue does not appear on stock 2.6.9 kernel.
**Initial** testing reveals that this issue is fixed by Rik's patch in BZ#133858 (on 12/04/2004) https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=133858
Created attachment 108177 [details] contains meminfo and slabinfo when oom killer is active. It seems that the dm layer is allocating a bunch of biovec-1 and bio from the slab cache. Then then the memory is exhausted so the OOM killer comes out. I can reproduce this problem easily by formatting an 80GB USB 2.0 HD attached to a PE1655MC. I don't need any other stress. I am not sure if Rik's patch will prevent it in all cases... or just delay the action.
What is really the throttling mechanism when one process does alot of writes to a device? What should stall the device untill to those writes complete and not suck up all cache?
My Bad. BZ133858 does not fix this issue. It was booting a older kernel and not the correctly patched kernel with Rik's patch. The patch does not resolve the issue. Also, If you look at the logs posted in #10, you can see that the biovec-1 and bio slabs around 400,000 active objects. With the kernel.org 2.6.9 kernel that doesn't have the problem the are around 20,000.
This bug is still present with 2.6.9-1.906_ELsmp
We agree that this isn't pretty. We've included the following text in the RHEL 4 release notes, which you haven't yet seen: "The initial release of Red Hat Enterprise Linux 4 does not support USB hard disk drives. However, other USB storage devices, such as flash media, CD-ROM, and DVD-ROM devices are currently supported." We're pursuing a fix for this for possible inclusion in early RHEL 4 update.
I would agree with your release notes if the root-cause was found to be only tied to large USB storage devices. I am not confident that it is at this time.
Created attachment 108649 [details] Reproduced without USB. This issue has now been reproduced without USB being present. The system is a PE700 with a CERC SATA 6CH card. Two mkfs's where run simultaneously with an increased number of inodes (-N with more than 40,000,000)
Looks like we need the lower zone protection hack we've got in RHEL3, ensuring that allocations with __GFP_HIGHMEM do not come from the DMA zone, on a system with highmem present. Active:45841 inactive:211029 dirty:642 writeback:194444 unstable:0 free:690536 slab:22839 mapped:45294 pagetables:928 DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:11500kB present:16384kB protections[]: 0 0 0 Normal free:2256kB min:936kB low:1872kB high:2808kB active:0kB inactive:768776kB present:901120kB protections[]: 0 0 0 HighMem free:2759872kB min:512kB low:1024kB high:1536kB active:183364kB inactive:63968kB present:3014400kB protections[]: 0 0 0
Thanks for looking at this Rik. Also, I just tested 2.6.9-1.906_EL_exitfixsmp and it didn't resolve the issue.
Robert, Regarding comment #19. Is the only "load" you put on the system the 2 simultaneous `mkfs` commands? Or is thrasher or other stress running at the same time?
No that was the only load... other than running it under X windows. The tar ball should have a PS listing.
Tim, Comment #23 was submitted by me. I was just reading the issue using Rogers workstation. -- Robert Hentosh.
It looks like one of those ludicrously easy to introduce threshold errors that tend to plague the VM on a regular basis. Basically, the block device cache only uses lowmem (AFAIK) so most of highmem is free. It looks like 80% of the lowmem zone is in writeback state, leaving not much pages around the VM could reclaim. Unfortunately, this is only around 20% of total system memory, or half of the 40% required to make the systemwide writeback throttling work right. This means that the writers (mkfs) can continue dirtying pages without any throttling, pushing the VM right off a cliff. I guess this means that the VM should deal with this situation, instead of relying on writeback throttling...
I am not sure why I am seeing the following behavior... but if one goes and does a dd if=/dev/zero of=/dev/sdb1 bs=512 count=78116000 everything works fine... but... dd if=/dev/zero of=/dev/sdb1 bs=4096 count=9764500 fails and causes the OOM to go postal. Using 512 as a block size results in the bio biovec-1 in /proc/slabinfo to stay bellow 200 and using 4096 causes it to blow up over 400,000 active objects. BTW, if you look at /proc/slabinfo it seems that the tunables: <batchcount> and <limit> fields names are switched. (refer to mm/slab.c in functions s_start() for the header and s_show() for the data seq_printf )
Robert, I've combed through the VM code but can't see an easy workaround to fix this bug, that won't risk introducing big regressions elsewhere in the VM. Because of this, I would like to ask you if you could please try lowering /proc/sys/vm/dirty_ratio to something like 10% and see if the bug is still reproducible.
Robert, I have another question for you, trying to narrow down the bug further. Can you reproduce the bug when you boot the system with "mem=1G" ?
With mem=1G I do not see the problem. The bio* slabs entries are still around 400000. I will try lowering the dirty_ratio.
So far I am unable to reproduce with /proc/sys/vm/dirty_ratio lowered to 10%.
Robert, I am not 100% positive yet whether my test patch fixes the bug, but I would invite you to try and reproduce the bug with my test kernels anyway. You can get test kernels, and the patch that produces them, from my people page: http://people.redhat.com/riel/.test-kernel-bug-141173/ If this kernel resolves your issue, I will also make sure the fix gets merged into the upstream kernel.
Created attachment 108749 [details] The last patch failed Nope. It didn't pass. Attached is the output of dmesg.
Sent to Dale earlier today in email: Dale: As you saw by Robert's most recent reply last evening, Rik's patch didn't resolve the problem and he's working on another one. However, to keep as many things moving forward as possible -- and although we do indeed have the exact configuration you're seeing this failure on -- can you box up one of those systems TODAY and overnight it to us in Westford, attention Joe Connolly? If seeing for ourselves lets us crack the problem, then it was worth the effort. Could this be a BIOS problem? We're running out of possibilities to explain this.... Hence getting the actual hardware in our hands. Thanks, Sue
Robert, I have created a test kernel with an additional patch (rhel4-sc-congested.patch) as well as yesterday's patch. Could you please try out http://people.redhat.com/riel/.test-kernel-bug-141173/kernel-smp-2.6.9-1.906_ELcongest.i686.rpm ? I'm currently compiling the up and hugemem kernels of this set, too, and will upload them later. Initial test results in our lab show positive results, but I'm sure this new kernel could also use a good smoke test...
Reassigning to myself, since the bug doesn't look very USB related, at least not these layers of it ...
The ELconest kernel has passed the first test. So far so good.
A PE420SC running the 906_ELcongest kernel and Thrasher lost it's video over the weekend. You can see the "X" printed on the screen, some garbage characters but system is hung for all practical purposes. Had to do a physical reboot. This one did not have an attached serial console. Restarting the test with serial console.
Created attachment 108947 [details] Failure with Congest kernel patch (dmesg output) Well, tests were running fine up to Saturday. Furthur testing on Monday resulted in failures. I am not sure what variable changed between the two. I don't see any except that the machines were sitting idle for over sunday. Here is the dmesg output from the latest kernel, this system was just running "mkfs.ext2 /dev/sdb1". It was a PE1655MC with a USB drive attached.
OK, I guess that means I "forgot to clean a spot" and there are a few more changes needed to the VM. Andrew Morton already gave us another one-line change yesterday (on lkml) and I will audit the code further to see what else needs changing. I will build you a new test kernel later today.
OK, the new test kernel is ready. Could you please try kernel-smp-2.6.9-1.906_EL.congest2.i686.rpm from http://people.redhat.com/riel/.test-kernel-bug-141173/ ? Note that due to the urgency of this issue, we have not done internal testing on this kernel yet, but there is only one line difference from the previous kernel, so...
No kernel-smp-2.6.9-1.906_EL.congest2.i686.rpm in http://people.redhat.com/riel/.test-kernel-bug-141173/ ;-( Last file posted to that URL was Dec 17.
Oops, I put them in the wrong directory. They should be there now...
So far so good with 906_EL.congest2. mkfs.ext3 on the external USB HD in a infinite loop with a sleep of 5 secs between each iteration seems to have *not* invoked OOM yet. Test has only run for 15 mins though. 'free' is showing that not a lot of memory is being consumed by the single mkfs command. Based on that, I have gone ahead and created two additional partitions on the same USB HD and am formatting (mkfs.ext3) them as well in parallel in an infinite loop as well. Will update with results later tonight.
No OOM invokations yet! Test is running fine with the three mkfs.ext3 processes chugging along. I even added an up2date in an infinite loop before leaving last night. I do notice that swap is turned of on this system though. That should not matter, right ?
Having swap turned off should make OOM kills _more_ likely, not less. Your test results are great news, lets see if the system can go another day without OOM kills...
Yuck! Kernel panic on the screen when I inspected the system this morning. No serical console attached but I will upload a screenshot in a moment as that is all I have to offer at this moment. The (partial) trace of the panic appears to be strikingly similar to the panic induced by Thrasher in bugzilla 141284.
Created attachment 109093 [details] OOPS as described in comment #59
SysRq was enabled but was not responsive. Rebooted the machine and on examining the syslog, appears that OOM Killer was invoked twice, killing httpd in both the instances in a gap of ~ 1hr. Will attach syslog in a minute.
Created attachment 109095 [details] syslog of system that had OOPS
OK, so we have 660 MB of pages in writeback stage, and still get an OOM kill. In fact, it looks like all the inactive pages are under writeback... I will audit the VM code to try and find out what might cause this bug to trigger.
I have added another patch (rhel4-vm-extraround.patch) and uploaded new test kernels (congest3) to http://people.redhat.com/riel/.test-kernel-bug-141173/
India test folks have been able to reproduce the OOM invokation part of the issue with the latest "congest3" kernel. I have requested for syslog messages and also details on how long it took to reproduce the issue etc. Here is what information I have at the moment: System: PE1655 (Shredder) External USB HD Kernel: 2.6.9-1.906_EL.congest3 Run Level: 5 mkfs.ext2 -N 40000000 /dev/sdb1 mkfs.ext2 -N 40000000 /dev/sdb2 Continuing to let the test run to see if it eventually panic's. Meanwhile regressing the same issue with the congest3 kernel in our setup here in Austin as well.
Created attachment 109268 [details] message log of congest3 kernel failure. The congest3 kernel was tested and resulted in failures. Here is /var/log/messages file from that machine. Please note the that there is more than one test run in the log file and double check when you see the oom message in the log that it was actually from the congest3 kernel.
Robert, this morning I found another thing in vmscan.c that might be wrong, and just finished compiling a kernel rpm (congest4) with the patch rhel4-nr_scanned-count-writeback.patch, which you'll be able to find on: http://people.redhat.com/riel/.test-kernel-bug-141173/
Created attachment 109277 [details] Failure in Austin on congest3 after a few days Here is the log file for a congest3 failure after a few days of running the mkfs test.
Just an observation noted with the congest4. We started the test of `mkfs.ext2 /dev/sdb1` on console F1 and `mkfs.ext2 /dev/sdb2` on console F2 and `mkfs.ext2 /dev/sdb3` on console F3. Each mkfs was in a loop to restart after it was complete. The test ran without the OOM killer appearing for over 20 hours. However it was noted that the first one start on /dev/sdb1 didn't complete its first iteration. The other mkfs's completed and continued to format. I killed the other two and the first one still didn't complete after 10mins. checking iostat after the other formats were killed and only the one on F1 running.. the io through-put to /dev/sdb dropped to zero. I re-ran this test under the 1-910 hugemem kernel and it behaved similarly. I.E. OOM killer didn't appear and the first 2 mkfs's were hung. We also have another issue that was reported elsewhere that occurs when doing a `cat /dev/zero > /mnt/largedisk/tempfile` Where the partition size is > 1TB. The system eventually hangs and examining the /var/log/messages file discloses that the OOM killer was envoked.
Robert: re: #73: Dale says that the "other issue reported elsewhere" to which you refer is coming from the Dell Storage team. Are we going to see more details on what they're seeing (log file, etc.)?
Yes, Robert's comments are referring to the "other issue." Based on information gathered overnight, that issue does not appear to be directly related to this bugzilla.
Created attachment 109434 [details] Other issue failure Contains the messages file of the failure on the "other issue"
I have been able to reproduce the "other issue" reported overseas. I installed a PE2850 with 6GB of RAM and attached a CX700 SAN with a ~1TB lun. I mounted the SAN at /bigdisk after formating it with `mkfs.ext2 /dev/sdd1`. I then ran the following command: while true do dd if=/dev/zero of=/bigbisk/bigfile bs=1M count=1024000 done That command ran overnight without issue. Curious to see the through put on the system. I then ftp'd over the sysstat rpm file and installed it. I then ran `iostat 5` on another console window and noted that i was averaging about 300MB/s writes to the SAN. Noting that the log file from the original failure I was able to reproduce from the pre-rc2 kernel might be useful to Red Hat. I then took a floppy and mounted it. At that instant I received a oom- killer message. The log file is attached above with both the failure from the pre-rc2 kernel and the latest 2.6.9-4 kernel.
Additional experiments we discussed on the phone to gather more datapoints: - instead of doing if=/dev/zero, try a "real" input device, ie another disk. - What if you vary the bs? half and/or 3/4 the size.
Ran several tests last night.. all on the -4 kernel. - PE2850 with 6GB attached to ~1TB SAN. I took this configuration and repartitioned the SAN LUN to 3 approximately equal partitions. I had to simultaneous cp's of large files on 1 and 3 being copied to partition 2. That ran overnight without issue. - I had 4 PE1655MC blades each attached to a USB HD (120GB). Each one ran one single mkfs (not in a loop) and one of the four blades had an oom-killer. I will attach log after this comment. - In remote testing we had a PE1655MC with one USB HD with 2 80GB partitions doing 2 simultaneous mkfs's. It resulted in the following oom-kill: Kernel-2.6.9-4.ELsmp Jan 7 08:27:15 shredder-1-2 kernel: oom-killer: gfp_mask=0xd0 Jan 7 08:27:15 shredder-1-2 kernel: DMA per-cpu: Jan 7 08:27:15 shredder-1-2 kernel: cpu 0 hot: low 2, high 6, batch 1 Jan 7 08:27:15 shredder-1-2 kernel: cpu 0 cold: low 0, high 2, batch 1 Jan 7 08:27:15 shredder-1-2 kernel: cpu 1 hot: low 2, high 6, batch 1 Jan 7 08:27:15 shredder-1-2 kernel: cpu 1 cold: low 0, high 2, batch 1 Jan 7 08:27:15 shredder-1-2 kernel: Normal per-cpu: Jan 7 08:27:15 shredder-1-2 kernel: cpu 0 hot: low 32, high 96, batch 16 Jan 7 08:27:15 shredder-1-2 kernel: cpu 0 cold: low 0, high 32, batch 16 Jan 7 08:27:15 shredder-1-2 kernel: cpu 1 hot: low 32, high 96, batch 16 Jan 7 08:27:15 shredder-1-2 kernel: cpu 1 cold: low 0, high 32, batch 16 Jan 7 08:27:15 shredder-1-2 kernel: HighMem per-cpu: Jan 7 08:27:15 shredder-1-2 kernel: cpu 0 hot: low 32, high 96, batch 16 Jan 7 08:27:21 shredder-1-2 kernel: cpu 0 cold: low 0, high 32, batch 16 Jan 7 08:27:21 shredder-1-2 kernel: cpu 1 hot: low 32, high 96, batch 16 Jan 7 08:28:02 shredder-1-2 kernel: cpu 1 cold: low 0, high 32, batch 16 Jan 7 08:28:02 shredder-1-2 kernel: Jan 7 08:28:02 shredder-1-2 kernel: Free pages: 1022784kB (1022784kB HighMem) Jan 7 08:28:02 shredder-1-2 kernel: Active:31104 inactive:201824 dirty:185930 writeback:9511 unstable:0 free:255696 slab:26826 mapped:32478 pagetables:747 Jan 7 08:28:02 shredder-1-2 kernel: DMA free:0kB min:16kB low:32kB high:48kB active:68kB inactive:8452kB present:16384kB Jan 7 08:28:02 shredder-1-2 kernel: protections[]: 0 0 0 Jan 7 08:28:02 shredder-1-2 kernel: Normal free:0kB min:936kB low:1872kB high:2808kB active:0kB inactive:772856kB present:901120kB Jan 7 08:28:02 shredder-1-2 kernel: protections[]: 0 0 0 Jan 7 08:28:02 shredder-1-2 kernel: HighMem free:1022784kB min:512kB low:1024kB high:1536kB active:124348kB inactive:25988kB present:1179520kB Jan 7 08:28:02 shredder-1-2 kernel: protections[]: 0 0 0 Jan 7 08:28:02 shredder-1-2 kernel: DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Jan 7 08:28:02 shredder-1-2 kernel: Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Jan 7 08:28:02 shredder-1-2 kernel: HighMem: 34*4kB 21*8kB 109*16kB 168*32kB 51*64kB 3*128kB 22*256kB 11*512kB 5*1024kB 4*2048kB 241*4096kB = 1022784kB Jan 7 08:28:02 shredder-1-2 kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0 Jan 7 08:28:02 shredder-1-2 kernel: Out of Memory: Killed process 2033 (cannaserver). - Also on a PE700 with several simultaneous mkfs's on SATA drives, the oom-killer was not seen in the log. But while in X-Windows some applications couldn't be launched. (like gnome-terminal, Firefox, GIMP, calculator etc..) The pointer would change to an hour glass and then back to a pointer without any application being launched. While other apps could be launched. (like Disk Management and Hardware Browser).
Robert . . . what mkfs command did you run specifically in the second case above (Shredder with single mkfs)? Thanks.
The command used in the four shredders to format the USB HD was 'mkfs.ext2 /dev/sdb1' only one instance was excuted on each blade.
Created attachment 109474 [details] Shredder single mkfs failure.
Robert - on the mkfs failures to the USB disks... can you help describe the frequency of failure in more detail. For example: - does it repeatably happen? - if it fails, is it from running that mkfs command once? Or is it in a loop which fails? If so, how long does it have to run till an oom kill is encountered? How many successful mkfs complete prior to the failure? - Are you seeing random failures on each of the blades? or is only 1 of them problematic? - Is it safe to assume that this mkfs failure is only exhibited on large USB devices? Since we didn't have any USB disks, we ran out to CompUSA and got some, which we are now hooking up.
Created a test script that would run a "mkfs.ext2 /dev/sbd1" in a loop. /dev/sdb is the Iomega 120GB USB external storage device. This test ran successfully 500 times. It was run with the following hardware: Dell Power Edge 1850 (Service Tag: GG0Y761, Express Service code: 35797571785), (2)3.6G CPU w/HT, 4 GB ram, Iomega 120GB USB Storage. RHEL 4, 2.6.9-5.ELsmp.
To Answer your questions. - At first it was very repeatable. We couldn't get a disk formatted without it envoking the OOM. With the latest kernel 2.6.9-4smp We now have to put the mkfs in a loop so that another mkfs will end after the first one completes. This will run over night on 4 boxes and then 2 of the boxes will fail within 12 hours. The distribution of when the failure occurs appears to be even. So we have seen failures within 5 mins and others occur several days. - We have seen it fail within a single mkfs on 2.6.9-4smp that was not in a loop. As described above generally our tests are in a loop. Again, the length of time seems to be an even distribution. There are somewhere between 0 and >10 successes before a failure occurs. - Yes. The failures occur on all of the blades running the test and the distrbution of the failures appears to be even. No one blade appears to fault more than the other. - It seems the the drive (Actually the partition being formatted) has to be greater than 40GB before the problem is triggered. It was not a test focus though to see what the threshold is. But the smaller drives seem to complete quickly and larger drives hit a performance drop that cause the format to take much longer. We had to buy USB drives to expand our testing. We purchased the cases with the IDE to USB/firewire converter and the IDE drive seperately. This allowed us to get 120GB drives which were larger and cheaper than the pre-packaged USB HDs. We wanted to see if the much larger drives would cause the problem to occur more frequently. It didn't seem to increase the failure per unit time. But it that case it increased the failure per mkfs itteration.
Question for Dell to help get more datapoints... We have been unable to reproduce the problem on hardware from other IHVs. This makes me wonder if certain storage adapters could be more vulnerable than others. For example, would you hit the same issue encountered on the 2TB storage failure if a Qlogic adapter was used? We are pursuing various combinations here to see if a patern can be established.
Created attachment 109615 [details] Hardware configuration for SAN testcase Here is the lspci output of the PE2850 box attached to a SAN where we have seen the problem. The Fibre channel card is an EMULEX.
Created attachment 109625 [details] 2.6.9-4smp hang with single mkfs.ext in loop This is the messages log file from a PE1655MC attached to a USB drive doing a mkfs of the entire USB drive in a loop. The last mkfs eventually hung it was given about 30mins at to see if progress was made then a control-Z was typed on the screen with the mkfs running. That was given 15 mins to suspend. At that time I turned on sysrq and then took the SYSRQ-T/M dumps that appear in the messages file attached. After more time waited (more than 10mins) the mkfs finally suspended.
I need to mention that the issue seen first with a 1TB filesystem was done not on a fibre channel card.. but a RAID card. (PERC4e) It had more than 1TB of SCSI disks attached to it. Since I was unable to get that in SCSI configuration i reproduced the problem on the SAN locally. So the issue has been witnessed on SATA (comment #19), USB, SCSI RAID (comment #77), and Fibre channel SAN (comment #78).
Created attachment 109638 [details] Failure with 2.6.9-5smp Failure on a PE1655MC (shredder) with USB HD after running 3 hours and 50 mins. Failure induced with following: while true do mkfs.ext2 /dev/sdb1 done Two other blades at this time are still running.
Last week on a con-call it was discussed that Dell would try to use a "real" input device in the `dd` test, rather than /dev/zero (ref comment #79). Any luck taking if= another disk?
Dell had asked whether running a single mkfs.ext or dd if=/dev/zero in a loop would create cumulative stress on the kernel that would explain the OOM kill scenario. At this time, we haven't established a firm pattern that would indicate a cumulative effect and are expanding our test scenarios and broadening our examination to other kernel subsystems such as the I/O path.
We (In reply to comment #94) > Last week on a con-call it was discussed that Dell would try to use a "real" > input device in the `dd` test, rather than /dev/zero (ref comment #79). Any > luck taking if= another disk? We ran a test using cp insead of dd and were unable to produce a failure in that configuration.
Update: 1.) Testing dd with LVM using command from Comment #26 Started: Jan 11,13:05 Test: `dd if=/dev/zero of=/dev/VolGroup00/LogVol02 bs=4096 count=9764500` in a loop. Status: This test is still running Hardware: Dell PowerEdge 1800, Service Tag:CNQD961 OS: RHEL4 RC2, 2.6.9-5.ELsmp 2.) Testing dd with partions > 1TB Started: Jan 11,13:20 Test: `dd if=/dev/zero of=/bigdisk/bigfile bs=1M count=1024000` in a loop. Status: This test is still running Hardware: Dell PowerEdge 2850, Service Tag:3054961, EMULEX HBA attached to a Winchester Systems SAN OS: RHEL4 RC2, 2.6.9-5.ELsmp 3.) Testing with USB storage device. Started: Jan 11,13:47 Test: `mkfs.ext2 -N 40000000 /dev/sdb1` in a loop. Status: This test is still running Hardware: Dell PowerEdge SC1425, Service Tag:9CGQC61, USB 1.1 Hub attach to an Iomega USB 120GB external storage device. OS: RHEL4 RC2, 2.6.9-5.ELsmp 4.) Testing Bonnie (http://www.textuality.com/bonnie/) Recommended by Tom Coughlan Started: Jan 11,15:10 Test: `./Bonnie -d /spare/scratch -s 2047 -m pe1850.$num` in a loop. Status: This test is still running Hardware: Dell PowerEdge 1850, Service Tag:GG0Y761 OS: RHEL4 RC2, 2.6.10 upstream Kernel Other testing activities: FWIW: I had `dd if=/dev/zero of=/dev/VolGroup00/LogVol02 bs=4096 count=9764500` in a loop running on non Dell hardware. Approx 2 weeks, the kernel was Rik van Riel congest3 kernel. It was still running when I stopped it yesterday evening.
Larry Woodman spotted a few more subtle bugs in vmscan.c, so I built and uploaded a -congest5 kernel today. This is based on 2.6.9-5.EL + Larry's patch. As usual, you can get it from http://people.redhat.com/riel/.test-kernel-bug-141173/
I have started off 2 tests on with the congest5 kernel just now. I didn't have access to them till now. I also have done the following test. I have added the following line into the end of /etc/rc.local sh -c "sleep 1m; mkfs.ext2 /dev/sdb1; sleep 1m; reboot" & There was some speculation that this issue wouldn't appear if the mkfs was not in a loop. This was to verify that assertion. Failures were encountered with the 2.6.9-4 kernel. I will attach the log in my next post. 2 machines were tested in this manner. The one with the larger drive (160GB) produced more failures, but the 80GB USB HD did as well.
Created attachment 109715 [details] failure on single mkfs without "loop" Out of 26 mkfs's seprated by system reboots, 7 of those had triggered the oom killer.
Created attachment 109716 [details] congest5 failure Congest5 failure.
The congest5 test case on a PE1655MC with a 160GB USB HD. I put it in a mkfs loop but it didn't complete the first itteration. I will put it into a reboot loop as described in comment #99 and see if it has better odds than the -5 unpatched kernel.
Update: We (Red Hat) have received the Chassis, dual Shredder blades, 2 USB External storage devices. I have started the `mkfs.ext2 -N 40000000 /dev/sdb1` in a loop. Both Shredder blades are running the exact same test. The kernel version is 2.6.9-5.ELsmp. Note: Tests 1-3 from Comment #97 are still running without error. I stopped test 4 to load the -5 Congest5 kernel. I have started that test back up.
Update: Tests 1-4 from Comment #97 are still running without error. On 1 of the 2 Shredder blades I have seen the OOM Kill happen very intermittantly. The only time I have seen the issue happen is when I changed the test to reboot after doing a `mkfs.ext2 /dev/sdb1` as per Comment #99. The test running mkfs in a loop never exhibited the issue. The kernel version is 2.6.9-5.ELsmp. I have spoken to Larry Woodman. He is going to continue to debug this issue.
Using the reboot loop on 2 shredder blades yeilded the following results. The reboot loop was accomplished by adding the following line in /etc/rc.local: sh -c "sleep 1m; mkfs.ext2 /dev/sdb1; sleep 1m" & Blade1 was running the -5smp kernel. Out of 41 cycles only 2 triggered the oom killer. Blade2 was running the -5congest kernel. Out of 14 reboots 14 failed. Looking at the SATA and SAN system configurations I have been unable to get them to fail on the -5smp kernel with both single instances of mkfs and multiple mkfs's or dd's.
I'm pretty sure I found the problem. In function get_dirty_limits() we do exclude highmem from the calculation when the request is for lowmem: --------------------------------------------------------------------------- #ifdef CONFIG_HIGHMEM /* * If this mapping can only allocate from low memory, * we exclude high memory from our count. */ if (mapping && !(mapping_gfp_mask(mapping) & __GFP_HIGHMEM)) available_memory -= totalhigh_pages; #endif --------------------------------------------------------------------------- But we use total_pages instead of available memory for the caclulation: --------------------------------------------------------------------------- background = (background_ratio * total_pages) / 100; dirty = (dirty_ratio * total_pages) / 100; --------------------------------------------------------------------------- The following patch fixes this problem and we no longer see the OOM kills: ---------------------------------------------------------------------------- --- linux-2.6.9/mm/page-writeback.c.orig +++ linux-2.6.9/mm/page-writeback.c @@ -167,8 +167,8 @@ get_dirty_limits(struct writeback_state if (background_ratio >= dirty_ratio) background_ratio = dirty_ratio / 2; - background = (background_ratio * total_pages) / 100; - dirty = (dirty_ratio * total_pages) / 100; + background = (background_ratio * available_memory) / 100; + dirty = (dirty_ratio * available_memory) / 100; tsk = current; if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) { background += background / 4; ---------------------------------------------------------------------- I'm building an official kernel for Dell to test with now. Larry Woodman
The kernel to test with that includes the above patch is in: >>>http://people.redhat.com/~lwoodman/RHEL4/ Larry Woodman
Initial tests the kernel from comment #108 are promising.
Reviewed record to see progress from Red Hat and Dell.
I have been running the Shredder test from Comment #106 with the kernel from Comment #108 since yesterday at 11:45. I have *not* seen the OOM condition since loading the kernel from Comment #108. It has been more then 24 hours.
It has been over 20 hours and we have not witnessed an oom kill. We will continue testing over the weekend.
The patch does fix this issue. We are unable to reproduce the problem with the patched kernel. We will regress this issue with the "update kernel" when it is available.
Thanks very much for all your exhaustive, Robert and team. It's been a long, haul and we appreciate greatly your efforts! As we've discussed, you'll see the fix in U1 (and a preliminary U1 kernel available some time over the next few weeks so you can test; more details about availability of this kernel soon).
Changing the title to reflect the Update in which a fix for this issue has been committed or being tracked for..
Robert, I have run into an issue with one of the shredder blades loaned to RedHat. I have been running the reboot mkfs test on those two systems. One of the two systems hangs on reboots. The message is that the memory size has been changed. I can either strike a key to continue or go into setup. When I go into setup it show the correct amount of memory 2048. I save the configuration and reboot it is fine until the next reboot it will come up with that memory size adjustment error again. Have you run across this before? Any recommendations on how to get around it or should I send it back to you. Thanks in advance Jeff
> Have you run across this before? Any recommendations on how to get around it > or should I send it back to you. > Thanks in advance > Jeff I will probably need the system back to find out why BIOS is not doing that correctly ... is the BIOS version recent? (Does it have an X?? type version number?)
Created attachment 110978 [details] /var/log/messages from RH cert test This is a new failure ... that seems to be strongly related. The oom-killer is popping up on only one platform we have been running the redhat certification test on. Also, it fails on both the RC1 kernel and Larry's patched kernel. Both failures are in the log... Larry's kernel is the last run.
If you need me to I will open different bugzilla for this issue in #118. But it seems to be the same... only this happens while running our certifications on the Nagano platform PE1425. This failure occured on 12GB of RAM. After 2 hours of starting the cert. It didn't fail on a 5GB configuration. We believe it also didn't fail on SCSI configuration... This failure is on SATA drives. This oom-killer issue now interferes with our ability to do certifications.
Please close this problem and open up a different bug for the problem you described in comment 118. In that case someone has allocated all of the memory and leaked it! Normal free:696kB min:936kB low:1872kB high:2808kB active:8300kB inactive:7268kB present:901120kB As you can see, only about 16MB out of 900MB of the Normal zone is on any list and the slabcache only accounts for ~150MB. Someone allocated ~750MB of lowmem and never freed it. Larry Woodman
(In reply to comment #120) > Please close this problem and open up a different bug for the problem you > described in comment 118. Okay, I will open up a new issue then for this new problem. But, I will leave this issue open until we receive the kernel drop to regress the older issue.
Opened BZ#147832 for comment 118.
Dell, any update here from testing with the latest U1-candidate code?
This problem is not seen in RHEL4 Update1 Beta.
/me celebrates and moves the bug to PROD_READY.
Based on comment #124, closing this issue.