Bug 59370 - >68 drives causes kswapd to occupy 99% of the CPU resources
>68 drives causes kswapd to occupy 99% of the CPU resources
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.2
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Pete Zaitcev
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-02-06 11:19 EST by Issue Tracker
Modified: 2007-04-18 12:39 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-04-17 13:13:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
oops from EMC, 2.4.9-21 w/ their change to sd.c (11.79 KB, text/plain)
2002-02-20 12:26 EST, Mike Gahagan
no flags Details
print #1 (787 bytes, patch)
2002-03-11 21:03 EST, Pete Zaitcev
no flags Details | Diff
change #1 - GFP_ATOMIC replacement (428 bytes, patch)
2002-03-21 00:12 EST, Pete Zaitcev
no flags Details | Diff
Change #2 - GFP_ATOMIX redux (1.49 KB, patch)
2002-03-22 03:19 EST, Pete Zaitcev
no flags Details | Diff

  None (edit)
Description Issue Tracker 2002-02-06 11:19:03 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.7-10 i686)

Description of problem:
When < 128 drives are attached but more than 68 drives (actually 75 was used in
testing) upon bootup  kswapd occupies 99% of the CPU.  The original problem was
a kernel panic with >128 drives (see Bug #55420 and #58442).  This is still an
issue; however with < 128 drives kswapd is causing the system to become unusable
and unstable (it will oops eventually)

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Attached 75 drives to a system
2.Boot the machine with a 7.2 kernel (2.4.9-21 was already tested)
3.
	

Actual Results:  Kswapd uses 99% of CPU resources

Expected Results:  Normal operation

Additional info:
Comment 1 Arjan van de Ven 2002-02-06 11:22:02 EST
Guess what... I'd love to see that oops....
Comment 2 Issue Tracker 2002-02-06 12:41:03 EST
I've got one from 2.4.9-13 and I'me waiting for one from 2.4.9-21.
Comment 3 Bob Matthews 2002-02-06 14:34:07 EST
2.4.9-13 is known to oops under this hardware configuration.  Pete's patch is
only in kernel > 2.4.9-18.2.  See
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=58442.
Comment 4 Issue Tracker 2002-02-06 14:45:26 EST
I thought that was only if >128 drives.  This is with less than 128 drives.  The
kernel shouldn't oops, it's coded to panic with >128 drives (in 2.4.9-13). 
Based on that, they reduced the number of drives to 115 and started to have this
new kswapd issue.
Comment 5 Mike Gahagan 2002-02-20 12:25:31 EST
Oops from EMC, run through ksymoops.. using 2.4.9-21 with their modifications to
sd.c (required for the kernel to boot.)
Comment 6 Mike Gahagan 2002-02-20 12:26:29 EST
Created attachment 46122 [details]
oops from EMC, 2.4.9-21 w/ their change to sd.c
Comment 7 Arjan van de Ven 2002-02-20 13:46:29 EST
Great. They modify the scsi layer.
Then the scsi layer oopses.
Doh!
Comment 8 Pete Zaitcev 2002-02-20 15:21:02 EST
I think it may not be as simple as Arjan assumes.

There is a certain change (limiting of dev_max), coming
from Bug 55420, authored by Mark Tsombakos, which I blessed for use.
I agreed because I have an RFE (Bug 58442) to address that problem,
which is a little complicated and late, but EMC must have their
box running ASAP. I do not think that the limiting of dev_max
invalidates our testing or support to them.

Of course, we need to verify that the modification that they did
is the one, and that no additional modifications
were done (e.g. Mark also killed panic(), which I not approved
and whole 55420 was about fixing the condition that led to that panic()).

Oops that Mike G. collected is a great help. I think it shows
that _before_ 55420 and limit to dev_max they simply could not
get far enough to hit this next oops :)

Let's do it this way. I take over and resolve the oops, but
defer the kswapd problem until that is past us. The kswapd seems
a murky business, which will take a while even with luck.
A liason needs to ask EMC if that "fix oops first" approach
is ok with them.
Comment 9 Pete Zaitcev 2002-03-11 21:03:40 EST
Created attachment 48225 [details]
print #1
Comment 10 Pete Zaitcev 2002-03-20 19:38:53 EST
Roger captured us a console trace with print #1.

Starting SYMCLI install...               <==== EMC install script prints this
done
scsi_build_commandblocks: want=253, space for=11 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_build_commandblocks: want=253, space for=0 blocks
scsi_register_device_module: nocore dev <NULL> major 21
scsi_unregister_device: module sg usage 1
Unable to handle kernel paging request at virtual address e08b0fcc
[.... same oops follows ....]

Char major 21 is sg, block major 21 is Acorn MFM
(not used by moremounts, BTW.) So, I think it's sg.

Comment 11 Pete Zaitcev 2002-03-21 00:12:42 EST
Created attachment 49326 [details]
change #1 - GFP_ATOMIC replacement
Comment 12 Pete Zaitcev 2002-03-22 03:19:34 EST
Created attachment 49604 [details]
Change #2 - GFP_ATOMIX redux
Comment 13 Pete Zaitcev 2002-03-22 03:20:57 EST
My first quick fix was incorrect: it attempted to sleep
with a spinlock. By pure luck Roger's box worked, but
I had to do a more decent fix.
Comment 14 Arjan van de Ven 2002-03-22 04:20:03 EST
Question: which scsi controller is this ? (just to rule out it using GFP_DMA
zone)
Comment 15 Arjan van de Ven 2002-03-22 04:47:59 EST
Also is there a specific reason to use version 6.2.1 of the aic7xxx driver
instead of the default 5.2.0 ? (which in our testing works much better so far)
Comment 16 Roger Gaudet 2002-03-22 15:23:41 EST
The HBA is the Adaptec 3944A Ultra SCSI adapter.  As for the version of the 
driver, that's what Linux "picks" when we install it (actually, that's what 
anaconda discovers).  We don't specify anything special.  Should I modify 
modules.conf to specify aic7xxx_old and see if that makes any difference?
Comment 17 Arjan van de Ven 2002-03-22 16:37:58 EST
In our kernels, "aic7xxx_old" is actually called aic7xxx.o
and "new aic7xxx" is called aic7xxx_mod.o ..........
but yes the aic7xxx_old driver is very much worth a shot since that is the
default one.
Comment 18 Roger Gaudet 2002-03-25 11:08:42 EST
At Arjan's suggestion I inserted the aic7xxx_old.o module in place of the 
aic7xxx_mod.o module (and yes, I removed the GFP_ATOMIC patch from scsi.c).  
Our installation succeeded and afterwards I did not see kswapd consuming 99% of 
the CPU.  So we're on to something here.
Comment 19 Pete Zaitcev 2002-03-25 12:16:48 EST
Arjan drew my attention to the fact that the aic7xxx(2.4.9) or
aic7xxx_old(2.4.18) use 8 times smaller queues (32 vs. 256),
so they use 8 times less space for scsi_build_commandblocks.
There are other things too, I do not know if they play a role.
Comment 20 Roger Gaudet 2002-03-26 17:30:11 EST
OK, here's the deal.  I'm sure this is news to noone except me but I need to do 
this.  After rummaging through the spec file for kernel-2.4.9-31.src.rpm I 
discovered that the aic7xxx drivers get "moved around" during the kernel RPM 
build such that the following naming convention is used:

aic7xxx.o is the former drivers/scsi/aic7xxx_old.o and identifies itself as 
follows:
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0

aic7xxx_mod.o is the former drivers/scsi/aic7xxx/aic7xxx.o and identifies 
itself as follows:
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1

My kernel build did not do this "moving around" of the driver files so I was 
using the 6.2.1 driver.  Now it does so I'm now using the "default" (or 5.2.0 
driver) as Arjan described it.  For my setup, now with 118 drives "seen" by the 
system being installed, the 5.2.0 driver works like a charm.  The 6.2.1 driver 
exhibits the problems originally identified in this thread.

For what it's worth, I have serial console logs (dmesg) of the whole boot 
sequence for both drivers.  The one difference that sticks out when the 6.2.1 
driver is loaded is that the "scsi::resize_dma_pool" messages appear.  They do 
not appear when the 5.2.0 driver is loaded.

....
Attached scsi disk sdp at scsi0, channel 0, id 2, lun 1
scsi::resize_dma_pool: WARNING, dma_sectors=0, wanted=65056, scaling
scsi::resize_dma_pool: WARNING, dma_sectors=0, wanted=48800, scaling
scsi::resize_dma_pool: WARNING, dma_sectors=0, wanted=36608, scaling
scsi::resize_dma_pool: WARNING, dma_sectors=0, wanted=27456, scaling
SCSI device sda: 8505600 512-byte hdwr sectors (4355 MB)
Partition check:
 sda: sda1
....
Comment 21 Pete Zaitcev 2002-04-13 01:17:40 EDT
Roger, we seem to have worked around the panic by using
a less aggressive driver, so what about that 99% CPU thing?
Is it still present?
Comment 22 Roger Gaudet 2002-04-17 13:13:28 EDT
Pete, using the 5.2.4/5.2.0 driver (I assume that's the "less aggressive 
driver" you're referring to?) we do not see kswapd consuming 99% of the CPU.

However, we really need the "new" driver to support >68 drives without the 
kswapd problem or the oopses we have been seeing.  We just discovered that on 
some of our machines an insmod of the de4x5.o network module will cause a hard 
hang when the "old" aic7xxx driver is already loaded.  The hang does not occur 
if the "new" driver is loaded.  I guess that's a new bug, huh?
Comment 23 Pete Zaitcev 2002-07-31 15:39:48 EDT
Resolved panic -> closing as NOTABUG. Need to use a driver with
smaller queues.

Bugs cannot stay open that long, esp. for good people.

There's a discussion, if this is a bug in the whole rotten SCSI stack.
It runs wild allocating everything with GFP_ATOMIC, which just
cannot work with number of devices that EMC and others have to use.
Not something we can easily fix, unfortunately. It's not just a bug,
it's a misdesign of the whole thing.

Note You need to log in before you can comment on or make changes to this bug.