Bug 40103

Summary: Slowly SCSI on a NCR53C876 cause high load
Product: [Retired] Red Hat Linux Reporter: Need Real Name <ferulisses>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED RAWHIDE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: high    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-05-10 20:57:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2001-05-10 16:35:35 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 95; DigExt)

Description of problem:
I got a RedHat Linux 7.0 running kernel 2.2.16, after upgrade to
RedHat 7.1 kernel 2.4.2, the load average goes upper than normal,
the System consumes all process power

Running top, I can see:
 12:41pm  up  2:10,  2 users,  load average: 17.62, 20.45, 17.06
51 processes: 45 sleeping, 6 running, 0 zombie, 0 stopped
CPU0 states:  1.7% user, 98.10% system,  0.0% nice,  0.0% idle
CPU1 states:  0.16% user, 99.0% system,  0.0% nice,  0.0% idle


How reproducible:
Always

Steps to Reproduce:
1. Install RedHat 7.1 on a equivalent machine


Actual Results:  All the system resources are consummed.

Expected Results:  Normal system activity.

Additional info:

Dual Pentium III 600Mhz
Intel N440BX Motherboard
512Mb RAM, SCSI On Board NCR53C876
3 HDs SCSI, 1 Quantum QM39100TD-SW 9Gb, 2 Fujitsu 18Gb (MAE3182LP & 
MAE3182MP)
ethernet: EtherExpress 10/100Mbits, 3COM 905b 10/100Mbits

If I run in single mode and copy lot of data from one disk to other, or to 
same disk, I can see some pauses, I hope that this cause the load average 
to go higher, may be the driver is broken, actually, I'm using ncr53c8xx

The computer also is a "router", it's job is to filter and forward 
packages from one ether device to other, I run ipchains for package 
filtering, cbq to shape speed for some clients (I tried to stop cbq and 
load average still high) and some shaper interfaces configured, squid, 
sendmail, and other minor process, but no one allocate lot of resources.

All this scenario runs okay in RedHat 7.0, but after upgrade, the
system goes slow, I think that the network traffic don't make
difference, since, in single mode I got some pauses managing data.

Comment 1 Arjan van de Ven 2001-05-10 17:14:54 UTC
Can you paste a "top" screen to this bug? (eg so we can see which processes
eat the CPU)
Or even the output of "ps waux" so we can see which processes are causing the
high load?

Comment 2 Arjan van de Ven 2001-05-10 17:25:27 UTC
Alternatively, we are working on an upgraded kernel with a newer NCR scsi
driver, a recent snapshot of the RPM for that kernel can be found in the
rawhide part of our ftp site.

Comment 3 Need Real Name 2001-05-10 20:54:02 UTC
Some hours after, the system was consumming 100% of resources, and the computer was inacessible, I need to reset.

After this, I made more tests:
- tried to use the sym53c876 modules - reboot and got system consumming 80% of resources after 30 minutes
- tried to remove cbq and shaper modules - reboot and got system consumming 80% of resources after 30 minutes
- downgrade to 2.2.16smp kernel from RedHat 7.0 disk - 45 minutes of uptime now, system at average of 10%

Now my system look stable. But the problem still exists.

I can't take a screen now because it's a production server and I can't reboot to kernel 2.4 and wait crash,
but I saw nothing anormal in top command, just normal process using very low CPU, the sum of all process
was bellow 50%. If it's a tip, the top process is squid, running at 10%-20% of CPU, may be him make
all data traffic to the system goes to crash.

There are any previsible problem with kernel downgrade ?


Comment 4 Arjan van de Ven 2001-05-10 20:57:45 UTC
If you don't have hostile local users, 2.2.16 should be fine for most stuff.
(only not for usb, but it doesn't look like you are using that ;)


Comment 5 Arjan van de Ven 2001-05-21 09:42:53 UTC
We have an upgraded driver in rawhide which should fix this -> closing as fixed
in rawhide.

Using 2.2.16 is an alternative.

Comment 6 ferulisses 2001-07-26 11:36:32 UTC
hi again.

I made a upgrade to kernel 2.4.3-12enterprise for i686 from RedHat Errata.

The problem persist. here is the header of top.


  8:39am  up 48 min,  1 user,  load average: 0.85, 0.83, 0.64
82 processes: 81 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states:  6.3% user, 67.2% system,  0.0% nice, 25.3% idle
CPU1 states:  5.3% user, 65.3% system,  0.0% nice, 28.3% idle
Mem:   512868K av,  499144K used,   13724K free,      56K shrd,  242120K buff
Swap:  556012K av,     292K used,  555720K free                  153496K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1041 squid     16   0 19852  19M  1404 S    45.0  3.8   9:01 squid
  826 netsaint   9   0  1088 1088   792 S     4.9  0.2   0:26 netsaint
14092 root       9   0  2444 2444  1980 S     3.1  0.4   0:00 sendmail
14047 root      10   0  2444 2444  1980 S     1.9  0.4   0:00 sendmail
  549 root       9   0  1956 1952  1760 S     1.7  0.3   0:02 sendmail
13860 root      16   0  1028 1028   768 R     1.5  0.2   0:00 top
14011 root      10   0  2440 2440  1980 S     1.1  0.4   0:00 sendmail
13188 root       9   0  2428 2428  1964 S     0.5  0.4   0:01 sendmail
13293 root       9   0  2428 2428  1968 S     0.5  0.4   0:01 sendmail
14109 root       9   0  2420 2420  1988 S     0.5  0.4   0:00 sendmail
13114 apache     8   0  4016 4004  3592 S     0.3  0.7   0:00 httpd
14065 root       9   0  2476 2476  1932 S     0.3  0.4   0:00 sendmail
14106 root       9   0  2300 2300  1960 S     0.3  0.4   0:00 sendmail
14107 root       9   0  2312 2312  2052 S     0.3  0.4   0:00 sendmail
14112 netsaint   9   0   540  540   476 S     0.3  0.1   0:00 check_ping
    6 root       9   0     0    0     0 SW    0.1  0.0   0:01 kupdated