Bug 239445 - High CPU load with file move
High CPU load with file move
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-08 10:42 EDT by Brian Wheeler
Modified: 2008-04-09 14:33 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-09 14:33:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Requested statistics (110.00 KB, application/x-tar)
2007-10-18 09:31 EDT, Brian Wheeler
no flags Details

  None (edit)
Description Brian Wheeler 2007-05-08 10:42:26 EDT
Description of problem:


Version-Release number of selected component (if applicable):

2.6.18-8.1.3.el5

How reproducible:

Every time

Steps to Reproduce:
1. Move a bunch of big files from one filesystem to another.
  
Actual results:

loadavg goes to 5+

Expected results:

Low load average

Additional info:

The machine is configured like this:

4 dual core AMD Opteron 2880s  There is 36G of RAM
a pair of striped 146G SAS drives (mounted as /scratch)
an IBM Fibre Channel disk array (multipath-failover) volume mounted as /digitize
all filesytems are ext3

I'm moving a bunch of files from /digitize to /scratch.  The files average about
200M each.



Snapshot of top:
--------------
top - 10:40:20 up 20:33,  2 users,  load average: 5.64, 5.71, 6.16
Tasks: 195 total,   2 running, 191 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us,  8.0%sy,  0.0%ni, 43.4%id, 46.7%wa,  0.0%hi,  1.8%si,  0.0%st
Mem:  37120680k total, 37048676k used,    72004k free,    50584k buffers
Swap:  2031608k total,      156k used,  2031452k free, 36071420k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
30718 root      18   0 68748  808  656 D   42  0.0  18:45.64 mv                 
 2710 root      10  -5     0    0    0 R   25  0.0   5:12.45 kjournald          
  303 root      15   0     0    0    0 D    0  0.0   0:00.08 pdflush            
  359 root      15   0     0    0    0 D    0  0.0   0:00.05 pdflush            
32739 root      15   0     0    0    0 D    0  0.0   0:00.36 pdflush            
  304 root      15   0     0    0    0 D    0  0.0   0:00.17 pdflush            
  354 root      15   0     0    0    0 D    0  0.0   0:00.00 pdflush            
  356 root      15   0     0    0    0 D    0  0.0   0:00.03 pdflush            
  357 root      15   0     0    0    0 D    0  0.0   0:00.00 pdflush            
  362 root      15   0     0    0    0 D    0  0.0   0:00.07 pdflush            
32718 root      15   0     0    0    0 D    0  0.0   0:00.24 pdflush            
32731 root      15   0     0    0    0 D    0  0.0   0:00.14 pdflush            
32736 bdwheele  15   0 12708 1136  800 R    0  0.0   0:00.12 top     
--------------

The file move triggered our load monitors and I'm concerned about performance
issues when me move this machine into production.
Comment 1 Brian Wheeler 2007-10-09 14:05:49 EDT
It appears that it also happens on file copies as well.  Running cp on a huge
file (268G -- a VM disk file) ramps up the load average.  The only processes on
the machine that are not sleeping are cp, kjournald, pdflush, and top.

This is with kernel-2.6.18-8.1.14.el5
Comment 2 Ivan Vecera 2007-10-18 08:39:48 EDT
Hello, I built a latest kernel package, could you please test it on your system? 

Could you please repeat your move/copy test? When load will be too high please
run following commands in parallel:
1) vmstat -n 1 60 > vmstat.txt
2) iostat /dev/sd? 1 60 > iostat.txt
3) mpstat -P ALL 1 60 > mpstat.txt
4) sysctl vm > sysctl.txt
5) ps -ea > ps.txt
6) ...and also again some screenshot from top

Then send me the result files (vmstat.txt,iostat.txt...).
The kernel packages for i686 and x86_64 can be downloaded at
http://people.redhat.com/ivecera/bz239445/

Thank you

-- Ivan
Comment 3 Brian Wheeler 2007-10-18 09:31:11 EDT
Created attachment 231051 [details]
Requested statistics
Comment 4 Brian Wheeler 2007-10-18 09:36:11 EDT
I've attached the statistics and I've got a few notes.

This doesn't happen with an rm.  I maxed out at 1.78 load average with top, an
idle jvm, some random cgis, but otherwise idle.  One cpu was always at 96% wait,
though it would move from cpu to cpu.

The idle load prior to starting the test was < 0.1

During the test (cp disk0.orig.img disk0.img, a 270G file) the load average rose
to around 7.  Later in the copy the average went down to ~5 when the kswapd
threads kicked in when the cache was being thrashed.  Only 156k went to swap, so
I assume it was mostly just paging text segments out to nowhere...and then
probably paging them back in as needed.
Comment 6 Larry Woodman 2007-12-13 17:24:54 EST
Is the high CPU load you are seeing a problem or is it just an observation? The
pdflush daemons write until the IO queues fill then block uninterruptable until
the devices catch up.  Linux includes processes blocking in an uninterruptable
state(like pdflushsd) in the load average as seen via top, uptime and
/proc/loadavg just as though they were actually runnable.

If this is not a problem, just an observation then this is simply the way Linux
works.  If you are seeing other performance issues then we have to look closet
at this issue.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.