Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 239445

Summary:

High CPU load with file move

Product:

Red Hat Enterprise Linux 5

Reporter:

Brian Wheeler <bdwheele>

Component:

kernel

Assignee:

Larry Woodman <lwoodman>

Status:

CLOSED NOTABUG

QA Contact:

Martin Jenner <mjenner>

Severity:

high

Docs Contact:

Priority:

medium

Version:

5.0

CC:

esandeen, liyanbj

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-04-09 18:33:42 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Requested statistics	none

Description Brian Wheeler 2007-05-08 14:42:26 UTC

Description of problem:


Version-Release number of selected component (if applicable):

2.6.18-8.1.3.el5

How reproducible:

Every time

Steps to Reproduce:
1. Move a bunch of big files from one filesystem to another.
  
Actual results:

loadavg goes to 5+

Expected results:

Low load average

Additional info:

The machine is configured like this:

4 dual core AMD Opteron 2880s  There is 36G of RAM
a pair of striped 146G SAS drives (mounted as /scratch)
an IBM Fibre Channel disk array (multipath-failover) volume mounted as /digitize
all filesytems are ext3

I'm moving a bunch of files from /digitize to /scratch.  The files average about
200M each.



Snapshot of top:
--------------
top - 10:40:20 up 20:33,  2 users,  load average: 5.64, 5.71, 6.16
Tasks: 195 total,   2 running, 191 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us,  8.0%sy,  0.0%ni, 43.4%id, 46.7%wa,  0.0%hi,  1.8%si,  0.0%st
Mem:  37120680k total, 37048676k used,    72004k free,    50584k buffers
Swap:  2031608k total,      156k used,  2031452k free, 36071420k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
30718 root      18   0 68748  808  656 D   42  0.0  18:45.64 mv                 
 2710 root      10  -5     0    0    0 R   25  0.0   5:12.45 kjournald          
  303 root      15   0     0    0    0 D    0  0.0   0:00.08 pdflush            
  359 root      15   0     0    0    0 D    0  0.0   0:00.05 pdflush            
32739 root      15   0     0    0    0 D    0  0.0   0:00.36 pdflush            
  304 root      15   0     0    0    0 D    0  0.0   0:00.17 pdflush            
  354 root      15   0     0    0    0 D    0  0.0   0:00.00 pdflush            
  356 root      15   0     0    0    0 D    0  0.0   0:00.03 pdflush            
  357 root      15   0     0    0    0 D    0  0.0   0:00.00 pdflush            
  362 root      15   0     0    0    0 D    0  0.0   0:00.07 pdflush            
32718 root      15   0     0    0    0 D    0  0.0   0:00.24 pdflush            
32731 root      15   0     0    0    0 D    0  0.0   0:00.14 pdflush            
32736 bdwheele  15   0 12708 1136  800 R    0  0.0   0:00.12 top     
--------------

The file move triggered our load monitors and I'm concerned about performance
issues when me move this machine into production.

Comment 1 Brian Wheeler 2007-10-09 18:05:49 UTC

It appears that it also happens on file copies as well.  Running cp on a huge
file (268G -- a VM disk file) ramps up the load average.  The only processes on
the machine that are not sleeping are cp, kjournald, pdflush, and top.

This is with kernel-2.6.18-8.1.14.el5

Comment 2 Ivan Vecera 2007-10-18 12:39:48 UTC

Hello, I built a latest kernel package, could you please test it on your system? 

Could you please repeat your move/copy test? When load will be too high please
run following commands in parallel:
1) vmstat -n 1 60 > vmstat.txt
2) iostat /dev/sd? 1 60 > iostat.txt
3) mpstat -P ALL 1 60 > mpstat.txt
4) sysctl vm > sysctl.txt
5) ps -ea > ps.txt
6) ...and also again some screenshot from top

Then send me the result files (vmstat.txt,iostat.txt...).
The kernel packages for i686 and x86_64 can be downloaded at
http://people.redhat.com/ivecera/bz239445/

Thank you

-- Ivan

Comment 3 Brian Wheeler 2007-10-18 13:31:11 UTC

Created attachment 231051 [details]
Requested statistics

Comment 4 Brian Wheeler 2007-10-18 13:36:11 UTC

I've attached the statistics and I've got a few notes.

This doesn't happen with an rm.  I maxed out at 1.78 load average with top, an
idle jvm, some random cgis, but otherwise idle.  One cpu was always at 96% wait,
though it would move from cpu to cpu.

The idle load prior to starting the test was < 0.1

During the test (cp disk0.orig.img disk0.img, a 270G file) the load average rose
to around 7.  Later in the copy the average went down to ~5 when the kswapd
threads kicked in when the cache was being thrashed.  Only 156k went to swap, so
I assume it was mostly just paging text segments out to nowhere...and then
probably paging them back in as needed.

Comment 6 Larry Woodman 2007-12-13 22:24:54 UTC

Is the high CPU load you are seeing a problem or is it just an observation? The
pdflush daemons write until the IO queues fill then block uninterruptable until
the devices catch up.  Linux includes processes blocking in an uninterruptable
state(like pdflushsd) in the load average as seen via top, uptime and
/proc/loadavg just as though they were actually runnable.

If this is not a problem, just an observation then this is simply the way Linux
works.  If you are seeing other performance issues then we have to look closet
at this issue.

Larry Woodman