26609 – Uptime's load average stuck at 1.0 after loopback device mounted

Bug 26609 - Uptime's load average stuck at 1.0 after loopback device mounted

Summary: Uptime's load average stuck at 1.0 after loopback device mounted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Michael K. Johnson
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-02-08 00:52 UTC by jbednar
Modified:	2007-04-18 16:31 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2001-02-08 14:28:25 UTC
Embargoed:

Attachments	(Terms of Use)

Description jbednar 2001-02-08 00:52:48 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.12-25 i686)


In RedHat 7.0, mounting a loopback device for an ISO CD image using "mount
-o loop,ro -t iso9660" causes the system load reported by uptime(1) to zoom
up to 1.0 and stay there, even after unmounting the device and removing
unused kernel modules with "rmmod -a".  The only way I've found to get the
load back down to zero is to reboot.  Under RedHat 6.0 mounting loopback
devices has no discernable effect on the system load.  Although the "load
average" is 1.0, the CPU idle percentage reported by top(1) remains above
95%, so I don't know if actual CPU cycles are being burned continuously, or
if the loopback device just e.g. corrupts some system log file, causing
uptime to erroneously report a high load.

Reproducible: Always
Steps to Reproduce/Actual Results:
The transcript below shows the steps taken and the results. I did the tests
in single-user mode with nothing else of interest running to show that
mounting the loopback device is indeed what affects the system load.  This
particular test is on a two-processor Celeron machine, but I reproduced it
exactly using a single-processor Athlon machine as well, so any Red Hat 7.0
machine will probably behave similarly.

$ mkdir foo
$ cp /etc/fstab /etc/issue foo/
$ mkisofs -o foo.iso foo
Total translation table size: 0
Total rockridge attributes bytes: 0
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 5064
27 extents written (0 Mb)
$ mkdir mnt
$ uptime
  2:35am  up 2 min,  0 users,  load average: 0.08, 0.10, 0.04
$ mount -o loop,ro -t iso9660 foo.iso mnt/
$ ls mnt/
fstab  issue
$ uptime
  2:35am  up 3 min,  0 users,  load average: 0.20, 0.12, 0.05
$ umount mnt/
$ uptime
  2:35am  up 3 min,  0 users,  load average: 0.33, 0.15, 0.06
$ uptime
  2:36am  up 3 min,  0 users,  load average: 0.38, 0.16, 0.06
$ uptime
  2:36am  up 3 min,  0 users,  load average: 0.59, 0.23, 0.09
$ uptime
  2:36am  up 3 min,  0 users,  load average: 0.65, 0.25, 0.10
$ rmmod loop
$ uptime
  2:36am  up 3 min,  0 users,  load average: 0.68, 0.27, 0.10
$ lsmod | grep loop
$ uptime
  2:36am  up 4 min,  0 users,  load average: 0.73, 0.29, 0.11
$ top | head 
(null)  2:37am  up 4 min,  0 users,  load average: 0.88, 0.40, 0.15
14 processes: 13 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU1 states:  0.1% user,  0.0% system,  0.0% nice, 99.0% idle
Mem:   646804K av,   28044K used,  618760K free,    7832K shrd,    3480K
buff
Swap:  899556K av,       0K used,  899556K free                   11460K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
    1 root       0   0   548  548   504 S     0.0  0.0   0:05 init
    2 root       0   0     0    0     0 SW    0.0  0.0   0:05 kflushd
q
$ uptime
  2:38am  up 5 min,  0 users,  load average: 0.92, 0.44, 0.17
$ ps ax
  PID TTY      STAT   TIME COMMAND
    1?        S      0:05 init [
    2 ?        SW     0:05 [kflushd]
    3 ?        SW     0:00 [kupdate]
    4 ?        SW     0:00 [kpiod]
    5 ?        SW     0:00 [kswapd]
    6 ?        SW<    0:00 [mdrecoveryd]
  186 tty1     S      0:00 init [
  187 tty1     S      0:00 /bin/sh
  188 tty1     R      0:00 emacs
  189 pts/0    S      0:00 /bin/bash -i
  190 pts/0    S      0:00 -csh
  213 pts/0    D      0:00 /sbin/modprobe -s -k block-major-7
  235 pts/0    R      0:00 ps ax
$ ps
  PID TTY          TIME CMD
  189 pts/0    00:00:00 bash
  190 pts/0    00:00:00 tcsh
  213 pts/0    00:00:00 loopd
  236 pts/0    00:00:00 ps
$ uptime
  2:38am  up 6 min,  0 users,  load average: 0.97, 0.54, 0.22
$ kill 213
$ uptime
  2:39am  up 6 min,  0 users,  load average: 0.97, 0.54, 0.22
$ uptime
  2:40am  up 8 min,  0 users,  load average: 0.99, 0.68, 0.31
$ 


Expected Results:  Under RedHat 6.0, the uptime shows a load of about 0.04
immediately after the mounting, and if I don't do anything for a while goes
down to 0.00 as it should.  Instead here it goes to about 1.00 and never
goes back down no matter how idle I am or whether anything is actually
going on.

This problem is significant because I am using the machine as a CD server
to supply the files mounted in various ISO CD images, which worked fine
under RedHat 6.0.

Comment 1 Bill Nottingham 2001-02-08 01:44:29 UTC

This is fixed in a later kernel package; either the current 2.4 kernels in
Raw Hide, or previous 2.2.17-x kernels from Red Hat should not have this problem.

Comment 2 jbednar 2001-02-08 02:33:22 UTC

Thanks for the info.  I'm using kernel-2.2.16-22, which is the 
latest one available from:
ftp://updates.redhat.com/redhat/redhat-7.0/i386/en/RedHat/RPMS/
If there's a release of 2.2.17 available, I don't know where it would be.

As for switching to Rawhide's 2.4 kernel; do you know if this is a serious bug,
i.e. whether CPU time is actually being wasted permanently, or whether it just
affects the display?  I don't really want to make the system any more
unstable...

Comment 3 jbednar 2001-02-08 06:03:04 UTC

I just tried out rawhide's kernel-enterprise-2.4.0-0.99.23.i686.rpm, but
although it improves matters, it does not fix them.  

Under 2.4, I tried mounting 15 loopback filesystems, which worked under the SMP
kernel distributed with RH7.0 because it allows 16 such devices, but the
enterprise 2.4 kernel appears to support only 8. This is odd because an
enterprise system is precisely the sort of system that might be set up as a CD
server as described in the CD-Server-HOWTO, so presumably a larger limit should
be used, e.g. 32 (for MAX_LOOP in loop.c).

Anyway, when the 8 loopback mounts succeeded, the load reported by uptime shot
up to 7.0, and if I'd waited long enough probably would have reached 8.0.  (I've
never seen any real definition of "load", but those numbers are much higher than
the loads seen during typical low-intensity use.)  The improvement over the
stock 7.0 kernel is that the load went back down when I unmounted the devices,
while the 7.0 kernel kept the high load forever.  But I can't imagine why there
should be such a high load in the first place!  Under RH6.0 mounting loopback
devices has no apparent effect on the system load, which makes sense because
surely the system doesn't need to do a whole lot of work keeping track of a
read-only filesystem when it's not being used.  And since I want to use the
machine as both an occasionally-active CD server and as a regular machine, 
having a high load while loopbacks are mounted but idle is not ok.

Anyway, the 2.4 kernel didn't work for me for other reasons (PPP failed
entirely, and something was going on with NFS lockd), and I'd try to figure
those out if the loopbacks were really working under 2.4.  But they don't seem
to be.

Comment 4 Arjan van de Ven 2001-02-08 14:28:21 UTC

"load" is calculated as "nr of processes wanting the CPU plus the nr of
processes waiting for IO". It seems 2.4.0 loop (which has some known issues)
increases that load. 2.2.16 loop had a bug that caused the kernel to think the
kernel-thread for loop was always doing IO. In both cases, your CPU was not
loaded.

Comment 5 Michael K. Johnson 2001-02-09 01:43:17 UTC

The 2.2.17-14 errata kernel (just released) should fix this.

Comment 6 jbednar 2001-02-09 09:17:09 UTC

Ok, I just tried the new kernel-smp-2.2.17-14 and that does fix the problem,
with no apparent bad side-effects.  Thanks!

Note You need to log in before you can comment on or make changes to this bug.