Bug 138769

Summary: Major slowdown of disk access in X on FC3
Product: [Fedora] Fedora Reporter: Luke Hutchison <luke.hutch>
Component: halAssignee: David Zeuthen <davidz>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: m.a.young, mclasen, romildo, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-21 19:06:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luke Hutchison 2004-11-11 01:29:57 UTC
In a fresh install of FC3 on a Dell Latitude C840 laptop, as soon as
Xwindows starts, the HDD starts to crawl.  It takes probably a minute
to get any of gdm, kdm or xdm to accept logins.  Full login to GNOME
takes about 10 minutes.  KDE is equally slow, so this is not caused by
the problem GNOME sometimes has with slow logins when the domain name
(or /etc/hosts) is set wrongly.  Strangely, even pure X apps seem to
take a long time to start.  After waiting for the GNOME login to
complete, any further programs (e.g. Firefox) also take a very long
time to start.

The HDD light seems to indicate that when this is happening, disk
access occurs in fairly regular bursts every 0.25s or so, but it seems
like not much work is being done in each burst, because it takes so
long for anything to start up.

I've also tried changing the scheduler to "elevator=deadline",
thinking it might be an IO scheduler problem.  No difference.

SELinux is disabled.  No seemingly relevant error messages are
reported by dmesg, in /var/log/messages, or on the console, even if X
is started from a text terminal.

I tried blowing away /tmp and recreating it in case there were any
weird X lockfiles there causing problems.  No difference.

I tried creating a new user in case the user profile I was logging in
with (copied from an FC2 install) had any dotfiles that were causing
problems.  No difference.

The OS boot and shutdown processes seem to proceed at normal speed, it
is just when X starts that any child processes seem to run like
molasses.  X was a lot faster before upgrading from FC2->FC3.

Another interesting thing is that after logging in completely once,
then logging back out and in, the login is very fast -- i.e. it
appears that if files are cached, there is no speed problem.  Logging
halfway in the first time, then hitting Ctrl-Alt-Backspace, then
logging back in again, causes the first part of the login to proceed
at high speed, until the point where Ctrl-Alt-Backspace was pressed,
then the remainder of the login proceeds slowly.  This makes me think
that there is a problem between the kernel file cache and the drive --
perhaps a driver issue is causing a huge number of interrupts on the bus?

I'm really stuck here, and I have now rendered my boss' laptop useless
-- please advise me what I can do to further diagnose the problem and
get this working again!  Thanks...

Comment 1 Dave Jones 2004-11-11 04:42:36 UTC
can you try turning off the readahead and readahead_early services using
chkconfig and see how much that helps ?

How much RAM does this laptop have?

Comment 2 Luke Hutchison 2004-11-11 22:52:29 UTC
Laptop RAM is 512MB, or 504 after video RAM is subtracted.  There is
212MB free after full GNOME startup, while running Firefox.

I disabled the readahead and readahead_early services while in
runlevel 3, then switched to runlevel 5.  There was no difference in
observed symptoms.

There is actually one console message that surfaces after switching to
runlevel 5:
    KDSKBENT: invalid argument
    failed to bind key 255 to value 2656
..however I don't see how this would be affecting drive access patterns.

The weirdest thing about this is the way the drive works in "stoccato"
mode after entering runlevel 5: activity occurs in bursts at
regularly-spaced intervals.

I don't know what would be different about runlevel 5 to cause this. 
Is this helpful?:

# chkconfig --list
lm_sensors      0:off   1:off   2:on    3:on    4:on    5:on    6:off
haldaemon       0:off   1:off   2:off   3:on    4:on    5:on    6:off
vncserver       0:off   1:off   2:off   3:off   4:off   5:off   6:off
bluetooth       0:off   1:off   2:off   3:off   4:off   5:off   6:off
yum             0:off   1:off   2:off   3:off   4:off   5:off   6:off
ntpd            0:off   1:off   2:off   3:off   4:off   5:off   6:off
ypbind          0:off   1:off   2:off   3:off   4:off   5:off   6:off
psacct          0:off   1:off   2:off   3:off   4:off   5:off   6:off
cups            0:off   1:off   2:on    3:on    4:on    5:on    6:off
kudzu           0:off   1:off   2:off   3:on    4:on    5:on    6:off
snmpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
cpuspeed        0:off   1:on    2:on    3:on    4:on    5:on    6:off
rhnsd           0:off   1:off   2:off   3:on    4:on    5:on    6:off
diskdump        0:off   1:off   2:off   3:off   4:off   5:off   6:off
smb             0:off   1:off   2:off   3:off   4:off   5:off   6:off
nfs             0:off   1:off   2:off   3:off   4:off   5:off   6:off
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
NetworkManager  0:off   1:off   2:off   3:off   4:off   5:off   6:off
apmd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rpcgssd         0:off   1:off   2:off   3:on    4:on    5:on    6:off
pcmcia          0:off   1:off   2:on    3:on    4:on    5:on    6:off
irqbalance      0:off   1:off   2:off   3:on    4:on    5:on    6:off
irda            0:off   1:off   2:off   3:off   4:off   5:off   6:off
autofs          0:off   1:off   2:off   3:on    4:on    5:on    6:off
mdmonitor       0:off   1:off   2:on    3:on    4:on    5:on    6:off
smartd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
lisa            0:off   1:off   2:off   3:off   4:off   5:off   6:off
netplugd        0:off   1:off   2:off   3:off   4:off   5:off   6:off
rpcsvcgssd      0:off   1:off   2:off   3:on    4:on    5:on    6:off
atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
snmptrapd       0:off   1:off   2:off   3:off   4:off   5:off   6:off
xinetd          0:off   1:off   2:off   3:on    4:on    5:on    6:off
mdmpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
spamassassin    0:off   1:off   2:off   3:off   4:off   5:off   6:off
saslauthd       0:off   1:off   2:off   3:off   4:off   5:off   6:off
netdump         0:off   1:off   2:off   3:off   4:off   5:off   6:off
xfs             0:off   1:off   2:on    3:on    4:on    5:on    6:off
sendmail        0:off   1:off   2:on    3:on    4:on    5:on    6:off
winbind         0:off   1:off   2:off   3:off   4:off   5:off   6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
nifd            0:off   1:off   2:off   3:on    4:on    5:on    6:off
anacron         0:off   1:off   2:on    3:on    4:on    5:on    6:off
messagebus      0:off   1:off   2:off   3:on    4:on    5:on    6:off
nscd            0:off   1:off   2:off   3:off   4:off   5:off   6:off
iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
microcode_ctl   0:off   1:off   2:off   3:off   4:off   5:off   6:off
portmap         0:off   1:off   2:off   3:on    4:on    5:on    6:off
syslog          0:off   1:off   2:on    3:on    4:on    5:on    6:off
gpm             0:off   1:off   2:on    3:on    4:on    5:on    6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
rpcidmapd       0:off   1:off   2:off   3:on    4:on    5:on    6:off
httpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
mDNSResponder   0:off   1:off   2:off   3:on    4:on    5:on    6:off
nfslock         0:off   1:off   2:off   3:on    4:on    5:on    6:off
isdn            0:off   1:off   2:on    3:on    4:on    5:on    6:off
readahead       0:off   1:off   2:off   3:off   4:off   5:on    6:off
readahead_early 0:off   1:off   2:off   3:off   4:off   5:on    6:off
acpid           0:off   1:off   2:off   3:on    4:on    5:on    6:off
cups-config-daemon      0:off   1:off   2:off   3:on    4:on    5:on 
  6:off
xinetd based services:
        chargen-udp:    off
        ktalk:  off
        gssftp: off
        chargen:        off
        echo-udp:       off
        rsync:  off
        daytime-udp:    off
        time-udp:       off
        klogin: off
        daytime:        off
        time:   off
        cups-lpd:       off
        kshell: off
        echo:   off
        krb5-telnet:    off
        eklogin:        off



I'm not the first to see peformance like this, apparently:

http://www.fedoraforum.org/forum/archive/index.php/t-24727.html



Here is the output of lspci:

00:00.0 Host bridge: Intel Corp. 82845 845 (Brookdale) Chipset Host
Bridge (rev 04)
00:01.0 PCI bridge: Intel Corp. 82845 845 (Brookdale) Chipset AGP
Bridge (rev 04)
00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02)
00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801 Mobile PCI Bridge (rev 42)
00:1f.0 ISA bridge: Intel Corp. 82801CAM ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801CAM IDE U100 (rev 02)
00:1f.5 Multimedia audio controller: Intel Corp. 82801CA/CAM AC'97
Audio Controller (rev 02)
00:1f.6 Modem: Intel Corp. 82801CA/CAM AC'97 Modem Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4
440 Go] (rev a3)
02:00.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado]
(rev 78)
02:01.0 CardBus bridge: Texas Instruments PCI4451 PC card Cardbus
Controller
02:01.1 CardBus bridge: Texas Instruments PCI4451 PC card Cardbus
Controller
02:01.2 FireWire (IEEE 1394): Texas Instruments PCI4451 IEEE-1394
Controller
02:03.0 CardBus bridge: Texas Instruments PCI1410 PC card Cardbus
Controller (rev 01)



Do you need any other hardware info?


Comment 3 Luke Hutchison 2004-11-11 22:53:49 UTC
Also, do you need the content of any log files?


Comment 4 Luke Hutchison 2004-11-11 23:06:32 UTC
FYI, while in runlevel 5:

# hdparm /dev/hda

/dev/hda:
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 65535/16/63, sectors = 60011642880, start = 0

# hdparm -tT /dev/hda

/dev/hda:
 Timing cached reads:   1268 MB in  2.00 seconds = 634.41 MB/sec
 Timing buffered disk reads:    8 MB in  3.55 seconds =   2.25 MB/sec
[root@laptop ~]#

# hdparm -c0 -d0 -u0 /dev/hda

/dev/hda:
 setting 32-bit IO_support flag to 0
 setting unmaskirq to 0 (off)
 setting using_dma to 0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)

# hdparm -tT /dev/hda

/dev/hda:
 Timing cached reads:   1304 MB in  2.00 seconds = 650.80 MB/sec
 Timing buffered disk reads:    4 MB in  3.12 seconds =   1.28 MB/sec

# hdparm -c1 -d1 -u1 /dev/hda

/dev/hda:
 setting 32-bit IO_support flag to 1
 setting unmaskirq to 1 (on)
 setting using_dma to 1 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)

# hdparm -tT /dev/hda

/dev/hda:
 Timing cached reads:   1312 MB in  2.00 seconds = 655.44 MB/sec
 Timing buffered disk reads:   14 MB in  5.68 seconds =   2.47 MB/sec

The "hdparm -c1 -d1 -u1 /dev/hda" command did not speed up the
desktop, although it subtly changed the drive light flashing pattern
(I think the pulses were shorter and closer together, perhaps at 0.2s
intervals or less).

Anyway, the "hdparm -tT" test is really a bandwidth test, and this
appears to be a latency-related problem.


Comment 5 Luke Hutchison 2004-11-12 00:10:18 UTC
I take back what I said about this appearing to be a latency problem.

With readahead and readahead_early enabled, booting into single user
mode, hdparm -tT gives  670.43MB/s / 29.32MB/s.  Subsequently typing
"init 3" and rerunning hdparm -tT gives 650.47MB/s / 14.11MB/s. 
Disabling readahead and readahead_early and rebooting into runlevel 3
and rerunning hdparm gives results around 650.47MB/s again for the
first number, but the second result varies wildly: 612kB/s, 7.94MB/s,
3.32MB/s, 2.64MB/s, 7.86MB/s, 10.46MB/s.  (It is possible the first
result was due to some other concurrent disk activity I guess.)  Init
5 now gives 628.10MB/s / 12.88MB/s, although the second result varies
again: it was 3.61MB/s on the next run.

The most interesting thing is that looking at top, when things are
going really slow, almost 100% of the CPU time is spent in what looks
like a wait state (I'm guessing that's what "99% wa" means), i.e. not
in idle, user, system, nice etc.  Also, the CPU fan comes on and stays
on for the whole several-minute duration of a GNOME login, indicating
it is grinding away at something.  Perhaps it is sitting in a spinlock?

Anything else I can try?


Comment 6 Luke Hutchison 2004-11-12 00:21:36 UTC
Also, should I try updating the BIOS?  It may fix the problem, but it
is possible that if there is a BIOS problem, it may be exposing a
possible strange race condition in the kernel, so it may be better to
debug what's really going on first.

My current BIOS is A02, the BIOS releases are at least up to A12.  I
haven't had a chance to see what has been fixed in the interim releases.


Comment 7 Josi Romildo Malaquias 2004-11-12 10:26:08 UTC
I am having this problem too, in a fresh install of FC 3.

Comment 8 Luke Hutchison 2004-11-12 14:55:28 UTC
Josi -- what is your hardware platform?

Comment 9 Michael Young 2004-11-13 09:26:26 UTC
I suspect this is a well known bug with laptops which have hard discs
and CD drives on the same IDE channel, where disc accesses are slow
when the CD drive is being talked to. Unfortunately, there are
processes which continually ask the CD drive whether it contains a new
CD in the drive, and these have the same effect. In earlier FCs
magicdev used to do this, and it seems that the haldaemon hald does
this in FC3. Try disabling this and see if performance improves
(service haldaemon stop - though check there is no hald process
running afterwards).

Comment 10 Luke Hutchison 2004-11-15 22:12:51 UTC
Disabling haldaemon fixed the problem.  Thanks!

Do you know what causes the slowdown?  It seems strange to me that
polling the CD drive should result in such a large decrease in IDE bus
throughput.  Is this a problem with the kernel, with the hardware, or
with the BIOS?

Is it possible to have HAL poll less frequently?  It appears to be
polling 4 times per second right now.

Comment 11 Luke Hutchison 2004-11-18 16:09:58 UTC
Alternatively, is there a faster way to check the CDROM drive status?
 It shouldn't knock out the IDE bus for ~0.2 sec each time the drive
state is polled.

Comment 12 David Zeuthen 2004-11-30 22:55:47 UTC
This is being tracked in bug 138148 - it appears that there is a way
to poll without powering up the optical drive, but the kernel appears
to send some commands on open(2) or close(2) that powers up the drive
and thus locks the IDE channel making performance bad on the hard disk.

*** This bug has been marked as a duplicate of 138148 ***

Comment 13 Red Hat Bugzilla 2006-02-21 19:06:54 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.