Bug 395441 - When kde is running, access to disk file mounted with loopback is terribly slow [was: ppp]
Summary: When kde is running, access to disk file mounted with loopback is terribly sl...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 8
Hardware: i686
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 427887
TreeView+ depends on / blocked
 
Reported: 2007-11-22 10:13 UTC by Wojciech Pilorz
Modified: 2008-10-04 05:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-10-04 05:07:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
iso 9660 image for testing and command to recreate it (3.15 KB, application/octet-stream)
2008-01-30 21:56 UTC, Wojciech Pilorz
no flags Details
bash command to do tests and results on Fedora8 (2.24 KB, application/octet-stream)
2008-01-30 22:06 UTC, Wojciech Pilorz
no flags Details
a few snapshots of /proc/interrupts (4.25 KB, text/plain)
2008-02-21 18:50 UTC, Wojciech Pilorz
no flags Details
dmesg output (23.26 KB, text/plain)
2008-02-21 18:51 UTC, Wojciech Pilorz
no flags Details
Contents of /proc/timer_list (2.78 KB, text/plain)
2008-02-21 19:00 UTC, Wojciech Pilorz
no flags Details
lspci output of two machines (11.44 KB, application/octet-stream)
2008-03-21 10:37 UTC, Wojciech Pilorz
no flags Details

Description Wojciech Pilorz 2007-11-22 10:13:39 UTC
Description of problem:

Thie is really weird.
If I have, say, a CD ISO image on the disk, I can read it with disk speed,
about 30MB/s on my hardware, the disk light in on during the read process.

Unfortunately, if both of the following is true:
1. the file is mounted with lookback option
2. ppp interface had been configured since boot time

then reading the files becaomes terribly slow, less than 2MB/s
the disk light is barely blinking;
vmstat and ioststa shows a lot of time spent in iowait


Version-Release number of selected component (if applicable):
Linux version 2.6.22.9-61.fc6 ( kernel-2.6.22.9-61.fc6.i686 )
ppp-2.4.4-1.fc6.i386

How reproducible:
Always in my setup; this is P4 Compaq D51S machine with 256MB of RAM,
running Fedora Core 6 with KDE.

Steps to Reproduce:
1. create test ISO image of 100MiB size as follows:

mkdir tmp_test_cd
cd tmp_test_cd
mkdir CD
dd if=/dev/zero bs=1M count=10 of=CD/NULLs_1Mi.f00
cd CD/
for x in $(perl -e 'for ("01" .. "10") { print "$_\n"}'); do ln NULLs_1Mi.f00
NULLs_1Mi.f$x; done
cd ..
mkisofs -r -q -no-cache-inodes CD > CD_100Mi.iso


2. mount it as loopback (as root) :
mkdir /mnt/loop /mnt/loop/CD
mount -o ro,loop tmp_test_cd/CD_100MI.iso /mnt/loop/CD

3. check transfer rate :
time md5sum -b tmp_test_cd/CD_100MI.iso

4. have another Linux PC connected via null modem cable (connecting two serial
ports)

5. run on this machine
pppd /dev/ttyS1 115200 crtscts nodefaultroute lock 10.0.1.11:10.0.1.21 deflate 12
and on the other machine
pppd /dev/ttyS0 115200 crtscts nodefaultroute lock 10.0.0.21:10.0.0.11 deflate
12 passive

(NOTE: ttyS0 or ttyS1 should reflect to which serial port the cable is connected)

6. check again reading the file
time md5sum -b tmp_test_cd/CD_100MI.iso
as well as reading files on mountes filesystem
md5sum -b /mnt/loop/CD/*
which is even slower, below 1MB/s
During read process vmstat shows like this

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  3 126136   3236   2920  63604    1    1   204    39   96  188  2  1 96  2  0
 0  2 126136   4508   2920  62164    0    0  1154     0  108  935  8  2  0 90  0
 1  3 126136   3984   2924  62716    0    0  2054     5  114  838  3  3  0 95  0
 0  3 126132   3576   2928  62988    0    0  2259     0  121  825 12  2  0 86  0
 0  2 126132   4164   2836  62392    0    0   896     0  110 1115 14  3  0 84  0
 0  3 126132   2960   2828  63704    0    0  3331     4  110 1175  4  3  0 94  0
 0  2 126132   4140   2796  62492    0    0   896     0   98  599  3  1  0 96  0
 0  3 126132   3060   2956  62904  122    0  1986    10  173  994  4  3  0 94  0
 0  3 126132   3832   2924  62204    0    0  2178     0  105  892  3  1  0 95  0
 0  2 126132   4196   2808  61932    0    0  2560    15  105  918  3  2  0 95  0
 0  3 126132   4072   2808  62092    0    0  2178     0  103  842  4  2  0 95  0
 0  3 126132   3060   2804  62984    0    0  1664     0  101  797  4  1  0 96  0

please note that we have 94% - 96% in iowait state.


  
Actual results:
Intolerably slow access to files in mounted loopback system
and to the mounted image file.

If I kill pppd and remove via rmmod all modules I could find 
relating to pppd, 
reading is still very slow.
The only way I could find get work it with normal speed is to reboot
the system (and do not configure ppp)

Expected results:
Should be able to read files from fikleysstem mounted via loopback 
at normal speed even if ppp is configured.
On by setup loopback mount is almost unusable when I use pppd with null modem
connection to another Linux machine.

Additional info:
nothing relevant in /var/log/messages or dmesg output

Comment 1 Jon Stanley 2008-01-08 01:50:46 UTC
(This is a mass-update to all current FC6 kernel bugs in NEW state)

Hello,

I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 2 Wojciech Pilorz 2008-01-28 15:11:54 UTC
It seems that the real factor deciding whether the problem occurs
is whether KDE environment is started or not.
I have missed this initially since I have KDE running most of the time.

I have reproduced the problem on Fedora 8 with kernel 2.6.23.9-85.fc8.

The interesting thing is that while the problem is also evident
 on FedoraCore 6 with kernel kernel-2.6.22.14-72.fc6.i686, the original FC6
kernel kernel-2.6.18-1.2798.fc6.i686 is perfectly OK.
I will follow with more details later today.

Comment 3 Wojciech Pilorz 2008-01-30 21:56:52 UTC
Created attachment 293492 [details]
iso 9660 image for testing and command to recreate it

File CD_700M.iso_gen.tgz contains:
mk_tmp_test_largecd.bash_hist  - bash commnand to recreate file CD_700Mi.iso
CD_700Mi_files_LRGCD.tar.gz.gz.gz  - tar of LRGCD directory, compressed 3 times

CD_700Mi.iso.gz.gz.gz  - 700MiB CD ISO image, compressed 3 times
CD_700Mi.iso.md5  - MD5 of decompressed ISO image

Comment 4 Wojciech Pilorz 2008-01-30 22:06:43 UTC
Created attachment 293497 [details]
bash command to do tests and results on Fedora8

File CD_700M_Fedora8_tests.tgz contain the following files:
080128_CD700loop_F8.bash_hist  - command to run tests; path needs to be   
corrected to one the image is actually installed
080128_CD700loop_F8_mounted1.ddlog  - screen log  of running commnands
080128_CD700loop_F8_NOTmounted2_KDE.iostat5 - results of iostat -x 5 5
080128_CD700loop_F8_NOTmounted2_KDE.ddlog - screen log of running commands
080128_CD700loop_F8_mounted3_KDE.iostat5 - results of iostat -x 5 5
080128_CD700loop_F8_mounted3_KDE.ddlog - screen log of running commands

*mounted1* is with only GDM running (no KDE)
*NOTmouned2* is with KDE running but without mounting image
*mounted3* is with KDE running and image loop mounted

Comment 5 Wojciech Pilorz 2008-01-30 22:51:39 UTC
I tend to believe that the problem I experience
is related to new 'tickless' (nohz) kernel feature.

This is because
- 2.6.18 Fedora6 kernel is OK, and it does not have nohz
- 2.6.22, 2.6.23 Fedora kernels on P4 machines I tried, where nohz actually is
  turned on (verified by grep timer /proc/interrups returning small almost
  constant number) - there is problem here On my Fedora8 P4 machine (256MB
  RAM, i8xx based mobo) it is enough to start KDE with 2 konsole-s, 4 bash
  session each and access to image mounted via loopback is most of the time
  about 1.3 - 1.5 MB/s down from 25-30MB/s expected.

  On my Fedora6 P4 machine (256MB RAM, same mobo) and 2.6.22 Fedora kernel
  sometimes it is enough to start KDE (I have about 10konsoles with about 60
  bash sessions total, plus konqueror, tsclient, some CIFS mounts)
  sometimes I also need to start pppd over serial cable
- 2.6.22 on my home machine with Fedora Core 6 does have nohz (tickless)
  feature, but somehow it does not get enabled, which is verified
  by grep timer /proc/interrupts
  returning number increasing at 1000/second rate
  (that machine is old PIII system, Intel BX440 based motherboard)

Notes:
- I can see that even without KDE running Fedora8 is about 2 times slower
than Fedora6, when accessing the image file.

- If the tests are being performed on machine with more than 700 MB of RAM,
I would suggest that larger image be created or that 
pagecache be freed between tests via
echo 1 > /proc/sys/vm/drop_caches

- Also, If CPU is slower than P4, you might try replacing md5sum with wc -c.

- I attempted to disable nohz on P4 machine (Fedora8) by passing
  nohz=off
  in kernel parameters from grub, but that did not disable nohz feature.




Comment 6 Christopher Brown 2008-02-14 00:00:03 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

(In reply to comment #5)

> - I attempted to disable nohz on P4 machine (Fedora8) by passing
>   nohz=off
>   in kernel parameters from grub, but that did not disable nohz feature.

Have you tried editing the kernel parameters directly in grub.conf?

Comment 7 Wojciech Pilorz 2008-02-20 19:15:32 UTC
Ok, to be usre I tried today with most recent Fedora kernel;
The grub.conf entry I was using was
title Fedora nohz=off (2.6.23.15-137.fc8)
        root (hd0,2)
        kernel /boot/vmlinuz-2.6.23.15-137.fc8 ro root=LABEL=/ rhgb quiet nohz=off
        initrd /boot/initrd-2.6.23.15-137.fc8.img

Output of dmesg command contains the line
Kernel command line: ro root=LABEL=/ rhgb quiet nohz=off

grep timer /proc/interrupts
show constant number during the test,
while I expected that nohz=off would result
in a number of timer interrupts increasing at a 1000/second rate
It seems like nohz=off gets ignored.

Anyway, the problem is still present in this kernel.
starting KDE results in read data transfer rate
for a mounted loopback file to  drop from 32MiB/s to 1.4MiB/s (more than 20
times slower).

Should I attach more details?

Wojtek


Comment 8 Chuck Ebbert 2008-02-20 22:23:19 UTC
(In reply to comment #7)
> grep timer /proc/interrupts
> show constant number during the test,
> while I expected that nohz=off would result
> in a number of timer interrupts increasing at a 1000/second rate
> It seems like nohz=off gets ignored.
> 

Timers run from the local apic sometimes. Is the "LOC" number increasing?


Comment 9 Wojciech Pilorz 2008-02-21 18:36:36 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > grep timer /proc/interrupts
> > show constant number during the test,
> > while I expected that nohz=off would result
> > in a number of timer interrupts increasing at a 1000/second rate
> > It seems like nohz=off gets ignored.
> > 
> 
> Timers run from the local apic sometimes. Is the "LOC" number increasing?
> 
Yes, LOC number is increasing.
I did not pay attention to it (and I cannot access that machine today), so I
cannot tell at what rate (but the order of magnitude is same as or close to that
of 1000/s)
I will attach dmesg output and result of cat /proc/timer_list
that I saved.
Perhaps the problems with data transfer rate when reading from loop-back mounted
images are unrelated to dynticks/nohz?


Comment 10 Wojciech Pilorz 2008-02-21 18:50:42 UTC
Created attachment 295551 [details]
a few snapshots of /proc/interrupts

Actually I have saved enough information to compute
LOC rate - it is exactly 1000/s (+-2%).
But it is the same on some other machine (2CPU PIII server running
FC6 and 2.6.22.14-72.fc6 kernel) which do not has nohz=off kernel parameter
specified. So I am not sure is LOC number rate can be used to determine whether

nohz=off was effective?

Comment 11 Wojciech Pilorz 2008-02-21 18:51:58 UTC
Created attachment 295552 [details]
dmesg output

Output from running dmesg commnad on the machine the tests were running on.

Comment 12 Wojciech Pilorz 2008-02-21 19:00:33 UTC
Created attachment 295553 [details]
Contents of /proc/timer_list

In case it could be useful, I am enclosing results
of cat /proc/timer_list.
The contents was taken before doing transfer tests (same for other files
attached)

Comment 13 Chuck Ebbert 2008-03-18 23:47:44 UTC
please post output of 'lspci -vnn' from  the machine that is having the problem.

Comment 14 Wojciech Pilorz 2008-03-21 10:37:43 UTC
Created attachment 298786 [details]
lspci output of two machines

I am attaching output of lspci with different options.
lspci has been run on two almost identical machines
(Compaq Deskpro EN SFF, with P4 and 256 MB RAMl;
 One had motherboard exchanged after failure
 and has possibly slighttly different 
 SCSI adapter (not in active use, anyway)

I am not sure if you can trust lspci output completely - 
lspci -vvn shows different IRQ values from
lspci -vvn -F xxx.lspci-x
 run on result of 
lcpci -x > xxx.lspci-x

Anyway I assumed that you meant lspci -vvn rather than lspci -vnn,
since lspci -vnn and lspci -vn have produced identical output.
(On ther other hand lspci -vvvn and lspci -vvn also have produced
 identical output)

Wojtek

Comment 15 Wojciech Pilorz 2008-10-02 13:00:23 UTC
I have moved to a newer and faster machine and installed Fedora 8 on that;
The system behaves normally. no such problem as described.
I have KDE running, serial link active.

Move that to X files?

Comment 16 Chuck Ebbert 2008-10-04 05:07:46 UTC
Closing: it got fixed but we don't know why or how.


Note You need to log in before you can comment on or make changes to this bug.