Bug 179573 - magicdev eats CPU
Summary: magicdev eats CPU
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Don Howard
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-02-01 15:01 UTC by Bastien Nocera
Modified: 2008-08-02 23:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-18 21:21:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
magicdev-ignore-aopen-drives.patch (442 bytes, patch)
2006-02-01 15:01 UTC, Bastien Nocera
no flags Details | Diff
Grapg: seconds since epoch vs. time needed for open() (15.20 KB, image/png)
2006-02-20 14:22 UTC, Gunther Mayer
no flags Details
Graph: seconds since epoch vs. Amount of Cached memory (14.27 KB, image/png)
2006-02-20 14:23 UTC, Gunther Mayer
no flags Details
"opencd" raw data from system call trace (edited slighty so gnuplot can read it) (113.20 KB, text/plain)
2006-02-20 14:31 UTC, Gunther Mayer
no flags Details
"memo" raw data from /proc/meminfo (with date and sec since epoch) (1.80 MB, text/plain)
2006-02-20 14:33 UTC, Gunther Mayer
no flags Details
test.c (1.16 KB, text/plain)
2006-08-01 15:20 UTC, Bastien Nocera
no flags Details

Description Bastien Nocera 2006-02-01 15:01:54 UTC
magicdev-1.1.4-8.0-x86_64

magicdev eats loads of CPU on this machine, as it seems to be taking .5 seconds
to open each one of the CD drives.

10:59:25.400380 open("/dev/cdrom", O_RDONLY|O_NONBLOCK|O_EXCL) = 14 <0.519990>
10:59:25.920370 ioctl(14, CDROM_MEDIA_CHANGED, 0x56a9f0) = 1 <0.010000>
10:59:25.930370 ioctl(14, CDROM_DRIVE_STATUS, 0x7fffffff) = 2 <0.000000>
10:59:25.930370 close(14)               = 0 <0.279995>
10:59:26.210365 open("/dev/cdrom1", O_RDONLY|O_NONBLOCK|O_EXCL) = 14 <0.539989>
10:59:26.750354 ioctl(14, CDROM_MEDIA_CHANGED, 0x7fffffff) = 1 <0.000000>
10:59:26.750354 ioctl(14, CDROM_DRIVE_STATUS, 0x7fffffff) = 2 <0.000000>
10:59:26.750354 close(14)               = 0 <0.279995>

Comment 1 Bastien Nocera 2006-02-01 15:01:55 UTC
Created attachment 123965 [details]
magicdev-ignore-aopen-drives.patch

Comment 2 Gunther Mayer 2006-02-20 14:13:30 UTC
@Bastien: your fix does not address the problem but blacklists innocent devices

This "magicdev eats cpu"-bug seems to have been reported for several years now
(google finds reports since year 2000), but was never root-caused. Instead
several myths have been propagated.





Comment 3 Gunther Mayer 2006-02-20 14:20:53 UTC
Hi,

I came to this conclusion:

5) per 2GB memory in "Cached state" the open("/dev/cdrom") system call takes 0.1
seconds longer
   E.g. when you have nearly 16GB "Cached" it will take approx. 0.8 seconds

Appendix
========

Experimental setup (here for a 16GB machine):

1) script script-strace-magicdev 
   strace -T -ttt -p 23055

2) script script-meminfo
   csh
   while 1
    echo `date` `date +%s` `cat /proc/meminfo`
    sleep 1
   end

3) Fill the filesystem cache and empty it for several times,
e.g.
3a)
22#: date ; time dd if=/dev/zero of=/usr3/test.mayerg bs=1024k count=7000 ; date
Mon Feb 20 12:57:36 MET 2006
7000+0 records in
7000+0 records out
0.010u 30.200s 0:30.34 99.5%    0+0k 0+0io 122pf+0w
Mon Feb 20 12:58:06 MET 2006
                                                                               
            ...

3b) Empty "Cached memory" with "consmem -m 14000" (allocate 14GB) [use your
favourite memory consuming application here]; increase and decrease the number
    a bit until "cached" as shown by top goes way below <1000MB

3c) repeat at 3a until enough data samples

4) Visualize the interdependecy with gnuplot
4a) prepare the data: script offending lines etc. from the raw data
      cp script-meminfo memo
      vi memo

      grep open script-strace-magicdev> opencd
      vi opencd
      replace the first decimal point with blank, because milliseconds are not
needed here:
      %s/\./ /
      remove the opens of "mtab" which are interspered, remoce < > from last column

4b) gnuplot for memory usage (e.g."cached" memory from /proc/meminfo)
    plot "memo" using ($7):($20) w lin
4c) gnuplot for the time spent in the open(/dev/cdrom) system call
    plot "opencd" using ($1):($7) w lin

5) Conclusion:   per 2GB memory in "Cached state" the open() takes 0.1 seconds
6) Further investigation: Enable "profile" for the kernel and inspect these numbers:
     2718 .text.lock.ioctl                          35.7632
     11959 invalidate_bdev                           28.7476

7) Result: open("/dev/cdrom") triggers invalidate_bdev (and other?) system calls
which consume
   lots of time under certain circumstances
8) Expected result: open("/dev/cdrom") should not take so long (e.g. <0.005 sec;
this is possible
   as shown be the experiment). There is no reason this should take longer when
"Cached" memory is large.



Comment 4 Gunther Mayer 2006-02-20 14:22:46 UTC
Created attachment 124883 [details]
Grapg: seconds since epoch  vs.   time needed for open()

Comment 5 Gunther Mayer 2006-02-20 14:23:45 UTC
Created attachment 124884 [details]
Graph: seconds since epoch  vs.  Amount of Cached memory

Comment 6 Gunther Mayer 2006-02-20 14:31:38 UTC
Created attachment 124885 [details]
"opencd" raw data from system call trace (edited slighty so gnuplot can read it)

Comment 7 Gunther Mayer 2006-02-20 14:33:04 UTC
Created attachment 124886 [details]
"memo" raw data from /proc/meminfo (with date and sec since epoch)

Comment 8 Owen Taylor 2006-02-20 15:03:06 UTC
This is clearly reporting a kernel issue, not a magicdev issue, and it would
be good to reassign the component. (magicdev isn't even used on
RHEL4, but polling /dev/cdrom is still done I think.)

But that being said, I'd really advise running magicdev or the GNOME desktop
at all on a production server. (Maybe my imagination just doesn't stretch
to workstations with >= 16GB of memory.) For RHEL3 magicdev can be turned off
by just turning off automount and autoplay in the GNOME preferences.

Comment 9 Gunther Mayer 2006-02-20 15:23:41 UTC
I agree this is a kernel issue, but don't know how to reassign the component,
can you do this here?

And yes, we are using 16 GB workstations with RHEL3 and would like to preserve
magicdev for the sake of user-friendliness (because this mounts inserted CDs
nicely).

Please see my IT#65386 and IT#85873.

Comment 10 John (J5) Palmieri 2006-02-20 15:57:37 UTC
Reassigning to kernel as per comment #8

Comment 15 David Lawrence 2006-04-18 20:26:03 UTC
NEEDINFO_ENG has been deprecated in favor of NEEDINFO or ASSIGNED. Changing
status to ASSIGNED for ENG review.

Comment 16 Don Howard 2006-04-18 20:29:06 UTC
Hello Gunther - 

When cdrom_open() detects that the media has changed it must invalidate all
buffers associated with the device, otherwise you run the risk of data
corruption due to cached pages being out-of-sync with the disk.  A machine with
16GB of memory can have a very large buffercache ("cached state" memory).  The
large lag on opening the cdrom device is not a bug, but a side-effect of the
unusually large (for a workstation) buffercache.

The real soure of the problem appears to be that ioctl(CDROM_MEDIA_CHANGED)
always returns true and ioctl(CDROM_DISK_STATUS) always returns CDS_TRAY_OPEN in
the strace log provided here.  

14:50:20.319004 open("/dev/cdrom", O_RDONLY|O_NONBLOCK|O_EXCL) = 14 <0.008773>
14:50:20.327874 ioctl(14, CDROM_MEDIA_CHANGED, 0x56ac10) = 1 <0.002832>
14:50:20.330793 ioctl(14, CDROM_DRIVE_STATUS, 0x7fffffff) = 2 <0.000864>
14:50:20.331741 close(14)               = 0 <0.000231>
14:50:20.332055 open("/dev/cdrom1", O_RDONLY|O_NONBLOCK|O_EXCL) = 14 <0.011797>
14:50:20.343942 ioctl(14, CDROM_MEDIA_CHANGED, 0x7fffffff) = 1 <0.004024>
14:50:20.348051 ioctl(14, CDROM_DRIVE_STATUS, 0x7fffffff) = 2 <0.001275>


/* drive status possibilities returned by CDROM_DRIVE_STATUS ioctl */
#define CDS_NO_INFO             0       /* if not implemented */
#define CDS_NO_DISC             1
#define CDS_TRAY_OPEN           2
#define CDS_DRIVE_NOT_READY     3
#define CDS_DISC_OK             4




From the sysreport info, it looks like this system is a SUN W1100z/2100z. The
hardware cert catalog shows this system to be certified as compatible, but the
listed cdrom device is a Samsung model.  See:

    http://bugzilla.redhat.com/hwcert/show.cgi?id=148524


The cdrom drives in the sysreport appear to be Acer/Aopen:

    hdc: AOPEN COM5232/AAH PRO, ATAPI CD/DVD-ROM drive
    hdd: AOPEN DUW1608/ARR, ATAPI CD/DVD-ROM drive

Is there any way that you can test with another cdrom device(s)?  The one that
is certified with this system in the HCL is a SAMSUNG CDRW/DVD SM-352F.

Comment 20 Bastien Nocera 2006-08-01 15:18:11 UTC
The RHEL3 kernel seems to think that there is constantly a new CD when a media
is in the drive. Ie. the attached testcase will show 10 occurences of "Detected
CD change with CDROM_MEDIA_CHANGED", instead of just the one.

Comment 21 Bastien Nocera 2006-08-01 15:20:28 UTC
Created attachment 133406 [details]
test.c

This same code seems to work fine on RHEL4.
Am I doing (and magicdev by the same token) something stupid?

Comment 28 RHEL Program Management 2007-01-18 21:21:55 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 


Note You need to log in before you can comment on or make changes to this bug.