Bug 98070 - ()IDE) Cenetek Rocket Drive causes kjournald high cpu
Summary: ()IDE) Cenetek Rocket Drive causes kjournald high cpu
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-06-25 23:51 UTC by Brian Feeny
Modified: 2007-04-18 16:55 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-01-05 19:35:54 UTC
Embargoed:


Attachments (Terms of Use)

Description Brian Feeny 2003-06-25 23:51:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC)

Description of problem:
Installed a Cenetek Rocket Drive, and enabled ext3 on it.  kjournald is now 
using between 50-80% cpu as reported by top.

System Info:
------------
Intel SE7501BR2 motherboard
Dual Xeon 2.6Ghz
2GB Memory
2 x 3ware 7500 Escalade IDE RAID Controllers
Cenetek Rocket Drive 4GB

Before the installation of the Cenetek, kjournald looked like this:

root        21  0.0  0.0     0    0 ?        SW   Jun24   0:08 [kjournald]
root       148  0.0  0.0     0    0 ?        SW   Jun24   0:00 [kjournald]
root       149  0.0  0.0     0    0 ?        SW   Jun24   0:04 [kjournald]
root       150  0.0  0.0     0    0 ?        SW   Jun24   0:08 [kjournald]
root       151  0.0  0.0     0    0 ?        SW   Jun24   0:23 [kjournald]
root       152  0.5  0.0     0    0 ?        DW   Jun24   2:36 [kjournald]


After the installation of the Cenetek Rocket Drive, kjournald now looks like 
this:

root        21  0.1  0.0     0    0 ?        SW   17:51   0:02 [kjournald]
root       148  0.0  0.0     0    0 ?        SW   17:51   0:00 [kjournald]
root       149  0.0  0.0     0    0 ?        SW   17:51   0:01 [kjournald]
root       150  0.4  0.0     0    0 ?        SW   17:51   0:09 [kjournald]
root       151  1.1  0.0     0    0 ?        SW   17:51   0:22 [kjournald]
root       152 60.1  0.0     0    0 ?        DW   17:51  20:04 [kjournald]

This machine is a mailserver.  Here is some df and iostat info:

[root@mercury tmp]# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/sda2              5036316   3909992    870492  82% /
/dev/sda1                46636     14371     29857  33% /boot
none                   1032412         0   1032412   0% /dev/shm
/dev/sda5              5044156   1447792   3340132  31% /var/log/qmail
/dev/sdb1             76928448   4283476  68737164   6% /var/spool/spam
/dev/sdc1            114247496  42658632  65785388  40% /home/cust
/dev/hde1              4062768    124352   3728708   4% /var/qmail/queue
sol:/home/ftp/pub/mirrors/updates.redhat.com/7.3
                       5044156   4186104    601820  88% /usr/src/redhat/
updates.redhat.com
[root@mercury tmp]# iostat -x
Linux 2.4.20-18.7smp (mercury.shreve.net)       06/25/2003

avg-cpu:  %user   %nice    %sys   %idle
          13.46    0.00   64.63   21.91

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
/dev/sda    23.03  21.89  3.17  3.91  209.16  206.51   104.58   103.26    58.66     
1.38  194.25  43.36   3.07
/dev/sda1    0.02   0.00  0.01  0.00    0.06    0.01     0.03     0.01     5.03     
0.01  497.30 494.59   0.07
/dev/sda2   22.66  18.91  3.12  3.29  206.25  177.72   103.13    88.86    59.87     
1.24  192.79  47.07   3.02
/dev/sda3    0.00   0.00  0.00  0.00    0.01    0.00     0.00     0.00     8.00     
0.00  100.00 100.00   0.00
/dev/sda5    0.35   2.98  0.04  0.62    2.83   28.78     1.41    14.39    48.20     
0.13  202.02 152.54   1.00
/dev/sdb     0.17  13.05  4.98  6.32   40.83  155.02    20.41    77.51    17.33    
15.64 1384.31  55.21   6.24
/dev/sdb1    0.17  13.05  4.98  6.32   40.83  155.02    20.41    77.51    17.34    
15.64 1384.36  55.21   6.24
/dev/sdc    10.60  65.25 28.84 58.14  315.07  987.38   157.54   493.69    14.97    
20.95  240.87  35.40  30.79
/dev/sdc1   10.59  65.25 28.84 58.14  315.04  987.38   157.52   493.69    14.97    
20.95  240.88  35.39  30.79
/dev/sdc2    0.01   0.00  0.00  0.00    0.03    0.00     0.02     0.00     8.00     
0.00   80.00  80.00   0.00
/dev/hde     1.34 551.53  6.20 235.76   60.18 6299.56    30.09  3149.78    
26.28   167.91   24.34  23.19  56.12
/dev/hde1    1.34 551.53  6.20 235.76   60.18 6299.56    30.09  3149.78    
26.28     5.88   24.34  20.39  49.34

I am willing to do whatever you all need from me, to help diagnose.  If you 
need me to run profile=2, i can do that.

System load is impacted, load is:

  6:34pm  up 42 min,  3 users,  load average: 15.28, 14.46, 13.11

It will go as high as 20-25.  There is plenty of memory (2GB) available, and 
really we are overkill in the memory and cpu department.  The mail queue (qmail 
system) is very thrashy, and so we moved it off the IDE RAID onto a Rocket 
Drive (RAM Drive), to lower the Transactions Per Second on that drive array.

Perhaps this is either a bug with Cenetek, or I need to use a different ext 
journaling mode? or perhaps ext3 is a bad idea on a thrashy mail queue?

My last attempt at putting this ram drive in, resulted in a some NMI error that 
left the box hosed.  e2fsck could not recover, what it did recover, was a bunch 
of trashed data and unlinked inodes in lost+found.  I attributed it to bad 
karma since alot of other things "changed" that day during the scheduled 
downtime.  I am now wondering if maybe kjournald caused the NMI, and was just 
buried so hard it corrupted the journal.  Maybe thats far fetched, appreciate 
any help.




Version-Release number of selected component (if applicable):
kernel-2.4.20-18.7

How reproducible:
Always

Steps to Reproduce:
1. Install Cenetek Rocket Drive
2. Enable ext3 on Rocket Drive
3. Thrash the drive with lots of read/writes (bonnie maybe), and watch 
kjournald go thru the roof.
    

Actual Results:  kjournald is in a state of high cpu utilization

Expected Results:  kjournald should get a little busier with an addtional 
journal, but not to the extent of being 100x more cpu intensive.

Additional info:

The machine is running Qmail, and is a mailserver.  It uses courier-imap for 
imapd and pop3d.  It also runs maildrop as a LDA, apache for webmail via 
squirrelmail, and a few other mail related things such as mailman.  The box has 
14 drives laid out as follows:

/dev/hda     RAID1       1 set of 2 drives
/dev/hdb     RAID10      2 sets of 2 drives
/dev/hdc     RAID10      3 sets of 2 drives

plus 2 hot spares.  This is all on the 3ware Escalade 7500 series controllers.  
All drives are 7200 RPM matched drives.

Comment 1 Alan Cox 2003-06-27 20:23:36 UTC
NMI comes from hardware so that may well indicate the card wasnt inserted
properly or you inserted it while soft off not with power removed etc. Probably
irrelevant in itself

As to the drive itself, Cenetek afaik doesn't have Linux DMA driver support so
the transfers would be PIO and thus very slow. Does hdparm think the device is
in dma mode ?


Comment 2 Arjan van de Ven 2003-06-28 09:21:31 UTC
cenatek supports DMA but it might not be enabled by default; hdparm -d1 will fix
that

Comment 3 Alan Cox 2003-06-28 19:08:36 UTC
Ahah ok. In which case can you attach a dmesg of the boot up and I'll tweak the
ide generic
for cenatek to turn on DMA (if the hdparm -d1 works)


Comment 4 Dave Jones 2004-01-05 19:35:54 UTC
EOL'd, and needinfo > 6 months.



Note You need to log in before you can comment on or make changes to this bug.