Bug 122514 - kscand eating lots of cpu
Summary: kscand eating lots of cpu
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Larry Woodman
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-05-05 13:11 UTC by Eugeny Balakhonov
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 19:26:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vmstat output (5.21 KB, text/plain)
2004-05-05 15:09 UTC, Eugeny Balakhonov
no flags Details
Oracle9i Database Server Patch Set Notes (112.31 KB, text/html)
2004-05-05 15:21 UTC, Eugeny Balakhonov
no flags Details
Output of vmstat after setting vm parameters (715.17 KB, text/plain)
2004-06-18 20:48 UTC, Antonio Cruz
no flags Details
Top q b output (903.57 KB, text/plain)
2004-06-18 20:56 UTC, Antonio Cruz
no flags Details

Description Eugeny Balakhonov 2004-05-05 13:11:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
MyIE2; .NET CLR 1.1.4322)

Description of problem:
I have same problem like in a bug 100680.
This occured after an installation of new patch on Oracle 9.2: from 
9.2.0.4 to 9.2.0.5. I didn't have this problem before installation of 
this patch. I didn't do any other changes.

I have computer with two P3 Xeon CPU, with 2 Gb RAM and with 3 SCSI 
HDD.

I have set up "high" level of this problem because it is my 
production system.


Version-Release number of selected component (if applicable):
kernel-2.4.21-4.ELsmp

How reproducible:
Didn't try

Steps to Reproduce:
Install patch 9.2.0.4 -> 9.2.0.5 on installed Oracle
    

Additional info:

Comment 1 Rik van Riel 2004-05-05 14:31:13 UTC
Exactly how much CPU is kscand using?

Does it use that much CPU all the time, or does the CPU use come in
load spikes?

Could you send us 30 seconds of output from 'vmstat 1' at a period
when the problem occurs ?

Comment 2 Larry Woodman 2004-05-05 14:57:04 UTC
There was a bug found and fixed revently in the page_referenced()
function that inadvertantly caused pagecache pages to be aged upward
when they shouldnt have been.  This bug combined with the relatively
large pagecache of an Oracle database can certainly cause kscand to
run more than it should.  We could get you a test kernel with this bug
fix to determine if this is the cause of your problem.

However, can you tell us what the content of the Oracle 9.2.0.5 patch
was or should we contact Oracle about that?

Larry Woodman


Comment 3 Eugeny Balakhonov 2004-05-05 15:09:54 UTC
Created attachment 99992 [details]
vmstat output

Comment 4 Eugeny Balakhonov 2004-05-05 15:10:20 UTC
kscand uses 3%-90% on both CPU. In some time it uses more than 90% of 
CPU! It occurs only if Oracle is running. 

Comment 5 Eugeny Balakhonov 2004-05-05 15:18:50 UTC
Where I can take this test kernel? I have downloaded file 
ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/3AS/en/os/SRP
MS/kernel-2.4.21-9.0.3.EL.src.rpm. I going to try this kernel.

Comment 6 Eugeny Balakhonov 2004-05-05 15:21:18 UTC
Created attachment 99993 [details]
Oracle9i Database Server Patch Set Notes 

Oracle9i Database Server Patch Set Notes 
Release 2 Patch Set 4 Version 9.2.0.5.0 for Linux x86

Comment 7 Larry Woodman 2004-05-05 15:49:19 UTC
Can you also include a quick "top" output at the same time so we can
see what else is running?

Larry


Comment 8 Antonio Cruz 2004-06-18 20:48:44 UTC
Created attachment 101247 [details]
Output of vmstat after setting vm parameters

This is the output of vmstat, showing several occurrences of the jump in load
caused by the kscand daemon. After using the values given by the Red Hat
Support team. I fear to say that those changes didn't help, as the system
continues to misbehave.

I am willing to provide data to help to solve this issue.

Thank you. Regards,

Antonio

Comment 9 Antonio Cruz 2004-06-18 20:50:48 UTC
Our system is a Dell 6650 with 4x 2,5 GHz Xeon, 8 GBytes memory, 
PERC4/DC with 2 36GB 15k RPM drives. QLogic fibre channel, Hitachi 
storage.

Regards,

AC

Comment 10 Antonio Cruz 2004-06-18 20:56:12 UTC
Created attachment 101248 [details]
Top q b output 

This top output shows the problem area, where cpu sys reaches 100%, I/O almost
stops and kscand is the top of cpu consummers. The load jumps from normal 4-5
to 30-50 in a step, and about a minute more the load is 4-5 again.

Regards,

AC

Comment 11 Antonio Cruz 2004-06-18 20:56:59 UTC
Test, to add my address to the Cc list. Sorry. AC

Comment 12 Antonio Cruz 2004-06-21 14:37:30 UTC
Hello all,

is this bug still active? We are experiencing this jumpy load 
behaviour with 2.4.21-15EL SMP here.

I will supply any data needed to help debugging.

Thanks,

Antonio

Comment 13 Larry Woodman 2004-11-29 20:06:40 UTC
I need to know if this is still a problem with the latest RHEL3-U4
kernel.  We fixed a couple problems in kscand and need to know if they
fixed this problem and this bug can bel closed.

Larry Woodman


Comment 14 Eduardo Dias 2004-12-03 20:37:22 UTC
Hello,

I would like to know if the problem solution is just addressed with 
kernel 2.6.xxx (RHEL-U4) or has any fix to 2.4.xxx kernels (RHEL-U2 
or RHEL-U3)?

Thanks,

Eduardo Dias

Comment 15 Antonio Cruz 2004-12-06 09:52:41 UTC
Hi there,

anyone using Dell computers with the new kernel? I have two PowerEdge 
6650 here and had to give up on using clumanager because of the 
outages created by the softdog when the kscand daemon frozen the 
machines, which induced undesired cluster's node switching.

Still no chance to test the new kernel...

Thank you,

Antonio

Comment 16 clive darr 2005-11-10 16:44:54 UTC
we also are also experiencing this problem when using the progress database

kscand and kswapd use ALL the cpu freezing the machine

Comment 17 Rik van Riel 2005-11-10 16:49:21 UTC
This bug should be fixed from RHEL3 U4 onwards.

If kscand is taking too much CPU, you can reduce the kscand scan percentage (in
/proc/sys/vm) to something like 10%.

Comment 18 Antonio Cruz 2005-11-10 17:04:04 UTC
Hi there. We are using the following values. Whith this, the kscand behaviour 
is tamed, as it only wake up from time to time and takes some seconds (1 to 3s) 
at once from the processors. We have an Oracle database, two instances, 900MB 
SGA, 1000-1200 processes/users.

Last login: Thu Nov 10 08:27:39 2005 from gsisssr_alfcruz.hcpa
[root@vega root]# cat /proc/sys/vm/
bdflush                 max_map_count           pagecache
dcache_priority         max-readahead           page-cluster
hugetlb_pool            min-readahead           pagetable_cache
inactive_clean_percent  overcommit_memory       stack_defer_threshold
kswapd                  overcommit_ratio        
[root@vega root]# cat /proc/sys/vm/*
30      500     0       0       500     3000    70      50      0
0
900
30
256     32      8
65536
16
3
0
50
10      75      30
3
25      50
2048
[root@vega root]# 

We are going to try the U4 release when possible. Thank you for the feedback.

Regards,

Antonio

Comment 19 clive darr 2005-11-10 18:19:44 UTC
(In reply to comment #17)

unfortunately we're already using 3.4-2 

please could you expand on your /proc/sys/vm suggestion ?

# cat /proc/sys/vm/kswapd
512     32      8



Comment 20 clive darr 2005-11-10 18:29:32 UTC
I've just discovered that RHEL3 U6 addresses a similar issue

http://rhn.redhat.com/errata/RHSA-2005-663.html
support for new "oom-kill" and "kscand_work_percent" sysctls

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=145950


Comment 21 RHEL Program Management 2007-10-19 19:26:36 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.