Sorry about bugspamming this module but I've been searching a lot for any
pointer about a little problem we are having on a production system and I didn't
want to write directly to any of the RH kernel maintainers.
We run an Oracle 18.104.22.168 database on a Fedora Core 3 system with kernel
2.6.10-1.770_FC3smp and from time to time (sometimes every day, sometimes every
2 weeks) an Oracle process sits there using 100% of the CPU doing nothing,
besides this it seems to be locking a shared resource because the rest of the
Oracle processes are unable to get the CPU (the avg load goes down to 1 constant).
The relational database is now unaccessible because it doesn't allow further
connections from the clients (SQLPlus or TOAD), so our DBAs are unable to check
what is causing this.
At the OS level everything seems ok, I try to 'strace -p' the faulting Oracle
process but I don't get any syscall (In a normal state, there is always a lot of
syscalls). Then I 'kill -9' the process and it shutdown normally but then
another random Oracle process takes over where the previous process left off and
its CPU usage goes 100%. This happens all over again any time I kill the new
The only workaround we have right now is killing the PMON process for that
instance and then restarting the database. Obviously this causes downtime on
our heavily used web app.
I am asking all of you for any advice you can give me about how to track this
problem down, I mean, I checked the syslog, strace the process, but other than
that I'm out of ideas. It surely smells like an Oracle bug and not a kernel bug
but I wonder if you have had some experiences like this in the RHEL 3/4 arena.
What other kind of tracking can I use at the kernel level?
Another idea is to upgrade to the latest kernel and apply any other update
available for FC3 but I don't know if that is wise since the ever changing (and
ever improving) kernel can create new problems with the Oracle system.
Our DBAs are looking in Oracle MetaLink but since Fedora is not a supported
system, they are reluctant to give further assistance.
Thanx in advance,
PS. Feel free to close this bug or put it in NEEDINFO or any other state.
there are some profiling tools (oprofile for eg) which may help point the
finger, but as you point out, this is likely an oracle problem, so I'd try and
get help out of those folks.
Sorry, no other ideas.