Sorry about bugspamming this module but I've been searching a lot for any pointer about a little problem we are having on a production system and I didn't want to write directly to any of the RH kernel maintainers. We run an Oracle 9.2.0.6 database on a Fedora Core 3 system with kernel 2.6.10-1.770_FC3smp and from time to time (sometimes every day, sometimes every 2 weeks) an Oracle process sits there using 100% of the CPU doing nothing, besides this it seems to be locking a shared resource because the rest of the Oracle processes are unable to get the CPU (the avg load goes down to 1 constant). The relational database is now unaccessible because it doesn't allow further connections from the clients (SQLPlus or TOAD), so our DBAs are unable to check what is causing this. At the OS level everything seems ok, I try to 'strace -p' the faulting Oracle process but I don't get any syscall (In a normal state, there is always a lot of syscalls). Then I 'kill -9' the process and it shutdown normally but then another random Oracle process takes over where the previous process left off and its CPU usage goes 100%. This happens all over again any time I kill the new undesired process. The only workaround we have right now is killing the PMON process for that instance and then restarting the database. Obviously this causes downtime on our heavily used web app. I am asking all of you for any advice you can give me about how to track this problem down, I mean, I checked the syslog, strace the process, but other than that I'm out of ideas. It surely smells like an Oracle bug and not a kernel bug but I wonder if you have had some experiences like this in the RHEL 3/4 arena. What other kind of tracking can I use at the kernel level? Another idea is to upgrade to the latest kernel and apply any other update available for FC3 but I don't know if that is wise since the ever changing (and ever improving) kernel can create new problems with the Oracle system. Our DBAs are looking in Oracle MetaLink but since Fedora is not a supported system, they are reluctant to give further assistance. Thanx in advance, -William PS. Feel free to close this bug or put it in NEEDINFO or any other state.
there are some profiling tools (oprofile for eg) which may help point the finger, but as you point out, this is likely an oracle problem, so I'd try and get help out of those folks. Sorry, no other ideas.