Hide Forgot
Description of problem: Receiving this in the log file: Feb 15 08:08:47 npws01 kernel: INFO: task oracle:2517 blocked for more than 120 seconds. Feb 15 08:08:47 npws01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 15 08:08:47 npws01 kernel: oracle D ffff81000900caa0 0 2517 2508 2516 (NOTLB) Feb 15 08:08:47 npws01 kernel: ffff810135b5da58 0000000000000082 0000000000000000 ffff810438833d78 Feb 15 08:08:47 npws01 kernel: ffff810438833c00 0000000000000009 ffff81043fff5860 ffff81010ef0e100 Feb 15 08:08:47 npws01 kernel: 00003b404f3288a6 0000000000044583 ffff81043fff5a48 0000000138833c00 Feb 15 08:08:47 npws01 kernel: Call Trace: Feb 15 08:08:47 npws01 kernel: Call Trace: Feb 15 08:08:47 npws01 kernel: [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90 Feb 15 08:08:47 npws01 kernel: [<ffffffff80028b0b>] sync_page+0x0/0x43 Feb 15 08:08:47 npws01 kernel: [<ffffffff800637ca>] io_schedule+0x3f/0x67 Feb 15 08:08:47 npws01 kernel: [<ffffffff80028b49>] sync_page+0x3e/0x43 Feb 15 08:08:47 npws01 kernel: [<ffffffff8006390e>] __wait_on_bit_lock+0x36/0x66 Feb 15 08:08:47 npws01 kernel: [<ffffffff8003fdc1>] __lock_page+0x5e/0x64 Feb 15 08:08:47 npws01 kernel: [<ffffffff800a28e2>] wake_bit_function+0x0/0x23 Feb 15 08:08:47 npws01 kernel: [<ffffffff80013b22>] find_lock_page+0x69/0xa2 Feb 15 08:08:47 npws01 kernel: [<ffffffff800c805f>] grab_cache_page_write_begin+0x2c/0x89 Feb 15 08:08:47 npws01 kernel: [<ffffffff88665125>] :nfs:nfs_write_begin+0x41/0xf8 Feb 15 08:08:47 npws01 kernel: [<ffffffff8000fda9>] generic_file_buffered_write+0x14b/0x675 Feb 15 08:08:47 npws01 kernel: [<ffffffff8003fdc1>] __lock_page+0x5e/0x64 Feb 15 08:08:47 npws01 kernel: [<ffffffff8001669b>] __generic_file_aio_write_nolock+0x369/0x3b6 Feb 15 08:08:47 npws01 kernel: [<ffffffff80021872>] generic_file_aio_write+0x65/0xc1 Feb 15 08:08:47 npws01 kernel: [<ffffffff8866584d>] :nfs:nfs_file_write+0xd8/0x14f Feb 15 08:08:47 npws01 kernel: [<ffffffff80018301>] do_sync_write+0xc7/0x104 Feb 15 08:08:47 npws01 kernel: [<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e Feb 15 08:08:47 npws01 kernel: [<ffffffff80016aa3>] vfs_write+0xce/0x174 Feb 15 08:08:47 npws01 kernel: [<ffffffff8004400e>] sys_pwrite64+0x50/0x70 Feb 15 08:08:47 npws01 kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0 Feb 15 08:08:47 npws01 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Feb 15 08:08:47 npws01 kernel: Version-Release number of selected component (if applicable): [root@npws01 log]# uname -a Linux npws01.deos.udel.edu 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Occasional during heavy load. Steps to Reproduce: 1. 2. 3. Actual results: System eventually becomes responsive Expected results: Additional info: WOuld liek some idea how to trace what is causing the problem and then how to avoid it.
Hi -- you're running Oracle? Does the problem happen if Oracle is not running? P.
Yes, this is an Oracle server. Unfortunately this is our production server so we can't do any tests. I doubt this problem would recur; we have already disabled the Oracle backups and have not had the problem since. But, we can't go for long like that and would like to know how we can identify what processes are consuming resources such that we received the message above. The message indicates the victim, how do we find the root cause? Thanks.
(In reply to comment #2) > Yes, this is an Oracle server. Unfortunately this is our production server so > we can't do any tests. I doubt this problem would recur; we have already > disabled the Oracle backups and have not had the problem since. Thanks for the info Geoff. > > But, we can't go for long like that and would like to know how we can identify > what processes are consuming resources such that we received the message above. > > The message indicates the victim, how do we find the root cause? There are a few ways to determine the cause by triggering a stack trace or panic when this issue occurs. Both of those, unfortunately, require a modification of the kernel. So this only happens when Oracle is loaded and is doing a backup? P. > > Thanks.