Description of problem: A startup of a database using RAC/NFS/Hugepages on ia64 will result in a machine deadlock. Version-Release number of selected component (if applicable): How reproducible: Everytime, but removing any of the 3 (RAC/NFS/Hugepages) and the problem won't reproduce. Steps to Reproduce: Create a RAC DB on nfs storage, and allocate hugepages for the SGA. Actual results: The sysrq-p data shows the following process on-CPU: CPU2: Call Trace: [<a000000100016da0>] show_stack+0x80/0xa0 [<a0000001003273b0>] showacpu+0x50/0x80 [<a00000010005b340>] handle_IPI+0x200/0x340 [<a0000001000130f0>] handle_IRQ_event+0x90/0x120 [<a000000100013ae0>] do_IRQ+0x180/0x560 [<a000000100015d10>] ia64_handle_irq+0xf0/0x1e0 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 [<a000000100008cb0>] ia64_spinlock_contention+0x30/0x60 [<a000000100593de0>] __lock_text_start+0x40/0x60 [<a0000001001294b0>] __set_page_dirty_buffers+0x30/0x300 [<a0000001000e1eb0>] set_page_dirty+0xf0/0x180 [<a0000001000e1fd0>] set_page_dirty_lock+0x90/0xc0 [<a000000200ab5010>] nfs_free_user_pages+0xd0/0x120 [nfs] [<a000000200ab55a0>] nfs_direct_complete+0x40/0x140 [nfs] [<a000000200ab5900>] nfs_direct_write_result+0x140/0x160 [nfs] [<a000000200a77b20>] nfs_writeback_done+0x2e0/0x420 [nfs] [<a000000200a7fa60>] nfs3_write_done+0xc0/0x100 [nfs] [<a0000002007d91b0>] __rpc_execute+0x770/0xb40 [sunrpc] [<a0000002007d9630>] __rpc_schedule+0xb0/0x240 [sunrpc] [<a0000002007da5b0>] rpciod+0x2d0/0x9a0 [sunrpc] The sysrq-t data shows a waiting D state oracle process as follows: oracle D a000000100590c60 0 19029 1 19031 19027 Call Trace: [<a00000010006ab70>] context_switch+0x470/0x9a0 [<a000000100590c60>] schedule+0x700/0x17c0 [<a000000200ab54b0>] nfs_direct_wait+0x3f0/0x4a0 [nfs] [<a000000200ab70a0>] nfs_file_direct_write+0xa00/0xee0 nfs] [<a000000200a5b870>] nfs_file_write+0x1d0/0x2a0 [nfs] [<a000000100123d80>] do_sync_write+0x140/0x1a0 [<a000000100124070>] vfs_write+0x290/0x360 [<a000000100124540>] sys_pwrite64+0x100/0x140 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20 Expected results: Expect the server not to deadlock. Additional info: Customer Name: Mercado Libre Oracle TAR: 5700008.993 Oracle Bug: 5520420 Redhat Issue Tracker: 123181 mainline fix info http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=4736ba03c29ab2e7764e1aed9858de823f69d2ad
Created attachment 161399 [details] linux-2.6.9-nfs-dio-aware-compound.patch
I reviewed the patch, it should be included in rhel 4.6 kernel
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
The patch has been posted, change status to post.., if the re-post is necessary for the future release, please just let me know.
committed in stream U7 build 68.4. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
Hi Guys, I´m the customer (Mercadolibre), We are not migrating to RH4 cause this bug, how can we do to have this patch released? Thanks Rodrigo. rodrigo
Hi Rodrigo, We just incorporated this patch into the RHEL 4.7 beta. Can you please download the beta from RHN, test the patch and report your results here? Thank you!
Thanks , When 4.7 will be official? cause we have a migration from RH3 to RH4 stopped waiting for this patch? Do you think could we have this patch ported to 4.6? Thanks Rodrigo.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html
Rodrigo, RHEL 4.7 GA is now on RHN, which includes this patch.