Bug 252400 - RHEL4 U5: ia64 machine hang when DB starts using rac/nfs/hugepages
RHEL4 U5: ia64 machine hang when DB starts using rac/nfs/hugepages
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.5
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Luming Yu
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-15 16:07 EDT by Keshav Sharma
Modified: 2013-08-05 21:43 EDT (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 15:15:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
linux-2.6.9-nfs-dio-aware-compound.patch (810 bytes, patch)
2007-08-15 16:07 EDT, Keshav Sharma
no flags Details | Diff

  None (edit)
Description Keshav Sharma 2007-08-15 16:07:18 EDT
Description of problem:

A startup of a database using RAC/NFS/Hugepages on ia64
will result in a machine deadlock.


Version-Release number of selected component (if applicable):


How reproducible:

Everytime, but removing any of the 3 (RAC/NFS/Hugepages) and
the problem won't reproduce.

Steps to Reproduce:

Create a RAC DB on nfs storage, and allocate hugepages
for the SGA.
  
Actual results:

The sysrq-p data shows the following process on-CPU:

CPU2:
Call Trace:
[<a000000100016da0>] show_stack+0x80/0xa0
[<a0000001003273b0>] showacpu+0x50/0x80
[<a00000010005b340>] handle_IPI+0x200/0x340
[<a0000001000130f0>] handle_IRQ_event+0x90/0x120
[<a000000100013ae0>] do_IRQ+0x180/0x560
[<a000000100015d10>] ia64_handle_irq+0xf0/0x1e0
[<a00000010000f600>] ia64_leave_kernel+0x0/0x260
[<a000000100008cb0>] ia64_spinlock_contention+0x30/0x60
[<a000000100593de0>] __lock_text_start+0x40/0x60
[<a0000001001294b0>] __set_page_dirty_buffers+0x30/0x300
[<a0000001000e1eb0>] set_page_dirty+0xf0/0x180
[<a0000001000e1fd0>] set_page_dirty_lock+0x90/0xc0
[<a000000200ab5010>] nfs_free_user_pages+0xd0/0x120 [nfs]
[<a000000200ab55a0>] nfs_direct_complete+0x40/0x140 [nfs]
[<a000000200ab5900>] nfs_direct_write_result+0x140/0x160 [nfs]
[<a000000200a77b20>] nfs_writeback_done+0x2e0/0x420 [nfs]
[<a000000200a7fa60>] nfs3_write_done+0xc0/0x100 [nfs]
[<a0000002007d91b0>] __rpc_execute+0x770/0xb40 [sunrpc]
[<a0000002007d9630>] __rpc_schedule+0xb0/0x240 [sunrpc]
[<a0000002007da5b0>] rpciod+0x2d0/0x9a0 [sunrpc]


The sysrq-t data shows a waiting D state oracle process
as follows:

oracle        D a000000100590c60     0 19029      1         19031 19027
Call Trace:
[<a00000010006ab70>] context_switch+0x470/0x9a0
[<a000000100590c60>] schedule+0x700/0x17c0
[<a000000200ab54b0>] nfs_direct_wait+0x3f0/0x4a0 [nfs]
[<a000000200ab70a0>] nfs_file_direct_write+0xa00/0xee0 nfs]
[<a000000200a5b870>] nfs_file_write+0x1d0/0x2a0 [nfs]
[<a000000100123d80>] do_sync_write+0x140/0x1a0
[<a000000100124070>] vfs_write+0x290/0x360
[<a000000100124540>] sys_pwrite64+0x100/0x140
[<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20


Expected results:

Expect the server not to deadlock.

Additional info:

Customer Name: Mercado Libre
Oracle TAR: 5700008.993
Oracle Bug: 5520420
Redhat Issue Tracker:  123181

mainline fix info
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=4736ba03c29ab2e7764e1aed9858de823f69d2ad
Comment 1 Keshav Sharma 2007-08-15 16:07:19 EDT
Created attachment 161399 [details]
linux-2.6.9-nfs-dio-aware-compound.patch
Comment 2 Luming Yu 2007-08-16 02:54:37 EDT
I reviewed the patch, it should be included in rhel 4.6 kernel
Comment 4 RHEL Product and Program Management 2007-08-22 16:25:52 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 RHEL Product and Program Management 2007-09-07 15:34:16 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 7 RHEL Product and Program Management 2007-09-08 14:40:33 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 9 RHEL Product and Program Management 2007-09-08 15:00:31 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 10 Luming Yu 2007-09-09 07:04:36 EDT
The patch has been posted, change status to post.., if the re-post is necessary
for the future release, please just let me know.

Comment 12 Jason Baron 2007-12-20 13:39:28 EST
committed in stream U7 build 68.4. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 14 rodrigo 2008-05-30 16:21:16 EDT
Hi Guys, I´m the customer (Mercadolibre), We are not migrating to RH4 cause 
this bug, how can we do to have this patch released? Thanks
Rodrigo.
rodrigo@mercadolibre.com
Comment 15 Ronald Pacheco 2008-05-30 16:37:50 EDT
Hi Rodrigo,

We just incorporated this patch into the RHEL 4.7 beta.  Can you please download
the beta from RHN, test the patch and report your results here?  Thank you!
Comment 16 rodrigo 2008-06-02 11:36:23 EDT
Thanks , When 4.7 will be official? cause we have a migration from RH3 to RH4 
stopped waiting for this patch? Do you think could we have this patch ported to 
4.6? 

Thanks
Rodrigo.
Comment 19 errata-xmlrpc 2008-07-24 15:15:23 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html
Comment 20 Ronald Pacheco 2008-07-24 16:36:37 EDT
Rodrigo,

RHEL 4.7 GA is now on RHN, which includes this patch.

Note You need to log in before you can comment on or make changes to this bug.