Bug 252400 - RHEL4 U5: ia64 machine hang when DB starts using rac/nfs/hugepages
Summary: RHEL4 U5: ia64 machine hang when DB starts using rac/nfs/hugepages
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.5
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Luming Yu
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-15 20:07 UTC by Keshav Sharma
Modified: 2018-10-19 19:59 UTC (History)
5 users (show)

Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:15:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
linux-2.6.9-nfs-dio-aware-compound.patch (810 bytes, patch)
2007-08-15 20:07 UTC, Keshav Sharma
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0665 0 normal SHIPPED_LIVE Moderate: Updated kernel packages for Red Hat Enterprise Linux 4.7 2008-07-24 16:41:06 UTC

Description Keshav Sharma 2007-08-15 20:07:18 UTC
Description of problem:

A startup of a database using RAC/NFS/Hugepages on ia64
will result in a machine deadlock.


Version-Release number of selected component (if applicable):


How reproducible:

Everytime, but removing any of the 3 (RAC/NFS/Hugepages) and
the problem won't reproduce.

Steps to Reproduce:

Create a RAC DB on nfs storage, and allocate hugepages
for the SGA.
  
Actual results:

The sysrq-p data shows the following process on-CPU:

CPU2:
Call Trace:
[<a000000100016da0>] show_stack+0x80/0xa0
[<a0000001003273b0>] showacpu+0x50/0x80
[<a00000010005b340>] handle_IPI+0x200/0x340
[<a0000001000130f0>] handle_IRQ_event+0x90/0x120
[<a000000100013ae0>] do_IRQ+0x180/0x560
[<a000000100015d10>] ia64_handle_irq+0xf0/0x1e0
[<a00000010000f600>] ia64_leave_kernel+0x0/0x260
[<a000000100008cb0>] ia64_spinlock_contention+0x30/0x60
[<a000000100593de0>] __lock_text_start+0x40/0x60
[<a0000001001294b0>] __set_page_dirty_buffers+0x30/0x300
[<a0000001000e1eb0>] set_page_dirty+0xf0/0x180
[<a0000001000e1fd0>] set_page_dirty_lock+0x90/0xc0
[<a000000200ab5010>] nfs_free_user_pages+0xd0/0x120 [nfs]
[<a000000200ab55a0>] nfs_direct_complete+0x40/0x140 [nfs]
[<a000000200ab5900>] nfs_direct_write_result+0x140/0x160 [nfs]
[<a000000200a77b20>] nfs_writeback_done+0x2e0/0x420 [nfs]
[<a000000200a7fa60>] nfs3_write_done+0xc0/0x100 [nfs]
[<a0000002007d91b0>] __rpc_execute+0x770/0xb40 [sunrpc]
[<a0000002007d9630>] __rpc_schedule+0xb0/0x240 [sunrpc]
[<a0000002007da5b0>] rpciod+0x2d0/0x9a0 [sunrpc]


The sysrq-t data shows a waiting D state oracle process
as follows:

oracle        D a000000100590c60     0 19029      1         19031 19027
Call Trace:
[<a00000010006ab70>] context_switch+0x470/0x9a0
[<a000000100590c60>] schedule+0x700/0x17c0
[<a000000200ab54b0>] nfs_direct_wait+0x3f0/0x4a0 [nfs]
[<a000000200ab70a0>] nfs_file_direct_write+0xa00/0xee0 nfs]
[<a000000200a5b870>] nfs_file_write+0x1d0/0x2a0 [nfs]
[<a000000100123d80>] do_sync_write+0x140/0x1a0
[<a000000100124070>] vfs_write+0x290/0x360
[<a000000100124540>] sys_pwrite64+0x100/0x140
[<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20


Expected results:

Expect the server not to deadlock.

Additional info:

Customer Name: Mercado Libre
Oracle TAR: 5700008.993
Oracle Bug: 5520420
Redhat Issue Tracker:  123181

mainline fix info
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=4736ba03c29ab2e7764e1aed9858de823f69d2ad

Comment 1 Keshav Sharma 2007-08-15 20:07:19 UTC
Created attachment 161399 [details]
linux-2.6.9-nfs-dio-aware-compound.patch

Comment 2 Luming Yu 2007-08-16 06:54:37 UTC
I reviewed the patch, it should be included in rhel 4.6 kernel

Comment 4 RHEL Program Management 2007-08-22 20:25:52 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 RHEL Program Management 2007-09-07 19:34:16 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 7 RHEL Program Management 2007-09-08 18:40:33 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 9 RHEL Program Management 2007-09-08 19:00:31 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 10 Luming Yu 2007-09-09 11:04:36 UTC
The patch has been posted, change status to post.., if the re-post is necessary
for the future release, please just let me know.



Comment 12 Jason Baron 2007-12-20 18:39:28 UTC
committed in stream U7 build 68.4. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 14 rodrigo 2008-05-30 20:21:16 UTC
Hi Guys, I´m the customer (Mercadolibre), We are not migrating to RH4 cause 
this bug, how can we do to have this patch released? Thanks
Rodrigo.
rodrigo

Comment 15 Ronald Pacheco 2008-05-30 20:37:50 UTC
Hi Rodrigo,

We just incorporated this patch into the RHEL 4.7 beta.  Can you please download
the beta from RHN, test the patch and report your results here?  Thank you!

Comment 16 rodrigo 2008-06-02 15:36:23 UTC
Thanks , When 4.7 will be official? cause we have a migration from RH3 to RH4 
stopped waiting for this patch? Do you think could we have this patch ported to 
4.6? 

Thanks
Rodrigo.


Comment 19 errata-xmlrpc 2008-07-24 19:15:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html

Comment 20 Ronald Pacheco 2008-07-24 20:36:37 UTC
Rodrigo,

RHEL 4.7 GA is now on RHN, which includes this patch.


Note You need to log in before you can comment on or make changes to this bug.