Bug 248954

Summary: Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64
Product: Red Hat Enterprise Linux 4 Reporter: John Sobecki <john.sobecki>
Component: kernelAssignee: Luming Yu <luyu>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: low    
Version: 4.5CC: cward, esandeen, jbaron, jwest, luyu, peterm, rpacheco
Target Milestone: ---Keywords: OtherQA, Tracking
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:14:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tested patch that resolves the hugepages spin none

Description John Sobecki 2007-07-19 20:25:06 UTC
Description of problem:

Using the stack ia64/oracle ASMLib/hugepages/Database 10.2, the ASM instance
will hang on startup and the ASM DBWR process will go into a 100% CPU
spin.  Sysrq-P samples show the following stack:

 CPU1:
 Call Trace:
  show_stack+0x80/0xa0
  showacpu+0x50/0x80
  handle_IPI+0x1f0/0x340                          
  handle_IRQ_event+0x90/0x120
  do_IRQ+0x180/0x560
  ia64_handle_irq+0xf0/0x1e0  
  ia64_leave_kernel+0x0/0x260
  ia64_spinlock_contention+0x20/0x60
  __lock_text_start+0x40/0x60
  __set_page_dirty_buffers+0x30/0x300
  set_page_dirty+0xf0/0x180
  set_page_dirty_lock+0x90/0xc0
  bio_unmap_user+0x70/0xe0
  asm_cleanup_bios+0x130/0x300 [oracleasm]
  asmfs_file_read+0x30/0x220 [oracleasm]
  vfs_read+0x290/0x360
  sys_read+0x70/0xe0
  ia64_ret_from_syscall+0x0/0x20


Version-Release number of selected component (if applicable):
2.6.9-55.0.2.EL on ia64 only.

How reproducible:

100% in house and 100% at customer.

Steps to Reproduce:
1. Install oracleasm RPMs
2. Install RDBMS 10.2
3. Allocate hugepages for ASM and database SGAs
4. Attempt to startup ASM instance using ASMlib discovery string of
ORCL:*
5. ASM instance hangs on startup, DBWR process spinning using 100% CPU
  
Actual results:
Same as above.

Expected results:
No spin. 

Additional info:

Removed hugepages or set max locked mem ulimit to zero so no hugepages 
can be allocated for the ASM instance is an effective workaround.

Comment 1 John Sobecki 2007-07-19 20:25:07 UTC
Created attachment 159608 [details]
tested patch that resolves the hugepages spin

Comment 2 Luming Yu 2007-07-25 01:35:06 UTC
Is it reproducible with upstream ?
Is the patch in comment#1 upstream?

Comment 3 John Sobecki 2007-07-25 20:16:03 UTC
Hi,

1) Haven't tested on EL5 yet

2) Patch per discussion with Jens Axboe, this is not how mainline patched 
(that is more generic to set_page_dirty_lock) but this seemed
less intrusive for the 2.6.9 kernel.  

Red Hat customer on this issue is: IOWA COURT INFORMATION SYSTEMS

Thanks, John

Comment 4 Luming Yu 2007-08-06 02:24:05 UTC
John,
Since the mainline use diffient method to resolve same problem, could you also
point out the link to that upstream patch?
Thanks,
Luming



Comment 5 Luming Yu 2007-08-07 07:11:38 UTC
rechecked  2.6.9 bio.c, it is clear that there are several places checking
PageCompound flag.  According to comments for bio_set_pages_dirty, Since VM
doesn't handle the dirtiness of compound pages, the patch should be right.
I will post it.

Thanks,
Luming

 

Comment 6 RHEL Program Management 2007-08-10 13:14:49 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 RHEL Program Management 2007-09-07 19:34:28 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 9 RHEL Program Management 2007-09-08 19:00:37 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 10 Luming Yu 2007-09-09 11:01:32 UTC
The patch has been posted, change status to post.., if the re-post is necessary
for the future release, please just let me know.

Comment 12 Jason Baron 2007-12-20 18:36:52 UTC
committed in stream U7 build 68.4. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 13 Ronald Pacheco 2007-12-20 19:14:41 UTC
John,

Can you please test and post your results here?  Thanks!

Comment 17 Ronald Pacheco 2008-06-03 13:09:45 UTC
John/Keshav,

Can you please report your test results here?

Comment 19 Chris Ward 2008-06-10 11:57:33 UTC
Oracle, this **high** severity bug is now a possible candidate for exclusion in
RHEL4.7. If you wish for this bug to be fixed in 4.7, please report your test
results here as soon as possible. Thank you.

Comment 20 Ronald Pacheco 2008-06-10 13:04:25 UTC
Keshav,

Can you please have your test results posted to this BZ?  

Thanks and Regards,

Ron Pacheco

Comment 21 John Sobecki 2008-06-10 16:59:04 UTC
Status:

An Oracle Enterprise Linux kernel with this patch successfully passed
OLT regression.  

And the customer was given a copy of this kernel, and has not reported any
problems for months.  Thanks, John

Comment 23 Chris Ward 2008-06-10 17:18:35 UTC
Oracle, thanks for the test results. I would highly recommend testing the latest
RHEL Snapshot available on partners.redhat.com to confirm that the kernel
currently being shipped addresses your issues. It is of minor concern, however,
sometimes patches that were included in custom RPM builds (comment #12) get left
out or modified during the course of continued development. Please report any
problems you might encounter while testing RHEL4U7 Snapshot releases. Thanks!

Comment 24 RHEL Program Management 2008-06-18 15:42:55 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 28 errata-xmlrpc 2008-07-24 19:14:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html

Comment 29 Chris Ward 2008-07-29 07:25:37 UTC
Partners, I would like to thank you all for your participation in assuring the
quality of this RHEL 4.7 Update Release. My hat's off to you all. Thanks.