Bug 118098

Summary: LTC6800-kernel oops with huge_page_release
Product: Red Hat Enterprise Linux 3 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, tao
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-02 04:31:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 113479    

Description IBM Bug Proxy 2004-03-11 22:09:35 UTC
The following has be reported by IBM LTC:  
kernel oops with huge_page_release
Hardware Environment:
x86 4-cpu box hyperthreaded (don't think this makes a diff)

Software Environment:
Red Hat EL AS 3 .. both -9.0.1-ELsmp and -11 (U2 beta kernel from 
ftp://partners.redhat.com/19443147e6de885277c0208e6fec70fe/2.4.21-11.EL/)

Steps to Reproduce:
1. run our db2 bucket ... when we finish creating a db, and are
detaching from
the shared memory segment, a kernel oops occurs.  SHM_HUGETLB has been
used in
the shmat call.
2. machine kernel oops at releasing the huge pages.

Actual Results:
EIP at huge_page_release 0x1e (2.4.21-0.9.1.ELsmp)

in the call stack:
unmap_hugepage_range
zap_hugepage_range
do_mumap
free_msg
sys_shmdt
sys_ipc
sys_gettimeofday

eax 36380500
ebx c27b0d84
ecx 00000005
edx c27b0d84
esi 1b200000
edi 1b400000
ebp eb9ca500
esp dbd79ee8

I can add more to this once I get the info.  This is stuff I collected 
yesterday, and is from the -9 kernel.

Expected Results:
it works as expeced/or an error message

I will see if I can come up with a standalone program to re-pro this.
 The 
version of DB2 that this happens on is not available for public
download yet.

Please send instructions on data to gather in order to debug this. 
Thanks.Glen / Mark  
 
U2 beta kernel problem.

Comment 1 Larry Woodman 2004-03-12 16:13:30 UTC
Please attach the actual console screen when the OOPS occurs.

Thanks, Larry Woodman


Comment 3 Larry Woodman 2004-03-19 03:10:04 UTC
Is this an x440 only problem or have you been able to reproduce it on
another x68 SMP system?

Larry Woodman

Comment 4 Larry Woodman 2004-03-19 03:48:51 UTC
The problem is somehow one of the small pages in the compound bigpage
is getting placed on the active list rather than being treated as a
special subset of the hugepage.  When the system V shared memory
region that maps the hugepage is unmapped via shmdt the lowlevel vm
system recognizes this as corruption and the system BUGs.  Still
looking for the offending code, I suspect somewhere in the IO layer.

Larry


Comment 6 Larry Woodman 2004-06-18 15:45:39 UTC
Please try out this kernel ASAP.

http://people.redhat.com/~lwoodman/.for_yvonne/


Larry


Comment 9 Ernie Petrides 2004-06-20 13:39:47 UTC
A fix for this problem has just been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.14.EL).


Comment 10 John Flanagan 2004-09-02 04:31:09 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html