The following has be reported by IBM LTC: kernel oops with huge_page_release Hardware Environment: x86 4-cpu box hyperthreaded (don't think this makes a diff) Software Environment: Red Hat EL AS 3 .. both -9.0.1-ELsmp and -11 (U2 beta kernel from ftp://partners.redhat.com/19443147e6de885277c0208e6fec70fe/2.4.21-11.EL/) Steps to Reproduce: 1. run our db2 bucket ... when we finish creating a db, and are detaching from the shared memory segment, a kernel oops occurs. SHM_HUGETLB has been used in the shmat call. 2. machine kernel oops at releasing the huge pages. Actual Results: EIP at huge_page_release 0x1e (2.4.21-0.9.1.ELsmp) in the call stack: unmap_hugepage_range zap_hugepage_range do_mumap free_msg sys_shmdt sys_ipc sys_gettimeofday eax 36380500 ebx c27b0d84 ecx 00000005 edx c27b0d84 esi 1b200000 edi 1b400000 ebp eb9ca500 esp dbd79ee8 I can add more to this once I get the info. This is stuff I collected yesterday, and is from the -9 kernel. Expected Results: it works as expeced/or an error message I will see if I can come up with a standalone program to re-pro this. The version of DB2 that this happens on is not available for public download yet. Please send instructions on data to gather in order to debug this. Thanks.Glen / Mark U2 beta kernel problem.
Please attach the actual console screen when the OOPS occurs. Thanks, Larry Woodman
Is this an x440 only problem or have you been able to reproduce it on another x68 SMP system? Larry Woodman
The problem is somehow one of the small pages in the compound bigpage is getting placed on the active list rather than being treated as a special subset of the hugepage. When the system V shared memory region that maps the hugepage is unmapped via shmdt the lowlevel vm system recognizes this as corruption and the system BUGs. Still looking for the offending code, I suspect somewhere in the IO layer. Larry
Please try out this kernel ASAP. http://people.redhat.com/~lwoodman/.for_yvonne/ Larry
A fix for this problem has just been committed to the RHEL3 U3 patch pool this evening (in kernel version 2.4.21-15.14.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-433.html