| Summary: | Kernel panics during Veritas SF testing. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Daniel Yeisley <dyeisley> |
| Component: | kernel | Assignee: | David Howells <dhowells> |
| Status: | CLOSED ERRATA | QA Contact: | Daniel Yeisley <dyeisley> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.7 | CC: | arozansk, benl, eguan, jburke, jcm, jstancek, moshiro, myamazak, pbunyan, qcai, rwheeler, syeghiay, tmuneda, vincent |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.18-273.el5 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-07-21 10:05:55 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 675781 | ||
| Bug Blocks: | |||
|
Description
Daniel Yeisley
2011-06-27 20:49:28 UTC
I did some more manual testing and found the following. [root@veritas3 ~]# dmesg | grep -i vx vxdmp: module license 'Proprietary. Send bug reports to support' taints kernel. VxVM vxdmp V-5-0-141 dmplinux:vxdmp: Cannot find device number for rootvxio: no version for "vxvm_imc_cleanup" found: kernel tainted. VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load [root@veritas3 ~]# lsmod | grep -i vx vxportal 41488 0 vxfs 1625752 2 fdd,vxportal vxio 1641832 0 vxdmp 249784 1 vxio [root@veritas3 ~]# uname -a Linux veritas3.rhts.eng.bos.redhat.com 2.6.18-269.el5 #1 SMP Tue Jun 21 16:22:46 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux There are complaints in dmesg about vxio not being loaded, but it shows up in lsmod. Maybe this is something for Veritas people to look into? (In reply to comment #3) > Maybe this is something for Veritas people to look into? I agree. I've contacted Veritas and am waiting for a response. I just manually ran the test and verified that it works on kernel 2.6.18-268.el5. I grabbed the same information as above and don't see the vxio not loaded messages. [root@veritas3 ~]# dmesg | grep -i vx vxdmp: module license 'Proprietary. Send bug reports to support' taints kernel. VxVM vxdmp V-5-0-141 dmplinux:vxdmp: Cannot find device number for rootvxio: no version for "vxvm_imc_cleanup" found: kernel tainted. [root@veritas3 ~]# lsmod | grep -i vx vxportal 41488 0 vxfs 1625752 2 fdd,vxportal vxspec 42096 0 vxio 1641832 1 vxdmp 249784 5 vxspec,vxio [root@veritas3 ~]# uname -a Linux veritas3.rhts.eng.bos.redhat.com 2.6.18-268.el5 #1 SMP Tue Jun 14 18:24:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux When it panics we see this just before the panic: vxdmp: module license 'Proprietary. Send bug reports to support' taints kernel. VxVM vxdmp V-5-0-141 dmplinux:vxdmp: Cannot find device number for rootvxio: no version for "vxvm_imc_cleanup" found: kernel tainted. VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load vxfs: disagrees about version of symbol struct_module VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load VxVM vxio V-5-0-472 vxspec: vxio not loaded. Aborting vxspec load LLT INFO V-14-1-10009 LLT Protocol available LLT INFO V-14-1-10483 16-bit cluster ID (999) set. Updating protocol version from 3.7 to 4.0 GAB INFO V-15-1-20021 GAB available When it passes we see this just before it would have panic'd vxdmp: module license 'Proprietary. Send bug reports to support' taints kernel. VxVM vxdmp V-5-0-141 dmplinux:vxdmp: Cannot find device number for rootvxio: no version for "vxvm_imc_cleanup" found: kernel tainted. vxfs: disagrees about version of symbol struct_module VxVM vxdmp V-5-0-34 added disk array OTHER_DISKS, datype = OTHER_DISKS VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk LLT INFO V-14-1-10009 LLT Protocol available LLT INFO V-14-1-10483 16-bit cluster ID (999) set. Updating protocol version from 3.7 to 4.0 GAB INFO V-15-1-20021 GAB available GAB INFO V-15-1-20026 Port a registration waiting for seed port membership I spoke with David Howells about this issue. If his patch is part of the problem, we still are not sure as of yet. It was thought that it should crash in something proc-related. The GAB module jumped to a NULL pointer and since it is a proprietary module, and we have no idea what it does we can't be sure at this point. A theory is that it's possible that his patch might show up as a corrupter - if a module is allocating its own PDE objects. He was careful to bury the wrapper inside fs/proc/ where other code can't get at it and really would expect bugs to crop up in fs/proc/ when it tries to access the wrapper and it's not there anything else shouldn't be aware the wrapper exists. Currently Dan Y is going to send a nm -u from the GAB module, we can at least see if it accesses the proc routines. David H is building a 2.6.18-269.el5 minus his patch. Once that is finished we will need Dan to rerun his tests but he will have to install the test kernel first. Regards, Jeff There is no vxio module installed or it has been removed or an upgrade of the module failed. I suggest looking there for clues. Also, the other modules are obviously implementing a compatible syscall wrapper that can't handle a missing piece, leading to NULL deference and panic on oops. I doubt this is a Red Hat issue. Jon, While I don't disagree with your assessment. The issue still remains that it worked with the 2.6.18-268.el5 kernel and it is failing with the 2.6.18-269.el5. It seems to reason that something has changed on our end that possibly instigating this behaviour since the test has not changed. Patch(es) available in kernel-2.6.18-273.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. ... Note: this kernel contains patches that are under embargo until 2011.07.07, so it will not actually be available until the 7th or 8th. moving to verified; issue is fixed in RHEL5.7-Server-20110707.3. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html |