Bug 155278
Summary: | Debugger killed by kernel when looking at the lowest addressed vmalloc page | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Chris Gottbrath <chrisg> | ||||||||||||
Component: | kernel | Assignee: | Jason Baron <jbaron> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 4.0 | CC: | davej, josh.carlson, knoel | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | ia64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | RHSA-2005-514 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2005-10-05 13:01:22 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 156322 | ||||||||||||||
Attachments: |
|
Description
Chris Gottbrath
2005-04-18 19:01:50 UTC
Created attachment 113337 [details]
1 of 2 trace files
We have two customer generated strace files for this issue. This is 1 of 2.
Created attachment 113338 [details]
strace file 2 of 2
we have two customer generated strace files that document this problem. This is
2 of 2.
What is the 'gm' module ? Does it still happen without that having been loaded ? The gm module is the mpich-gm module. It is a special purpose communication module for the myrinet high speed interconnect. However -- the test case is basically a hello world (with a malloc) -- so I doubt that the GM module is needed to reproduce this. Dave, the gm module is not required to reproduce this. We build a vanilla (newly installed -- not updated at all) RHEL 4 system and were able to reproduce this right away. Thanks, Chris Furthermore -- we appear to be able to reproduce this in GDB with the following command print *(long *)a000000000004000 ok. I have a good idea what the problem is. I'll post a fix as soon as i can. thanks. Jason, Excellent! Thanks for keeping us appraised of your progress. Looking forward to testing the fix. Cheers, Chris Created attachment 113571 [details]
restrict in_gate_area()
This patch resolved this issue for me in limited testing. I've posted it to the
linux-ia64 list for further feedback. I wouldn't have a chance to build a test
kernel for distribution with this patch until next week. But feel free to test
it. thanks.
Jason, your patch seems to indicate that it was generated against a 2.6.9 kernel. We downloaded a vanilla 2.6.9 kernel and applied your patch with only offsets. We built both a vanilla (unpatched 2.6.9) and patched kernel. However I can't reproduce the problem with the vanilla kernel. Were you working from a vanilla kernel? If not were you working from some sort of known baseline that we can replicate (such as a SRPM kernel)? Did you verify the existance of the bug before applying your patch and verify that it was gone in your patched kernel? It might be useful to know that we have seen it in the 2.6.9 EL kernel listed above and also the vanilla 2.6.11 kernel version. Thanks, Chris hi Chris, The patch was indeed against 2.6.9, but specifically the Red Hat RHEL4 kernel, which is 2.6.9 based. I did indeed verfiy that the issue existed before the patch, and was fixed by the patch. I will post a link to rhel4 kernel sources and binaries with this patch later today. thanks, -Jason Jason, Ok thanks! Cheers, Chris hi Chris, I've placed test kernels and an SRPM with the patch at: http://people.redhat.com/~jbaron/2.6.9-6.39.EL.gate.1.jbaron/ Please let me know if this resolves the issue. thanks, -Jason Created attachment 113699 [details]
test program
Here is a simple test program that i used in validating the bug fix.
Jason, Thanks for the rpms -- we will try these out and get back to you ASAP. Cheers, Chris Jason, Thanks for your patience. We've had the updated kernel installed for a while here and as far as I can tell the fix looks good. I'm coordinating with another engineer here and I wanted to wai to hear what he said before getting back to you. I don't see anything wrong with this fix. What are the next steps for getting this scheduled into something that our mutual customers can use? Cheers, Chris Ok. I heard back from the other engineer here at Etnus who is watching this issue. He is happy with the fix but he wanted me to ask one side question: "One thing I noted the, /proc/pid/maps still shows the address range 0xa000000000000000-0xa000000000020000. Can you ask them if this is correct? I would think that it would show 0xa000000000000000-0xa000000000004000." Is the range listed in /proc/pid/maps correct? Cheers, Chris Jason, What are the next steps for getting this into a kernel revision that our mutual customers can use? Hav you looked at all into the address discrepancy in /proc/<pid>/maps? Cheers, Chris Hi Chris, i was out sick last week :( This problem should be addressed in U2. If you need a fix sooner, you can contact Red Hat support. I'll look into what to do about the discrpency. thanks. Created attachment 115224 [details]
map holes ot 0
New patch for this issue.
Also, the address range in /proc/pid/maps is correct. The GATE PAGE is mapped twice and covers 8 pages. Devel ACK Jason, Thanks. Should we test out the attached patch? Will there be any kind of a pre-release of update 2 for us to look at? Cheers, Chris hi Chris, Feel free to test the above patch, or i hope to have this patch integrated into a U2 pre-release shortly. I'll point you at that. either way. thanks, -Jason Jason, What is the status of this fix? We are seeing a report of a similar problem (though the output on the system console is a little different -- it says kernel panic) in an ia64 linux system running RHEL 4 update 1 kernel version 2.6.9-11.EL. Did this get into the RHEL 4 update 2 release stream? Is it available via RHN? Cheers, Chris hi Chris, yes, this in rhel4 u2 beta. This is a available via the rhn beta channel, or you can just grab the kernel from: http://people.redhat.com/~jbaron/rhel4/ The release should be official in a copule weeks. thanks. Thanks Chris, Please let us know how you make out. Thanks. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-514.html |