Bug 452759
Summary: | kernel lockup when a kernel page fault occures. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Frank Ch. Eigler <fche> |
Component: | kernel | Assignee: | Prarit Bhargava <prarit> |
Status: | CLOSED NOTABUG | QA Contact: | Martin Jenner <mjenner> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.7 | CC: | fche, luyu, lwoodman, mhiramat, prarit, tyamamot, vgoyal |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | ia64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-06-25 17:19:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 451707 |
Description
Frank Ch. Eigler
2008-06-24 20:54:49 UTC
how about rhel 5 kernel and upstream kernel with this test case? (In reply to comment #1) > how about rhel 5 kernel and upstream kernel with this test case? on 2.6.18-94.el5, nothing happened. however, as I reported on bug #435530, rhel5.2 xen kernel might have same problem. So ... this a kernel-xen issue? (In reply to comment #3) > So ... this a kernel-xen issue? No, this entry is for rhel4.7 kernel issue. kernel-2.6.9-74.EL has this issue. I guess there is no pages mapped near the address of 0xafffffffffffxxxx on rhel4 kernel, and pagefault handler never returns. the address is in region 5, and according to below document(p.26), http://www.ia64-linux.org/doc/IA64linuxkernel.PDF there seems no pages above 0xa00003ffffffffff. (In reply to comment #5) > I guess there is no pages mapped near the address of 0xafffffffffffxxxx on rhel4 > kernel, and pagefault handler never returns. > the address is in region 5, and according to below document(p.26), > http://www.ia64-linux.org/doc/IA64linuxkernel.PDF > there seems no pages above 0xa00003ffffffffff. > I wonder what happens if you access a nonexistant page on x86. I'm pretty sure you would take an MCE ... the question is what should we do on ia64? At a minimum, the process shouldn't hang. P. So I was wondering "How can I determine whether or not it is valid to read from the address supplied by a user in a module?" I read through the kernel and noted the following comment: /* * The "__xxx" versions do not do address space checking, useful when * doing multiple accesses to the same area (the programmer has to do the * checks by hand with "access_ok()") */ #define __put_user(x, ptr) __put_user_nocheck((__typeof__(*(ptr))) (x), (ptr), sizeof(*(ptr))) #define __get_user(x, ptr) __get_user_nocheck((x), (ptr), sizeof(*(ptr))) Frank and Masami, If I'm reading the above correctly, your code is incomplete. The module as currently written is basically doing what amounts to a NULL dereference (it is interesting that the code hangs BTW). I think the following code is better and correctly calls access_ok() before attempting the __get_user (as is specified in the kernel): static int initmod(void) { int val=0; int * addr = (int*)0xafffffffffffffffLL; // kernel nonexist page /* kernel says user must do access_ok if __get_user is called */ if (access_ok(VERIFY_WRITE, addr, KERNEL_DS)) { __get_user(val, addr); } else printk("access not ok\n"); return 0; } The above module code will always fail on the access_ok check. Closing as NOTABUG. P. Prarit, why access_ok(... KERNEL_DS)? This is user-space data we're pretending to access. Yes, but you're still kernel-side. __get_user() calls __get_user_nocheck() which calls __do_get_user(..., KERNEL_DS). ie) KERNEL_DS is always the segment used when __get_user() is called. P. I'm trying to test it for myself, but maybe you have an ia64 machine you can do it upon yourself: - if (access_ok(VERIFY_WRITE, addr, KERNEL_DS)) { + if (access_ok(VERIFY_READ, addr, 4)) { Does that work for you? I'm sure that will work but it doesn't matter what you set the segment value to -- it will always get set to KERNEL_DS. P. Thanks, confirmed: access_ok(...., {1, 0, 4}) all work as advertised. So systemtap must be missing an access_ok() check where it is needed. Yes, that's what I would think. The code is pretty explicit about stating the requirement of access_ok() when using __get_user(). P. Thank you, Prarit. Frank, I tested that, access_ok() can work if the address must be user address. But I think this can't apply to systemtap because it uses similar code of __get_user() for accessing kernel address.(kread) Anyway, that is systemtap's bug. not a kernel bug. There is a kernel issue still in that systemtap would like to have some mechanism to dereference arbitrary kernel addresses, with exception-style page fault catching. Something like the probe_kernel_* routines in recent kernels could do the trick. Prarit, do you happen to know of something already in RHEL4.7 to satisfy that need? Nothing that I know of -- you might want to ping vgoyal as he might have a better idea. P. Prarit, I don't understand the IA64 code but here are my general thougts/queries. - access_ok() just verifies that you are accessing a user space address (at least on x86). So if I am trying to access a kernel address and pass it to access_ok(), then it should say that you should not access this address. I think that's what might be averting the problem here that we are trying to access a non-existent kernel address but access_ok() says no. - But that does not take away the problem that If a module is trying to access a nonexistent kernel address using __get_user(), then either the kernel should crash or __get_user() should return -EFAULT. In this case page fault handler hangs so it does sound like a bug. Hanging is not the solution. Either crash, or let fixup code handle it. - Frank mentioned that on x86_64, __get_user_xx() is allowing to poke at kernel addresses also and returns -EFAULT. May be we can try to emulate the same behavior on ia64. I am not aware if any of the functions allow that on ia64. So I think this sounds like a but and should not be closed as NOTABUG. Page fault handler for sure is misbehaving. |