From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922 Description of problem: When using kernel 2.4.21-6.ELhugemem or earlier for that matter, simply running "cat /dev/kmem > /dev/null" will panic the machine. This is not the case for enterprise kernels (in AS 2.1) or smp kernels in RHEL3. Other applications which appear to trigger this bug are openafs' kdump and some veritas utilities. Version-Release number of selected component (if applicable): 2.4.21-6.ELhugemem How reproducible: Always Steps to Reproduce: 1. cat /dev/kmem > /dev/null 2. 3. Actual Results: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 0215e9ea *pde = 00003001 *pte = 00000000 hang. Expected Results: cat: /dev/kmem: read error [Bad address] (this is what the RHAS 2.1 enterprise kernels do, for example) Additional info:
reading kmem randomly can have ALL sorts of bad sideeffects and is severly discouraged!
Sure, but simply reading from /dev/kmem shouldn't be able to panic the machine. No other OS I'm aware of has such a problem.
on IA64 it will certainly do so (you are expected to get a bunch of machine checks if you do this on several chipsets) and some x86 chipsets will too. Easiest sounds to just remove /dev/kmem; nothing uses it anyway (unlike /dev/mem).
klogd (at least on inspection of strings) and openafs' kdump are at least a couple things that do use /dev/kmem.
klogd only does so for pre 2.0 kernels (before current modutils). I can't imagine what kdump thinks to find from /dev/kmem really... there's not a lot useful stuff in /dev/kmem at all.
It looks like Veritas (vxvm) may also have some probing of kmem, as we were experiencing similar panics with the hugemem kernel, but none under the smp one.
kernel modules don't use /dev/kmem! the hugemem kernel is different in another aspect though; it REQUIRES that kernel modules follow the proper copy_from_user() API while normal kernels sort of kinda usually work even when the API isn't followed. Crashes will be similar yes....
I know that kernel modules don't use /dev/kmem... machines would panic when the veritas user-level processes needed to create and manage volumes were run.
that sounds more like a missing copy_from_user in veritas code than using /dev/kmem. I can't imagine communicating with kernel space via /dev/kmem; that's what ioctls and such are for which use copy_from_user(). It's actually a reasonable common bug in vendor code to forget to use this API; some other OS's don't need it, and it mostly happens to work in non-hugemem kernels (unless you happen to be under vm pressure).
saias83 /ms/user/h/hagberg 4# cat /dev/kmem > /dev/null cat: /dev/kmem: Bad address saias83 /ms/user/h/hagberg 5# uname -r 2.4.21-4.EL saias83 /ms/user/h/hagberg 6# arch ia64
/dev/mem also suffers a similar fate - "cat /dev/mem > /dev/null" causes a hugemem-running machine to hang hard (no ping, no nothing) w/o any panic or oops logged. I don't see why you wouldn't want to put bounds checking on regions or parts of /dev/kmem that shouldn't be accessed or that will cause machine panic. w/o such a fence around the kernel problem, the offending program(s) can't really be debugged properly.
And crash opens /dev/kmem, so it isn't quite true to say that nothing uses it. In fact there's an interesting note in there: /* * On 32-bit architectures w/memory above ~936MB, * that memory can only be accessed via vmalloc'd * addresses. However, /dev/mem returns 0 bytes, * and non-reserved memory pages can't be mmap'd, so * the only alternative is to read it from /dev/kmem. */
Any update on this?
I'll investigate this for RHEL 3 U3.
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.3.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html