Bug 217628

Summary: Memory corruption when reading /proc/kcore
Product: Red Hat Enterprise Linux 2.1 Reporter: Don Howard <dhoward>
Component: kernelAssignee: Don Howard <dhoward>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1   
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
URL: http://marc.theaimsgroup.com/?t=110739734900006&r=1&w=2
Whiteboard:
Fixed In Version: RHSA-2007-0012 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-17 10:51:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Don Howard 2006-11-29 00:30:15 UTC
+++ This bug was initially created as a clone of Bug #147666 +++

Description of problem:
Possible memory corruption when /proc/kcore is read


Version-Release number of selected component (if applicable):
2.4.9-e.57


How reproducible:
dd if=/proc/kcore of=/tmp/kcore bs=4k count=10
(if necessary, repeat a few times)

Steps to Reproduce:
see above
  
Actual results:
Various; usually the machine freezes after some /proc/kcore reads.

Expected results:
No problems, /proc/lcore is correctly read.

Additional info:
The problem is that the size of the kcore header is calculated incorrectly if
there are lots of VMAs. The reason is that the size of the data fields in the
ELF notes is not accounted for oin get_kcore_size() (fs/proc/kcore.c).


RH's Ernie Petrides has posted a patch for this to LKML.
http://marc.theaimsgroup.com/?t=110739734900006&r=1&w=2

It was accepted by Marcelo into 2.4 mainline.
http://linux.bkbits.net:8080/linux-2.4/cset@42024081gb19vludDwvjkxZjV0NvPg?nav=index.html|src/|src/fs|src/fs/proc|related/fs/proc/kcore.c

In 2.6 the problem has been fixed for 1.5 years.
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.69/2.5.69-mm9/broken-out/proc-kcore-rework.patch

BUG 141394 contains references to this problem for RHEL3.

-- Additional comment from Martin.Wilck on 2005-02-10 04:05
EST --
According to Ernie, this was accepted into the RHEL-U3 patch set.
The patch is pretty small and can hardly break stuff, so it'd be nice to see it
in AS2.1 ASAP, too.

Comment 3 Mike Gahagan 2006-12-20 15:25:50 UTC
The system survives the reproducer using dd, however on two occations I have
killed the system with a:

cat /proc/kcore > /dev/null



Comment 4 Mike Gahagan 2006-12-20 15:38:12 UTC
The failure seems to match the description if bz 213567. I can verify that the
changes that went into 213567 are in the e.64 kernel so I suspect that something
else is going on here.


Comment 5 Don Howard 2006-12-20 17:13:35 UTC
The tell-tail for 213567 is that the cat process dies in read_kcore() when
trying to read un-mapped vmalloc()ed memory.

Derry does not use vmalloc() in proc_file_read(), so there must be a different
reason for the crash you see. (It could be some other use of vmalloc())

Can you collect a vmcore?

Comment 6 Don Howard 2006-12-20 17:27:10 UTC
Also, cat /proc/kcore > /dev/null has the possiblity of touching read-volatile
memory.  In that case, a crash or hang *would* be expected.

Comment 7 Don Howard 2006-12-20 19:49:48 UTC
cat of /proc/kcore results in immediate hang, hardware alarm sounds, and machine
reboot on my local zx2000.  This is true of kernels e.58, e.60, and e.64.  This
is not the same issue that I found in 213567, nor the issue addressed in this BZ.  

I strongly suspect that this is due to the senario mentioned above - reading of
random device registers.  

Comment 8 Marcel Holtmann 2006-12-21 16:18:51 UTC
Can you please verify that the initially kernel we shipped would also hang on a
cat of /proc/kcore. If yes, then we need a separate bug report for it and it has
nothing to do with the current errata.


Comment 9 Mike Gahagan 2007-01-02 21:42:53 UTC
Hi,

I just tried a 'cat /proc/kcore > /dev/null' using 2.4.18-e.12 (RHEL 2.1 for
ia64 GA kernel) and was able to hang the system. Unfortunately I have yet to get
a vmcore for any of these as it doesn't look like netdump works on itanium :(

I'll see about getting some serial console output.
 

Comment 10 Don Howard 2007-01-03 23:02:11 UTC
Hi Mike -

You are correct, 2.1 does not have netdump support on ia64.  

I've tested this some more today, and I see hangs under rhel3 on ia64 with this
too. I'm pretty certain that the hang you have encountered is not related to the
issue addressed here.



Comment 13 Red Hat Bugzilla 2007-01-17 10:52:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0012.html