Bug 65218 - Oops in pte_chain_alloc
Oops in pte_chain_alloc
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kdpms (Show other bugs)
7.2
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-05-20 10:10 EDT by Michael Chapman
Modified: 2015-01-04 17:01 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-11-27 18:03:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Oops in pte_chain_alloc (processed by ksymoops) (4.11 KB, text/plain)
2002-05-20 10:12 EDT, Michael Chapman
no flags Details
dmesg output (from boot immediately after the oops) (10.50 KB, text/plain)
2002-05-20 10:13 EDT, Michael Chapman
no flags Details
lspci output (830 bytes, text/plain)
2002-05-20 10:14 EDT, Michael Chapman
no flags Details
lspci output (830 bytes, text/plain)
2002-05-20 10:14 EDT, Michael Chapman
no flags Details
cat /proc/cpuinfo (372 bytes, text/plain)
2002-05-20 10:15 EDT, Michael Chapman
no flags Details
Fragment memory (571 bytes, text/plain)
2002-05-29 12:06 EDT, Michael Chapman
no flags Details

  None (edit)
Description Michael Chapman 2002-05-20 10:10:39 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Description of problem:
I am using kernel-2.4.18-4 (compiled for i686-redhat-linux, as reported by "rpm
-q kernel --qf '%{PLATFORM}\n'" ).

After a variable uptime (anything from a few minutes to many hours), the kernel
oopses in the pte_chain_alloc function (in the mm/rmap.c file). There does not
seem to be anything specific causing this -- it tends to simply occur without
warning (usually while the computer is actively being used locally, rather than
remotely).

The oopses seem to occur at line 356 in rmap.c. It appears that the
pte_chain_freelist pointer is being set to some bogus value (often, but not
always, a nice round number such as 0x54000000 or 0x14000000).

I'll attach a sample oops (passed through ksymoops), as well as output from
dmesg, lspci, and "cat /proc/cpuinfo".


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Boot with 2.4.18-4 kernel
2. Do stuff -- oops usually occurs within a few hours of operation
3.
	

Actual Results:  Kernel oopses in pte_chain_alloc

Expected Results:  Kernel shouldn't oops :-)

Additional info:
Comment 1 Michael Chapman 2002-05-20 10:12:14 EDT
Created attachment 58009 [details]
Oops in pte_chain_alloc (processed by ksymoops)
Comment 2 Michael Chapman 2002-05-20 10:13:30 EDT
Created attachment 58010 [details]
dmesg output (from boot immediately after the oops)
Comment 3 Michael Chapman 2002-05-20 10:14:09 EDT
Created attachment 58011 [details]
lspci output
Comment 4 Michael Chapman 2002-05-20 10:15:20 EDT
Created attachment 58013 [details]
cat /proc/cpuinfo
Comment 5 Michael Chapman 2002-05-29 12:04:12 EDT
I've tried narrowing down this problem a bit further.

Firstly, I've run memtest86 for over 12 hours and it hasn't reported a single
problem. Secondly, I've run cpuburn for about an hour and a half, and it didn't
report any problems either.

I also wrote a small program, crash.c (I'll attach this), which deliberately
tries to fragment memory as much as possible, hopefully causing the kernel the
oops. It seems to work pretty well -- using this program I am able to oops the
2.4.18-4 kernel even while I'm in runlevel 1 with no other programs running, and
no modules loaded except ext3 and jbd, both with swap turned on and swap turned
off. The oopses are always on the same line in rmap.c .

I have also compiled kernel 2.4.16. I used Red Hat's .config file for my
architecture so that the configuration was as close to the 2.4.18-4
configuration as possible. With this kernel, I can't get crash.c to oops the
kernel. However, in the two days I've been running it X has crashed once. This
could be due to something completely different -- I'll try to narrow it down
further.

I really hope to get the 2.4.18 kernel working soon -- in particular, it's got a
later version of the drm modules which X 4.2.0 needs to give me hardware
acceleration.
Comment 6 Michael Chapman 2002-05-29 12:06:29 EDT
Created attachment 58845 [details]
Fragment memory
Comment 7 Michael Chapman 2002-06-04 04:57:59 EDT
I can confirm the problem is specifically with Rik van Riel's reverse-mapping
patch. I have tested both the standard 2.4.18 kernel, and the same kernel with
the latest applicable rmap patch (rmap12h). With the standard kernel there were
no problems; with the patched kernel I am able to crash it very quickly (on
exactly the same line in rmap.c).

Since this is not really a Red Hat bug, I guess I ought to bring this to the
kernel mailing list.
Comment 8 Michael K. Johnson 2002-06-14 16:49:55 EDT
I ran this program for about two weeks on a 2GB 2x1GHz Xeon Dell server
without seeing this problem, so I think there is some sort of hardware
dependency here.

Note You need to log in before you can comment on or make changes to this bug.