Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 65218

Summary:

Oops in pte_chain_alloc

Product:

[Retired] Red Hat Linux

Reporter:

Michael Chapman <redhat-bugzilla>

Component:

kdpms

Assignee:

Dave Jones <davej>

Status:

CLOSED WONTFIX

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

7.2

CC:

pfrields, sysadmin

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-11-27 23:03:17 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Oops in pte_chain_alloc (processed by ksymoops)	none
dmesg output (from boot immediately after the oops)	none
lspci output	none
lspci output	none
cat /proc/cpuinfo	none
Fragment memory	none

Description Michael Chapman 2002-05-20 14:10:39 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Description of problem:
I am using kernel-2.4.18-4 (compiled for i686-redhat-linux, as reported by "rpm
-q kernel --qf '%{PLATFORM}\n'" ).

After a variable uptime (anything from a few minutes to many hours), the kernel
oopses in the pte_chain_alloc function (in the mm/rmap.c file). There does not
seem to be anything specific causing this -- it tends to simply occur without
warning (usually while the computer is actively being used locally, rather than
remotely).

The oopses seem to occur at line 356 in rmap.c. It appears that the
pte_chain_freelist pointer is being set to some bogus value (often, but not
always, a nice round number such as 0x54000000 or 0x14000000).

I'll attach a sample oops (passed through ksymoops), as well as output from
dmesg, lspci, and "cat /proc/cpuinfo".


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Boot with 2.4.18-4 kernel
2. Do stuff -- oops usually occurs within a few hours of operation
3.
	

Actual Results:  Kernel oopses in pte_chain_alloc

Expected Results:  Kernel shouldn't oops :-)

Additional info:

Comment 1 Michael Chapman 2002-05-20 14:12:14 UTC

Created attachment 58009 [details]
Oops in pte_chain_alloc (processed by ksymoops)

Comment 2 Michael Chapman 2002-05-20 14:13:30 UTC

Created attachment 58010 [details]
dmesg output (from boot immediately after the oops)

Comment 3 Michael Chapman 2002-05-20 14:14:09 UTC

Created attachment 58011 [details]
lspci output

Comment 4 Michael Chapman 2002-05-20 14:15:20 UTC

Created attachment 58013 [details]
cat /proc/cpuinfo

Comment 5 Michael Chapman 2002-05-29 16:04:12 UTC

I've tried narrowing down this problem a bit further.

Firstly, I've run memtest86 for over 12 hours and it hasn't reported a single
problem. Secondly, I've run cpuburn for about an hour and a half, and it didn't
report any problems either.

I also wrote a small program, crash.c (I'll attach this), which deliberately
tries to fragment memory as much as possible, hopefully causing the kernel the
oops. It seems to work pretty well -- using this program I am able to oops the
2.4.18-4 kernel even while I'm in runlevel 1 with no other programs running, and
no modules loaded except ext3 and jbd, both with swap turned on and swap turned
off. The oopses are always on the same line in rmap.c .

I have also compiled kernel 2.4.16. I used Red Hat's .config file for my
architecture so that the configuration was as close to the 2.4.18-4
configuration as possible. With this kernel, I can't get crash.c to oops the
kernel. However, in the two days I've been running it X has crashed once. This
could be due to something completely different -- I'll try to narrow it down
further.

I really hope to get the 2.4.18 kernel working soon -- in particular, it's got a
later version of the drm modules which X 4.2.0 needs to give me hardware
acceleration.

Comment 6 Michael Chapman 2002-05-29 16:06:29 UTC

Created attachment 58845 [details]
Fragment memory

Comment 7 Michael Chapman 2002-06-04 08:57:59 UTC

I can confirm the problem is specifically with Rik van Riel's reverse-mapping
patch. I have tested both the standard 2.4.18 kernel, and the same kernel with
the latest applicable rmap patch (rmap12h). With the standard kernel there were
no problems; with the patched kernel I am able to crash it very quickly (on
exactly the same line in rmap.c).

Since this is not really a Red Hat bug, I guess I ought to bring this to the
kernel mailing list.

Comment 8 Michael K. Johnson 2002-06-14 20:49:55 UTC

I ran this program for about two weeks on a 2GB 2x1GHz Xeon Dell server
without seeing this problem, so I think there is some sort of hardware
dependency here.