67540 – Memory management (paging) panic on big-memory SMP system using pthreads

Bug 67540 - Memory management (paging) panic on big-memory SMP system using pthreads

Summary: Memory management (paging) panic on big-memory SMP system using pthreads

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-06-26 23:39 UTC by Need Real Name
Modified:	2008-08-01 16:22 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:42 UTC
Embargoed:

Attachments	(Terms of Use)
Top output at the time of the crash. This may be of some limited help. (3.81 KB, text/plain) 2002-06-26 23:42 UTC, Need Real Name	no flags	Details
View All

Description Need Real Name 2002-06-26 23:39:15 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.9-31 i686)

Description of problem:
This problem occurs when running a proprietary application. The application is
multithreaded and employs the Berkeley DB (www.sleepycat.com). The problem
happens when the application allocates 1.3 gigabytes of memory (128 megabytes of
which is Berkeley DB cache), then proceeds to do heavy database inserts with
some database reads. Without fail, at a particular point in the test, the kernel
panics. At the time , there are heavy memory copies and a lot of disk I/O being
performed.

The program is multithreaded, but only one thread is writing to the database.
The database is approximately 600 megabytes in size.
Records in the database tend to be very large, on the order of 100K to 1
megabyte or more, but some are as small as 16 bytes.

System configuration is:

2 Pentium III 1GHz CPUs
4 Gigabytes of RAM
2 160 Gigabyte SCSI disks mirrored together (Linux software RAID)
Red Hat 7.1 or 7.3

The application alternately does massive memcopies for a long while, then writes
out changed memory buffers to the Berkeley DB database.

The panic screen looks like this:

CPU: 1
EIP: 0010:[<C0139B96>]  Not tainted
EFLAGS: 00010282

EIP is at rmqueue [kernel] 0x246 (2.4.18-4smp)
Call Trace: __alloc_pages 0x72
do_no_page 0xa1
handle_mm_fault 0xd4
page_table 0xc1
do_page_fault 0x12d
sys_sysctl 0x96
do_page_fault 0x0
error_code 0x34

Code: 0f 0b 5f 5d 8d b6 00 00 00 00 8b 43 18 99 80 00 00 00 74 13

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Write a pthread-enabled C program using Berkeley DB. It should allocate 1GB
    of memory for reading/writing database entries and 128 MB for the Berkeley
DB
    cache.
2. Get a dual CPU pentium III box with 4 GB RAM and run the test:
3. Do massive memcopy operations between buffers for 10 minutes or so.
4. Stop the copies and write all changed pages to the database.
	

Actual Results:  System halted with panic message previously described.

Expected Results:  System should not have crashed?

Additional info:

We are anxious to solve this problem ASAP. It makes Red Hat Linux unusable. It
may be hard to reproduce, so we will gladly help to do that should you need our
assistance. We can make it happen every time.

Comment 1 Need Real Name 2002-06-26 23:42:36 UTC

Created attachment 62785 [details]
Top output at the time of the crash. This may be of some limited help.

Comment 2 Bugzilla owner 2004-09-30 15:39:42 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.