Bug 79573

Summary: system lockup with message "ENOMEM in do_get_write_access retrying"
Product: [Retired] Red Hat Linux Reporter: Need Real Name <dwtrusty>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: alan, ngaywood
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-01-05 20:22:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2002-12-13 16:31:18 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; HP-UX B.11.11 9000/785)

Description of problem:
System locked up when using 'rdist' to copy many files onto an ext3 file system.

Console login did not respond, as well as ssh login.  We had to power the system
off to restart it.

Additional messages seen were:

ENOMEM in new_handle, retrying.
ENOMEM in journal_get_undo_access_Rsmp_df5dec49, retrying.

The system is running kernel 2.4.18-18.7.xbigmem.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Use the rdist utility to send a large number of files to the machine with the
new kernel.
2.
3.
	

Actual Results:  Machine locks up

Expected Results:  Normal operation

Additional info:

I ran a 'cat /proc/meminfo' shortly before the lockup and these were the
results:

        total:    used:    free:  shared: buffers:  cached:
Mem:  8132640768 8126660608  5980160        0   892928 7294271488
Swap: 2097434624   737280 2096697344
MemTotal:      7942032 kB
MemFree:          5840 kB
MemShared:           0 kB
Buffers:           872 kB
Cached:        7123240 kB
SwapCached:         72 kB
Active:        3710784 kB
Inact_dirty:   3157560 kB
Inact_clean:    267020 kB
Inact_target:  1427072 kB
HighTotal:     7143360 kB
HighFree:         1244 kB
LowTotal:       798672 kB
LowFree:          4596 kB
SwapTotal:     2048276 kB
SwapFree:      2047556 kB
Committed_AS:    22452 kB

Comment 1 Stephen Tweedie 2002-12-13 17:06:36 UTC
The "ENOMEM ..., retrying" messages are an indication that ext3 is experiencing
temporary memory allocation pressure, but they do happen under very high load
and are not a fault in themselves.  ext3 should continue quite happily under
those conditions (and indeed it does so under testing.)

So we need far more information to work out where the real lockup is --- the
presence of these messages does not in any way tell us that the lockup is due to
ext3.

Comment 2 Norman Gaywood 2002-12-14 02:05:49 UTC
Is this the same as bug# 79257?

The message "ENOMEM in do_get_write_access retrying" occurs in the 79257 case if
the copy is left long enough.


Comment 3 Need Real Name 2002-12-16 18:42:29 UTC
It looks very similar to 79257.  It is likely to be caused by the same
problem.



Comment 4 Alan Cox 2003-06-08 15:30:48 UTC
Is this still occuring with 2.4.20 based errata ?