From Bugzilla Helper: User-Agent: Mozilla/4.75 [en] (X11; U; HP-UX B.11.11 9000/785) Description of problem: System locked up when using 'rdist' to copy many files onto an ext3 file system. Console login did not respond, as well as ssh login. We had to power the system off to restart it. Additional messages seen were: ENOMEM in new_handle, retrying. ENOMEM in journal_get_undo_access_Rsmp_df5dec49, retrying. The system is running kernel 2.4.18-18.7.xbigmem. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Use the rdist utility to send a large number of files to the machine with the new kernel. 2. 3. Actual Results: Machine locks up Expected Results: Normal operation Additional info: I ran a 'cat /proc/meminfo' shortly before the lockup and these were the results: total: used: free: shared: buffers: cached: Mem: 8132640768 8126660608 5980160 0 892928 7294271488 Swap: 2097434624 737280 2096697344 MemTotal: 7942032 kB MemFree: 5840 kB MemShared: 0 kB Buffers: 872 kB Cached: 7123240 kB SwapCached: 72 kB Active: 3710784 kB Inact_dirty: 3157560 kB Inact_clean: 267020 kB Inact_target: 1427072 kB HighTotal: 7143360 kB HighFree: 1244 kB LowTotal: 798672 kB LowFree: 4596 kB SwapTotal: 2048276 kB SwapFree: 2047556 kB Committed_AS: 22452 kB
The "ENOMEM ..., retrying" messages are an indication that ext3 is experiencing temporary memory allocation pressure, but they do happen under very high load and are not a fault in themselves. ext3 should continue quite happily under those conditions (and indeed it does so under testing.) So we need far more information to work out where the real lockup is --- the presence of these messages does not in any way tell us that the lockup is due to ext3.
Is this the same as bug# 79257? The message "ENOMEM in do_get_write_access retrying" occurs in the 79257 case if the copy is left long enough.
It looks very similar to 79257. It is likely to be caused by the same problem.
Is this still occuring with 2.4.20 based errata ?