Red Hat Bugzilla – Bug 171021
Entire OS becomes unuseable due to all mounted drives becoming read-only
Last modified: 2007-11-30 17:07:21 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Description of problem:
Me and many other people can "crash" the kernel by running the vmware application, as per this internet posting:-
In my case, I power up a virtual PC with 2gigs RAM (my host has 6gigs), and I try to install Oracle 10g in the host PC (also ES4)
Version-Release number of selected component (if applicable):
Linux localhost 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux
Steps to Reproduce:
1. power up a virtual PC (ES4) with 2gigs RAM
2. install Oracle 10g
Actual Results: kernel: journal_get_undo_access: No memory for committed data
(sda becomes read-only, OS becomes useless)
Expected Results: No error
this error could probably be used as a nasty DoS attack
FYI - upon reboot, my entire 300gig drive was totally hosed - fsck went nuts
for 10 mins, forced a reboot, then I lost the lot.
IMHO1 - besides the fact I *should* have had 4gigs RAM spare, nothing should
be allowed to exhaust kernel memory so much that it make the kernel useless
and ultimately destroys all data on all mounted disks.
IMHO2 - after something does screw up and memory gets exhaused, the kernel
should *automatically* monitor the state of the system and resume normal
operation when memory becomes free again, including re-mounting as RW whatever
it turned into RO in order to prevent catastrophic disk destruction after
rebooting. (I did kill vmware which should have freed things up before I
Chris, I really dont know how to reproduce this internally. Can I ask you to
reproduce this problem and get me a "vmstat 1" outputs as well as several
AltSysrq M, W and P outputs followed by one AltSysrq-T output.
Thanks, Larry Woodman
OK Larry - I'll have a go - it'll take me some time to reinstall the OS etc
I understand "vmstat 1" - but what's all that "AltSysrq" stuff? I presume
it's something relating to hitting "Alt" and the "SysRq" button, probably on
the console, and probably only in a GUI (X) - is this correct? (It did
nothing in my vnc session, but I've got a DL360 so I can bring up a console on
the iLo card without going in to the datacenter if that's the only way -
assuming I can send an AltSysRq through to the iLo from my browser...)
Do I have to do anything to enable the AltSysrq stuff?
1.) as root "echo 1 > /proc/sys/kernel/sysrq"
2.) at the console keyboard hold down the Alt and SysRq keys and press M W P and T
3.) the results are written to /var/log/messages
ernel: journal_get_undo_access: No memory for committed data
indicates that the kernel is under serious memory pressure. If the internal
journaling state machine can't make progress as a result then taking the journal
offline and going readonly is the only action ext3 can take, but it's a
defensive measure and not something that should cause any corruption. Indeed,
I've got plenty of reports of kernel memory starvation causing ext3 to complain
like this without any corruption.
So there may well be something else going on --- some other component of the
kernel which is not reacting as gracefully to the memory starvation. (And it's
low memory starvation that's happening in this case, so there's less than 1G of
that to go around no matter how much physical ram you have, unless you run the
Full kernel logs (not just the single line of ext3 error) may help to point to
the problem; serial or network console can be invaluable in trapping that.
Thanks Stephen for that explanation (and Larry for those SysRq instructions).
Unfortunately - I've tried 3 times now and not been able to reproduce this
problem; perhaps the actual /proc/sys/kernel/sysrq setting has an effect, or
perhaps me running vmstat and periodically doing the AltSysRq stuff changed
Double-unfortunately - after the install worked, I stopped logging stuff and
created a new database inside my virtual machine, which ultimately locked up
the host kernel completely. The problem seems related not so much to memory
usage, as to extreme disk usage (at least - that's my guess - during the
oracle install, it's just some Java apps copying files around).
I don't have time left to experiment (sorry - gotta get this machine live
ASAP) so please accept my apologies for not managing to get more info for you.
No longer repeoducable.