Bug 157969

Summary: filesystem turns readonly for no obvious reason
Product: Red Hat Enterprise Linux 4 Reporter: Bo Peng <bpeng>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, sct
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-05 14:28:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Supress ext3 console spam after journal abort none

Description Bo Peng 2005-05-17 15:04:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Red Hat/1.0.3-1.4.1 Firefox/1.0.3

Description of problem:
This problem has bothered me from kernel 2.4 to 2.6 and I never did figure out the reason. A filesystem would suddenly go 'read-only' without an obvious reason!

This first happened when I was using a mobile FAT32 harddrive partition. It copied files well but after a while, when I try to write to it again, it will say 'read-only file system', can not.... I thought linux does not support usb hd and fat32 that well so I just ignored the problem.

Now, I have upgraded to Redhat EL4 with a 2.6 kernel. I am installing a VM machine on a parition. Suddenly, I get errors saying "read-only file system"... As a result, all my vm machines crash since I could not shut them down properly.

What might be causing the problem? Will some kind of I/O error trigger this kind of protective behavior?

Currently, the file system (/vmware) is like:

 % df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg-vmware
                      50412228   9986548  37864864  21% /vmware

NOTE: /vmware is NOT full.

% ls -l
total 4502756
-rw-r--r--  1 bpeng bpeng      52675 May 17 23:00 vmware-0.log
-rw-r--r--  1 bpeng bpeng      29004 May 17 17:20 vmware-1.log
-rw-r--r--  1 bpeng bpeng      33309 May 17 17:03 vmware-2.log

NOTE: user bpeng has write permission

%  cp vmware.log vmware.log.tmp
cp: cannot create regular file `vmware.log.tmp': Read-only file system

NOTE: can not write to disk

% ls -l ..
total 40
drwx------  2 root  root  16384 May 13 20:34 lost+found/
drwxr-xr-x  2 bpeng bpeng  4096 May 17 23:01 winXPPro/

NOTE: permission of parent directory is also fine.

 % cat /etc/fstab
/dev/vg/vmware          /vmware                 ext3    defaults        1 2
(other filesystems are ignored)

NOTE: /vmware is mounted as rw

% mount
/dev/mapper/vg-vmware on /vmware type ext3 (rw)

NOTE: mount showing /vmware is still rw.

I can report other things (like a log file) if needed.

This may be caused by vmware as well but the fact that this problem happened before and no application should be able to make the partition readonly let me doubt it is a kernel/filesystem related problem.

Thanks.
Bo 

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.5.EL

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
Can not reproduce the problem. 

1.re-mount /vmware 
2.install Mandrake as a vmware Virtual machine

Everything is fine. The only difference is that several other VM was running when the error happened. 

  

Additional info:

Comment 1 Dave Jones 2005-05-18 01:51:07 UTC
can you paste the output of dmesg (from the host machine, and the guest)
at the time things go read-only please ?


Comment 2 Bo Peng 2005-05-18 02:33:18 UTC
I was installing a mandrake VM when the 'read-only filesystem' happens. The
installation was unsuccessful so no dmeg could be obtained from the guest
machine.  To restart my virtual machines, I umount /vmware and mount it again.
Here is the dmeg output I have right now (a day later). I guess only the first
few lines are useful since they indicate some journal problem.

I will try to get dmeg output immediately if this happens again.

t_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
/dev/vmmon[18992]: host clock rate change request 2002 -> 0
vmmon: Had to deallocate locked 64761 pages from vm driver cd7ac000
vmmon: Had to deallocate AWE 3305 pages from vm driver cd7ac000
vmmon: Had to deallocate AWE 16 pages from vm driver cce2c000
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning (device dm-3): ext3_clear_journal_err: Filesystem error recorded
from previous mount: IO failure
EXT3-fs warning (device dm-3): ext3_clear_journal_err: Marking fs in need of
filesystem check.
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on dm-3, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-3, type ext3), uses xattr
/dev/vmnet: open called by PID 20253 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmmon[20259]: host clock rate change request 0 -> 19
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
/dev/vmmon[20259]: host clock rate change request 19 -> 83
/dev/vmnet: open called by PID 20259 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmmon[20259]: host clock rate change request 83 -> 1043
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev hda2, type ext3), uses xattr
/dev/vmnet: open called by PID 20259 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmmon[20259]: host clock rate change request 1043 -> 19
/dev/vmmon[20253]: host clock rate change request 19 -> 0
vmmon: Had to deallocate locked 26982 pages from vm driver d4252000
vmmon: Had to deallocate AWE 4005 pages from vm driver d4252000
/dev/vmnet: open called by PID 27431 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmmon[27439]: host clock rate change request 0 -> 19
/dev/vmmon[27431]: host clock rate change request 19 -> 0
vmmon: Had to deallocate locked 840 pages from vm driver da5f6000
vmmon: Had to deallocate AWE 1784 pages from vm driver da5f6000
/dev/vmnet: open called by PID 27502 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmmon[27510]: host clock rate change request 0 -> 19
/dev/vmmon[27502]: host clock rate change request 19 -> 0
vmmon: Had to deallocate locked 1261 pages from vm driver e9693000
vmmon: Had to deallocate AWE 1785 pages from vm driver e9693000
/dev/vmnet: open called by PID 27559 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmmon[27567]: host clock rate change request 0 -> 19
/dev/vmmon[27567]: host clock rate change request 19 -> 1043
/dev/vmmon[27567]: host clock rate change request 1043 -> 19
/dev/vmmon[27567]: host clock rate change request 19 -> 0
/dev/vmmon[27567]: host clock rate change request 0 -> 1001
/dev/vmmon[27567]: host clock rate change request 1001 -> 2001
/dev/vmmon[27567]: host clock rate change request 2001 -> 1001
/dev/vmmon[27567]: host clock rate change request 1001 -> 0
/dev/vmmon[27567]: host clock rate change request 0 -> 19
/dev/vmmon[27567]: host clock rate change request 19 -> 0
/dev/vmmon[27567]: host clock rate change request 0 -> 1001
/dev/vmmon[27567]: host clock rate change request 1001 -> 2001
/dev/vmnet: open called by PID 27567 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmnet: open called by PID 27567 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmnet: open called by PID 27567 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmnet: open called by PID 27567 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
/dev/vmnet: open called by PID 27774 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
/dev/vmnet: open called by PID 27780 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev hda2, type ext3), uses xattr
/dev/vmnet: open called by PID 27780 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
vmmon: Had to deallocate locked 36840 pages from vm driver c86e7000
vmmon: Had to deallocate AWE 3387 pages from vm driver c86e7000
/dev/vmnet: open called by PID 2707 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened
vmnet: recvClusterPtr was already deactivated
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
/dev/vmnet: open called by PID 2861 (vmware-vmx)
/dev/vmnet: port on hub 8 successfully opened


Comment 3 Dave Jones 2005-05-18 02:48:33 UTC
if you pass -s 128000 to dmesg, it'll give you a bigger scrollback. My guess is
there was some I/O error further above.


Comment 4 Bo Peng 2005-05-18 03:18:37 UTC
The longest dmesg I can get is:

start_transaction: Journal has aborted

1471 lines of 
EXT3-fs error (device dm-3) in start_transaction: Journal has aborted
vmmon: Had to deallocate locked 12051 pages from vm driver da4cf000
vmmon: Had to deallocate AWE 2741 pages from vm driver da4cf000
another several hundreds of EXT3-fs error .... plus what I gave previously.

/dev/vmmon[18992]: host clock rate change request 2002 -> 0
....


I guess it is not easy to trace exactly what had happened at this stage. Could
you tell me what *might* be causing the problem? People from comp.os.linux.misc
suggest that maybe a bad transmitted bit has triggered this. If this is the
case, how can I control the damage when this happens? I lost some work due to
hard crash of all my virtual machines.

Bo



Comment 5 Stephen Tweedie 2005-05-18 09:52:00 UTC
The initial problem could either be filesystem inconsistency being detected
on-disk, or an IO failure propagating upwards from the block device underneath.
 The initial log message telling you that the journal is going offline will tell
you why it is occurring.

However, the spamming of the console with thousands of "Journal has aborted"
errors is not helping to preserve this information --- there's really not much
point in continuing to emit that line after we've aborted, so I'll see about
fixing that.


Comment 6 Stephen Tweedie 2005-05-18 12:04:48 UTC
You really need to capture the initial error to see why the journal went
offline.  In the presence of the above "journal has aborted" ext3 console spam,
it's going to be hard to capture that without some external logging; I always
use serial console for this, but netconsole works too.

Comment 7 Stephen Tweedie 2005-05-18 12:06:53 UTC
Created attachment 114502 [details]
Supress ext3 console spam after journal abort

This patch should prevent the ext3 code from continuing to spam the console
indefinitely if user applications continue trying to write to the filesystem
after the journal has gone offline.

Comment 8 Stephen Tweedie 2005-05-18 12:11:41 UTC
Please try to capture the "first cause" console messages when the filesystem
initially goes offline, either with the preceding patch or via serial/network
console, so that we can investigate this further.

Thanks!


Comment 9 Stephen Tweedie 2005-09-05 14:28:14 UTC
There's still no evidence here to allow us to distinguish between a hardware or
software problem as the root cause; closing as we can't proceed with this
problem until there's a proper kernel log.  Please reopen if this recurs and you
are able to capture the necessary kernel messages.