Red Hat Bugzilla – Bug 130300
mounting gfs readonly can fail due to error recovering journal
Last modified: 2010-01-11 21:56:51 EST
Description of problem:
I've seen this while running mount_stress on both RHEL3 and RHEL4.
The simpliest case is to:
mount a gfs filesystem on whole cluster
umount on two of the nodes
remount on one with -o locktable=clustername:newname
remount on the other one with -o ro
There must be some kind of race condition because this does not
always cause the problem. I have seen the error without the preceeding
mount -o locktable=clustername:newname attempt so it's not required
but seems to help.
When this bug occurs, I see this standard error:
mount: wrong fs type, bad option, bad superblock on /dev/pool/corey1,
or too many mounted file systems
Along with this on the console:
GFS: fsid=morph-cluster:corey1, jid=0: Trying to acquire journal lock...
GFS: fsid=morph-cluster:corey1, jid=0: Looking at journal...
GFS: fsid=morph-cluster:corey1, jid=0: can't replay: read-only FS
GFS: fsid=morph-cluster:corey1, jid=0: Failed
GFS: error recovering my journal (-30)
Once in this state all other mount -ro attempts to this fs on that
node will result in the same error. However, this can fixed by
mounting the filesystem rw, umounting, and then the ro attempt will work.
1) Are you trying to mount the same filesystem into two different
lockspaces? If so... what are you expecting to happen? Are you
deliberately trying to cause corruption and/or panics?
For example, on node1:
mount -t gfs /dev/pool/pool0 /gfs -o locktable=cluster:foo
and on node 2:
mount -t gfs /dev/pool/pool0 /gfs -o locktable=cluster:bar
2) What do you expect GFS to do when it encounters a journal in need
of repair when told to mount read-only? If it modifies the
filesystem, it really isn't read-only at that point (and arguably a bug).
Perhaps remounting the filesystem readonly after the journal has been
replayed is what you are after? e.g.:
mount -o remount,ro /gfs
In which case, I don't know what would happen if a read-write node
crashes and the read-only node trys to recover. Hopefully it would
fail to replay the journal and allow another a node to retry.
I was trying to mount the same filesystem into two different
lockspaces just to see that the locktable flag worked knowing that I
might corrupt my data. But that's a different "I have a loaded gun
pointed at my foot" issue. :)
I guess this isn't really a bug then as that is the expected
behavior if the journal needs to be replayed. It's just that the
error given was a little scarey. :( But if one looks in the log,
it's clear what happened.