Description of problem: I've seen this while running mount_stress on both RHEL3 and RHEL4. The simpliest case is to: mount a gfs filesystem on whole cluster umount on two of the nodes remount on one with -o locktable=clustername:newname remount on the other one with -o ro There must be some kind of race condition because this does not always cause the problem. I have seen the error without the preceeding mount -o locktable=clustername:newname attempt so it's not required but seems to help. When this bug occurs, I see this standard error: mount: wrong fs type, bad option, bad superblock on /dev/pool/corey1, or too many mounted file systems Along with this on the console: GFS: fsid=morph-cluster:corey1, jid=0: Trying to acquire journal lock... GFS: fsid=morph-cluster:corey1, jid=0: Looking at journal... GFS: fsid=morph-cluster:corey1, jid=0: can't replay: read-only FS GFS: fsid=morph-cluster:corey1, jid=0: Failed GFS: error recovering my journal (-30) Once in this state all other mount -ro attempts to this fs on that node will result in the same error. However, this can fixed by mounting the filesystem rw, umounting, and then the ro attempt will work. How reproducible: Sometimes
1) Are you trying to mount the same filesystem into two different lockspaces? If so... what are you expecting to happen? Are you deliberately trying to cause corruption and/or panics? For example, on node1: mount -t gfs /dev/pool/pool0 /gfs -o locktable=cluster:foo and on node 2: mount -t gfs /dev/pool/pool0 /gfs -o locktable=cluster:bar 2) What do you expect GFS to do when it encounters a journal in need of repair when told to mount read-only? If it modifies the filesystem, it really isn't read-only at that point (and arguably a bug). Perhaps remounting the filesystem readonly after the journal has been replayed is what you are after? e.g.: mount -o remount,ro /gfs In which case, I don't know what would happen if a read-write node crashes and the read-only node trys to recover. Hopefully it would fail to replay the journal and allow another a node to retry.
I was trying to mount the same filesystem into two different lockspaces just to see that the locktable flag worked knowing that I might corrupt my data. But that's a different "I have a loaded gun pointed at my foot" issue. :) I guess this isn't really a bug then as that is the expected behavior if the journal needs to be replayed. It's just that the error given was a little scarey. :( But if one looks in the log, it's clear what happened.