[root@marathon-01 ~]# mount /dev/sdb1 /uss_a/ [root@marathon-01 ~]# cp -aR /usr/src/redhat/BUILD/kernel-2.6.18/linux-2.6.18.x86_64 /uss_a/ [root@marathon-01 ~]# df -hi /uss_a Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 245M 25K 245M 1% /uss_a [root@marathon-01 ~]# df -h /uss_a Filesystem Size Used Avail Use% Mounted on /dev/sdb1 977G 374M 977G 1% /uss_a [root@marathon-01 ~]# mkfs.gfs2 -p lock_nolock -j 1 -O -r 2048 /dev/sdb1 Device: /dev/sdb1 Blocksize: 4096 Device Size 976.57 GB (256001791 blocks) Filesystem Size: 976.57 GB (256001791 blocks) Journals: 1 Resource Groups: 489 Locking Protocol: "lock_nolock" Lock Table: "" [root@marathon-01 ~]# df -hi /uss_a Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 245M 25K 245M 1% /uss_a [root@marathon-01 ~]# df -h /uss_a Filesystem Size Used Avail Use% Mounted on /dev/sdb1 977G 374M 977G 1% /uss_a [root@marathon-01 ~]# umount /uss_a ![root@marathon-01 ~]# mount /dev/sdb1 /uss_a/ [root@marathon-01 ~]# df -h /uss_a Filesystem Size Used Avail Use% Mounted on /dev/sdb1 977G 34M 977G 1% /uss_a [root@marathon-01 ~]# df -hi /uss_a Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 245M 12 245M 1% /uss_a [root@marathon-01 ~]# ls /uss_a [root@marathon-01 ~]# Most mkfs's that I know of will refuse to mkfs a mounted filesystem... mkfs.gfs[1] behaves the same way.
Re-assigning to rpeterso.
This is essentially the same as the age-old bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156012 except that's for fsck whereas this is for mkfs. But we need the same mechanism for all the userland-only tools. The problem is tricky and we've had many discussions about it. It would be simple to figure out if the node that's doing mkfs has the fs mounted, but that isn't good enough, since other nodes may have it mounted as well. Figuring out if another node has a fs already mounted is difficult because when mkfs is run, you may not even have the cluster infrastructure running. In other words, you may not have any of the cluster communication stuff running at that point, and the userland tools can't be dependent on them running. In RHEL4 it was nearly impossible because much of the infrastructure was in the kernel code. In RHEL5, it might be somewhat easier because much of that code was brought down into userland. Today, a node will only have knowledge of the mount group only after it joins the group in question. We talked, for example, about changing the group code (gfs2_controld, groupd, etc.) so that all nodes know about the mount groups of other nodes, but that would be a potentially major design change. In theory, I suppose we could also have mkfs / fsck / etc. try to join the mount group first. If joining is successful, it could see if anyone else is a member of the group and if so, throw up an error. If it can't join, it could throw up a warning saying something like (but less wordy than): "WARNING: The cluster infrastructure isn't running, so %s can't tell if the file system is mounted by other nodes. Are you sure you still want to do this operation and have you made sure no other node has the thing mounted?", argv[0] We could also throw in a check for the lock protocol and have it be more forgiving if lock_nolock was specified in the superblock. We could have it check only in the local /proc/mounts, etc., although that's not very good either because the mount protocol may have been overridden on the mount command from another node. There was some promising discussion about using exclusive volume locking stuff in LVM2. However, that only solves the problem for LVM2 volumes, and there are customers out there using GFS and GFS2 with no clustered LVM in place, so just raw devices. We could call that a permanent restriction. I suppose some checking is better than none at all, which is what we have right now. I'm going to reroute this to Ryan O'Hara since he's got the original bug and since I'm going on vacation and can't work on it. I'd understand if Ryan closes it as a duplicate of that bug. I'm also adding Dave Teigland to the cc list because he's been a part of the discussion since day one.
It's a no-brainer to check if the fs is mounted on the local node, which solves the problem for lock_nolock which is what was reported here. Just check what mkfs.ext3 does (could it be as simple as using O_EXCL?) and copy it. That would probably catch a lot of lock_dlm cases, too. For checking if the fs is mounted on another node, we've been through all those discussions over and over, and my position hasn't changed -- the only thing that makes sense is to activate the LV exclusively. Yes, you are required to use clvm to benefit from some ancillary cluster-related features of GFS (see withdraw); this is simply one of them.
Fixed. Added check_mount function to gfs_mkfs/main.c, which does a very simple scan of /proc/mounts using the getmntent() interface. If we see that the device is already mounted, simple print an error message and exit. Please note that this check can only determine if a device is locally mounted. It will not solve the other issue, where another node may have the device mounted.
Thanks, sounds good. Yes, I understand that cross-SAN mount checks are tricky. ext3 could use it too. :) -Eric
This is fixed for gfs and gfs2. My previous comment is in reference to the code changes for gfs(1). Changes for gfs2 are in gfs2/mkfs/main_mkfs.c. Code is identical.
There are still ways around this check. For example, there are multiple names which point to the same device. [root@marathon-02 ~]# mount ... /dev/mapper/marathon-marathon0 on /mnt/gfs1 type gfs (rw,hostdata=jid=0:id=65538:first=1) [root@marathon-02 ~]# mkfs -t gfs2 -j 5 -p lock_dlm -t marathon:marathon0 /dev/marathon/marathon0 This will destroy any data on /dev/marathon/marathon0. It appears to contain a gfs filesystem. Are you sure you want to proceed? [y/n] y Device: /dev/marathon/marathon0 Blocksize: 4096 Device Size 4888.68 GB (1281537024 blocks) Filesystem Size: 4888.68 GB (1281537022 blocks) Journals: 5 Resource Groups: 9778 Locking Protocol: "lock_dlm" Lock Table: "marathon:marathon0" [root@marathon-02 ~]# mkfs -t gfs2 -j 5 -p lock_dlm -t marathon:marathon0 /dev/mapper/marathon-marathon0 cannot create filesystem: /dev/mapper/marathon-marathon0 appears to be mounted [root@marathon-02 ~]# ls -l /dev/marathon/marathon0 lrwxrwxrwx 1 root root 30 Nov 14 15:12 /dev/marathon/marathon0 -> /dev/mapper/marathon-marathon0 Is this something the code check should catch?
Looks like we need to use same solution I provided for mkfs.gfs (BZ 426298). Agree?
Fixed in RHEL5 tree. check_mount will now attempt to open the device with O_EXCL flag. If resulting fd < 0 and errno == EBUSY, the device is busy/mounted. Note that this will catch any attempts to run mkfs on a device that is part of an LVM volume, which is something you don't want to do anyway.
Created attachment 324507 [details] Fix check_mount for gfs2.
Need to get new changes incorporated prior to 5.3 release.
Committed changes to RHEL53 branch. Marking MODIFIED.
Verified with gfs2-utils-0.1.53-1.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0087.html