Bug 421761 - 'gfs_tool lockdump' wrongly says 'unknown mountpoint' re HP cciss RAID array
'gfs_tool lockdump' wrongly says 'unknown mountpoint' re HP cciss RAID array
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: Robert Peterson
GFS Bugs
Depends On:
Blocks: 431945
  Show dependency treegraph
Reported: 2007-12-12 10:18 EST by Charlie Brady
Modified: 2010-01-11 22:16 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2008-0804
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-25 15:28:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Proposed patch to fix the problem (2.43 KB, patch)
2008-02-07 19:09 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Charlie Brady 2007-12-12 10:18:53 EST
I have a two node system with a GFS filesystem on external RAID array.

[root@aa-node-1 ~]# cat /proc/mounts
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
none /dev tmpfs rw 0 0
/dev/root / ext3 rw 0 0
none /dev tmpfs rw 0 0 none /selinux selinuxfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
/dev/md1 /boot ext3 rw 0 0
none /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
/dev/cciss/c0d0p1 /diskarray gfs rw,noatime,nodiratime 0 0
[root@aa-node-1 ~]#

[root@aa-node-1 ~]# gfs_tool lockdump /diskarray gfs_tool: unknown mountpoint
[root@aa-node-1 ~]# 

strace tells me that it does get the gfs file list, and that it is finding the
mountpoint in /proc/mounts:

open("/proc/fs/gfs", O_RDWR|O_LARGEFILE) = 3
write(3, "list", 4)                     = 4
read(3, "4172492800 cciss/c0d0p1 6E0845C6"..., 1048575) = 45
close(3)                                = 0
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
= 0xb7e14000
read(3, "rootfs / rootfs rw 0 0\n/proc /pr"..., 1024) = 448
open("/proc/devices", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
= 0xb7e13000
read(4, "Character devices:\n  1 mem\n  4 /"..., 1024) = 414
close(4)                                = 0
munmap(0xb7e13000, 4096)                = 0
stat64("/dev/cciss/c0d0p1", {st_mode=S_IFBLK|0600, st_rdev=makedev(104,
1), ...}) = 0
close(3)                                = 0
munmap(0xb7e14000, 4096)                = 0
write(2, "gfs_tool: ", 10gfs_tool: )              = 10 write(2, "unknown
mountpoint /diskarray\n", 30unknown mountpoint
) = 30
exit_group(1)                           = ?
Process 24095 detached
[root@aa-node-1 ~]#

The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
cookie for the filesystem, because "cciss/c0d0p1" does not match

The error message is misleading. The mountpoint does exist, is known to the
system, but gfs_tool just can't find the cookie.

As a workaround, I can read the lockdump via:

[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo list >&5
[root@aa-node-1 ~]# cat <&5
4172492800 cciss/c0d0p1 6E0845C6A41911:FS1.0
cat: -: No such file or directory
[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo lockdump 4172492800 >&5
[root@aa-node-1 ~]# dd bs=4096k <&5 > /tmp/gfs.lockdump
dd: reading `standard input': No such file or directory
0+1 records in
0+1 records out
[root@aa-node-1 ~]#
Comment 1 Charlie Brady 2007-12-12 10:33:31 EST
> The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
> cookie for the filesystem, because "cciss/c0d0p1" does not match
> "/dev/cciss/c0d0p1".

No, I think the failing comparison is "cciss/c0d0p1" vs "c0d0p1", due to this
code in do_basename():

        if (stat(device, &st))
                goto punt;
        if (major(st.st_rdev) == major_number) {
                static char realname[16];
                snprintf(realname, 16, "dm-%u", minor(st.st_rdev));
                return realname;

        return basename(device);

Using basename() to strip a "/dev/" prefix appears naive.
Comment 2 Robert Peterson 2007-12-12 11:04:56 EST
I'll assume ownership of this one.  I fixed this for RHEL5.2.
This section of code is not straightforward.  It does lookups only
to turn around and do reverse lookups.
Comment 3 Charlie Brady 2007-12-12 11:26:28 EST
This code looks odd to me - is a mountpoint ever listed as the first item in the
output of "gfs_tool list", or is this just a backdoor way to allow "gfs_tool
lockdump cookie"?

        for (x = 0; *lines[x]; x++) {
                char s_id[256];
                sscanf(lines[x], "%s %s", cookie, s_id);
                if (dev) {
                        if (strcmp(s_id, dev) == 0)
                                return cookie;
                } else {
                        if (strcmp(cookie, mp) == 0)
                                return cookie;

Using the cookie as the index to 'gfs_tool lockdump' does work, so a simpler
workaround becomes:

gfs_tool lockdump $(gfs_tool list | awk '{ print $1 }')
Comment 5 RHEL Product and Program Management 2007-12-12 11:35:41 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 6 Robert Peterson 2008-02-07 10:11:16 EST
The "proper" way to fix this is to do the same thing I did with GFS2.
That is, to change the gfs kernel module so that it kicks out the
device major and minor number as "major:minor" rather than s_id, and
change gfs_tool to expect it that way accordingly.  That way it will
find the proper device no matter where it is or what it's called.
That would require a gfs-kernel crosswrite bugzilla.
Comment 7 Robert Peterson 2008-02-07 19:09:03 EST
Created attachment 294292 [details]
Proposed patch to fix the problem

Here is the patch I propose to fix the problem.  I've tested the
RHEL5 equivalent patch, but not the RHEL4 version yet.	Note that
this fix relies upon the gfs-kernel patch attached to bug #431945.
Comment 8 Robert Peterson 2008-02-08 13:33:01 EST
The RHEL4 patch isn't quite right.  (This is why we do testing).
In RHEL4, GFS uses a special "diapered device" to insulate the users
from the device level.  That device isn't translating well with this
patch.  So I'll need to adjust it accordingly.
Comment 9 Robert Peterson 2008-02-08 14:38:28 EST
Just to clarify the situation: The problem noted in comment #8 was
caused because the change I made to the gfs-kernel module was returning
the device information for the "diapered device" not the real device.
I corrected that in a revised gfs-kernel patch for bug #421761.
Then I tested the two on a real RHEL4.6 prototype system and
everything worked properly.  So the patch attached to this bugzilla
record is correct.
Comment 10 Robert Peterson 2008-03-14 12:34:09 EDT
Tested on system "trin-09" and committed to our internal git tree
in branch RHEL4 for inclusion in 4.7.  Setting status to modified.
Comment 13 errata-xmlrpc 2008-07-25 15:28:20 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.