Bug 421761 - 'gfs_tool lockdump' wrongly says 'unknown mountpoint' re HP cciss RAID array
'gfs_tool lockdump' wrongly says 'unknown mountpoint' re HP cciss RAID array
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
All Linux
low Severity medium
: ---
: ---
Assigned To: Robert Peterson
GFS Bugs
:
Depends On:
Blocks: 431945
  Show dependency treegraph
 
Reported: 2007-12-12 10:18 EST by Charlie Brady
Modified: 2010-01-11 22:16 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2008-0804
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-25 15:28:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch to fix the problem (2.43 KB, patch)
2008-02-07 19:09 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Charlie Brady 2007-12-12 10:18:53 EST
I have a two node system with a GFS filesystem on external RAID array.

[root@aa-node-1 ~]# cat /proc/mounts
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
none /dev tmpfs rw 0 0
/dev/root / ext3 rw 0 0
none /dev tmpfs rw 0 0 none /selinux selinuxfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
/dev/md1 /boot ext3 rw 0 0
none /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
/dev/cciss/c0d0p1 /diskarray gfs rw,noatime,nodiratime 0 0
[root@aa-node-1 ~]#

[root@aa-node-1 ~]# gfs_tool lockdump /diskarray gfs_tool: unknown mountpoint
/diskarray
[root@aa-node-1 ~]# 

strace tells me that it does get the gfs file list, and that it is finding the
mountpoint in /proc/mounts:

open("/proc/fs/gfs", O_RDWR|O_LARGEFILE) = 3
write(3, "list", 4)                     = 4
read(3, "4172492800 cciss/c0d0p1 6E0845C6"..., 1048575) = 45
close(3)                                = 0
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7e14000
read(3, "rootfs / rootfs rw 0 0\n/proc /pr"..., 1024) = 448
open("/proc/devices", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7e13000
read(4, "Character devices:\n  1 mem\n  4 /"..., 1024) = 414
close(4)                                = 0
munmap(0xb7e13000, 4096)                = 0
stat64("/dev/cciss/c0d0p1", {st_mode=S_IFBLK|0600, st_rdev=makedev(104,
1), ...}) = 0
close(3)                                = 0
munmap(0xb7e14000, 4096)                = 0
write(2, "gfs_tool: ", 10gfs_tool: )              = 10 write(2, "unknown
mountpoint /diskarray\n", 30unknown mountpoint
/diskarray
) = 30
exit_group(1)                           = ?
Process 24095 detached
[root@aa-node-1 ~]#

The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
cookie for the filesystem, because "cciss/c0d0p1" does not match
"/dev/cciss/c0d0p1".

The error message is misleading. The mountpoint does exist, is known to the
system, but gfs_tool just can't find the cookie.

As a workaround, I can read the lockdump via:

[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo list >&5
[root@aa-node-1 ~]# cat <&5
4172492800 cciss/c0d0p1 6E0845C6A41911:FS1.0
cat: -: No such file or directory
[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo lockdump 4172492800 >&5
[root@aa-node-1 ~]# dd bs=4096k <&5 > /tmp/gfs.lockdump
dd: reading `standard input': No such file or directory
0+1 records in
0+1 records out
[root@aa-node-1 ~]#
Comment 1 Charlie Brady 2007-12-12 10:33:31 EST
> The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
> cookie for the filesystem, because "cciss/c0d0p1" does not match
> "/dev/cciss/c0d0p1".

No, I think the failing comparison is "cciss/c0d0p1" vs "c0d0p1", due to this
code in do_basename():

...
        if (stat(device, &st))
                goto punt;
        if (major(st.st_rdev) == major_number) {
                static char realname[16];
                snprintf(realname, 16, "dm-%u", minor(st.st_rdev));
                return realname;
        }

 punt:
        return basename(device);
}
...

Using basename() to strip a "/dev/" prefix appears naive.
Comment 2 Robert Peterson 2007-12-12 11:04:56 EST
I'll assume ownership of this one.  I fixed this for RHEL5.2.
This section of code is not straightforward.  It does lookups only
to turn around and do reverse lookups.
Comment 3 Charlie Brady 2007-12-12 11:26:28 EST
This code looks odd to me - is a mountpoint ever listed as the first item in the
output of "gfs_tool list", or is this just a backdoor way to allow "gfs_tool
lockdump cookie"?

...
        for (x = 0; *lines[x]; x++) {
                char s_id[256];
                sscanf(lines[x], "%s %s", cookie, s_id);
                if (dev) {
                        if (strcmp(s_id, dev) == 0)
                                return cookie;
                } else {
                        if (strcmp(cookie, mp) == 0)
                                return cookie;
                }
        }
...

Using the cookie as the index to 'gfs_tool lockdump' does work, so a simpler
workaround becomes:

gfs_tool lockdump $(gfs_tool list | awk '{ print $1 }')
Comment 5 RHEL Product and Program Management 2007-12-12 11:35:41 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Robert Peterson 2008-02-07 10:11:16 EST
The "proper" way to fix this is to do the same thing I did with GFS2.
That is, to change the gfs kernel module so that it kicks out the
device major and minor number as "major:minor" rather than s_id, and
change gfs_tool to expect it that way accordingly.  That way it will
find the proper device no matter where it is or what it's called.
That would require a gfs-kernel crosswrite bugzilla.
Comment 7 Robert Peterson 2008-02-07 19:09:03 EST
Created attachment 294292 [details]
Proposed patch to fix the problem

Here is the patch I propose to fix the problem.  I've tested the
RHEL5 equivalent patch, but not the RHEL4 version yet.	Note that
this fix relies upon the gfs-kernel patch attached to bug #431945.
Comment 8 Robert Peterson 2008-02-08 13:33:01 EST
The RHEL4 patch isn't quite right.  (This is why we do testing).
In RHEL4, GFS uses a special "diapered device" to insulate the users
from the device level.  That device isn't translating well with this
patch.  So I'll need to adjust it accordingly.
Comment 9 Robert Peterson 2008-02-08 14:38:28 EST
Just to clarify the situation: The problem noted in comment #8 was
caused because the change I made to the gfs-kernel module was returning
the device information for the "diapered device" not the real device.
I corrected that in a revised gfs-kernel patch for bug #421761.
Then I tested the two on a real RHEL4.6 prototype system and
everything worked properly.  So the patch attached to this bugzilla
record is correct.
Comment 10 Robert Peterson 2008-03-14 12:34:09 EDT
Tested on system "trin-09" and committed to our internal git tree
in branch RHEL4 for inclusion in 4.7.  Setting status to modified.
Comment 13 errata-xmlrpc 2008-07-25 15:28:20 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0804.html

Note You need to log in before you can comment on or make changes to this bug.