Bug 421761 - 'gfs_tool lockdump' wrongly says 'unknown mountpoint' re HP cciss RAID array
Summary: 'gfs_tool lockdump' wrongly says 'unknown mountpoint' re HP cciss RAID array
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs   
(Show other bugs)
Version: 4
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Robert Peterson
QA Contact: GFS Bugs
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 431945
TreeView+ depends on / blocked
 
Reported: 2007-12-12 15:18 UTC by Charlie Brady
Modified: 2010-01-12 03:16 UTC (History)
0 users

Fixed In Version: RHBA-2008-0804
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-25 19:28:20 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch to fix the problem (2.43 KB, patch)
2008-02-08 00:09 UTC, Robert Peterson
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0804 normal SHIPPED_LIVE GFS bug fix update 2008-07-25 19:28:17 UTC

Description Charlie Brady 2007-12-12 15:18:53 UTC
I have a two node system with a GFS filesystem on external RAID array.

[root@aa-node-1 ~]# cat /proc/mounts
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
none /dev tmpfs rw 0 0
/dev/root / ext3 rw 0 0
none /dev tmpfs rw 0 0 none /selinux selinuxfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
/dev/md1 /boot ext3 rw 0 0
none /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
/dev/cciss/c0d0p1 /diskarray gfs rw,noatime,nodiratime 0 0
[root@aa-node-1 ~]#

[root@aa-node-1 ~]# gfs_tool lockdump /diskarray gfs_tool: unknown mountpoint
/diskarray
[root@aa-node-1 ~]# 

strace tells me that it does get the gfs file list, and that it is finding the
mountpoint in /proc/mounts:

open("/proc/fs/gfs", O_RDWR|O_LARGEFILE) = 3
write(3, "list", 4)                     = 4
read(3, "4172492800 cciss/c0d0p1 6E0845C6"..., 1048575) = 45
close(3)                                = 0
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7e14000
read(3, "rootfs / rootfs rw 0 0\n/proc /pr"..., 1024) = 448
open("/proc/devices", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7e13000
read(4, "Character devices:\n  1 mem\n  4 /"..., 1024) = 414
close(4)                                = 0
munmap(0xb7e13000, 4096)                = 0
stat64("/dev/cciss/c0d0p1", {st_mode=S_IFBLK|0600, st_rdev=makedev(104,
1), ...}) = 0
close(3)                                = 0
munmap(0xb7e14000, 4096)                = 0
write(2, "gfs_tool: ", 10gfs_tool: )              = 10 write(2, "unknown
mountpoint /diskarray\n", 30unknown mountpoint
/diskarray
) = 30
exit_group(1)                           = ?
Process 24095 detached
[root@aa-node-1 ~]#

The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
cookie for the filesystem, because "cciss/c0d0p1" does not match
"/dev/cciss/c0d0p1".

The error message is misleading. The mountpoint does exist, is known to the
system, but gfs_tool just can't find the cookie.

As a workaround, I can read the lockdump via:

[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo list >&5
[root@aa-node-1 ~]# cat <&5
4172492800 cciss/c0d0p1 6E0845C6A41911:FS1.0
cat: -: No such file or directory
[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo lockdump 4172492800 >&5
[root@aa-node-1 ~]# dd bs=4096k <&5 > /tmp/gfs.lockdump
dd: reading `standard input': No such file or directory
0+1 records in
0+1 records out
[root@aa-node-1 ~]#

Comment 1 Charlie Brady 2007-12-12 15:33:31 UTC
> The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
> cookie for the filesystem, because "cciss/c0d0p1" does not match
> "/dev/cciss/c0d0p1".

No, I think the failing comparison is "cciss/c0d0p1" vs "c0d0p1", due to this
code in do_basename():

...
        if (stat(device, &st))
                goto punt;
        if (major(st.st_rdev) == major_number) {
                static char realname[16];
                snprintf(realname, 16, "dm-%u", minor(st.st_rdev));
                return realname;
        }

 punt:
        return basename(device);
}
...

Using basename() to strip a "/dev/" prefix appears naive.


Comment 2 Robert Peterson 2007-12-12 16:04:56 UTC
I'll assume ownership of this one.  I fixed this for RHEL5.2.
This section of code is not straightforward.  It does lookups only
to turn around and do reverse lookups.


Comment 3 Charlie Brady 2007-12-12 16:26:28 UTC
This code looks odd to me - is a mountpoint ever listed as the first item in the
output of "gfs_tool list", or is this just a backdoor way to allow "gfs_tool
lockdump cookie"?

...
        for (x = 0; *lines[x]; x++) {
                char s_id[256];
                sscanf(lines[x], "%s %s", cookie, s_id);
                if (dev) {
                        if (strcmp(s_id, dev) == 0)
                                return cookie;
                } else {
                        if (strcmp(cookie, mp) == 0)
                                return cookie;
                }
        }
...

Using the cookie as the index to 'gfs_tool lockdump' does work, so a simpler
workaround becomes:

gfs_tool lockdump $(gfs_tool list | awk '{ print $1 }')


Comment 5 RHEL Product and Program Management 2007-12-12 16:35:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Robert Peterson 2008-02-07 15:11:16 UTC
The "proper" way to fix this is to do the same thing I did with GFS2.
That is, to change the gfs kernel module so that it kicks out the
device major and minor number as "major:minor" rather than s_id, and
change gfs_tool to expect it that way accordingly.  That way it will
find the proper device no matter where it is or what it's called.
That would require a gfs-kernel crosswrite bugzilla.


Comment 7 Robert Peterson 2008-02-08 00:09:03 UTC
Created attachment 294292 [details]
Proposed patch to fix the problem

Here is the patch I propose to fix the problem.  I've tested the
RHEL5 equivalent patch, but not the RHEL4 version yet.	Note that
this fix relies upon the gfs-kernel patch attached to bug #431945.

Comment 8 Robert Peterson 2008-02-08 18:33:01 UTC
The RHEL4 patch isn't quite right.  (This is why we do testing).
In RHEL4, GFS uses a special "diapered device" to insulate the users
from the device level.  That device isn't translating well with this
patch.  So I'll need to adjust it accordingly.


Comment 9 Robert Peterson 2008-02-08 19:38:28 UTC
Just to clarify the situation: The problem noted in comment #8 was
caused because the change I made to the gfs-kernel module was returning
the device information for the "diapered device" not the real device.
I corrected that in a revised gfs-kernel patch for bug #421761.
Then I tested the two on a real RHEL4.6 prototype system and
everything worked properly.  So the patch attached to this bugzilla
record is correct.


Comment 10 Robert Peterson 2008-03-14 16:34:09 UTC
Tested on system "trin-09" and committed to our internal git tree
in branch RHEL4 for inclusion in 4.7.  Setting status to modified.


Comment 13 errata-xmlrpc 2008-07-25 19:28:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0804.html



Note You need to log in before you can comment on or make changes to this bug.