Bug 431945 - GFS: gfs-kernel should use device major:minor
GFS: gfs-kernel should use device major:minor
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: GFS-kernel (Show other bugs)
4
All Linux
low Severity medium
: ---
: ---
Assigned To: Robert Peterson
Cluster QE
:
: 475881 (view as bug list)
Depends On: 421761
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-07 17:49 EST by Robert Peterson
Modified: 2010-01-11 22:19 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2008-0802
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-25 15:27:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch to fix the problem (1.32 KB, patch)
2008-02-07 18:52 EST, Robert Peterson
no flags Details | Diff
Proposed gfs2 kernel patch--try #2 (1.57 KB, patch)
2008-02-08 14:26 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Robert Peterson 2008-02-07 17:49:31 EST
+++ This bug was initially created as a clone of Bug #421761 +++
Bug #421761 was cloned so I can do the gfs-kernel work.
This is similar to GFS2 bug pairs: 354201 (userland) and 363901 (kernel)
for RHEL5.  We need to fix this bug for GFS in both RHEL4 and 5.

I have a two node system with a GFS filesystem on external RAID array.

[root@aa-node-1 ~]# cat /proc/mounts
rootfs / rootfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
none /dev tmpfs rw 0 0
/dev/root / ext3 rw 0 0
none /dev tmpfs rw 0 0 none /selinux selinuxfs rw 0 0
/proc /proc proc rw,nodiratime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
/dev/md1 /boot ext3 rw 0 0
none /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
/dev/cciss/c0d0p1 /diskarray gfs rw,noatime,nodiratime 0 0
[root@aa-node-1 ~]#

[root@aa-node-1 ~]# gfs_tool lockdump /diskarray gfs_tool: unknown mountpoint
/diskarray
[root@aa-node-1 ~]# 

strace tells me that it does get the gfs file list, and that it is finding the
mountpoint in /proc/mounts:

open("/proc/fs/gfs", O_RDWR|O_LARGEFILE) = 3
write(3, "list", 4)                     = 4
read(3, "4172492800 cciss/c0d0p1 6E0845C6"..., 1048575) = 45
close(3)                                = 0
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7e14000
read(3, "rootfs / rootfs rw 0 0\n/proc /pr"..., 1024) = 448
open("/proc/devices", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7e13000
read(4, "Character devices:\n  1 mem\n  4 /"..., 1024) = 414
close(4)                                = 0
munmap(0xb7e13000, 4096)                = 0
stat64("/dev/cciss/c0d0p1", {st_mode=S_IFBLK|0600, st_rdev=makedev(104,
1), ...}) = 0
close(3)                                = 0
munmap(0xb7e14000, 4096)                = 0
write(2, "gfs_tool: ", 10gfs_tool: )              = 10 write(2, "unknown
mountpoint /diskarray\n", 30unknown mountpoint
/diskarray
) = 30
exit_group(1)                           = ?
Process 24095 detached
[root@aa-node-1 ~]#

The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
cookie for the filesystem, because "cciss/c0d0p1" does not match
"/dev/cciss/c0d0p1".

The error message is misleading. The mountpoint does exist, is known to the
system, but gfs_tool just can't find the cookie.

As a workaround, I can read the lockdump via:

[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo list >&5
[root@aa-node-1 ~]# cat <&5
4172492800 cciss/c0d0p1 6E0845C6A41911:FS1.0
cat: -: No such file or directory
[root@aa-node-1 ~]# exec 5<>/proc/fs/gfs
[root@aa-node-1 ~]# echo lockdump 4172492800 >&5
[root@aa-node-1 ~]# dd bs=4096k <&5 > /tmp/gfs.lockdump
dd: reading `standard input': No such file or directory
0+1 records in
0+1 records out
[root@aa-node-1 ~]#

-- Additional comment from charlieb-redhat-bugzilla@e-smith.com on 2007-12-12
10:33 EST --
> The problem lies in mp2cookie() in gfs_tool/util.c - it's failing to find a
> cookie for the filesystem, because "cciss/c0d0p1" does not match
> "/dev/cciss/c0d0p1".

No, I think the failing comparison is "cciss/c0d0p1" vs "c0d0p1", due to this
code in do_basename():

...
        if (stat(device, &st))
                goto punt;
        if (major(st.st_rdev) == major_number) {
                static char realname[16];
                snprintf(realname, 16, "dm-%u", minor(st.st_rdev));
                return realname;
        }

 punt:
        return basename(device);
}
...

Using basename() to strip a "/dev/" prefix appears naive.


-- Additional comment from rpeterso@redhat.com on 2007-12-12 11:04 EST --
I'll assume ownership of this one.  I fixed this for RHEL5.2.
This section of code is not straightforward.  It does lookups only
to turn around and do reverse lookups.


-- Additional comment from charlieb-redhat-bugzilla@e-smith.com on 2007-12-12
11:26 EST --
This code looks odd to me - is a mountpoint ever listed as the first item in the
output of "gfs_tool list", or is this just a backdoor way to allow "gfs_tool
lockdump cookie"?

...
        for (x = 0; *lines[x]; x++) {
                char s_id[256];
                sscanf(lines[x], "%s %s", cookie, s_id);
                if (dev) {
                        if (strcmp(s_id, dev) == 0)
                                return cookie;
                } else {
                        if (strcmp(cookie, mp) == 0)
                                return cookie;
                }
        }
...

Using the cookie as the index to 'gfs_tool lockdump' does work, so a simpler
workaround becomes:

gfs_tool lockdump $(gfs_tool list | awk '{ print $1 }')


-- Additional comment from rkenna@redhat.com on 2007-12-12 11:32 EST --
Marking for 4.7 consideration

-- Additional comment from pm-rhel@redhat.com on 2007-12-12 11:35 EST --
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

-- Additional comment from rpeterso@redhat.com on 2008-02-07 10:11 EST --
The "proper" way to fix this is to do the same thing I did with GFS2.
That is, to change the gfs kernel module so that it kicks out the
device major and minor number as "major:minor" rather than s_id, and
change gfs_tool to expect it that way accordingly.  That way it will
find the proper device no matter where it is or what it's called.
That would require a gfs-kernel crosswrite bugzilla.
Comment 1 Robert Peterson 2008-02-07 18:52:16 EST
Created attachment 294291 [details]
Proposed patch to fix the problem

This patch allows the gfs-kernel to return the device ID in the form
"major:minor" to userland so that gfs_tool can identify the device
regardless of the device.
Comment 2 Robert Peterson 2008-02-07 18:55:17 EST
I should mention that this patch is untested in its RHEL4 incarnation
because I don't currently have a RHEL4 system for testing.  I will test
it on RHEL4.x before releasing the fix.
Comment 3 RHEL Product and Program Management 2008-02-07 18:57:37 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Robert Peterson 2008-02-08 14:26:02 EST
Created attachment 294401 [details]
Proposed gfs2 kernel patch--try #2

This version returns the major/minor numbers for the correct device,
not the "diaper" device.  This was tested on a real RHEL4.x machine.
Comment 5 Robert Peterson 2008-03-14 12:35:23 EDT
Fix was tested on system "trin-09" and committed to our internal
GIT tree at branch RHEL4 for inclusion into 4.7.  Setting status to
modified.
Comment 8 errata-xmlrpc 2008-07-25 15:27:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0802.html
Comment 9 Abhijith Das 2009-01-21 16:53:00 EST
*** Bug 475881 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.