Bug 363901 - GFS2: sysfs "id" file should contain device id
Summary: GFS2: sysfs "id" file should contain device id
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Don Zickus
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks: 354201
TreeView+ depends on / blocked
 
Reported: 2007-11-02 14:44 UTC by Robert Peterson
Modified: 2008-05-21 15:00 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:00:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to fix the problem (498 bytes, patch)
2007-11-06 15:20 UTC, Robert Peterson
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Robert Peterson 2007-11-02 14:44:17 UTC
+++ This bug was initially created as a clone of Bug #354201 +++
This is for the gfs2 kernel portion of the fix.  In order for
gfs2_tool to correctly find the tuning files in sysfs, it needs to
find them by the device ID.  This bug record is for patching
the RHEL5.2 gfs2 kernel code so that it puts "major:minor" in the
/sys/fs/gfs2/<s_id>/id file rather than the s_id value.

As seen in the trace, the gfs2_tool improperly parses dev names if they have an
underscore as in seen with cciss (HP) disks

----------------------
[root@ustst-d0024 ~]# gfs2_tool settune /uss incore_log_blocks 4096
gfs2_tool: unknown mountpoint /uss, no fsname for c0d1p1
[root@ustst-d0024 ~]# gfs2_tool settune /uss/ incore_log_blocks 4096
gfs2_tool: unknown mountpoint /uss/, no device found
[root@ustst-d0024 ~]# strace gfs2_tool settune /uss incore_log_blocks 4096
execve("/sbin/gfs2_tool", ["gfs2_tool", "settune", "/uss", "incore_log_blocks",
"4096"], [/* 21 vars */]) = 0
brk(0)                                  = 0x9fc0000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=45044, ...}) = 0
mmap2(NULL, 45044, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f9b000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\357\226\0004\0\0\0"..., 512)
= 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1585788, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f9a000
mmap2(0x959000, 1308068, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x959000
mmap2(0xa93000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x139) = 0xa93000
mmap2(0xa96000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
-1, 0) = 0xa96000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f99000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f996c0, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}) = 0
mprotect(0xa93000, 8192, PROT_READ)     = 0
mprotect(0x955000, 4096, PROT_READ)     = 0
munmap(0xb7f9b000, 45044)               = 0
open("/uss", O_RDONLY|O_LARGEFILE)      = 3
close(3)                                = 0
brk(0)                                  = 0x9fc0000
brk(0x9fe1000)                          = 0x9fe1000
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7fa5000
read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 872
open("/proc/devices", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7fa4000
read(4, "Character devices:\n  1 mem\n  4 /"..., 4096) = 410
close(4)                                = 0
munmap(0xb7fa4000, 4096)                = 0
stat64("/dev/cciss/c0d1p1", {st_mode=S_IFBLK|0640, st_rdev=makedev(104, 17),
...}) = 0
close(3)                                = 0
munmap(0xb7fa5000, 4096)                = 0
open("/sys/fs/gfs2", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
getdents64(3, /* 3 entries */, 4096)    = 80
open("/sys/fs/gfs2/cciss_c0d1p1/id", O_RDONLY|O_LARGEFILE) = 4
read(4, "cciss/c0d1p1\n", 4096)         = 13
close(4)                                = 0
getdents64(3, /* 0 entries */, 4096)    = 0
close(3)                                = 0
write(2, "gfs2_tool: ", 11gfs2_tool: )             = 11
write(2, "unknown mountpoint /uss, no fsna"..., 46unknown mountpoint /uss, no
fsname for c0d1p1
) = 46
exit_group(1)                           = ?

-- Additional comment from rkenna on 2007-10-26 10:53 EST --
Reproducer:

Mount a gfs2 file system directly on a dev (not LVM2) with an underscore (mknod
an alias if needed)

Run the command 
[root@et-virt05 dev]#  ~/gfs2_tool settune /uss incore_log_blocks 4096
/root/gfs2_tool: unknown mountpoint /uss, no fsname for xxx_yyy



-- Additional comment from rkenna on 2007-10-26 10:54 EST --
Trace with proposed change for the library

[root@et-virt05 dev]# strace ~/gfs2_tool settune /uss incore_log_blocks 4096
execve("/root/gfs2_tool", ["/root/gfs2_tool", "settune", "/uss",
"incore_log_blocks", "4096"], [/* 24 vars */]) = 0
brk(0)                                  = 0x1036000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaaab000
uname({sys="Linux", node="et-virt05.lab.boston.redhat.com", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=104403, ...}) = 0
mmap(NULL, 104403, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaaac000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\331A\250>\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1687464, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaac6000
mmap(0x3ea8400000, 3469464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x3ea8400000
mprotect(0x3ea8546000, 2097152, PROT_NONE) = 0
mmap(0x3ea8746000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x146000) = 0x3ea8746000
mmap(0x3ea874b000, 16536, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ea874b000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaac7000
arch_prctl(ARCH_SET_FS, 0x2aaaaaac7240) = 0
mprotect(0x3ea8746000, 16384, PROT_READ) = 0
mprotect(0x3ea8219000, 4096, PROT_READ) = 0
munmap(0x2aaaaaaac000, 104403)          = 0
brk(0)                                  = 0x1036000
brk(0x1057000)                          = 0x1057000
open("/proc/mounts", O_RDONLY)          = 3
lstat("/uss", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaaac000
read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 871
close(3)                                = 0
munmap(0x2aaaaaaac000, 4096)            = 0
lstat("/uss", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
open("/proc/mounts", O_RDONLY)          = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaaac000
read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 871
open("/proc/devices", O_RDONLY)         = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaaad000
read(4, "Character devices:\n  1 mem\n  4 /"..., 4096) = 454
close(4)                                = 0
munmap(0x2aaaaaaad000, 4096)            = 0
stat("/dev/xxx_yyy", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
close(3)                                = 0
munmap(0x2aaaaaaac000, 4096)            = 0
open("/sys/fs/gfs2", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 4 entries */, 4096)      = 120
open("/sys/fs/gfs2/sdb1/id", O_RDONLY)  = 4
read(4, "sdb1\n", 4096)                 = 5
close(4)                                = 0
open("/sys/fs/gfs2/RHMag-Phys:guest_roots/id", O_RDONLY) = 4
read(4, "dm-3\n", 4096)                 = 5
close(4)                                = 0
getdents(3, /* 0 entries */, 4096)      = 0
close(3)                                = 0
write(2, "/root/gfs2_tool: ", 17/root/gfs2_tool: )       = 17
write(2, "unknown mountpoint /uss, no fsna"..., 47unknown mountpoint /uss, no
fsname for xxx_yyy
) = 47
exit_group(1)                           = ?


-- Additional comment from rpeterso on 2007-10-26 11:34 EST --
It's my belief that it has nothing to do with the underscore.
I believe that the problem lies when the device name is in a subdirectory,
at least in the case of cciss.  (I haven't tried the reproducer yet).

When the gfs2 kernel code registers the file system in the sysfs, it
has to substitute slash (/) chars for underscore (_).  Thus, 
/dev/cciss/c0d0p0 is registered as /dev/cciss_c0d0p0.  The problem is
that the gfs2 utils doesn't do that substitution.  We'd likely have 
this problem with AOE as well, where the device is something like: 
/dev/etherd/e1.1p1 which gets registered as /dev/etherd_e1.1p1.
We should probably change the summary accordingly.

The fix should be simple and straightforward.

How soon do they want or need a fix (so I can prioritize)?


-- Additional comment from rkenna on 2007-10-26 12:05 EST --
re: Comment #3

-- Additional comment from rkenna on 2007-10-26 12:06 EST --
re: Comment #3 

I have a failure case that works with

0 brw-r----- 1 root disk 8, 17 Oct 26 10:38 /dev/sdb1

and fails with 

0 brw-r----- 1 root root 8, 17 Oct 26 10:42 /dev/xxx_yyy


-- Additional comment from rpeterso on 2007-10-29 16:05 EST --
I verified that we are dealing with two problems: (1) The AOL problem
where a sub-subdirectory like /dev/cciss/c0d0p1 doesn't work, and
(2) The problem reported from comment #5 where alternate devices
created with mknod don't work.  Again, the underscore isn't the cause.
I've got a fix that seems to work for both cases.  Unfortunately, it's
backed up behind bug #349601, which I can ship easily; I just wanted to
give people more of a chance to respond to my proposed output format.


-- Additional comment from rpeterso on 2007-10-30 10:45 EST --
Created an attachment (id=243451)
Preliminary patch

Now that I've committed bug #349601, I can move on with this one.
This is a patch that mostly resolves the problem.  There is still
one case where the problem is not solved, and I'll explain why:

The problem lies when gfs2_tool is trying to figure out where the
gfs2 kernel code will keep its sysfs interface for the mounted fs.
The gfs2 kernel code uses /sys/fs/gfs2/<SOME NAME>/* for these files.
In the case where the file system has a lock table, gfs2 uses the
lock table name.  Something like: /sys/fs/gfs2/bobs_roth\:test_gfs/*.
That's easy for gfs2_tool to figure out, by reading the gfs2 superblock
of the device.	If the lock table is NULL--meaning it's a stand-alone
file system, gfs2 uses VFS's "s_id" value.  In cases where the device
mounted is something like "/dev/sdb1", the s_id will be "sdb1" and
thus, gfs2 will use /sys/fs/gfs2/sdb1/*.  That's also not a problem for
gfs2_tool to figure out.  However, here's the catch:

If the user creates their own device equivalent, s_id won't match the
device name.  So if I try to simulate customer hardware I don't
actually have, it's not going to work.	For example, if I do:
   mkfs /dev/cciss
   mknod /dev/cciss/c0d0p1 b 8 17 
   mount -tgfs2 /dev/cciss/c0d0p0 /mnt/gfs2

This will actually mount the second partition on the second SCSI drive
under /mnt/gfs2.  However, the gfs2 kernel code will still create its
sysfs files under: /sys/fs/gfs2/sdb1/*, NOT something like the device
name.  So gfs2_tool won't be able to find the sysfs files.

The previous version checked the last basename of the device against
all the gfs2 sysfs files it could find.  However, in this case, the
"id" file will still be "sdb1", again, named after the VFS s_id.

AFAIK, there's no good way for gfs2_tool to determine the VFS s_id
of a mounted file system.  The only way I can think of to do this
properly is to change the gfs2 kernel code so that a statfs from
userland can return a unique identifier in the f_fsid field of the
struct statfs.	This seems to be what some other file systems do.
Right now, gfs2 always returns "0, 0" as the f_fsid.

Now normally customers aren't going to be creating their own phantom
devices that don't match what vfs thinks.  That's the kind of thing
we do in order to test customer scenarios.  So our options are:
(a) document around this restriction, (b) fix gfs2 kernel to pass
something in the statfs f_fsid and use it in gfs2_tool, or (c)
maybe someone can think of a magic trick to get vfs's s_id from user
space.	I didn't find any offhand.


-- Additional comment from rpeterso on 2007-10-31 10:02 EST --
Adding Steve Whitehouse to the cc list to get his input regarding
comment #7.


-- Additional comment from swhiteho on 2007-10-31 10:33 EST --
Please don't do this by matching device names. Ideally you ought to be matching
device numbers (not major/minor, just the dev_t directly). Use stat to get the
dev_t for the fs in question and then its a question of how to find the dev_t
for the /sys directory. Then you can iterate over the /sys directories until you
find the matching one. Ideally we'd have a device node in there which would be
set with the same device number as the device that the fs is mounted on, then we
can just compare the two numbers. That would be my preferred approach.

If that turns out to be impossible, then it looks like we'll need (at least to
attempt) to reconstruct the device path from the info thats in that directory
and then use stat to turn the path into a device number. I really don't like
that as the mounter of the fs might be in a different fs name space to the
person who is running the user tool so the paths might not be the same, but we
might have to do that as a last resort if the other method can't be made to work.



-- Additional comment from rpeterso on 2007-10-31 18:22 EST --
Created an attachment (id=244921)
Proposed gfs2 kernel patch

This patch puts the dev_t value into /sys/fs/gfs2/<fs>/id so that
userland can pick it up.  This enables userland to determine which
file system is being referenced.  Note that there's a fundamental
translation that needs to be done.  The kernel references dev_t as
((major << 20) | minor), whereas userland uses ((major << 8) | minor.
This translation can be done in the kernel or in userland.
Since gfs2's sysfs is the accepted interface into userland, I thought
it was best done in the kernel code that manages that.	Let me know if
you disagree.

This makes the userland patch much simpler (also tested and it works).

If you like this patch, I'll post it to cluster-devel so it can go
upstream, and I'll open a bugzilla to get it into rhel5.2.
I'll wait for feedback from Steve W.


-- Additional comment from rpeterso on 2007-10-31 18:29 EST --
Created an attachment (id=244931)
Replacement patch that relies on the kernel patch

This patch is similar to how the code worked before.  It gets the dev_t
from the stat command, then it tries all the /sys/fs/gfs2/*/id files
looking for a matching dev_t value.  It's tested and works, but it
relies on the kernel patch, so pending its acceptance.


-- Additional comment from rpeterso on 2007-11-02 09:49 EST --
Created an attachment (id=246731)
Proposed gfs2 kernel patch--try #2

This gfs2 kernel patch presents the device "major:minor" in the ID
file, in that order.


-- Additional comment from rpeterso on 2007-11-02 09:51 EST --
Created an attachment (id=246741)
Replacement gfs2_tool patch that relies on kernel patch

Comment 1 Robert Peterson 2007-11-02 14:47:07 UTC
Requesting ack flags to get this in the 5.2 kernel.


Comment 2 RHEL Program Management 2007-11-02 14:54:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Robert Peterson 2007-11-05 03:42:27 UTC
Dave Teigland had a good point about being careful when changing the abi.
Following his lead, I found out that if we switch the value of the "id"
file as suggested in the patch, older versions of the gfs2_tool that
expect it the other way will break.  In other words, the users would then
be required to upgrade their gfs2_tool.  That might be acceptable for
RHEL because we can control it, but it probably won't fly for upstream.
Therefore, I propose we instead export the device id to a new sysfs file
called device_id.  That way we won't break anything.  My big question is:
Are there any kabi requirements here that I don't know about?
Is it okay to create a new sysfs file for gfs2?  How will that be viewed by
the upstream community?


Comment 4 Robert Peterson 2007-11-06 15:17:40 UTC
We discussed the patch in Monday's gfs2 tech meeting, and we decided
that there shouldn't be an issue exporting the device "major:minor" in
the ID file.  Other file systems do this, so it should be okay for us.
Right now, gfs2 is still in tech preview and no one in RHEL or the
upstream community should be under the impression that it's ready for
enterprise use yet, so making this change to the kernel abi seemed
acceptable to everyone in the meeting.  Therefore, we decided to go
ahead with the patch as is and not try to create a new device_id file.
I've tested with that already.  I'll attach the patch to this bz and
post it to rhkernel-list for inclusion in RHEL5.2.

The kernel should really have a better solution for this that works for
all file systems.  Perhaps we should create some kind of interface that
works for all file systems and propose it on lkml.  Time permitting.


Comment 5 Robert Peterson 2007-11-06 15:20:07 UTC
Created attachment 249291 [details]
Patch to fix the problem

Here is the patch I'm going to post to rhkernel-list.

Comment 6 Robert Peterson 2007-11-06 18:45:55 UTC
Posted to rhkernel-list.  Turning it over to Don Zickus.

Comment 7 Don Zickus 2007-11-29 17:05:11 UTC
in 2.6.18-58.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 10 errata-xmlrpc 2008-05-21 15:00:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.