Bug 669832 - using GFS2 reference LABEL=blah:blah only half works for resource definition
Summary: using GFS2 reference LABEL=blah:blah only half works for resource definition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: resource-agents
Version: 6.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Fabio Massimo Di Nitto
QA Contact: Toure Dunnon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-14 22:32 UTC by joshua
Modified: 2016-04-26 15:26 UTC (History)
5 users (show)

Fixed In Version: resource-agents-3.0.12-20.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 14:21:07 UTC
Target Upstream Version:


Attachments (Terms of Use)
screenshot of specifying GFS2 fs via LABEL= (31.07 KB, image/png)
2011-01-14 22:32 UTC, joshua
no flags Details
sanitized cluster.conf (2.23 KB, text/plain)
2011-02-03 16:44 UTC, joshua
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0744 0 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2011-05-18 18:09:07 UTC

Description joshua 2011-01-14 22:32:17 UTC
Description of problem:

When referencing a GFS2 filesystem with LABEL=CDTG-HA-Cluster:CDTG_GFS2_LV as a global cluster resource (see screenshot), the service starts and runs successfully... however, when the first service check happens, it says:

rgmanager[625]: clusterfs:CDTG GFS2 Filesystem: LABEL=CDTG-HA-Cluster:CDTG_GFS2_LV is not mounted on /data/cluster-storage/
Jan 14 17:17:44 cdtg-rtp-sun-1 rgmanager[6559]: status on clusterfs "CDTG GFS2 Filesystem" returned 7 (unspecified)


So... it can mount the unmounted GFS2 fs with a LABEL= reference, but then can't tell that that filesystem is still mounted a few seconds later when it checks the resource :-(

Version-Release number of selected component (if applicable):

rgmanager-3.0.12-10.el6.x86_64

Comment 1 joshua 2011-01-14 22:32:58 UTC
Created attachment 473606 [details]
screenshot of specifying GFS2 fs via LABEL=

Comment 3 joshua 2011-01-14 22:41:50 UTC
On a side note, using the UUID# or UUID=UUID# doesn't work at all.  The documentation doesn't reference UUID or LABEL usage... but the luci GUI does... is this just a bug in the GUI making me believe that either UUIDs or LABELs should work?

Comment 4 Fabio Massimo Di Nitto 2011-02-03 15:38:09 UTC
Please provide your cluster.conf and a tar of /var/log/cluster/, /dev/disk /var/log/messages (or an sosreport alternatively).

Comment 5 Fabio Massimo Di Nitto 2011-02-03 16:35:17 UTC
http://git.fedorahosted.org/git/?p=resource-agents.git;a=commitdiff;h=a23793faa3c7718177cbbda77363d5fd3765415d

this commit fix the LABEL= monitoring check.

Comment 6 Fabio Massimo Di Nitto 2011-02-03 16:36:39 UTC
We will need to cross check the cluster.conf generated from Luci when using UUID= because manual testing shows that it works as expected.

<clusterfs device="UUID=d92d449a-6c04-c820-9f2a-5ad81ca774c5" fsid="30538" fstype="gfs2" mountpoint="/mnt" name="gfs2-test"/>

and verify that the reported has /dev/disk/by-uuid on all nodes.

Comment 7 joshua 2011-02-03 16:43:02 UTC
We are working on things that are too sensitive to give you logs for... however, here is a cleaned version of cluster.conf

Comment 8 joshua 2011-02-03 16:44:56 UTC
Created attachment 476819 [details]
sanitized cluster.conf

Comment 9 Fabio Massimo Di Nitto 2011-02-03 16:51:01 UTC
(In reply to comment #7)
> We are working on things that are too sensitive to give you logs for...

then please consider signing up for official support contract, that includes
NDA and secrecy of data exchanged between Red Hat and customers.

In future, consider that no logs, means that there might be no way to fix an
issue you report.

At the very end you are running software I write, as root...... ;)

> however, here is a cleaned version of cluster.conf

I need to see the cluster.conf generated by luci for the UUID case and I need
to know what you have in /dev/disk/by-uuid from all nodes to make sure the
requested UUID from cluster.conf is visible everywhere.

The cluster.conf you attached, point straight to the device.

Comment 10 Jeremy West 2011-02-03 17:06:10 UTC
Hi Joshua,

For support assistance regarding bugs or potential bugs, please make sure you're working through Red Hat support.  For information on how to access support, please review the policies and instructions here: www.redhat.com/support

If you're not currently a customer, we would encourage you to become one so that we can ensure that issues such as reported in this bug are resolved and prioritized appropriately.  

For now, I'm closing this bug ... since bugzilla is not a valid support tool.  We can certainly reopen this and are happy to help resolve the problem via the appropriate support channels.

Thanks
Jeremy West
Red Hat Support

Comment 11 joshua 2011-02-03 18:46:27 UTC
Jeremy, I'm not looking for support assistance.  Me not working with a TAM doesn't mean this isn't a bug.  I'm not looking for support, I'm looking to make Red Hat aware that there is a bug that needs to be addressed.

This *is* a bug, as Fabio clearly indicates with his committing a patch.

Comment 12 joshua 2011-02-03 18:47:21 UTC
Fabio, I don't have a cluster.conf file with UUID=# in it, because that didn't work, as my bugzilla entry stated.  LABEL=blah:blah did work, but wasn't detected as mounted when the status check happens, again, as this bug states. I do have the UUID /dev/ device in /dev/disk/by-uuid/ on all nodes.

This isn't hard to reproduce... it seems like you did in fact find the LABEL= problem and commit a fix... so it may be that that will be the resolution when it makes it to RHEL.

Thank you!

Comment 13 Fabio Massimo Di Nitto 2011-02-03 18:57:12 UTC
(In reply to comment #12)
> Fabio, I don't have a cluster.conf file with UUID=# in it, because that didn't
> work, as my bugzilla entry stated.

Yes I understand that, but I cannot reproduce the UUID issue here. So I need to see if it's your version of luci generating the wrong config or at the time of testing the by-uuid entries were missing.

There is a window in which not all nodes have them because the way udev works.

>  LABEL=blah:blah did work, but wasn't
> detected as mounted when the status check happens, again, as this bug states.

Yes and I was able to reproduce that after some mingling around and that's why there is a fix now.

> I
> do have the UUID /dev/ device in /dev/disk/by-uuid/ on all nodes.

See above.. one simple question based on how udev works:

mkfs.gfs2 on nodeX -> you get the by-uuid entry on nodeX but not all the other nodes (udev is not cluster aware or understands shared storage)

1) did you test UUID= at this stage?
2) did you reboot the nodes in between mkfs and testing? (this would have caused the device to be re-scanned and populate by-uuid on all nodes)


> 
> This isn't hard to reproduce... it seems like you did in fact find the LABEL=
> problem and commit a fix... so it may be that that will be the resolution when
> it makes it to RHEL.

It took me sometime as it doesn't appear in all LABEL= conditions. Having had those info at the beginning could have saved me more time.

Comment 14 Perry Myers 2011-02-03 19:04:13 UTC
(In reply to comment #11)
> Jeremy, I'm not looking for support assistance.  Me not working with a TAM
> doesn't mean this isn't a bug.  I'm not looking for support, I'm looking to
> make Red Hat aware that there is a bug that needs to be addressed.

We have no problems fixing bugs, but for every bug that is filed from an external user, we do need to associate it with a support ticket and a valid subscription.  Please work with Jeremy and the Red Hat support team to provide your subscription information to them.

Future bugs filed without properly using the support process will be closed.

Comment 15 joshua 2011-02-03 19:13:04 UTC
Fabio:

Ok... now I understand what you are needing.  I've made the gfs2 filesystem many days and reboots (of all nodes) ago.

Here is more on the UUID bug:

I put UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C into the luci GUI, and cluster.conf shows this:

<clusterfs device="UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C" fsid="40050" fstype="gfs2" mountpoint="/data/cluster-storage/" name="CDTG GFS2 Filesystem"/>

and the logs show this:

rgmanager[10218]: start_filesystem: Could not match UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C with a real device

... even while *all* nodes have this:

ls -l /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c 
lrwxrwxrwx. 1 root root 10 Jan 31 14:41 /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c -> ../../dm-3

Does that show you what you wanted to know?

Comment 16 joshua 2011-02-03 19:20:51 UTC
Thanks Perry, though I think you are confusing Bugzilla with "support process".  I'm not asking or even wanting support, or someone to drive this fix into RHEL X.Y release, or an SLA, or a work-around, or a TAM, or a product manager... I'm looking to get the upstream code fixed.  Think "Bug", not "Support"... or "Bugzilla", not "Support Process".

I'm not looking for support... really, I'm not.  I'm simply trying to improve Linux and RHEL, as is the open source way.  You're welcome!

Comment 17 Fabio Massimo Di Nitto 2011-02-03 19:23:22 UTC
(In reply to comment #15)
> Fabio:
> 
> Ok... now I understand what you are needing.  I've made the gfs2 filesystem
> many days and reboots (of all nodes) ago.
> 
> Here is more on the UUID bug:
> 
> I put UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C into the luci GUI, and
> cluster.conf shows this:
> 
> <clusterfs device="UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C" fsid="40050"
> fstype="gfs2" mountpoint="/data/cluster-storage/" name="CDTG GFS2 Filesystem"/>
> 
> and the logs show this:
> 
> rgmanager[10218]: start_filesystem: Could not match
> UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C with a real device
> 
> ... even while *all* nodes have this:
> 
> ls -l /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c 
> lrwxrwxrwx. 1 root root 10 Jan 31 14:41
> /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c -> ../../dm-3
> 
> Does that show you what you wanted to know?

Yes thanks.

The problem is very simple. A bug in mkfs.gfs2 used to display all capital letters in the UUID (sorry I don't have the bz reference handy for that) and I assume you copy pasted that in UUID= in luci. That issue has been already addressed and UUID is now displayed correctly from mkfs.gfs2.

In reality that entry is case-sensitive. All UUID have to be lower case.

Use the UUID as you see it from by-uuid and it will work (but you will need to patch the fs-lib.sh in order for it work a bit longer than mount, same deal as LABEL=).

Comment 18 joshua 2011-02-03 19:35:14 UTC
Ok, that makes sense... though I was getting the UUID from the gfs2_tool command:

$ sudo gfs2_tool sb /dev/mapper/CDTG_GFS2_VG-CDTG_GFS2_LV all | grep uuid
  uuid = 28F14E5C-5E7C-BAE6-F863-F99A3A22130C

Should I/we open a bug against gfs2_tool as it is displaying the uuid in all caps as well?


I'll wait for the patch to make it into fs-lib.sh before using LABEL= or UUID= as well.  Thank you for looking at this!

Comment 19 Fabio Massimo Di Nitto 2011-02-03 19:46:49 UTC
(In reply to comment #18)
> Ok, that makes sense... though I was getting the UUID from the gfs2_tool
> command:
> 
> $ sudo gfs2_tool sb /dev/mapper/CDTG_GFS2_VG-CDTG_GFS2_LV all | grep uuid
>   uuid = 28F14E5C-5E7C-BAE6-F863-F99A3A22130C

same code path.. it's in a shared library.

> 
> Should I/we open a bug against gfs2_tool as it is displaying the uuid in all
> caps as well?

No, it's the same error, already fixed in newer versions of gfs2-utils. You basically need to upgrade your system.

> 
> 
> I'll wait for the patch to make it into fs-lib.sh before using LABEL= or UUID=
> as well.  Thank you for looking at this!

No problem.

Comment 20 Fabio Massimo Di Nitto 2011-02-04 08:05:03 UTC
Moving back to POST, patch is available upstream.

Comment 23 Nate Straz 2011-04-08 14:16:42 UTC
Verified that LABEL= and UUID= are working as expected.  gfs2_tool however is still printing the UUID in all caps, but that's another bug.

Comment 24 errata-xmlrpc 2011-05-19 14:21:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0744.html


Note You need to log in before you can comment on or make changes to this bug.