Hide Forgot
Description of problem: When referencing a GFS2 filesystem with LABEL=CDTG-HA-Cluster:CDTG_GFS2_LV as a global cluster resource (see screenshot), the service starts and runs successfully... however, when the first service check happens, it says: rgmanager[625]: clusterfs:CDTG GFS2 Filesystem: LABEL=CDTG-HA-Cluster:CDTG_GFS2_LV is not mounted on /data/cluster-storage/ Jan 14 17:17:44 cdtg-rtp-sun-1 rgmanager[6559]: status on clusterfs "CDTG GFS2 Filesystem" returned 7 (unspecified) So... it can mount the unmounted GFS2 fs with a LABEL= reference, but then can't tell that that filesystem is still mounted a few seconds later when it checks the resource :-( Version-Release number of selected component (if applicable): rgmanager-3.0.12-10.el6.x86_64
Created attachment 473606 [details] screenshot of specifying GFS2 fs via LABEL=
On a side note, using the UUID# or UUID=UUID# doesn't work at all. The documentation doesn't reference UUID or LABEL usage... but the luci GUI does... is this just a bug in the GUI making me believe that either UUIDs or LABELs should work?
Please provide your cluster.conf and a tar of /var/log/cluster/, /dev/disk /var/log/messages (or an sosreport alternatively).
http://git.fedorahosted.org/git/?p=resource-agents.git;a=commitdiff;h=a23793faa3c7718177cbbda77363d5fd3765415d this commit fix the LABEL= monitoring check.
We will need to cross check the cluster.conf generated from Luci when using UUID= because manual testing shows that it works as expected. <clusterfs device="UUID=d92d449a-6c04-c820-9f2a-5ad81ca774c5" fsid="30538" fstype="gfs2" mountpoint="/mnt" name="gfs2-test"/> and verify that the reported has /dev/disk/by-uuid on all nodes.
We are working on things that are too sensitive to give you logs for... however, here is a cleaned version of cluster.conf
Created attachment 476819 [details] sanitized cluster.conf
(In reply to comment #7) > We are working on things that are too sensitive to give you logs for... then please consider signing up for official support contract, that includes NDA and secrecy of data exchanged between Red Hat and customers. In future, consider that no logs, means that there might be no way to fix an issue you report. At the very end you are running software I write, as root...... ;) > however, here is a cleaned version of cluster.conf I need to see the cluster.conf generated by luci for the UUID case and I need to know what you have in /dev/disk/by-uuid from all nodes to make sure the requested UUID from cluster.conf is visible everywhere. The cluster.conf you attached, point straight to the device.
Hi Joshua, For support assistance regarding bugs or potential bugs, please make sure you're working through Red Hat support. For information on how to access support, please review the policies and instructions here: www.redhat.com/support If you're not currently a customer, we would encourage you to become one so that we can ensure that issues such as reported in this bug are resolved and prioritized appropriately. For now, I'm closing this bug ... since bugzilla is not a valid support tool. We can certainly reopen this and are happy to help resolve the problem via the appropriate support channels. Thanks Jeremy West Red Hat Support
Jeremy, I'm not looking for support assistance. Me not working with a TAM doesn't mean this isn't a bug. I'm not looking for support, I'm looking to make Red Hat aware that there is a bug that needs to be addressed. This *is* a bug, as Fabio clearly indicates with his committing a patch.
Fabio, I don't have a cluster.conf file with UUID=# in it, because that didn't work, as my bugzilla entry stated. LABEL=blah:blah did work, but wasn't detected as mounted when the status check happens, again, as this bug states. I do have the UUID /dev/ device in /dev/disk/by-uuid/ on all nodes. This isn't hard to reproduce... it seems like you did in fact find the LABEL= problem and commit a fix... so it may be that that will be the resolution when it makes it to RHEL. Thank you!
(In reply to comment #12) > Fabio, I don't have a cluster.conf file with UUID=# in it, because that didn't > work, as my bugzilla entry stated. Yes I understand that, but I cannot reproduce the UUID issue here. So I need to see if it's your version of luci generating the wrong config or at the time of testing the by-uuid entries were missing. There is a window in which not all nodes have them because the way udev works. > LABEL=blah:blah did work, but wasn't > detected as mounted when the status check happens, again, as this bug states. Yes and I was able to reproduce that after some mingling around and that's why there is a fix now. > I > do have the UUID /dev/ device in /dev/disk/by-uuid/ on all nodes. See above.. one simple question based on how udev works: mkfs.gfs2 on nodeX -> you get the by-uuid entry on nodeX but not all the other nodes (udev is not cluster aware or understands shared storage) 1) did you test UUID= at this stage? 2) did you reboot the nodes in between mkfs and testing? (this would have caused the device to be re-scanned and populate by-uuid on all nodes) > > This isn't hard to reproduce... it seems like you did in fact find the LABEL= > problem and commit a fix... so it may be that that will be the resolution when > it makes it to RHEL. It took me sometime as it doesn't appear in all LABEL= conditions. Having had those info at the beginning could have saved me more time.
(In reply to comment #11) > Jeremy, I'm not looking for support assistance. Me not working with a TAM > doesn't mean this isn't a bug. I'm not looking for support, I'm looking to > make Red Hat aware that there is a bug that needs to be addressed. We have no problems fixing bugs, but for every bug that is filed from an external user, we do need to associate it with a support ticket and a valid subscription. Please work with Jeremy and the Red Hat support team to provide your subscription information to them. Future bugs filed without properly using the support process will be closed.
Fabio: Ok... now I understand what you are needing. I've made the gfs2 filesystem many days and reboots (of all nodes) ago. Here is more on the UUID bug: I put UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C into the luci GUI, and cluster.conf shows this: <clusterfs device="UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C" fsid="40050" fstype="gfs2" mountpoint="/data/cluster-storage/" name="CDTG GFS2 Filesystem"/> and the logs show this: rgmanager[10218]: start_filesystem: Could not match UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C with a real device ... even while *all* nodes have this: ls -l /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c lrwxrwxrwx. 1 root root 10 Jan 31 14:41 /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c -> ../../dm-3 Does that show you what you wanted to know?
Thanks Perry, though I think you are confusing Bugzilla with "support process". I'm not asking or even wanting support, or someone to drive this fix into RHEL X.Y release, or an SLA, or a work-around, or a TAM, or a product manager... I'm looking to get the upstream code fixed. Think "Bug", not "Support"... or "Bugzilla", not "Support Process". I'm not looking for support... really, I'm not. I'm simply trying to improve Linux and RHEL, as is the open source way. You're welcome!
(In reply to comment #15) > Fabio: > > Ok... now I understand what you are needing. I've made the gfs2 filesystem > many days and reboots (of all nodes) ago. > > Here is more on the UUID bug: > > I put UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C into the luci GUI, and > cluster.conf shows this: > > <clusterfs device="UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C" fsid="40050" > fstype="gfs2" mountpoint="/data/cluster-storage/" name="CDTG GFS2 Filesystem"/> > > and the logs show this: > > rgmanager[10218]: start_filesystem: Could not match > UUID=28F14E5C-5E7C-BAE6-F863-F99A3A22130C with a real device > > ... even while *all* nodes have this: > > ls -l /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c > lrwxrwxrwx. 1 root root 10 Jan 31 14:41 > /dev/disk/by-uuid/28f14e5c-5e7c-bae6-f863-f99a3a22130c -> ../../dm-3 > > Does that show you what you wanted to know? Yes thanks. The problem is very simple. A bug in mkfs.gfs2 used to display all capital letters in the UUID (sorry I don't have the bz reference handy for that) and I assume you copy pasted that in UUID= in luci. That issue has been already addressed and UUID is now displayed correctly from mkfs.gfs2. In reality that entry is case-sensitive. All UUID have to be lower case. Use the UUID as you see it from by-uuid and it will work (but you will need to patch the fs-lib.sh in order for it work a bit longer than mount, same deal as LABEL=).
Ok, that makes sense... though I was getting the UUID from the gfs2_tool command: $ sudo gfs2_tool sb /dev/mapper/CDTG_GFS2_VG-CDTG_GFS2_LV all | grep uuid uuid = 28F14E5C-5E7C-BAE6-F863-F99A3A22130C Should I/we open a bug against gfs2_tool as it is displaying the uuid in all caps as well? I'll wait for the patch to make it into fs-lib.sh before using LABEL= or UUID= as well. Thank you for looking at this!
(In reply to comment #18) > Ok, that makes sense... though I was getting the UUID from the gfs2_tool > command: > > $ sudo gfs2_tool sb /dev/mapper/CDTG_GFS2_VG-CDTG_GFS2_LV all | grep uuid > uuid = 28F14E5C-5E7C-BAE6-F863-F99A3A22130C same code path.. it's in a shared library. > > Should I/we open a bug against gfs2_tool as it is displaying the uuid in all > caps as well? No, it's the same error, already fixed in newer versions of gfs2-utils. You basically need to upgrade your system. > > > I'll wait for the patch to make it into fs-lib.sh before using LABEL= or UUID= > as well. Thank you for looking at this! No problem.
Moving back to POST, patch is available upstream.
Verified that LABEL= and UUID= are working as expected. gfs2_tool however is still printing the UUID in all caps, but that's another bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0744.html