Description of problem: Continuous nfs_grace_monitor log messages observed in /var/log/messages Version-Release number of selected component (if applicable): 3.7.9-1 How reproducible: Always Steps to Reproduce: 1.Install the latest 3.1.3 build 2.Setup nfs-ganesha on the cluster. 3.Observe that continuous log messages related to nfs_grace_monitor are getting generated (every 5 second)in /var/log/messages. Actual results: Continuous nfs_grace_monitor log messages observed in /var/log/messages Expected results: Should not fill up /var/log/messages with consistent logs. Additional info: Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -Q,--query is not yet implemented, use -D to delete existing values ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ attrd_updater - Tool for updating cluster node attributes ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ Usage: attrd_updater command -n attribute [options] ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ Options: ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -?, --help#011#011This text ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -$, --version#011#011Version information ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -V, --verbose#011#011Increase debug output ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -q, --quiet#011#011Print only the value on stdout ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -n, --name=value#011The attribute's name ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ Commands: ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -U, --update=value#011Update the attribute's value in attrd. If this causes the value to change, it will also be updated in the cluster configuration ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -Q, --query#011#011Query the attribute's value from attrd ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -D, --delete#011#011Delete the attribute in attrd. If a value was previously set, it will also be removed from the cluster configuration ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -R, --refresh#011#011(Advanced) Force the attrd daemon to resend all current values to the CIB ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ Additional options: ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -l, --lifetime=value#011Lifetime of the node attribute. Allowed values: forever, reboot ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -d, --delay=value#011The time to wait (dampening) in seconds further changes occur ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -s, --set=value#011(Advanced) The attribute set in which to place the value ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -N, --node=value#011Set the attribute for the named node (instead of the local one) ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ ] Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ Report bugs to pacemaker.org ]
I guess this is on RHEL6 as I'm not seeing this on RHEL7.
I have RHEL 7 configured RHGS nodes and the issue is seen on those machines. Even now the messages are continuous and filling the log file. Snippets below: [root@dhcp46-247 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@dhcp46-247 ~]# date Wed Mar 30 20:43:23 IST 2016 [root@dhcp46-247 ~]# tailf /var/log/messages Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ -R, --refresh#011#011(Advanced) Force the attrd daemon to resend all current values to the CIB ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ Additional options: ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ -l, --lifetime=value#011Lifetime of the node attribute. Allowed values: forever, reboot ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ -d, --delay=value#011The time to wait (dampening) in seconds further changes occur ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ -s, --set=value#011(Advanced) The attribute set in which to place the value ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ -N, --node=value#011Set the attribute for the named node (instead of the local one) ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ ] Mar 30 20:43:26 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19388:stderr [ Report bugs to pacemaker.org ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -Q,--query is not yet implemented, use -D to delete existing values ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ attrd_updater - Tool for updating cluster node attributes ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ Usage: attrd_updater command -n attribute [options] ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ Options: ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -?, --help#011#011This text ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -$, --version#011#011Version information ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -V, --verbose#011#011Increase debug output ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -q, --quiet#011#011Print only the value on stdout ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -n, --name=value#011The attribute's name ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ Commands: ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -U, --update=value#011Update the attribute's value in attrd. If this causes the value to change, it will also be updated in the cluster configuration ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -Q, --query#011#011Query the attribute's value from attrd ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -D, --delete#011#011Delete the attribute in attrd. If a value was previously set, it will also be removed from the cluster configuration ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -R, --refresh#011#011(Advanced) Force the attrd daemon to resend all current values to the CIB ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ Additional options: ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -l, --lifetime=value#011Lifetime of the node attribute. Allowed values: forever, reboot ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -d, --delay=value#011The time to wait (dampening) in seconds further changes occur ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -s, --set=value#011(Advanced) The attribute set in which to place the value ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ -N, --node=value#011Set the attribute for the named node (instead of the local one) ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ ] Mar 30 20:43:31 dhcp46-247 lrmd[4102]: notice: operation_finished: nfs-grace_monitor_5000:19449:stderr [ Report bugs to pacemaker.org ]
Okay, I'm stumped. I've installed fresh RHEL7.2 machines. These are genuine RHEL, and subscribed to all updates. I have installed rhpkg/brew built glusterfs-3.7.9-1.el7rhgs.x86_64, including glusterfs-ganesha-3.7.9-1.el7rhgs.x86_64 from download.eng.bos.redhat.com/brewroot/packages/ Using the default pcs, pacemaker, corosync (pcs-0.9.143-15.el7.x86_64, pacemaker-1.1.13-10.el7_2.2.x86_64, corosync-2.3.4-7.el7_2.1.x86_64) that are pulled in as Required: dependencies from the RHEL repos. I've configured and set up a four node cluster, but I don't see any of those messages in /var/log/messages. Please provide the FQDN or IP address of your hosts so that I can examine them. Thanks
Kaleb: This is pacemaker logging the output from the resource agent. It seems to be logging the usage information to stderr for some reason. Solution would be to either change the agent or figure out what the agent is unhappy about and change the configuration accordingly.
Kaleb, Specifically, attrd_updater is printing a usage error when called by the nfs-grace resource's monitor action. The first message is the most useful: Mar 28 15:29:14 dhcp46-247 lrmd[29082]: notice: operation_finished: nfs-grace_monitor_5000:4735:stderr [ -Q,--query is not yet implemented, use -D to delete existing values ] The -Q option to attrd_updater was not added until RHEL 7.2. If I remember correctly, the agent is using attrd_updater instead of crm_attribute so it can use private attributes (-p/--private). Those, too, were added in 7.2. Unfortunately, backporting those features to RHEL 6 is not a possibility because they can not work with CMAN. The resource agent will need to be modified to work under RHEL 6. The same message won't occur with the 7.2 packages; the reporter apparently is using the 7.1 packages (1.1.12-22.el7_1.2) on 7.2 hosts, and an update from the standard repositories will make the problem go away there.
With the latest glusterfs-3.7.9-3 build and with below pacemaker, pcs and corosync packages pulled as dependencies: [root@dhcp37-180 ~]# rpm -qa|grep pacemaker pacemaker-1.1.13-10.el7_2.2.x86_64 pacemaker-libs-1.1.13-10.el7_2.2.x86_64 pacemaker-cluster-libs-1.1.13-10.el7_2.2.x86_64 pacemaker-cli-1.1.13-10.el7_2.2.x86_64 [root@dhcp37-180 ~]# rpm -qa|grep pcs pcs-0.9.143-15.el7.x86_64 [root@dhcp37-127 ~]# rpm -qa|grep corosync corosynclib-2.3.4-7.el7_2.1.x86_64 corosync-2.3.4-7.el7_2.1.x86_64 i still can see below messages getting generated continuously on all the nodes: May 3 06:38:22 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:26959:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:27 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27011:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:32 dhcp37-180 ganesha_mon(nfs-mon)[27018]: INFO: warning: crm_attribute --node=dhcp37-180 --lifetime=forever --name=grace-active --update=1 failed May 3 06:38:32 dhcp37-180 lrmd[20097]: notice: nfs-mon_monitor_10000:27018:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:33 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27034:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:38 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27068:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:42 dhcp37-180 ganesha_mon(nfs-mon)[27120]: INFO: warning: crm_attribute --node=dhcp37-180 --lifetime=forever --name=grace-active --update=1 failed May 3 06:38:42 dhcp37-180 lrmd[20097]: notice: nfs-mon_monitor_10000:27120:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:43 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27136:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:48 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27144:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:52 dhcp37-180 ganesha_mon(nfs-mon)[27151]: INFO: warning: crm_attribute --node=dhcp37-180 --lifetime=forever --name=grace-active --update=1 failed May 3 06:38:52 dhcp37-180 lrmd[20097]: notice: nfs-mon_monitor_10000:27151:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:53 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27167:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:38:58 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27219:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:02 dhcp37-180 ganesha_mon(nfs-mon)[27226]: INFO: warning: crm_attribute --node=dhcp37-180 --lifetime=forever --name=grace-active --update=1 failed May 3 06:39:02 dhcp37-180 lrmd[20097]: notice: nfs-mon_monitor_10000:27226:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:03 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27242:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:08 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27249:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:12 dhcp37-180 ganesha_mon(nfs-mon)[27301]: INFO: warning: crm_attribute --node=dhcp37-180 --lifetime=forever --name=grace-active --update=1 failed May 3 06:39:12 dhcp37-180 lrmd[20097]: notice: nfs-mon_monitor_10000:27301:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:13 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27317:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:18 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27324:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:23 dhcp37-180 ganesha_mon(nfs-mon)[27331]: INFO: warning: crm_attribute --node=dhcp37-180 --lifetime=forever --name=grace-active --update=1 failed May 3 06:39:23 dhcp37-180 lrmd[20097]: notice: nfs-mon_monitor_10000:27331:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:23 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27347:stderr [ Could not map name=dhcp37-180 to a UUID ] May 3 06:39:25 dhcp37-180 systemd: Started Session 13 of user root. May 3 06:39:25 dhcp37-180 systemd-logind: New session 13 of user root. May 3 06:39:25 dhcp37-180 systemd: Starting Session 13 of user root. May 3 06:39:28 dhcp37-180 lrmd[20097]: notice: nfs-grace_monitor_5000:27416:stderr [ Could not map name=dhcp37-180 to a UUID ] Is this expected behavior?
Technically, '-' (a.k.a. variously as dash, hyphen, minus sign) is not a valid character for hostnames per RFC 952 (https://tools.ietf.org/html/rfc952). Legal Yes, DNS allows such illegal names and many things work with them. A quick google search shows this message coming up for other people who have similarly illegal hostnames, e.g. with - and : in them. These messages went away when they changed to strictly legal hostnames. Thus it seems that pacemaker is one of those things that don't work with '-' in a hostname. Perhaps a quick (I hope it can be quick) test would be to set up a similar cluster. Since we're stuck with Red Hat's broken DHCP names, populate /etc/hosts with corresponding aliases. E.g.: 10.70.37.180 dhcp37x180 dhcp37x180.lab.eng.blr.redhat.com 10.70.37.YYY dhcp37xYYY dhcp37xYYY.lab.eng.blr.redhat.com ... Use the dhcp37xYYY names in the setup. (And FWIW, none of the hostnames I use in development have - in the name, and I don't see these messages.) And I'll check with the pacemaker team to confirm.
Correction, '-' is legal (not enough coffee in my system.) Ken Gaillot writes: The message would imply that the node doesn't have an entry in the CIB's nodes section. Were those same names used with "pcs cluster setup", and do they show up in "pcs cluster status"? Your crm_attribute command looks correct, so I'm not sure why it's having a problem. Also, bear in mind that DHCP breaks corosync whenever the lease renews. QA often doesn't run into that problem because they use long lease times and short cluster lifetimes, but it is a time bomb.
Ken Gaillot writes further: >> Also, bear in mind that DHCP breaks corosync whenever the lease renews. >> QA often doesn't run into that problem because they use long lease times >> and short cluster lifetimes, but it is a time bomb. > > So what should we tell our customers? Is there documentation about > this? Best Practices? Yes: https://access.redhat.com/solutions/1529133 Some other how-to's mention it, e.g. some openstack-specific stuff and a how-to on using a specific network interface for cluster traffic. Surprisingly, it's not mentioned in the High Availability Add-On administration/reference guides; I might open a doc bz for that.
As discussed in scrum, since these messages are kind of expected in a dhcp based clustered environment and will not be seen if we have statically assigned addresses and if we have proper hostnames, Hence marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240