Hide Forgot
Description of problem: nova-compute-wait complains about invalid nova host name when domain resource attribute is specified, hostname of node is FQDN and nova host is specified as FQDN. There is a check in resource agent code: NOVA_HOST=$(openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null) if [ $? = 1 ]; then if [ "x${OCF_RESKEY_domain}" != x ]; then NOVA_HOST=$(uname -n | awk -F. '{print $1}') else NOVA_HOST=$(uname -n) fi fi # We only need to check a configured value, calculated ones are fine if [ $? = 0 ]; then if [ "x${OCF_RESKEY_domain}" != x ]; then short_host=$(uname -n | awk -F. '{print $1}') if [ "x$NOVA_HOST" != "x${short_host}" ]; then ocf_exit_reason "Invalid Nova host name, must be ${short_host} in order for instance recovery to function" rc=$OCF_ERR_CONFIGURED fi elif [ "x$NOVA_HOST" != "x$(uname -n)" ]; then ocf_exit_reason "Invalid Nova host name, must be $(uname -n) in order for instance recovery to function" rc=$OCF_ERR_CONFIGURED fi fi This seems to cause following: 1. If Nova host in nova.conf is specified as FQDN, hostname of the node is specified as FQDN and domain attribute of resource agent is specified then nova-compute-wait refuses to start and complains about invalid nova host name. - If I do not specify domain attribute it would work but I think resource agent should handle that situation or it should be exactly written in the description of resource attribute that If setup uses FQDN then domain attribute should not be specified. 2. The check actually checks the opposite situation from resource description, description of domain attribute says: "domain: DNS domain in which hosts live, useful when the cluster uses short names and nova uses FQDN" but in the reality It checks that when nova does not use FQDN it equals node hostname without domain. If nova host is specified in nova.conf as FQDN and hostname of node is not FQDN, It would fail to start. I do not really know what the reason of the check is, so I may be wrong somewhere, Initially I understood from the description that host attribute from nova.conf is being compared with the name of host from pacemaker point of view. But from the code it compares it with node hostname. Version-Release number of selected component (if applicable): resource-agents-3.9.5-54.el7_2.9 How reproducible: Always
Does 'pcs status' show FQDN? If so this would be expected but I can appreciate that it is suboptimal. The ideal solution is bug #1289410 which would allow to avoid any heuristics and just use the value nova is using. Until then, we can look at improving the agent.
(In reply to Andrew Beekhof from comment #2) > Does 'pcs status' show FQDN? > If so this would be expected but I can appreciate that it is suboptimal. No, cluster node names are not FQDN in "pcs status" output, but the nodes are defined as FQDN in nova.conf and node hostnames are FQDN as well.
Thats exceedingly strange then. I will investigate (there will be some delay due to Easter/PTO)
These are the two patches currently being tested for this: https://github.com/beekhof/fence-agents/commit/564b70d https://github.com/beekhof/openstack-resource-agents/commit/6a42076e
Oyvind: Can we get a build for the resource-agents piece please? Will clone for the fence-agents
Patched from upstream and updated metadata longdesc/shortdesc to "Deprecated - do not use anymore." for the parameters that arent in use anymore.
Did code verification. /usr/lib/ocf/resource.d/openstack/nova-compute-wait doesn't include the problematic code mentioned in the resource-agents-3.9.5-71.el7
*** Bug 1374327 has been marked as a duplicate of this bug. ***
I agree on the need for the backport. We are straddling RHEL 7.2 and 7.3 in our OSP10 testing efforts and because of the tight delivery timeframe we need to remove all blockers (potential or otherwise) to our progress. This change is low-risk and the business value is that we can move ahead with our OSP 10 test plan on 7.2 if needed, regardless of any RHEL 7.3 delays.
*** Bug 1374980 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2174.html