Bug 1320783

Summary: nova-compute-wait complains about Invalid Nova host name.
Product: Red Hat Enterprise Linux 7 Reporter: Marian Krcmarik <mkrcmari>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: agk, cluster-maint, fdinitto, michele, oblaut, royoung, snagar, ushkalim
Target Milestone: rcKeywords: ZStream
Target Release: ---Flags: royoung: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-71.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1334162 1380314 (view as bug list) Environment:
Last Closed: 2016-11-04 00:02:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1334162, 1380314    

Description Marian Krcmarik 2016-03-24 01:44:29 UTC
Description of problem:
nova-compute-wait complains about invalid nova host name when domain resource attribute is specified, hostname of node is FQDN and nova host is specified as FQDN.
There is a check in resource agent code:
   NOVA_HOST=$(openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null)
   if [ $? = 1 ]; then
       if [ "x${OCF_RESKEY_domain}" != x ]; then
	   NOVA_HOST=$(uname -n | awk -F. '{print $1}')
       else
	   NOVA_HOST=$(uname -n)
       fi
   fi

    # We only need to check a configured value, calculated ones are fine
   if [ $? = 0 ]; then
	if [ "x${OCF_RESKEY_domain}" != x ]; then
	    short_host=$(uname -n | awk -F. '{print $1}')
	    if [ "x$NOVA_HOST" != "x${short_host}" ]; then
		ocf_exit_reason "Invalid Nova host name, must be ${short_host} in order for instance recovery to function"
		rc=$OCF_ERR_CONFIGURED
	    fi

	elif [ "x$NOVA_HOST" != "x$(uname -n)" ]; then
            ocf_exit_reason "Invalid Nova host name, must be $(uname -n) in order for instance recovery to function"
	    rc=$OCF_ERR_CONFIGURED
	fi
    fi

This seems to cause following:
1. If Nova host in nova.conf is specified as FQDN, hostname of the node is specified as FQDN and domain attribute of resource agent is specified then nova-compute-wait refuses to start and complains about invalid nova host name.
- If I do not specify domain attribute it would work but I think resource agent should handle that situation or it should be exactly written in the description of resource attribute that If setup uses FQDN then domain attribute should not be specified.
2. The check actually checks the opposite situation from resource description, description of domain attribute says: "domain: DNS domain in which hosts live, useful when the cluster uses short names and nova uses FQDN" but in the reality It checks that when nova does not use FQDN it equals node hostname without domain. If nova host is specified in nova.conf as FQDN and hostname of node is not FQDN, It would fail to start.

I do not really know what the reason of the check is, so I may be wrong somewhere, Initially I understood from the description that host attribute from nova.conf is being compared with the name of host from pacemaker point of view. But from the code it compares it with node hostname.

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-54.el7_2.9

How reproducible:
Always

Comment 2 Andrew Beekhof 2016-03-24 05:29:35 UTC
Does 'pcs status' show FQDN?
If so this would be expected but I can appreciate that it is suboptimal.


The ideal solution is bug #1289410 which would allow to avoid any heuristics and just use the value nova is using.

Until then, we can look at improving the agent.

Comment 3 Marian Krcmarik 2016-03-24 09:29:51 UTC
(In reply to Andrew Beekhof from comment #2)
> Does 'pcs status' show FQDN?
> If so this would be expected but I can appreciate that it is suboptimal.

No, cluster node names are not FQDN in "pcs status" output, but the nodes are defined as FQDN in nova.conf and node hostnames are FQDN as well.

Comment 4 Andrew Beekhof 2016-03-24 09:34:52 UTC
Thats exceedingly strange then.

I will investigate (there will be some delay due to Easter/PTO)

Comment 5 Andrew Beekhof 2016-04-22 04:56:03 UTC
These are the two patches currently being tested for this:

  https://github.com/beekhof/fence-agents/commit/564b70d
  https://github.com/beekhof/openstack-resource-agents/commit/6a42076e

Comment 6 Andrew Beekhof 2016-05-09 04:43:47 UTC
Oyvind: Can we get a build for the resource-agents piece please?

Will clone for the fence-agents

Comment 7 Oyvind Albrigtsen 2016-05-13 10:35:33 UTC
Patched from upstream and updated metadata longdesc/shortdesc to "Deprecated - do not use anymore." for the parameters that arent in use anymore.

Comment 9 Leonid Natapov 2016-05-25 14:47:53 UTC
Did code verification. /usr/lib/ocf/resource.d/openstack/nova-compute-wait doesn't include the problematic code  mentioned in the  
resource-agents-3.9.5-71.el7

Comment 10 Oyvind Albrigtsen 2016-09-09 07:52:21 UTC
*** Bug 1374327 has been marked as a duplicate of this bug. ***

Comment 12 Rob Young 2016-09-09 14:15:27 UTC
I agree on the need for the backport. We are straddling RHEL 7.2 and 7.3 in our OSP10 testing efforts and because of the tight delivery timeframe we need to remove all blockers (potential or otherwise) to our progress. This change is low-risk and the business value is that we can move ahead with our OSP 10 test plan on 7.2 if needed, regardless of any RHEL 7.3 delays.

Comment 13 Fabio Massimo Di Nitto 2016-09-11 12:20:16 UTC
*** Bug 1374980 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2016-11-04 00:02:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html