| Summary: | Instance HA - invalid nova host name, myst be <short> for instance recovery to function | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Andreas Karis <akaris> |
| Component: | resource-agents | Assignee: | Andrew Beekhof <abeekhof> |
| Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.2 | CC: | agk, akaris, berrange, cluster-maint, dasmith, dhill, ebeaudoi, eglynn, fdinitto, kchamart, rbryant, sbauza, sferdjao, sgordon, sputhenp, srevivo, svanders, vromanso |
| Target Milestone: | pre-dev-freeze | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-05 09:34:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
~~ # pcs resource show nova-compute-checkevacuate Resource: nova-compute-checkevacuate (class=ocf provider=openstack type=nova-compute-wait) Attributes: auth_url=https://xyz.domainname.com:5000/v2.0 username=admin password=XXXXXXXXXXX tenant_name=admin domain=abc.domainname.com Operations: stop interval=0s timeout=300 (nova-compute-checkevacuate-stop-interval-0s) monitor interval=10 timeout=20 (nova-compute-checkevacuate-monitor-interval-10) start interval=0s timeout=300 (nova-compute-checkevacuate-start-interval-0s) ~~~ so domain is correctly configured in the above. Compare that to the description of the resource ~~~ [root@overcloud-controller-0 lib]# pcs resource describe ocf:openstack:nova-compute-wait ocf:openstack:nova-compute-wait - OpenStack Nova Compute Server OpenStack Nova Compute Server. Resource options: auth_url (required): Authorization URL for connecting to keystone in admin context username (required): Username for connecting to keystone in admin context password (required): Password for connecting to keystone in admin context tenant_name (required): Tenant name for connecting to keystone in admin context. Note that with Keystone V3 tenant names are only unique within a domain. domain: DNS domain in which hosts live, useful when the cluster uses short names and nova uses FQDN endpoint_type: Nova API location (internal, public or admin URL) no_shared_storage: Disable shared storage recovery for instances. Use at your own risk! evacuation_delay: How long to wait for nova to finish evacuating instances elsewhere before starting nova-compute. Only used when the agent detects evacuations might be in progress. You may need to increase the start timeout when increasing this value. [root@overcloud-controller-0 lib]# ~~~ Compare that to the code ~~~ # we take a chance here and hope that host is either not configured # or configured in nova.conf NOVA_HOST=$(openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null) if [ $? = 1 ]; then (... we don't care, this won't be executed ...) fi # We only need to check a configured value, calculated ones are fine openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null if [ $? = 0 ]; then if [ "x${OCF_RESKEY_domain}" != x ]; then short_host=$(uname -n | awk -F. '{print $1}') if [ "x$NOVA_HOST" != "x${short_host}" ]; then ocf_exit_reason "Invalid Nova host name, must be ${short_host} in order for instance recovery to function" rc=$OCF_ERR_CONFIGURED fi elif [ "x$NOVA_HOST" != "x$(uname -n)" ]; then ocf_exit_reason "Invalid Nova host name, must be $(uname -n) in order for instance recovery to function" rc=$OCF_ERR_CONFIGURED fi fi ~~~ First of all, the above is a bit ugly. What about if / else? ~~~ NOVA_HOST=$(crudini --get /etc/nova/nova.conf DEFAULT host 2>/dev/null) if [ $? = 1 ]; then (...) fi # We only need to check a configured value, calculated ones are fine crudini --get /etc/nova/nova.conf DEFAULT host 2>/dev/null if [ $? = 0 ]; then (...) ~~~ Could simply be written as ~~~ NOVA_HOST=$(crudini --get /etc/nova/nova.conf DEFAULT host 2>/dev/null) if [ $? = 1 ]; then (...) else (...) ~~~ Which means that the code would be way more readable. And then, the following verification isn't very logical? if "x<domainname> != x" then we execute the following? this can't be right! we want to compare our NOVA_HOST against the full uname -n, no? Because domain name is set, so we _know_ that NOVA_HOST will contain an FQDN) ~~~ if [ "x${OCF_RESKEY_domain}" != x ]; then short_host=$(uname -n | awk -F. '{print $1}') if [ "x$NOVA_HOST" != "x${short_host}" ]; then ~~~ We know that we use a domain name, because we configured it, and we know that we are using the value in NOVA_HOST, which hence likely will contain the same domain name. So this verification is either too much or needs to be modified? If we want to keep everything "short", then let's strip the domain name from NOVA_HOST with sed kind of like this: elif [ `echo "x$NOVA_HOST" | sed -e "s/\.${OCF_RESKEY_domain}$//"` != "x$(uname -n)" ]; then ~~~ I'll try to look into this and provide a patch. @akaris: Is this still an issue, and is it in fact nova related? Because the Customer Portal ticket has been closed. There are more customers adopting instance HA and it's broken in osp8 and above due to this bug. Releasing a fix for this should be taken on high priority. This is in /usr/lib/ocf/resource.d/openstack/nova-compute-wait *** This bug has been marked as a duplicate of bug 1380314 *** |
Description of problem: The nova-compute-wait can not start with the option domain=localdomain with "Invalid Nova host name, must be XXXX in order for instance recovery to function" Version-Release number of selected component (if applicable): OSP 8.0 How reproducible: In an environment where nova.conf records the FQDN as parameter `host=`, which leads NOVA_HOST != uname -n | awk -F. '{print $1}', causing the nova-compute-wait failed. Steps to Reproduce: 1. Configure `host=` in nova.conf with the FQDN 2. configure instance HA according to https://access.redhat.com/articles/1544823#comment-1045511 3. (see private comment by Chen Chen from May 7th) Actual results: Expected results: Additional info: ~~~ NOVA_HOST=$(openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null) if [ "x${OCF_RESKEY_domain}" != x ]; then short_host=$(uname -n | awk -F. '{print $1}') if [ "x$NOVA_HOST" != "x${short_host}" ]; then ocf_exit_reason "Invalid Nova host name, must be ${short_host} in order for instance recovery to function" rc=$OCF_ERR_CONFIGURED fi ~~~