Bug 1366270 - hosted-engine-setup (and cockpit) accepts host address with an underscore while the engine correctly refuses them
Summary: hosted-engine-setup (and cockpit) accepts host address with an underscore whi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 3.6.6
Hardware: All
OS: Linux
medium
high
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard: integration
Depends On:
Blocks: 1380629
TreeView+ depends on / blocked
 
Reported: 2016-08-11 12:51 UTC by Sachin Raje
Modified: 2019-12-16 06:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, hosted-engine-setup and cockpit incorrectly accepted host addresses containing underscores. The Manager correctly refused the addresses. This meant that hosted-engine-setup would fail while trying to add the host to the Manager and the user had to cleanup and restart the installation. Now, the host address syntax is validated in hosted-engine-setup and hosted-engine-setup will refuse to deploy is the address syntax is invalid.
Clone Of:
Environment:
Last Closed: 2017-04-25 00:43:07 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1002 0 normal SHIPPED_LIVE ovirt-hosted-engine-setup bug fix and enhancement update 2017-04-18 20:14:37 UTC
oVirt gerrit 64472 0 None MERGED network: validating hostname syntax also on the first host 2020-06-26 01:39:51 UTC
oVirt gerrit 64475 0 None MERGED hostname: Fix hostname syntax validation 2020-06-26 01:39:51 UTC

Description Sachin Raje 2016-08-11 12:51:48 UTC
Description of problem:

1. After new hosted-engine deployment if hypervisor gets rebooted due to power outage or some other reasons, the vm.conf does not get import into OVF store.

2. This cause complete 're-deployement' of hosted-engine setup as there is no way to get 'vm.conf' to start HEVM after host reboot.

3. As per RHEV3.6 HE setup, the vm.conf automatically gets imported only when there is at least 1 'master storage domain' added and the 'datacenter' is in "UP" status which is not possible in above mentioned scenario.


Version-Release number of selected component (if applicable):
RHEV-3.6 


How reproducible: Always


Steps to Reproduce:
1. Deploy HE setup
2. Reboot the host before importing the vm.conf to OVF storage.


Actual results: HEVM failed to start after host reboot with following errors :

# hosted-engine --vm-start
Unable to read vm.conf, please check ovirt-ha-agent logs

agent.log :

MainThread::WARNING::2016-08-11 12:13:55,616::ovf_store::104::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE
MainThread::ERROR::2016-08-11 12:13:55,617::config::235::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
MainThread::ERROR::2016-08-11 12:13:55,652::heconflib::111::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(validateConfImage) 'version' is not stored in the HE configuration image
MainThread::ERROR::2016-08-11 12:13:55,666::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: ''Configuration value not found: file=/var/run/ovirt-hosted-engine-ha/vm.conf, key=memSize'' - trying to restart agent


# systemctl status ovirt-ha-agent

Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: INFO:ovirt_hosted_engine_ha.lib.upgrade.StorageServer:Host configuration is already up-to-date
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configurat... OVF_STORE
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OV...al vm.conf
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, fallin...al vm.conf
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR 'version' is not stored in th...tion image
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:'version' is not stored in the HE configuration image
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: ''Configuration value not found: file=/var/r...tart agent
Aug 11 12:37:41 dhcp210-150.gsslab.pnq.redhat.com ovirt-ha-agent[6473]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: ''Configuration value not found: file=/var/run/ovirt-hosted...tart agent


Expected results:
There should be a way to start / recover a HEVM instead of redeployment of HE setup.


Additional info:

Need to find any possible workaround for this issue.

Or

As an alternative way by which we can keep copy of vm.conf in original location after initial HE setup i.e. /etc/ovirt-hosted-engine/vm.conf until it gets imported successfully to OVF_Store and then we may unlink or remove the vm.conf from old location.

This will allow us to start the HEVM with available vm.conf and avoid redeployment of HE setup.

Comment 3 Simone Tiraboschi 2016-09-08 13:13:50 UTC
The engine VM should already re-start from the initial vm.conf till we get a valid OVF_STORE; this sequence can be repeated as many time as we want.

I suspect that the issue was somewhere else.

Comment 4 Simone Tiraboschi 2016-09-08 13:34:47 UTC
OK, in this case the issue was here:

12:13:55,652::heconflib::111::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(validateConfImage) 'version' is not stored in the HE configuration image

We got this since the setup failed before completing.

In this case the setup failed since:

2016-09-06 19:57:24 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:614 Cannot add the host to cluster Default
Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/engine/add_host.py", line 604, in _closeup
    otopicons.NetEnv.IPTABLES_ENABLE
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 18305, in add
    headers={"Correlation-Id":correlation_id, "Expect":expect}
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 79, in add
    return self.request('POST', url, body, headers, cls=cls)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 122, in request
    persistent_auth=self.__persistent_auth
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 79, in do_request
    persistent_auth)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 156, in __do_request
    raise errors.RequestError(response_code, response_reason, response_body)
RequestError: 
status: 400
reason: Bad Request
detail: Host address must be a FQDN or a valid IP address
2016-09-06 19:57:24 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:622 Cannot automatically add the host to cluster Default:
Host address must be a FQDN or a valid IP address
 
and at the end:
2016-09-06 19:59:31 DEBUG otopi.context context._executeMethod:128 Stage terminate METHOD otopi.plugins.gr_he_common.core.misc.Plugin._terminate
2016-09-06 19:59:31 ERROR otopi.plugins.gr_he_common.core.misc misc._terminate:180 Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy


The root cause is that:
2016-09-06 19:59:31 DEBUG otopi.context context.dumpEnvironment:770 ENV OVEHOSTED_NETWORK/host_name=str:'rhv_prod_h01.!!!MASKED!!!'
is not a valid fqdn since it contains underscores which is not an allowed char for an hostname and so the agent is correctly refusing to deploy that host.

Please try again with a valid hostname.
On the other side cockipt and ovirt-hosted-engine-setup should fail before with a clear error.

Comment 5 Nikolai Sednev 2017-02-06 17:45:19 UTC
Works for me on these components on host:
rhvm-appliance-4.1.20170126.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.1-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-node-ng-nodectl-4.1.0-0.20170104.1.el7.noarch
libvirt-client-2.0.0-10.el7_3.4.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
vdsm-4.19.4-1.el7ev.x86_64
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 7.3

If incorrect FQDN of form a_b_c.some.domain.com being used, then in Cockpit customer being asked again to provide the correct FQDN, till correct FQDN is provided and then deployment continues as expected.


Note You need to log in before you can comment on or make changes to this bug.