Bug 1367777

Summary: For Self Hosted RHV Deployment changing data center and cluster name causes deployment to fail
Product: Red Hat Quickstart Cloud Installer Reporter: James Olin Oden <joden>
Component: Installation - RHEVAssignee: John Matthews <jmatthew>
Status: CLOSED EOL QA Contact: Tasos Papaioannou <tpapaioa>
Severity: high Docs Contact: Dan Macpherson <dmacpher>
Priority: unspecified    
Version: 1.0CC: bthurber, fabian, tpapaioa
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1367897 (view as bug list) Environment:
Last Closed: 2018-02-26 19:58:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1367897    

Description James Olin Oden 2016-08-17 12:52:08 UTC
Description of problem:
I have three times (well, Fabian did it once and I twice).   What originally happened was was doing a RHV self hosted deployment with four hosts, and I had change the data center and cluster name to be:

   0123456789112345678921234567893123456789

Which is exactly 40 characters long, the maximum length of a data center or cluster name.   As it was deploying the first host for the engine to run on, it died with the following error:

===== Puppet run for the host hyperviso14.b.b status reported as Error ======

On the host that had the puppet failure, /var/log/messages had the following 
concerning puppet:

Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) [ ERROR ] Failed to execute stage 'Closing up': Specified cluster does not exist: 1111
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) [ INFO  ] Stage: Clean up
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160816211614.conf'
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) [ INFO  ] Stage: Pre-termination
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) [ INFO  ] Stage: Termination
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns)           Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160816204743-iylgq9.log
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: hosted-engine --deploy --config-append=/etc/qci/answers returned 1 instead of one of [0]
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Setup/Exec[hosted-engine-setup]/returns) change from notrun to 0 failed: hosted-engine --deploy --config-append=/etc/qci/answers returned 1 instead of one of [0]
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Notify[oVirt Configuration stage- Done]) Dependency Exec[hosted-engine-setup] has failures: true
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Notify[oVirt Configuration stage- Done]) Skipping because of failed dependencies
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Notify[Datacenter is not in upstatus, going over configuration]) Dependency Exec[hosted-engine-setup] has failures: true
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Notify[Datacenter is not in upstatus, going over configuration]) Skipping because of failed dependencies
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/File[/etc/qci/engine-DC-config.py]) Dependency Exec[hosted-engine-setup] has failures: true
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/File[/etc/qci/engine-DC-config.py]) Skipping because of failed dependencies
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Exec[engine_dc_config]) Dependency Exec[hosted-engine-setup] has failures: true
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Exec[engine_dc_config]) Skipping because of failed dependencies
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Notify[oVirt Configuration stage- Starting]) Dependency Exec[hosted-engine-setup] has failures: true
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: (/Stage[main]/Ovirt::Self_hosted::Config/Notify[oVirt Configuration stage- Starting]) Skipping because of failed dependencies
Aug 16 21:16:15 hypervisor14 puppet-agent[3811]: Finished catalog run in 2140.07 seconds

When you look in the log, /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160817004633-g16j41.log, pointed to by the error above you find this:

Aug 17 00:45:25 hypervisor14.b.b vdsm[13818]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Aug 17 00:45:25 hypervisor14.b.b vdsm[13818]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                              Traceback (most recent call last):
                                                File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 232, in _getHaInfo
                                                  stats = instance.get_all_stats()
                                                File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                  with broker.connection(self._retries, self._wait):
                                                File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                  return self.gen.next()
                                                File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                  self.connect(retries, wait)
                                                File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                  raise BrokerConnectionError(error_msg)
                                              BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)

This error is listed several times in that log.

Version-Release number of selected component (if applicable):
QCI-1.0-RHEL-7-20160815.t.0

How reproducible:
every time

Steps to Reproduce:
1.  Do a RHV deployment 
2.  Change the data center and cluster names
3.  Continue with the deployment

Actual results:
It will fail with the puppet error which seems to be due to vdsm not running.

Expected results:
No errors.

Comment 2 John Matthews 2016-08-17 17:59:14 UTC
This is outside scope of GA, for GA we will disable the ability to configure the data center/cluster name with self-hosted.

Comment 3 Fabian von Feilitzsch 2016-08-17 19:05:29 UTC
Disabled datacenter/cluster configuration for self-hosted: https://github.com/fusor/fusor/pull/1165

Comment 4 Tasos Papaioannou 2016-08-23 13:55:53 UTC
Verified on QCI-1.0-RHEL-7-20160819.t.0.