Created attachment 1019789 [details] logs from 2nd host Description of problem: I can't add second host into self-hosted env, first host runs ok. - there's problem with hosted-engine --deploy, rhevm bridge is not created successfully. i made that manually, ip was still on underlying (em1) device; then i reexecuted hosted-engine --deploy [ INFO ] Configuring VM [ INFO ] Updating hosted-engine configuration [ INFO ] Stage: Transaction commit [ INFO ] Stage: Closing up [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add slot-5b to the manager [ INFO ] Enabling and starting HA services Hosted Engine successfully set up [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150428182105.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination vdsm.log is full of python exceptions... Thread-47::DEBUG::2015-04-28 18:20:02,837::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/10.34.63.202:_mnt_export_nfs_lv2___brq-setup/23c03bb6-98 89-4cbf-b7ad-55b9a2c70653/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-47::DEBUG::2015-04-28 18:20:02,842::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n497 bytes (497 B) copied, 0.000312696 s, 1 .6 MB/s\n'; <rc> = 0 Thread-47::ERROR::2015-04-28 18:20:02,845::domainMonitor::256::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 23c03bb6-9889-4cbf-b7ad-55b9a2c70653 monitorin g information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 250, in _monitorDomain self.nextStatus.hasHostId = self.domain.hasHostId(self.hostId) File "/usr/share/vdsm/storage/sd.py", line 483, in hasHostId return self._clusterLock.hasHostId(hostId) File "/usr/share/vdsm/storage/clusterlock.py", line 261, in hasHostId hostId, self._idsPath) TypeError: argument 2 must be integer<k>, not str ... MainThread::DEBUG::2015-04-28 18:24:18,293::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor ioprocess communication (36158)::ERROR::2015-04-28 18:24:18,292::__init__::152::IOProcessClient::(_communicate) IOProcess failure Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate raise Exception("FD closed") Exception: FD closed Version-Release number of selected component (if applicable): vdsm-4.16.13.1-1.el7ev.x86_64 ovirt-hosted-engine-setup-1.2.2-3.el7.noarch How reproducible: 100% Steps to Reproduce: 1. two hosts, one host part of self-hosted engine 2. have rhevm env working 3. add 2nd host into self-hosted engine Actual results: setup fails in the end, seems vdsm related Expected results: should work, it should be "HA" Additional info:
Created attachment 1019790 [details] engine logs
Please retry now - due to EMC storage policy it's IQN needs to be allowed to shares. I've just added it's IQN to the list.
No, still same issue.
while adding 2nd host with ovirt-hosted-engine-setup-1.3.0-0.0.master.20150518075146.gitdd9741f.el7.noarch: ... --== HOSTED ENGINE CONFIGURATION ==-- Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_2]: Enter 'admin@internal' user password that will be used for accessing the Administrator Portal: Confirm 'admin@internal' user password: [WARNING] Failed to resolve jb-hosted.rhev.lab.eng.brq.redhat.com using DNS, it can be resolved only locally [ INFO ] Stage: Setup validation [ ERROR ] Failed to execute stage 'Setup validation': [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata' [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150529172837.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination ... 2015-05-29 17:28:34 DEBUG otopi.plugins.ovirt_hosted_engine_setup.pki.vdsmpki plugin.execute:940 execute-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/server-cert.pem') s tderr: 2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:141 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation 2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:147 condition False 2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:141 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation 2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:155 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 145, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 263, in _validation ] + ".metadata", File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 168, in get_all_host_stats_direct self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 119, in get_all_stats_direct stats = sb.get_raw_stats_for_service_type("client", service_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 125, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata' 2015-05-29 17:28:34 ERROR otopi.context context._executeMethod:164 Failed to execute stage 'Setup validation': [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9 be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata' ...
Created attachment 1032157 [details] ovirt-hosted-engine-setup-20150529172808-on9y3z.log
broken symlink: [root@dell-r210ii-13 ~]# ls -l /rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata lrwxrwxrwx. 1 vdsm kvm 132 May 29 17:15 /rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata -> /var/run/vdsm/storage/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/e2124bb1-bd54-4527-90ce-903e9bf7daf1/1ed25ddd-1fbf-4c16-ac24-1becbf1e6fc7 [root@dell-r210ii-13 ~]# find /var/run/vdsm/ /var/run/vdsm/ /var/run/vdsm/lvm /var/run/vdsm/lvm/lvm.conf /var/run/vdsm/client.log /var/run/vdsm/nets_restored /var/run/vdsm/svdsm.sock /var/run/vdsm/v2v /var/run/vdsm/trackedInterfaces /var/run/vdsm/sourceRoutes
vdsm-4.17.0-822.git9b11a18.el7.noarch on RHEL7.1, vdsm not running yet (error occured during hosted-engine --deploy on 2nd host)
The original issue was this one File "/usr/share/vdsm/storage/clusterlock.py", line 261, in hasHostId hostId, self._idsPath) TypeError: argument 2 must be integer<k>, not str and now it seams OK cause it goes further. I has also been marked as verified for 3.5.3: https://bugzilla.redhat.com/1221290 With VDSM 4.17 we are facing an additional issue 2015-05-29 17:28:34 ERROR otopi.context context._executeMethod:164 Failed to execute stage 'Setup validation': [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9 be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata' witch was also reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1226670 Please handle this separately.
Verified on ovirt-hosted-engine-setup-1.3.0-0.4.beta.git42eb801.el7ev.noarch Deployment of additional host on NFS storage passed without any errors
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0375.html