Bug 1216172 - [self-hosted] Can't add 2nd host into self-hosted env: The VDSM host was found in a failed state... Unable to add slot-5b to the manager
Summary: [self-hosted] Can't add 2nd host into self-hosted env: The VDSM host was foun...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Simone Tiraboschi
QA Contact: Artyom
URL:
Whiteboard:
Depends On: 1215967 1221148 1226670 1271272
Blocks: 1221290 1234915
TreeView+ depends on / blocked
 
Reported: 2015-04-28 16:37 UTC by Jiri Belka
Modified: 2019-09-12 08:25 UTC (History)
13 users (show)

Fixed In Version: ovirt-3.6.0-alpha1
Doc Type: Bug Fix
Doc Text:
Previously, HostId was treated as an integer on the first host and as a string on additional hosts due to bad parsing of the answerfile, causing setup to fail. Now, this failure has been fixed by treating HostId as an integer on all hosts.
Clone Of:
: 1221290 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:12:01 UTC
oVirt Team: Integration
Target Upstream Version:


Attachments (Terms of Use)
logs from 2nd host (643.28 KB, application/x-gzip)
2015-04-28 16:37 UTC, Jiri Belka
no flags Details
engine logs (1.16 MB, application/x-gzip)
2015-04-28 16:38 UTC, Jiri Belka
no flags Details
ovirt-hosted-engine-setup-20150529172808-on9y3z.log (251.10 KB, text/plain)
2015-05-29 15:32 UTC, Jiri Belka
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:0375 normal SHIPPED_LIVE ovirt-hosted-engine-setup bug fix and enhancement update 2016-03-09 23:48:34 UTC
oVirt gerrit 40778 master MERGED packaging: setup: always storing HOST_ID as an int Never

Description Jiri Belka 2015-04-28 16:37:31 UTC
Created attachment 1019789 [details]
logs from 2nd host

Description of problem:

I can't add second host into self-hosted env, first host runs ok.

- there's problem with hosted-engine --deploy, rhevm bridge is not created successfully. i made that manually, ip was still on underlying (em1) device; then i reexecuted hosted-engine --deploy

[ INFO  ] Configuring VM
[ INFO  ] Updating hosted-engine configuration
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
[ ERROR ] Unable to add slot-5b to the manager
[ INFO  ] Enabling and starting HA services
          Hosted Engine successfully set up
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150428182105.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

vdsm.log is full of python exceptions...
Thread-47::DEBUG::2015-04-28 18:20:02,837::fileSD::261::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/10.34.63.202:_mnt_export_nfs_lv2___brq-setup/23c03bb6-98
89-4cbf-b7ad-55b9a2c70653/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-47::DEBUG::2015-04-28 18:20:02,842::fileSD::261::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n497 bytes (497 B) copied, 0.000312696 s, 1
.6 MB/s\n'; <rc> = 0
Thread-47::ERROR::2015-04-28 18:20:02,845::domainMonitor::256::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 23c03bb6-9889-4cbf-b7ad-55b9a2c70653 monitorin
g information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 250, in _monitorDomain
    self.nextStatus.hasHostId = self.domain.hasHostId(self.hostId)
  File "/usr/share/vdsm/storage/sd.py", line 483, in hasHostId
    return self._clusterLock.hasHostId(hostId)
  File "/usr/share/vdsm/storage/clusterlock.py", line 261, in hasHostId
    hostId, self._idsPath)
TypeError: argument 2 must be integer<k>, not str

...


MainThread::DEBUG::2015-04-28 18:24:18,293::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor
ioprocess communication (36158)::ERROR::2015-04-28 18:24:18,292::__init__::152::IOProcessClient::(_communicate) IOProcess failure
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
    raise Exception("FD closed")
Exception: FD closed

Version-Release number of selected component (if applicable):
vdsm-4.16.13.1-1.el7ev.x86_64
ovirt-hosted-engine-setup-1.2.2-3.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. two hosts, one host part of self-hosted engine
2. have rhevm env working
3. add 2nd host into self-hosted engine

Actual results:
setup fails in the end, seems vdsm related

Expected results:
should work, it should be "HA"

Additional info:

Comment 1 Jiri Belka 2015-04-28 16:38:17 UTC
Created attachment 1019790 [details]
engine logs

Comment 2 Pavel Stehlik 2015-04-29 06:26:07 UTC
Please retry now - due to EMC storage policy it's IQN needs to be allowed to shares. I've just added it's IQN to the list.

Comment 3 Jiri Belka 2015-04-29 07:42:43 UTC
No, still same issue.

Comment 6 Jiri Belka 2015-05-29 15:32:03 UTC
while adding 2nd host with ovirt-hosted-engine-setup-1.3.0-0.0.master.20150518075146.gitdd9741f.el7.noarch:

...
          --== HOSTED ENGINE CONFIGURATION ==--
         
          Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_2]: 
          Enter 'admin@internal' user password that will be used for accessing the Administrator Portal: 
          Confirm 'admin@internal' user password: 
[WARNING] Failed to resolve jb-hosted.rhev.lab.eng.brq.redhat.com using DNS, it can be resolved only locally
[ INFO  ] Stage: Setup validation
[ ERROR ] Failed to execute stage 'Setup validation': [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata'
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150529172837.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination


...
2015-05-29 17:28:34 DEBUG otopi.plugins.ovirt_hosted_engine_setup.pki.vdsmpki plugin.execute:940 execute-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/server-cert.pem') s
tderr:


2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:141 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation
2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:147 condition False
2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:141 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation
2015-05-29 17:28:34 DEBUG otopi.context context._executeMethod:155 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 145, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 263, in _validation
    ] + ".metadata",
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 168, in get_all_host_stats_direct
    self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 119, in get_all_stats_direct
    stats = sb.get_raw_stats_for_service_type("client", service_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 125, in get_raw_stats_for_service_type
    f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata'
2015-05-29 17:28:34 ERROR otopi.context context._executeMethod:164 Failed to execute stage 'Setup validation': [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9
be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata'
...

Comment 7 Jiri Belka 2015-05-29 15:32:28 UTC
Created attachment 1032157 [details]
ovirt-hosted-engine-setup-20150529172808-on9y3z.log

Comment 8 Jiri Belka 2015-05-29 15:33:57 UTC
broken symlink:

[root@dell-r210ii-13 ~]# ls -l /rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata
lrwxrwxrwx. 1 vdsm kvm 132 May 29 17:15 /rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata -> /var/run/vdsm/storage/5440dfcd-9be7-4e43-97c5-bff83cc20e9b/e2124bb1-bd54-4527-90ce-903e9bf7daf1/1ed25ddd-1fbf-4c16-ac24-1becbf1e6fc7

[root@dell-r210ii-13 ~]# find /var/run/vdsm/
/var/run/vdsm/
/var/run/vdsm/lvm
/var/run/vdsm/lvm/lvm.conf
/var/run/vdsm/client.log
/var/run/vdsm/nets_restored
/var/run/vdsm/svdsm.sock
/var/run/vdsm/v2v
/var/run/vdsm/trackedInterfaces
/var/run/vdsm/sourceRoutes

Comment 9 Jiri Belka 2015-05-29 15:35:17 UTC
vdsm-4.17.0-822.git9b11a18.el7.noarch

on RHEL7.1, vdsm not running yet (error occured during hosted-engine --deploy on 2nd host)

Comment 10 Simone Tiraboschi 2015-06-08 07:59:11 UTC
The original issue was this one

File "/usr/share/vdsm/storage/clusterlock.py", line 261, in hasHostId
    hostId, self._idsPath)
TypeError: argument 2 must be integer<k>, not str

and now it seams OK cause it goes further.
I has also been marked as verified for 3.5.3: https://bugzilla.redhat.com/1221290

With VDSM 4.17 we are facing an additional issue 
2015-05-29 17:28:34 ERROR otopi.context context._executeMethod:164 Failed to execute stage 'Setup validation': [Errno 2] No such file or directory: '/rhev/data-center/mnt/10.34.63.199:_jbelka_jb-hosted/5440dfcd-9
be7-4e43-97c5-bff83cc20e9b/ha_agent/hosted-engine.metadata'

witch was also reported here: 
https://bugzilla.redhat.com/show_bug.cgi?id=1226670

Please handle this separately.

Comment 12 Artyom 2015-09-02 10:46:17 UTC
Verified on ovirt-hosted-engine-setup-1.3.0-0.4.beta.git42eb801.el7ev.noarch
Deployment of additional host on NFS storage passed without any errors

Comment 14 errata-xmlrpc 2016-03-09 19:12:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0375.html


Note You need to log in before you can comment on or make changes to this bug.