Created attachment 960409 [details] setup and vdsm logs Description of problem: Deployed hosted-engine, used iscsi. When I got to the phase when rhevm installation was completed on the engine's VM, picked: (1) Continue setup - engine installation is complete Then, installation crashed with: (1, 2, 3)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add hosted_engine_1 to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20141123113619.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination Version-Release number of selected component (if applicable): rhev 3.5 vt10.1 vdsm-4.16.7.4-1.el7ev.x86_64 ovirt-hosted-engine-ha-1.2.4-1.el7.noarch ovirt-hosted-engine-setup-1.2.1-3.el7ev.noarch Host: [root@green-vdsb ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.0 (Maipo) How reproducible: Tried once Steps to Reproduce: 1. hosted-engine --deploy, pick iscsi 2. when rhevm installation is completed on engine's VM, continue the deployment 3. Actual results: Deployed the hosted-engine on a 'clean' environment (no VM running on the host and no storage domain exists on the used LUN). Deployment failed with the mentioned error Expected results: Deployment should succeed Additional info: setup and vdsm logs
Created attachment 960411 [details] updated logs from vdsm.log: Thread-71::DEBUG::2014-11-23 11:34:11,689::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain 31e980f2-ebca-4c73-bb29-0f8a8b06ef48 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('31e980f2-ebca-4c73-bb29-0f8a8b06ef48', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) Attaching the relevant logs
Created attachment 960412 [details] sanlock.log
Looked at the attached logs, can't see much more than already shown above. sanlock log ends half an hour before this error. Already talked in private with Elad and he said the relevant data (machines, logs etc) were already removed. Next time please keep and attach also logs from the engine VM (especially host-deploy). I'll leave it for Sandro to have a look too, he might be able to see things I did not see.
David, in case we fail to reproduce, what should QE collect next time to help debug this? all of dom_md? parts? anything else? Thanks!
Federico, what should be collected to help debug this on vdsm side? Thanks!
It's not reproducible on RHEL7 with RHEV-M 3.5 vt11. In the past we had an issue with sanlock and SELinux on RHEL7 but it's now solved. Tried with: ovirt-hosted-engine-setup.noarch 1.2.1-4.el7ev @qa-latest selinux-policy.noarch 3.12.1-153.el7_0.12 @rhel7z selinux-policy-targeted.noarch 3.12.1-153.el7_0.12 @rhel7z vdsm.x86_64 4.16.7.5-1.el7ev @qa-latest Please ensure you have the latest rpms from @rhel7z
(In reply to Simone Tiraboschi from comment #6) > It's not reproducible on RHEL7 with RHEV-M 3.5 vt11. > > In the past we had an issue with sanlock and SELinux on RHEL7 but it's now > solved. > > Tried with: > ovirt-hosted-engine-setup.noarch 1.2.1-4.el7ev @qa-latest > selinux-policy.noarch 3.12.1-153.el7_0.12 @rhel7z > > selinux-policy-targeted.noarch 3.12.1-153.el7_0.12 @rhel7z > vdsm.x86_64 4.16.7.5-1.el7ev @qa-latest > > Please ensure you have the latest rpms from @rhel7z Actually, I'm using newer SElinux packages: [root@green-vdsb ~]# rpm -qa |grep selinux libselinux-2.2.2-6.el7.x86_64 selinux-policy-3.13.1-6.el7.noarch libselinux-ruby-2.2.2-6.el7.x86_64 libselinux-python-2.2.2-6.el7.x86_64 libselinux-utils-2.2.2-6.el7.x86_64 selinux-policy-targeted-3.13.1-6.el7.noarch This seems to be similar to https://bugzilla.redhat.com/show_bug.cgi?id=1167277 Please check that, re-openning for now
Collecting the output of 'sanlock status' and 'sanlock log_dump' would probably show you what's making the lockspace busy. That part might already be obvious from the vdsm side, I don't know.
(In reply to Elad from comment #7) > > Please ensure you have the latest rpms from @rhel7z > > Actually, I'm using newer SElinux packages: Are you using an alpha release of RHEL 7.1?
(In reply to Simone Tiraboschi from comment #9) > (In reply to Elad from comment #7) > > > Please ensure you have the latest rpms from @rhel7z > > > > Actually, I'm using newer SElinux packages: > > Are you using an alpha release of RHEL 7.1? No, used that host before to test something with these spesipic SElinux packages
It's not reproducible on RHEL-7.1-Alpha-1.2 with ovirt-hosted-engine-ha.noarch 1.2.4-1.el7 ovirt-hosted-engine-setup.noarch 1.2.1-4.el7ev selinux-policy.noarch 3.13.1-9.el7 selinux-policy-targeted.noarch 3.13.1-9.el7 vdsm.x86_64 4.16.7.5-1.el7ev Closing it for insufficient data. Fell free to open again if you are able to reproduce it.
Created attachment 961598 [details] /var/log Reproduced. Attaching the logs, re-openning. [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add hosted_engine_1 to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20141126125221.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination Red Hat Enterprise Linux Server release 7.0 (Maipo) ovirt-hosted-engine-setup-1.2.1-3.el7ev.noarch ovirt-hosted-engine-ha-1.2.4-1.el7.noarch libselinux-2.2.2-6.el7.x86_64 libselinux-ruby-2.2.2-6.el7.x86_64 selinux-policy-3.12.1-153.el7.noarch libselinux-utils-2.2.2-6.el7.x86_64 selinux-policy-targeted-3.12.1-153.el7.noarch libselinux-python-2.2.2-6.el7.x86_64 libvirt-daemon-1.1.1-29.el7_0.3.x86_64 vdsm-4.16.7.4-1.el7ev.x86_64 sanlock-3.1.0-2.el7.x86_64 /var/log attached
Created attachment 961601 [details] updated logs Uploading /var/log also from engine and the answer file
(In reply to David Teigland from comment #8) > Collecting the output of 'sanlock status' and 'sanlock log_dump' would > probably show you what's making the lockspace busy. That part might already > be obvious from the vdsm side, I don't know. Is still probably something around sanlock Thread-73::DEBUG::2014-11-26 12:50:15,103::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain 7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) MainThread::DEBUG::2014-11-26 12:50:15,104::taskManager::90::Storage.TaskManager::(prepareForShutdown) Request to stop all tasks I'm uploading status and log_dump from that host.
Created attachment 961721 [details] sanlock_log_dump
Created attachment 961722 [details] sanlock_status
rem_lockspace(SANLK_REM_UNUSED) returns -EBUSY if resource leases are still held in the lockspace, which is the case here, r 7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9:ddeb2ca1-bd47-4dc0-b561-484161df8a0b:/dev/7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9/leases:111149056:3 p 15422 2014-11-26 12:20:14+0200 8038 [10967]: cmd_acquire 2,10,15422 ci_in 4 fd 12 count 1 2014-11-26 12:20:14+0200 8038 [10967]: s6:r15 resource 7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9:ddeb2ca1-bd47-4dc0-b561-484161df8a0b:/dev/7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9/leases:111149056 for 2,10,15422 ... 2014-11-26 12:50:15+0200 9839 [10966]: cmd_rem_lockspace 4,12 7e0e3617-b7c3-47ba-a2ba-ef0fd564cbb9 flags 2 2014-11-26 12:50:15+0200 9839 [10966]: cmd_rem_lockspace 4,12 done -16 If vdsm does not want a resource lease to block the removal of the lockspace, it can drop the REM_UNUSED flag.
Moving to VDSM team.
Created attachment 965579 [details] logs from RHEL6 reproduce Deployment fails everytime. Tried with RHEL7.0 and RHEL6.6. The host was reprovisioned before each deployment and LUN were created right before it (they are clean). Attaching logs from RHEL6 host. rhev 3.5 vt13.1 libvirt-0.10.2-46.el6_6.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 vdsm-4.16.8.1-2.el6ev.x86_64lib selinux-utils-2.0.94-5.8.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 selinux-policy-3.7.19-260.el6.noarch selinux-policy-targeted-3.7.19-260.el6.noarch libselinux-ruby-2.0.94-5.8.el6.x86_64 libselinux-python-2.0.94-5.8.el6.x86_64 sanlock-2.8-1.el6.x86_64 Thread-82::INFO::2014-12-07 16:38:26,071::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 0b0a80b7-539a-4d82-8e8e-8a25257c0b74 (id: 1) Thread-82::DEBUG::2014-12-07 16:38:26,071::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain 0b0a80b7-539a-4d82-8e8e-8a25257c0b74 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('0b0a80b7-539a-4d82-8e8e-8a25257c0b74', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
Elad, can we retry with selinux-policy-targeted-3.7.19-260.el6_6.1 please? This is quite possibly an issue we already encountered.
The bug reproduced everytime using RHEL7.0 with: libselinux-2.2.2-6.el7.x86_64 selinux-policy-3.13.1-6.el7.noarch libselinux-ruby-2.2.2-6.el7.x86_64 libselinux-python-2.2.2-6.el7.x86_64 libselinux-utils-2.2.2-6.el7.x86_64 selinux-policy-targeted-3.13.1-6.el7.noarch Using RHEL6.6, I'm not manage to reproduce using the latest policy: libselinux-utils-2.0.94-5.8.el6.x86_64 selinux-policy-targeted-3.7.19-260.el6_6.1.noarch libselinux-2.0.94-5.8.el6.x86_64 libselinux-ruby-2.0.94-5.8.el6.x86_64 selinux-policy-3.7.19-260.el6_6.1.noarch libselinux-python-2.0.94-5.8.el6.x86_64
Elad, can you reproduce this in permissive mode? If not, please attach /var/log/audit/audit.log* This looks like bug 1160808 which was fixed on rhel 6.6.
(In reply to Nir Soffer from comment #23) > This looks like bug 1160808 which was fixed on rhel 6.6. For 6.6, this is indeed fixed, as Elad mentioned in comment 22: > > Using RHEL6.6, I'm not manage to reproduce using the latest policy: > > libselinux-utils-2.0.94-5.8.el6.x86_64 > selinux-policy-targeted-3.7.19-260.el6_6.1.noarch > libselinux-2.0.94-5.8.el6.x86_64 > libselinux-ruby-2.0.94-5.8.el6.x86_64 > selinux-policy-3.7.19-260.el6_6.1.noarch > libselinux-python-2.0.94-5.8.el6.x86_64 For 7.0, this DOES reproduce: > The bug reproduced everytime using RHEL7.0 with: > > libselinux-2.2.2-6.el7.x86_64 > selinux-policy-3.13.1-6.el7.noarch > libselinux-ruby-2.2.2-6.el7.x86_64 > libselinux-python-2.2.2-6.el7.x86_64 > libselinux-utils-2.2.2-6.el7.x86_64 > selinux-policy-targeted-3.13.1-6.el7.noarch
Looking in vdsm log, we can see that HostedEngine vm was started: Thread-162::DEBUG::2014-11-20 17:06:10,172::BindingXMLRPC::1132::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine': 'rhel6.5.0', 'vmId': '1ee80915-9835-46ae-86e6-9c57a40d7aab', 'devices': [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': '0ecf3686-ac2f-469a-96d7-c0df22abda01', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'optional': 'false', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': '1e0e7362 -985d-4910-b6d5-e56b46db4c04', 'imageID': '2c12b8fc-4ea4-4b24-b97f-85fd3d7b2240', 'specParams': {}, 'readonly': 'false', 'domainID': '57d4532d-179a-409d-baf4-deefa75d4acd', 'deviceId': '2c12b8fc-4ea4-4b24-b97f-8 5fd3d7b2240', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'scs i', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3E:76:D5:D5', 'linkActive': 'true', 'network': 'rhevm', 'bootOrder': '1', 'filter': 'vdsm-no-mac-spoofing', 'specParams': { }, 'deviceId': '8024ca7a-5a57-4fed-b649-ef3648862858', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'cons ole', 'specParams': {}, 'type': 'console', 'deviceId': '83951312-e41f-4b70-ab7b-39bbe798ef18', 'alias': 'console0'}], 'smp': '2', 'memSize': '4096', 'cpuType': 'Conroe', 'spiceSecureChannels': 'smain,sdisplay,si nputs,scursor,splayback,srecord,ssmartcard,susbredir', 'vmName': 'HostedEngine', 'display': 'qxl'},) {} This vm is holding a lease on the domain 57d4532d-179a-409d-baf4-deefa75d4acd <lease> <key>1e0e7362-985d-4910-b6d5-e56b46db4c04</key> <lockspace>57d4532d-179a-409d-baf4-deefa75d4acd</lockspace> <target offset="111149056" path="/dev/57d4532d-179a-409d-baf4-deefa75d4acd/leases"/> </lease> Then local client ask vdsm to stop monitor this domain: Thread-170::DEBUG::2014-11-20 17:11:01,935::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1] Thread-170::INFO::2014-11-20 17:11:01,936::logUtils::44::dispatcher::(wrapper) Run and protect: stopMonitoringDomain(sdUUID='57d4532d-179a-409d-baf4-deefa75d4acd', options=None) And this operation fails (expected), because the the vm is holding a lease on this domain: Thread-170::INFO::2014-11-20 17:11:01,937::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 57d4532d-179a-409d-baf4-deefa75d4acd Thread-71::DEBUG::2014-11-20 17:11:01,938::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 57d4532d-179a-409d-baf4-deefa75d4acd Thread-71::INFO::2014-11-20 17:11:01,938::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 57d4532d-179a-409d-baf4-deefa75d4acd (id: 1) Thread-71::DEBUG::2014-11-20 17:11:01,938::domainMonitor::201::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain 57d4532d-179a-409d-baf4-deefa75d4acd Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 198, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('57d4532d-179a-409d-baf4-deefa75d4acd', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) The domain thread does exit normally after such errors - releasing the host id is the last thing a domain monitor thread does. I don't see any other storage related error, so I think this bug should move back to hosted engine developers. Looking in ovirt-hosted-engine-setup-20141123104216-axscu4.log, this log is not very useful since lot of info is **FILTERED**, which makes it impossible to match with the vdsm log. To make progress with this, unfiltered log is needed.
(In reply to Nir Soffer from comment #25) > Thread-170::DEBUG::2014-11-20 > 17:11:01,935::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1] > Thread-170::INFO::2014-11-20 > 17:11:01,936::logUtils::44::dispatcher::(wrapper) Run and protect: > stopMonitoringDomain(sdUUID='57d4532d-179a-409d-baf4-deefa75d4acd', > options=None) > > And this operation fails (expected), because the the vm is holding a lease > on this domain: HE setup calls stopMonitoringDomain only at cleanup. Let's focus on description of the problem: [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add hosted_engine_1 to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20141123113619.conf' stopMonitoringDomain is called only after [ INFO ] Stage: Clean up the issue here is why we have: [Errno 111] Connection refused and why we have [ ERROR ] The VDSM host was found in a failed state.
(In reply to Sandro Bonazzola from comment #29) > (In reply to Nir Soffer from comment #25) > > > > Thread-170::DEBUG::2014-11-20 > > 17:11:01,935::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1] > > Thread-170::INFO::2014-11-20 > > 17:11:01,936::logUtils::44::dispatcher::(wrapper) Run and protect: > > stopMonitoringDomain(sdUUID='57d4532d-179a-409d-baf4-deefa75d4acd', > > options=None) > > > > And this operation fails (expected), because the the vm is holding a lease > > on this domain: > > HE setup calls stopMonitoringDomain only at cleanup. > > Let's focus on description of the problem: > > [ ERROR ] The VDSM host was found in a failed state. Please check engine and > bootstrap installation logs. > [ ERROR ] Unable to add hosted_engine_1 to the manager > Please shutdown the VM allowing the system to launch it as a > monitored service. > The system will wait until the VM is down. > [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection > refused > [ INFO ] Stage: Clean up > [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused > [ INFO ] Generating answer file > '/var/lib/ovirt-hosted-engine-setup/answers/answers-20141123113619.conf' > > stopMonitoringDomain is called only after > [ INFO ] Stage: Clean up > > the issue here is why we have: > [Errno 111] Connection refused This is good question - but not storage issue. You need to understand what hosted engine setup was doing until the point where vdsm was not listening to connections, and how much time vdsm was in this state. Generally vdsm should be always up unless you stop the service. > > and why we have > > [ ERROR ] The VDSM host was found in a failed state. I don't know what is "failed state".
Removing blocked bugs since it was re-targeted.
In order to advance with this issue we need a reproducer to investigate. Can you please provide the environment needed to debug this issue?
Will answer offline
Reproduced on RHEL 7.1 with VT14 Thread-104::ERROR::2015-03-10 19:08:42,954::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`63d035b1-1e12-45f1-b9e0-8a4ee7c7e8e2`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 3335, in _run self._connection.createXML(domxml, flags), File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3424, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error: Unable to apply rule 'The name org.fedoraproject.FirewallD1 was not provided by any .service files' Thread-104::DEBUG::2015-03-10 19:08:43,063::vm::2786::vm.Vm::(setDownStatus) vmId=`63d035b1-1e12-45f1-b9e0-8a4ee7c7e8e2`::Changed state to Down: internal error: Unable to apply rule 'The name org.fedoraproject.FirewallD1 was not provided by any .service files' (code=1)
Looks like a libvirt restart is needed between firewalld stopped and iptables started.
Cannot be tested due to https://bugzilla.redhat.com/show_bug.cgi?id=1215623
Hosted-engine deployment succeeds, iptables service is activated. Used: ovirt-3.6.0-alpha3 ovirt-hosted-engine-ha-1.3.0-0.0.master.20150615153650.20150615153645.git5f8c290.el7.noarch ovirt-hosted-engine-setup-1.3.0-0.0.master.20150729070044.git26149d7.el7.noarch iptables-services-1.4.21-13.el7.x86_64 iptables-1.4.21-13.el7.x86_64 vdsm-4.17.0-1229.git8299061.el7.noarch
I faced this issue and observed firewalld was inactive. Just to try I started firewalld, then issue disappears and could able to create new virtual network. I am very happy now. !!!
*** Bug 1278715 has been marked as a duplicate of this bug. ***
Reproduced on latest components: Host: ovirt-hosted-engine-setup-1.3.2-0.0.master.20151215104504.git25b97ca.el7.centos.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-release36-snapshot-002-2.noarch vdsm-4.17.13-35.gitcbc2303.el7.centos.noarch mom-0.5.1-2.el7.noarch ovirt-engine-sdk-python-3.6.1.1-0.1.20151127.git2400b22.el7.centos.noarch ovirt-vmconsole-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch ovirt-release36-002-2.noarch ovirt-hosted-engine-ha-1.3.3.6-1.20151221120517.gita5d04b3.el7.noarch ovirt-host-deploy-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch ovirt-vmconsole-host-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch ovirt-setup-lib-1.0.1-0.0.master.20151221171731.gitbf50a11.el7.centos.noarch qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64 libvirt-client-1.2.17-13.el7_2.2.x86_64 Linux version 3.10.0-327.4.4.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Dec 17 15:51:24 EST 2015 Engine: rhevm-3.6.1.3-0.1.el6.noarch ovirt-vmconsole-1.0.0-1.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch ovirt-engine-extension-aaa-jdbc-1.0.4-1.el6ev.noarch Linux version 2.6.32-573.8.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Fri Sep 25 19:24:22 EDT 2015 [ INFO ] Engine replied: DB Up!Welcome to Health Status! [ INFO ] Acquiring internal CA cert from the engine [ INFO ] The following CA certificate is going to be used, please immediately interrupt if not correct: [ INFO ] Issuer: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-1.qa.lab.tlv.redhat.com.31514, Subject: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-1.qa.lab.tlv.redhat.com.31514, Fingerprint (SHA-1): 8C7DA7D7948D9B8446A39DD3F86B105E83F38353 [ INFO ] Connecting to the Engine Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add hosted_engine_1 to the manager [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ ERROR ] Failed to execute stage 'Closing up': VDSM did not start within 120 seconds [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151222150453.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
Created attachment 1108639 [details] logs from host and engine (sosreports) and configuration log from deployment of the HE
host-deploy failed deploying the host due to an RPM conflict. For some reason on that host you have qemu-kvm-tools-ev from upstream oVirt repo and qemu-kvm-tools-rhev (notice the -rhev instead of the -ev!!!) from rhev repo. Can you please try to reproduce on a clean environment? 2015-12-22 15:02:50 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:95 Yum Status: Check Package Signatures 2015-12-22 15:02:50 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:95 Yum Status: Running Test Transaction Running Transaction Check 2015-12-22 15:02:50 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:100 Yum Test Transaction Errors: file /usr/share/qemu-kvm/tscdeadline_latency.flat from install of qemu-kvm-tools-ev-10:2.3.0-31.el7_2.4.1.x86_64 conflicts with file from package qemu-kvm-tools-rhev-10:2.3.0-31.el7_2.5.x86_64 2015-12-22 15:02:50 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/tmp/ovirt-aiBYLewGtE/pythonlib/otopi/context.py", line 146, in _executeMethod method['method']() File "/tmp/ovirt-aiBYLewGtE/otopi-plugins/otopi/packagers/yumpackager.py", line 274, in _packages self._miniyum.processTransaction() File "/tmp/ovirt-aiBYLewGtE/pythonlib/otopi/miniyum.py", line 1054, in processTransaction rpmDisplay=self._RPMCallback(sink=self._sink) File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 6503, in processTransaction self._doTestTransaction(callback,display=rpmTestDisplay) File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 6600, in _doTestTransaction raise Errors.YumTestTransactionError, errstring YumTestTransactionError: Test Transaction Errors: file /usr/share/qemu-kvm/tscdeadline_latency.flat from install of qemu-kvm-tools-ev-10:2.3.0-31.el7_2.4.1.x86_64 conflicts with file from package qemu-kvm-tools-rhev-10:2.3.0-31.el7_2.5.x86_64
(In reply to Nikolai Sednev from comment #42) > Reproduced on latest components: > Host: > ovirt-release36-snapshot-002-2.noarch > Engine: > rhevm-3.6.1.3-0.1.el6.noarch OK, the issue is just here: you mixed upstream (on the host) and downstream components (on the engine VM). Host-deploy from that rhevm will try to deploy the host using downstream components an this will obviously generate a conflict. 3.6.2 will solve also this one forcefully disabling ovirt-host-deploy packager while deploying the host where the engine VM runs (see https://bugzilla.redhat.com/show_bug.cgi?id=1236588 ). For now please avoid mixing upstream and downstream components.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0375.html