Created attachment 935150 [details] logs Description of problem: HE installation fails on freshly imaged RHEL6.5 host for 3.5 latest release vt2.2. Right after VM OS was successfully installed over ISCSI and VM console VNC screen closed, I proceeded with HE installation dialogue, then received error as described bellow: "Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs" Here comes errors from the log, while on ISCSI target there was plenty of space, 1.53GB used of 30GB (94% free): Thread-100::ERROR::2014-09-07 16:27:31,717::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 3346, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device Thread-100::DEBUG::2014-09-07 16:27:31,769::vm::2813::vm.Vm::(setDownStatus) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::Changed state to Down: Failed to acquire lock: No space left on device (code=1) Thread-78::DEBUG::2014-09-07 16:27:32,498::clusterlock::255::Storage.SANLock::(releaseHostId) Host id for domain 65f8c0e8-ca08-4e40-a1d9-55732f83b384 released successfully (id: 1) Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::616::Storage.ResourceManager::(releaseResource) Trying to release resource 'Storage.HsmDomainMonitorLock' Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::635::Storage.ResourceManager::(releaseResource) Released resource 'Storage.HsmDomainMonitorLock' (0 active users) Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::641::Storage.ResourceManager::(releaseResource) Resource 'Storage.HsmDomainMonitorLock' is free, finding out if anyone is waiting for it. Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::649::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.HsmDomainMonitorLock', Clearing records. Thread-112::INFO::2014-09-07 16:27:32,500::logUtils::47::dispatcher::(wrapper) Run and protect: stopMonitoringDomain, Return response: None Version-Release number of selected component (if applicable): vdsm-4.16.3-2.el6.x86_64 sanlock-2.8-1.el6.x86_64 libvirt-0.10.2-29.el6_5.12.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64 ovirt-hosted-engine-setup-1.2.0-0.2.master.el6_5.noarch Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 How reproducible: 100% Steps to Reproduce: 1.On freshly imaged RHEL6.5 host with latest repos for 3.5 install HE. 2.Use ISCSI option and start installation process of HE. 3.Use VNC console option for VM and after OS with RHEL 6.5 installed via PXE, continue with the setup. Actual results: "Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs" Expected results: Setup should proceed farther and succeed. Additional info: logs from host attached.
Same scenario but with NFS3 SD was tested on another freshly installed host, logs attached.
Created attachment 935157 [details] logs from nfs he deployment from host
Indeed the VM failed to start; vdsm.log: <lease> <key>c8ab9ceb-d09a-4e97-aaca-690e2979e77c</key> <lockspace>b2151a75-ab4d-402c-adbc-685692cdeca1</lockspace> <target offset="0" path="/rhev/data-center/mnt/10.35.160.108:_RHEV_nsednev__HE__3__5/b2151a75-ab4d-402c-adbc-685692cdeca1/images/ced30854-7fc5-4328-85b6-f75a827d9c12/c8ab9ceb-d09a-4e97-aaca-690e2979e77c.lease"/> </lease> Thread-88::ERROR::2014-09-07 17:37:42,076::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`99b0555e-7db2-4af8-8d8b-66af36caf69b`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 3346, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device Thread-88::DEBUG::2014-09-07 17:37:42,086::vm::2813::vm.Vm::(setDownStatus) vmId=`99b0555e-7db2-4af8-8d8b-66af36caf69b`::Changed state to Down: Failed to acquire lock: No space left on device (code=1) sanlock.log: 2014-09-07 17:37:41+0300 5631 [30983]: r3 cmd_acquire 2,12,33474 invalid lockspace found -1 failed 0 name b2151a75-ab4d-402c-adbc-685692cdeca1
Jiri, is this something related to http://gerrit.ovirt.org/31751 ? Can you take a look?
(In reply to Sandro Bonazzola from comment #4) > Jiri, is this something related to http://gerrit.ovirt.org/31751 ? > Can you take a look? I don't think so.
(In reply to Jiri Moskovcak from comment #5) > (In reply to Sandro Bonazzola from comment #4) > > Jiri, is this something related to http://gerrit.ovirt.org/31751 ? > > Can you take a look? > > I don't think so. oops, I just realized that answer sounds a bit weird. So to avoid any confusion - I don't think it's related.
looking at the logs I see that there was a call to stopMonitoringDomain while the VM was going to start: Thread-112::DEBUG::2014-09-07 16:27:29,494::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1] Thread-112::DEBUG::2014-09-07 16:27:29,495::task::595::Storage.TaskManager.Task::(_updateState) Task=`02874973-7ff4-4183-9a11-d5ba7ffdb6b3`::moving from state init -> state preparing Thread-112::INFO::2014-09-07 16:27:29,495::logUtils::44::dispatcher::(wrapper) Run and protect: stopMonitoringDomain(sdUUID='65f8c0e8-ca08-4e40-a1d9-55732f83b384', options=None) Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::198::Storage.ResourceManager.Request::(__init__) ResName=`Storage.HsmDomainMonitorLock`ReqID=`007f923c-bea4-472e-9fa5-2e12edfff0bb`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3648' at 'stopMonitoringDomain' Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::542::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.HsmDomainMonitorLock' for lock type 'exclusive' Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::601::Storage.ResourceManager::(registerResource) Resource 'Storage.HsmDomainMonitorLock' is free. Now locking as 'exclusive' (1 active user) Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::238::Storage.ResourceManager.Request::(grant) ResName=`Storage.HsmDomainMonitorLock`ReqID=`007f923c-bea4-472e-9fa5-2e12edfff0bb`::Granted request Thread-112::INFO::2014-09-07 16:27:29,497::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 65f8c0e8-ca08-4e40-a1d9-55732f83b384 Thread-78::DEBUG::2014-09-07 16:27:29,497::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 65f8c0e8-ca08-4e40-a1d9-55732f83b384 Thread-78::INFO::2014-09-07 16:27:29,498::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 65f8c0e8-ca08-4e40-a1d9-55732f83b384 (id: 1) Thread-100::DEBUG::2014-09-07 16:27:31,717::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-100::DEBUG::2014-09-07 16:27:31,717::vm::2294::vm.Vm::(_startUnderlyingVm) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::_ongoingCreations released Thread-100::ERROR::2014-09-07 16:27:31,717::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 3346, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device I can't reproduce this on my system. Can you detail what has been done while the setup was starting the VM? Looks like someone/something deliberately stopped the domain monitor on the storage domain going to be used by the Hosted Engine VM and that wasn't the setup. Also, are you able to reproduce?
Sure, just follow the steps from initial reproduction steps, which means do exactly the following: 1.Install fresh image of RHEL6.5 on your host. 2.rm -rf /etc/yum.repos.d/* 3.vim /etc/yum.repos.d/rhev.repo 4.copy all the following from here https://mojo.redhat.com/docs/DOC-962666 5.yum clean all 6.yum update -y && yum install screen -y 7.yum install ovirt-hosted-engine-setup -y 8.hosted-engine --deploy For sure you have to use the FQDN asked and reserved for your HE from eng-ops and please reserve the MAC address before, so you'll get the same IP for your HE.
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1148712 ,setting to assigned untill fixed.
Bug #1148712 is on qa now.
Works for me on: rhevm-3.5.0-0.21.el6ev.noarch policycoreutils-python-2.0.83-19.47.el6_6.1.x86_64 qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 ovirt-hosted-engine-setup-1.2.1-4.el6ev.noarch sanlock-2.8-1.el6.x86_64 ovirt-host-deploy-1.3.0-1.el6ev.noarch ovirt-hosted-engine-ha-1.2.4-1.el6ev.noarch libvirt-0.10.2-46.el6_6.2.x86_64 vdsm-4.16.7.5-1.el6ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0161.html