Bug 1139019 - Failed to execute stage 'Closing up': VM startup failed to acquire sanlock
Summary: Failed to execute stage 'Closing up': VM startup failed to acquire sanlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 3.5.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Sandro Bonazzola
QA Contact: Nikolai Sednev
URL:
Whiteboard: integration
Depends On: 1159946
Blocks: 1002454 1059435 1129261 rhev3.5beta3 1250199
TreeView+ depends on / blocked
 
Reported: 2014-09-07 13:55 UTC by Nikolai Sednev
Modified: 2015-08-04 17:51 UTC (History)
15 users (show)

Fixed In Version: ovirt-hosted-engine-setup-1.2.0-1.el6ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-11 20:40:52 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (101.83 KB, application/octet-stream)
2014-09-07 13:55 UTC, Nikolai Sednev
no flags Details
logs from nfs he deployment from host (73.96 KB, application/octet-stream)
2014-09-07 14:48 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0161 0 normal SHIPPED_LIVE ovirt-hosted-engine-setup bug fix and enhancement update 2015-12-07 21:35:11 UTC

Description Nikolai Sednev 2014-09-07 13:55:45 UTC
Created attachment 935150 [details]
logs

Description of problem:
HE installation fails on freshly imaged RHEL6.5 host for 3.5 latest release vt2.2.

Right after VM OS was successfully installed over ISCSI and VM console VNC screen closed, I proceeded with HE installation dialogue, then received error as described bellow:

"Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs"

Here comes errors from the log, while on ISCSI target there was plenty of space, 1.53GB used of 30GB (94% free):
 
Thread-100::ERROR::2014-09-07 16:27:31,717::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3346, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Failed to acquire lock: No space left on device
Thread-100::DEBUG::2014-09-07 16:27:31,769::vm::2813::vm.Vm::(setDownStatus) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::Changed state to Down: Failed to acquire lock: No space left on device (code=1)
Thread-78::DEBUG::2014-09-07 16:27:32,498::clusterlock::255::Storage.SANLock::(releaseHostId) Host id for domain 65f8c0e8-ca08-4e40-a1d9-55732f83b384 released successfully (id: 1)
Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::616::Storage.ResourceManager::(releaseResource) Trying to release resource 'Storage.HsmDomainMonitorLock'
Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::635::Storage.ResourceManager::(releaseResource) Released resource 'Storage.HsmDomainMonitorLock' (0 active users)
Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::641::Storage.ResourceManager::(releaseResource) Resource 'Storage.HsmDomainMonitorLock' is free, finding out if anyone is waiting for it.
Thread-112::DEBUG::2014-09-07 16:27:32,499::resourceManager::649::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.HsmDomainMonitorLock', Clearing records.
Thread-112::INFO::2014-09-07 16:27:32,500::logUtils::47::dispatcher::(wrapper) Run and protect: stopMonitoringDomain, Return response: None


Version-Release number of selected component (if applicable):
vdsm-4.16.3-2.el6.x86_64
sanlock-2.8-1.el6.x86_64
libvirt-0.10.2-29.el6_5.12.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64
ovirt-hosted-engine-setup-1.2.0-0.2.master.el6_5.noarch
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014

How reproducible:
100%

Steps to Reproduce:
1.On freshly imaged RHEL6.5 host with latest repos for 3.5 install HE.
2.Use ISCSI option and start installation process of HE.
3.Use VNC console option for VM and after OS with RHEL 6.5 installed via PXE, continue with the setup. 

Actual results:
"Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs"

Expected results:
Setup should proceed farther and succeed.

Additional info:
logs from host attached.

Comment 1 Nikolai Sednev 2014-09-07 14:44:39 UTC
Same scenario but with NFS3 SD was tested on another freshly installed host, logs attached.

Comment 2 Nikolai Sednev 2014-09-07 14:48:35 UTC
Created attachment 935157 [details]
logs from nfs he deployment from host

Comment 3 Dan Kenigsberg 2014-09-07 21:05:50 UTC
Indeed the VM failed to start; 

vdsm.log:
		<lease>
			<key>c8ab9ceb-d09a-4e97-aaca-690e2979e77c</key>
			<lockspace>b2151a75-ab4d-402c-adbc-685692cdeca1</lockspace>
			<target offset="0" path="/rhev/data-center/mnt/10.35.160.108:_RHEV_nsednev__HE__3__5/b2151a75-ab4d-402c-adbc-685692cdeca1/images/ced30854-7fc5-4328-85b6-f75a827d9c12/c8ab9ceb-d09a-4e97-aaca-690e2979e77c.lease"/>
		</lease>

Thread-88::ERROR::2014-09-07 17:37:42,076::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`99b0555e-7db2-4af8-8d8b-66af36caf69b`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3346, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Failed to acquire lock: No space left on device
Thread-88::DEBUG::2014-09-07 17:37:42,086::vm::2813::vm.Vm::(setDownStatus) vmId=`99b0555e-7db2-4af8-8d8b-66af36caf69b`::Changed state to Down: Failed to acquire lock: No space left on device (code=1)

sanlock.log:
2014-09-07 17:37:41+0300 5631 [30983]: r3 cmd_acquire 2,12,33474 invalid lockspace found -1 failed 0 name b2151a75-ab4d-402c-adbc-685692cdeca1

Comment 4 Sandro Bonazzola 2014-09-08 06:18:06 UTC
Jiri, is this something related to http://gerrit.ovirt.org/31751 ?
Can you take a look?

Comment 5 Jiri Moskovcak 2014-09-08 07:21:20 UTC
(In reply to Sandro Bonazzola from comment #4)
> Jiri, is this something related to http://gerrit.ovirt.org/31751 ?
> Can you take a look?

I don't think so.

Comment 6 Jiri Moskovcak 2014-09-08 07:23:54 UTC
(In reply to Jiri Moskovcak from comment #5)
> (In reply to Sandro Bonazzola from comment #4)
> > Jiri, is this something related to http://gerrit.ovirt.org/31751 ?
> > Can you take a look?
> 
> I don't think so.

oops, I just realized that answer sounds a bit weird. So to avoid any confusion - I don't think it's related.

Comment 7 Sandro Bonazzola 2014-09-08 14:01:07 UTC
looking at the logs I see that there was a call to stopMonitoringDomain while the VM was going to start:

 Thread-112::DEBUG::2014-09-07 16:27:29,494::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1]
 Thread-112::DEBUG::2014-09-07 16:27:29,495::task::595::Storage.TaskManager.Task::(_updateState) Task=`02874973-7ff4-4183-9a11-d5ba7ffdb6b3`::moving from state init -> state preparing
 Thread-112::INFO::2014-09-07 16:27:29,495::logUtils::44::dispatcher::(wrapper) Run and protect: stopMonitoringDomain(sdUUID='65f8c0e8-ca08-4e40-a1d9-55732f83b384', options=None)
 Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::198::Storage.ResourceManager.Request::(__init__) ResName=`Storage.HsmDomainMonitorLock`ReqID=`007f923c-bea4-472e-9fa5-2e12edfff0bb`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '3648' at 'stopMonitoringDomain'
 Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::542::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.HsmDomainMonitorLock' for lock type 'exclusive'
 Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::601::Storage.ResourceManager::(registerResource) Resource 'Storage.HsmDomainMonitorLock' is free. Now locking as 'exclusive' (1 active user)
 Thread-112::DEBUG::2014-09-07 16:27:29,496::resourceManager::238::Storage.ResourceManager.Request::(grant) ResName=`Storage.HsmDomainMonitorLock`ReqID=`007f923c-bea4-472e-9fa5-2e12edfff0bb`::Granted request
 Thread-112::INFO::2014-09-07 16:27:29,497::domainMonitor::114::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 65f8c0e8-ca08-4e40-a1d9-55732f83b384
 Thread-78::DEBUG::2014-09-07 16:27:29,497::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 65f8c0e8-ca08-4e40-a1d9-55732f83b384
 Thread-78::INFO::2014-09-07 16:27:29,498::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 65f8c0e8-ca08-4e40-a1d9-55732f83b384 (id: 1)
 Thread-100::DEBUG::2014-09-07 16:27:31,717::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device
 Thread-100::DEBUG::2014-09-07 16:27:31,717::vm::2294::vm.Vm::(_startUnderlyingVm) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::_ongoingCreations released
 Thread-100::ERROR::2014-09-07 16:27:31,717::vm::2331::vm.Vm::(_startUnderlyingVm) vmId=`e4d1769d-73de-47cb-af1f-a6b030553551`::The vm start process failed
 Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2271, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3346, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
 libvirtError: Failed to acquire lock: No space left on device


I can't reproduce this on my system.
Can you detail what has been done while the setup was starting the VM?
Looks like someone/something deliberately stopped the domain monitor on the storage domain going to be used by the Hosted Engine VM and that wasn't the setup.

Also, are you able to reproduce?

Comment 8 Nikolai Sednev 2014-09-08 18:13:56 UTC
Sure, just follow the steps from initial reproduction steps, which means do exactly the following:
1.Install fresh image of RHEL6.5 on your host.
2.rm -rf /etc/yum.repos.d/*
3.vim /etc/yum.repos.d/rhev.repo
4.copy all the following from here https://mojo.redhat.com/docs/DOC-962666
5.yum clean all
6.yum update -y && yum install screen -y
7.yum install ovirt-hosted-engine-setup -y
8.hosted-engine --deploy

For sure you have to use the FQDN asked and reserved for your HE from eng-ops and please reserve the MAC address before, so you'll get the same IP for your HE.

Comment 22 Nikolai Sednev 2014-10-13 12:28:55 UTC
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1148712 ,setting to assigned untill fixed.

Comment 23 Nikolai Sednev 2014-10-13 12:29:35 UTC
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1148712 ,setting to assigned untill fixed.

Comment 24 Sandro Bonazzola 2014-10-21 09:45:28 UTC
Bug #1148712 is on qa now.

Comment 25 Nikolai Sednev 2014-11-25 17:05:50 UTC
Works for me on:
rhevm-3.5.0-0.21.el6ev.noarch
policycoreutils-python-2.0.83-19.47.el6_6.1.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
ovirt-hosted-engine-setup-1.2.1-4.el6ev.noarch
sanlock-2.8-1.el6.x86_64
ovirt-host-deploy-1.3.0-1.el6ev.noarch
ovirt-hosted-engine-ha-1.2.4-1.el6ev.noarch
libvirt-0.10.2-46.el6_6.2.x86_64
vdsm-4.16.7.5-1.el6ev.x86_64

Comment 27 errata-xmlrpc 2015-02-11 20:40:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0161.html


Note You need to log in before you can comment on or make changes to this bug.