Description of problem: VDSM crashes while virt-v2v copies the disks. It used to work few weeks ago but it seems that recent changes in VDSM break it - so it needs to be adjusted. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. start to import VM from VMware 2. 3. Actual results: While the disk is copied, VDSM crashes Expected results: The conversion should succeed Additional info: The last log before the crash is: vdsm::68::vds::(sigtermHandler) Received signal 15 Few seconds before that we see the following exception: Thread-47::ERROR::2015-07-16 18:19:38,377::monitor::366::Storage.Monitor::(_releaseHostId) Error releasing host id 2 for domain f4f54f47-9ccf-4978-a9a7-12a6d89bf94e Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 363, in _releaseHostId self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 535, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'f4f54f47-9ccf-4978-a9a7-12a6d89bf94e', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) I tried to lock the image using resource-manager (exclusively) but it didn't help. Moreover, the conversion works fine when invoking virt-v2v from the command line.
Just copying here what I said in the email: > Assuming storage = the data domain, then virt-v2v itself doesn't write > to it, it runs 'qemu-img convert' which writes to it. I would need to > take a much closer look to work out what block size qemu-img is using, > but virt-v2v sets the source block size of the copy to 64 MB. By which I mean, virt-v2v invokes qemu-img convert with a source disk which is https, and it sets the https readahead to 64 MB. This just means that we don't have to keep going back to the origin vCenter server, but probably doesn't influence the qemu-img convert block size. > As far as I know, neither virt-v2v nor qemu-img interact in any way > with sanlock.
Nir, can you take a look please?
(In reply to Arik from comment #0) > The last log before the crash is: > vdsm::68::vds::(sigtermHandler) Received signal 15 This happen when someone is stopping vdsm. According to Liron who looked at the logs, sanlock lost access to the storage, so it fenced vdsm which was the spm. > Few seconds before that we see the following exception: > ReleaseHostIdFailure: Cannot release host id: > (u'f4f54f47-9ccf-4978-a9a7-12a6d89bf94e', SanlockException(16, 'Sanlock > lockspace remove failure', 'Device or resource busy')) This is expected when stopping the spm host. Not pretty, but this is the current behavior. This is not a crash, just how unclean shutdown looks. Please attach these logs: /var/log/vdsm/vdsm.log /var/log/messages /var/log/sanlock.log
Arik, can you reproduce this issue when using the attached patch?
Nir, VDSM crashes again but with different error: Thread-22::ERROR::2015-07-20 09:31:26,316::monitor::250::Storage.Monitor::(_monitorDomain) Error monitoring domain f4f54f47-9ccf-4978-a9a7-12a6d89bf94e Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 246, in _monitorDomain self._performDomainSelftest() File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 759, in wrapper value = meth(self, *a, **kw) File "/usr/share/vdsm/storage/monitor.py", line 313, in _performDomainSelftest self.domain.selftest() File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 142, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/fileSD.py", line 194, in __init__ validateFileSystemFeatures(manifest.sdUUID, manifest.mountpoint) File "/usr/share/vdsm/storage/fileSD.py", line 89, in validateFileSystemFeatures oop.getProcessPool(sdUUID).directTouch(testFilePath) File "/usr/share/vdsm/storage/outOfProcess.py", line 350, in directTouch ioproc.touch(path, flags, mode) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 385, in _sendCommand raise Timeout(os.strerror(errno.ETIMEDOUT)) Timeout: Connection timed out and few seconds afterwards it gets: MainThread::DEBUG::2015-07-20 09:31:28,029::vdsm::68::vds::(sigtermHandler) Received signal 15 I figured out that it is related to the amount of free space in the storage. when I freed more space, the import suddenly worked. I reproduced it again by creating a large disk on the storage. Note that the free space should be sufficient to store the disk on the imported VM (9g are available).
Created attachment 1053729 [details] vdsm log vdsm log when trying to verify Nir's patch. Note that at 2015-07-20 09:25:20,318 I got the SanlockException when trying to import the VM without the patch, and at 2015-07-20 09:31:26,316 I got timeout which lead to VDSM termination when doing the same import with the patch.
(In reply to Arik from comment #6) > Created attachment 1053729 [details] > vdsm log Please provide the other logs mentioned in comment 3.
(In reply to Arik from comment #5) > Nir, VDSM crashes again but with different error: Vdsm does not crash, it was terminated (received signal 15). > Thread-22::ERROR::2015-07-20 > 09:31:26,316::monitor::250::Storage.Monitor::(_monitorDomain) Error > monitoring domain f4f54f47-9ccf-4978-a9a7-12a6d89bf94e > Traceback (most recent call last): > Timeout: Connection timed out ioprocess timed out trying to touch a file - connection to storage is lost, or too slow. > and few seconds afterwards it gets: > MainThread::DEBUG::2015-07-20 09:31:28,029::vdsm::68::vds::(sigtermHandler) > Received signal 15 Another hint that connection to storage is too slow. > I figured out that it is related to the amount of free space in the storage. > when I freed more space, the import suddenly worked. I reproduced it again > by creating a large disk on the storage. > Note that the free space should be sufficient to store the disk on the > imported VM (9g are available). Can you reproduce this with another NFS server? for example, try run your own server on some machine. Can you reproduce this with iSCSI stoarage domain?
Nir, we played with it a bit more, and it seems it really is happening only when NFS servers start to be short on space. Also tried directly invoking "qemu-img convert" to see if it has anything to do with v2v or not and indeed it reproduced the same. There is a difference between v2v and regular import flow during these tests, though - import is from NFS to NFS, whereas v2v copies from the host to NFS. It seemed to be going much faster in case of host->NFS. So I suppose the stress on NFS side is simply higher and it responds slowly (simple directory listing on host from the NFS server took several seconds during qemu-img convert) Since it does work ok when the server has more space I think indeed the behavior on other NFS servers or iSCSI is different Don't know what can we do about it other than the proposed patch...which doesn't seem to address it completely, but should not hurt anyway. Since we are seeing a lot of issues in storage monitoring in loaded deployments - I would perhaps only suggest to review the values and possibly increase defaults to more than 30s for SD monitoring and also for sanlock.
(In reply to Michal Skrivanek from comment #9) > Nir, > we played with it a bit more, and it seems it really is happening only when > NFS servers start to be short on space. Can I have access to the system reproducing this issue, and/or the vmware setup where you import vm from? > Since we are seeing a lot of issues in storage monitoring in loaded > deployments - I would perhaps only suggest to review the values and possibly > increase defaults to more than 30s for SD monitoring and also for sanlock. We cannot increase the timeouts for sanlock - if writing and reading from the lease area (done each 10 seconds) fails for 40 seconds, the storage is considered lost, and sanlock terminate the spm. The storage must work within these limits, otherwise we cannot use it.
David, do you think we can change sanlock configuration to support overloaded storage server?
The sanlock io timeout can be specified when adding a lockspace, see sanlock_add_lockspace_timeout(). Doing that affects all of the other timeouts.
(In reply to David Teigland from comment #13) > The sanlock io timeout can be specified when adding a lockspace, see > sanlock_add_lockspace_timeout(). Doing that affects all of the other > timeouts. Sure, today the time to join a cluster is up to 20 seconds, with timeout of 10 seconds. If we change the timeout to 20 seconds, this delay will be up to 40 seconds - right? I think the key is not having larger timeouts, but preventing storage operations (such as importing a vm) from overloading the machine, leading to sanlock timeouts. Does sanlock use the highest priority for its io operations? can we ensure that sanlock io get higher priority compared with any other io?
> Sure, today the time to join a cluster is up to 20 seconds, with timeout > of 10 seconds. If we change the timeout to 20 seconds, this delay will be > up to 40 seconds - right? right > I think the key is not having larger timeouts, but preventing storage > operations (such as importing a vm) from overloading the machine, leading > to sanlock timeouts. yes > Does sanlock use the highest priority for its io operations? can we ensure > that sanlock io get higher priority compared with any other io? sanlock does not use io priorities (ioprio_set). It's been a long time so I don't remember the reasons... it looks like a few of the possible issues would be: - It only works with the cfq scheduler. Is that used for RHEV storage? - Would the deadline scheduler be a better solution in any case? - I think it would only apply to block devices, not nfs. - I mistakenly thought it would not apply to any aio, but it looks like it would apply to aio reads, assuming all of the previous points made it relevant.
(In reply to David Teigland from comment #15) > > Does sanlock use the highest priority for its io operations? can we ensure > > that sanlock io get higher priority compared with any other io? > > sanlock does not use io priorities (ioprio_set). It's been a long time so I > don't remember the reasons... it looks like a few of the possible issues > would be: > > - It only works with the cfq scheduler. Is that used for RHEV storage? We use the deadline scheduler for block devices using a udev rule. > - Would the deadline scheduler be a better solution in any case? This was performance team advice (according to Federico). > - I think it would only apply to block devices, not nfs. This bug is about nfs, in particular, nfs server that become very slow when almost full (see comment 9). Can we improve the situation for nfs?
I don't have enough nfs knowledge to help with that.
Created attachment 1054043 [details] sanlock log
Created attachment 1054046 [details] output of journalctl -b
David, can you check sanlock log (attachment 1054043 [details])? This line is suspicious (selinux?): 2015-07-13 12:03:03+0300 1622 [13612]: set scheduler RR|RESET_ON_FORK priority 99 failed: Operation not permitted This log is from fedora 22. I don't see this on rhel 7.1 or fedora 21. Is it possible that sanlock, not using RR scheduler is missing deadlines when the machine is overloaded, leading to bogus io timeouts?
Created attachment 1054080 [details] sched_setcheduler test This succeeds when running on rhel 7.1 and fedora 21, fails on fedora 22.
Arik, this issue seems to be cause by sanlock, failing to run using SCHED_RR scheduler. After rebooting your host (bamba.tlv), I cannot reproduce the error in sanlock, and it is running now using the RR scheduler: # ps axf -o pid,stat,cmd,class,rtprio 747 SLsl sanlock daemon -U sanlock - RR 99 748 S \_ sanlock daemon -U sanlo TS - Can you reproduce now the same issue?
(In reply to Nir Soffer from comment #22) I'm sure reboot does not fix it - the host has been rebooted several times and was even installed with different OS, but the problem still happened (in both rhel and fedora). It does not reproduce now, but there is a lot of free space in multipass.eng.lab now and it never reproduced when the storage was not short in free space. As far as I'm concerned, this bug is not urgent anymore since it doesn't block us and the problem is not related to the import using virt-v2v. IMO, a fix for this bug should be verified by checking that the problem does not happen when the amount of free space in the storage is low and external process, such as virt-v2v but might be other, writes to it.
The root cause seems to be failure of sched_setscheduler(2), causing sanlock to run without real-time shceduling. This can cause sanlock to miss deadlines when the machine is overloaded with io, which can cause io timeouts. The sched_setscheduler(2) issue disappeared after reboot, and we cannot reproduce this issue now. This may be also related to low space on nfs server, which cause the server to be very slow, but we don't have enough information to tell. Closing for now, until we have more data.
(In reply to Nir Soffer from comment #24) I don't think that closing this bug without trying to reproduce it properly is the right thing to do. It will be a serious issue once it happens to users and customers, so we should try to prevent it. Nir, what additional data do you need in order to reproduce this issue?
(In reply to Arik from comment #25) > (In reply to Nir Soffer from comment #24) > I don't think that closing this bug without trying to reproduce it properly > is the right thing to do. It will be a serious issue once it happens to > users and customers, so we should try to prevent it. > > Nir, what additional data do you need in order to reproduce this issue? This bug cannot be reproduced currently, since the sched_setcheduler() call does not fail any more. Until it fails again, we have nothing to do here. Closing it since I'm not going to invest more time in this. Feel free to take this bug if you want.
This is not a sanlock issue, removing "sanlock" category. Also lowering severity, as termination by sanlock when loosing access to storage is expected behavior.
Should this be on MODIFIED?
Should this bug be closed?
(In reply to Yaniv Dary from comment #29) > Should this bug be closed? It should.