Description of problem: Sanlock version fixing this issue eliminate rare failures in vdsm, when vdsm receive a signal during sanlock operation. When this happens, the vdsm operation fails with this error: jsonrpc.Executor/3::ERROR::2016-06-08 00:03:48,803::task::868::Storage.TaskManager.Task::(_setError) Task=`1bb2fee0-9a56-49e4-890b-3c1b051683f5`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 875, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 606, in getSpmStatus status = self._getSpmStatusInfo(pool) File "/usr/share/vdsm/storage/hsm.py", line 600, in _getSpmStatusInfo (pool.spmRole,) + pool.getSpmStatus())) File "/usr/share/vdsm/storage/sp.py", line 114, in getSpmStatus return self._backend.getSpmStatus() File "/usr/share/vdsm/storage/spbackends.py", line 430, in getSpmStatus lVer, spmId = self.masterDomain.inquireClusterLock() File "/usr/share/vdsm/storage/sd.py", line 688, in inquireClusterLock return self._manifest.inquireDomainLock() File "/usr/share/vdsm/storage/sd.py", line 436, in inquireDomainLock return self._domainLock.inquire() File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 327, in inquire resource = sanlock.read_resource(self._leasesPath, SDM_LEASE_OFFSET) SanlockException: (4, 'Sanlock resource read failure', 'Interrupted system call') Version-Release number of selected component (if applicable): any How reproducible: rare Fixed upstream in sanlcok, we are waiting for a backport.
sanlock-3.2.4-3.el7_2 is available for testing, ON_QA already, for 7.2.z.
(In reply to Yaniv Kaul from comment #2) > sanlock-3.2.4-3.el7_2 is available for testing, ON_QA already, for 7.2.z. The package was not released yet, so are still blocked on it, but I sent a patch to require it. We will merge the patch as soon as the package is available to users.
I think the bug for 7.2.z is bug 1357883 .
With RHEL 7.3 released, the 7.2.z backport is a mute point. Sanlock 3.4.0-1 delivers a fix for this issue via bug 1356667. We'll wait a couple of days till it hits the CentOS repository too and require it in vdsm.spec.in in time for oVirt 4.0.6. RHEL users can already run "yum update sanlock*" to receive the newest sanlock package that resolves the issue regardless of VDSM's requirements.
reducing both priority and severity as a fix is available by yum upgrade sanlock. When CentOS' package will be available, we'll consume it.
RHEL 7.2.z has become a mute point by now. We should require the sanlock version provided in RHEL 7.3 and QAed by bug 1356667.
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.
Nir, Can you provide clear steps to reproduce?
This is random error, there is no way to reproduce it. The fix is in sanlock, and in vdsm we only require the version with the fix.
vdsm requires the correct Sanlock version which delivers a fix for this issue via bug 1356667 (according to comment #5) [root@storage-ge2-vdsm1 ~]# yum deplist vdsm dependency: sanlock >= 3.4.0-1 provider: sanlock.x86_64 3.4.0-1.el7 Tested using: vdsm-4.19.2-2.el7ev.x86_64