Bug 1356676 - consume fix for "Bug 1356667: libsanlock does not handle EINTR, causing failures in client" to be released 2016-Sep-13
Summary: consume fix for "Bug 1356667: libsanlock does not handle EINTR, causing failu...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.18.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.1.0-beta
: ---
Assignee: Allon Mureinik
QA Contact: Elad
URL:
Whiteboard:
Depends On: 1356667
Blocks: 1416336
TreeView+ depends on / blocked
 
Reported: 2016-07-14 16:29 UTC by Nir Soffer
Modified: 2017-02-15 14:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1416336 (view as bug list)
Environment:
Last Closed: 2017-02-15 14:50:22 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: exception+
rule-engine: planning_ack+
amureini: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 61200 0 master ABANDONED spec: Require sanlock version handling EINTR 2016-12-27 13:20:33 UTC
oVirt gerrit 69197 0 master MERGED spec: Require sanlock version handling EINTR 2016-12-27 16:12:20 UTC
oVirt gerrit 69201 0 ovirt-4.0 MERGED spec: Require sanlock version handling EINTR 2016-12-28 15:31:42 UTC
oVirt gerrit 69210 0 ovirt-4.1 MERGED spec: Require sanlock version handling EINTR 2016-12-28 14:35:17 UTC

Description Nir Soffer 2016-07-14 16:29:00 UTC
Description of problem:

Sanlock version fixing this issue eliminate rare failures in vdsm, when vdsm
receive a signal during sanlock operation.

When this happens, the vdsm operation fails with this error:

jsonrpc.Executor/3::ERROR::2016-06-08 00:03:48,803::task::868::Storage.TaskManager.Task::(_setError) Task=`1bb2fee0-9a56-49e4-890b-3c1b051683f5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 606, in getSpmStatus
    status = self._getSpmStatusInfo(pool)
  File "/usr/share/vdsm/storage/hsm.py", line 600, in _getSpmStatusInfo
    (pool.spmRole,) + pool.getSpmStatus()))
  File "/usr/share/vdsm/storage/sp.py", line 114, in getSpmStatus
    return self._backend.getSpmStatus()
  File "/usr/share/vdsm/storage/spbackends.py", line 430, in getSpmStatus
    lVer, spmId = self.masterDomain.inquireClusterLock()
  File "/usr/share/vdsm/storage/sd.py", line 688, in inquireClusterLock
    return self._manifest.inquireDomainLock()
  File "/usr/share/vdsm/storage/sd.py", line 436, in inquireDomainLock
    return self._domainLock.inquire()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 327, in inquire
    resource = sanlock.read_resource(self._leasesPath, SDM_LEASE_OFFSET)
SanlockException: (4, 'Sanlock resource read failure', 'Interrupted system call')

Version-Release number of selected component (if applicable):
any

How reproducible:
rare

Fixed upstream in sanlcok, we are waiting for a backport.

Comment 2 Yaniv Kaul 2016-07-21 13:29:51 UTC
sanlock-3.2.4-3.el7_2 is available for testing, ON_QA already, for 7.2.z.

Comment 3 Nir Soffer 2016-07-21 14:15:49 UTC
(In reply to Yaniv Kaul from comment #2)
> sanlock-3.2.4-3.el7_2 is available for testing, ON_QA already, for 7.2.z.

The package was not released yet, so are still blocked on it, but I sent a patch
to require it. We will merge the patch as soon as the package is available
to users.

Comment 4 Yaniv Kaul 2016-07-25 12:52:41 UTC
I think the bug for 7.2.z is bug 1357883 .

Comment 5 Allon Mureinik 2016-11-07 08:53:36 UTC
With RHEL 7.3 released, the 7.2.z backport is a mute point. Sanlock 3.4.0-1 delivers a fix for this issue via bug 1356667.

We'll wait a couple of days till it hits the CentOS repository too and require it in vdsm.spec.in in time for oVirt 4.0.6.
RHEL users can already run "yum update sanlock*" to receive the newest sanlock package that resolves the issue regardless of VDSM's requirements.

Comment 6 Allon Mureinik 2016-11-22 17:33:01 UTC
reducing both priority and severity as a fix is available by yum upgrade sanlock. When CentOS' package will be available, we'll consume it.

Comment 7 Allon Mureinik 2016-12-27 15:26:25 UTC
RHEL 7.2.z has become a mute point by now. We should require the sanlock version provided in RHEL 7.3 and QAed by bug 1356667.

Comment 8 Sandro Bonazzola 2017-01-25 07:55:20 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 9 Raz Tamir 2017-01-26 08:35:12 UTC
Nir,
Can you provide clear steps to reproduce?

Comment 10 Nir Soffer 2017-01-31 11:43:25 UTC
This is random error, there is no way to reproduce it. The fix is in sanlock, and
in vdsm we only require the version with the fix.

Comment 11 Elad 2017-01-31 11:57:46 UTC
vdsm requires the correct Sanlock version which delivers a fix for this issue via bug 1356667 (according to comment #5)

[root@storage-ge2-vdsm1 ~]# yum deplist vdsm

  dependency: sanlock >= 3.4.0-1
   provider: sanlock.x86_64 3.4.0-1.el7


Tested using:
vdsm-4.19.2-2.el7ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.