Bug 1821042 - Have errors: "SanlockException: (104, 'Unable to read resource owners', 'Connection reset by peer')"
Summary: Have errors: "SanlockException: (104, 'Unable to read resource owners', 'Conn...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.30.30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.4.6
: ---
Assignee: Nir Soffer
QA Contact: Avihai
URL:
Whiteboard:
Depends On: 1812185
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-05 15:52 UTC by David Vaanunu
Modified: 2021-05-13 06:22 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-05-12 13:23:30 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
vdsm.log (1.18 MB, application/x-xz)
2020-04-05 15:52 UTC, David Vaanunu
no flags Details

Description David Vaanunu 2020-04-05 15:52:32 UTC
Created attachment 1676415 [details]
vdsm.log

Description of problem:

During of running flow of creating VM, CreateSnapshot(*30), DeleteVM
get the error in vdsm.log 


Version-Release number of selected component (if applicable):

rhv-release-4.3.5-1-001
vdsm-4.30.24-2.el7ev.x86_64


How reproducible:


Steps to Reproduce:
1. Running 50 Users concurrency
2. CreateVM, CreateSnapshots(*30), DeleteVM
3. Run ~5 cycles (~20Hours)

Actual results:
ERROR messages

Expected results:
No Errors



ERROR MESSAGE:

2020-04-02 04:25:05,584+0000 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='e4444821-3794-4ed6-a017-e77faf59c881') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in getVolumeInfo
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3107, in getVolumeInfo
    info = vol.getInfo()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 259, in getInfo
    leasestatus = self.getLeaseStatus()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 204, in getLeaseStatus
    self.volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 531, in inquireVolumeLease
    return self._domainLock.inquire(lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 460, in inquire
    sector=self._block_size)
SanlockException: (104, 'Unable to read resource owners', 'Connection reset by peer')
2020-04-02 04:25:05,584+0000 INFO  (jsonrpc/3) [storage.TaskManager.Task] (Task='e4444821-3794-4ed6-a017-e77faf59c881') aborting: Task is aborted: u"(104, 'Unable to read resource owners', 'Connection reset by peer')" - code 100 (task:1181)
2020-04-02 04:25:05,585+0000 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH getVolumeInfo error=(104, 'Unable to read resource owners', 'Connection reset by peer') (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare
    raise self.error
SanlockException: (104, 'Unable to read resource owners', 'Connection reset by peer')



Additional info:

Comment 1 Eyal Shenitzky 2020-04-06 14:16:00 UTC
David, is this a regression? is it reproducible?

Comment 2 David Vaanunu 2020-04-20 12:55:14 UTC
it is reproducible

regression - It is the first time we ran this scenario

related:

RHV bug:
https://bugzilla.redhat.com/1664159

Sanlock bug:
https://bugzilla.redhat.com/1812185

Comment 3 Tal Nisan 2020-06-29 18:47:55 UTC
Targeting to 4.5 for the meanwhile as the Sanlock fix will only be released in RHEL 8.3, if needed please escalate to have it in 8.2.z so we can consume it earlier

Comment 4 Nir Soffer 2020-10-15 13:01:08 UTC
We cannot deffer this to 4.5, sanlock is a critical component and we
must use the lastest version to ensure that we consume all fixes.

Moving back to 4.4.3. This version requires 8.3 so it must use sanlock
from 8.3 (8.3.2), fixing this issue.

Comment 5 Amit Bawer 2020-10-15 14:06:16 UTC
(In reply to Nir Soffer from comment #4)
> We cannot deffer this to 4.5, sanlock is a critical component and we
> must use the lastest version to ensure that we consume all fixes.
> 
> Moving back to 4.4.3. This version requires 8.3 so it must use sanlock
> from 8.3 (8.3.2), fixing this issue.

According to release notes 4.4.3 is for el 8.2 [1], rhel 8.3 is currently beta AFAICT, centos 8.3 is not to be seen on the centos downloads.
so if this ticket is for adding a spec requirement for sanlock-3.8.2 or later wouldn't it break users installation on a missing dep?

[1] https://www.ovirt.org/release/4.4.3/

Comment 7 Eyal Shenitzky 2021-03-09 08:34:14 UTC
We are already requiring sanlock >= 3.8.2-4.
Is this solves this bug?

Comment 8 Nir Soffer 2021-04-22 10:34:14 UTC
This random failure that is not reproducible. Vdsm requires the sanlock version
fixing for a while since 4.4.5.

Moving to ONQA.


Note You need to log in before you can comment on or make changes to this bug.