Bug 1955813

Summary: bad client message causes closed connection and released leases
Product: Red Hat Enterprise Linux 8 Reporter: David Teigland <teigland>
Component: sanlockAssignee: David Teigland <teigland>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: aefrat, agk, cluster-maint, cmarthal, jbrassow, mcsontos, nsoffer, vjuranek
Target Milestone: betaKeywords: Rebase, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sanlock-3.8.4-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1965481 (view as bug list) Environment:
Last Closed: 2021-11-09 19:44:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1965481    

Description David Teigland 2021-04-30 20:45:44 UTC
Description of problem:

If the sanlock daemon receives a bad message from a client, it will respond by closing the client connection and releasing any resource leases that are held by the client.  The client may continue running, using the leases, unaware of the problem.  The sanlock daemon should just ignore bad messages and leave the client connection and leases in place.

One indication of this problem that would appear in /var/log/sanlock.log is:

"ci 2 recv 32 magic 0 vs 4282010"  (message contains wrong magic number)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Nir Soffer 2021-05-19 19:58:36 UTC
Avihai, can you add QE ack for this bug?

This bug affects RHV (bug 1952345), but it is practically impossible
to reproduce with current vdsm, so no special testing is needed.

What we need is to test sanlock build in RHV automated test to make sure
there are no regressions.


Corey from cluster-qe can help with the errata process, regression testing,
etc, but we need to make sure this sanlock version does not break RHV before 
it is released.

Comment 3 Avihai 2021-05-20 08:54:15 UTC
(In reply to Nir Soffer from comment #2)
> Avihai, can you add QE ack for this bug?
> 
> This bug affects RHV (bug 1952345), but it is practically impossible
> to reproduce with current vdsm, so no special testing is needed.
> 
> What we need is to test sanlock build in RHV automated test to make sure
> there are no regressions.
> 
> 
> Corey from cluster-qe can help with the errata process, regression testing,
> etc, but we need to make sure this sanlock version does not break RHV before 
> it is released.

I can not QE ACK for RHEL QE in a RHEL bug as I'm RHV QE.

I have no objection to adding this new fixed RHEL/sanlock in RHV side once get an official RHEL build in RHV downstream and test it in RHV regressions but can not ACK for RHEL QE.

Let's stick to the official process, RHEL QE should do QE ACKs for RHEL bugs, not RHV QE.

Comment 12 Corey Marthaler 2021-06-07 17:24:27 UTC
Marking this verified in the latest rpms based on our regression testing. I believe RHV QA provided their testing findings in bug 1961748.

sanlock-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021
sanlock-lib-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021

Comment 15 errata-xmlrpc 2021-11-09 19:44:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sanlock bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4422