Bug 906590

Summary:	Sanlock mishandles locks for paused domains in libvirt
Product:	Red Hat Enterprise Linux 6	Reporter:	Michael Rodrigues <help>
Component:	sanlock	Assignee:	David Teigland <teigland>
Status:	CLOSED NOTABUG	QA Contact:	yeylon <yeylon>
Severity:	medium	Docs Contact:
Priority:	low
Version:	6.3	CC:	ajia, cluster-maint, srevivo
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-09-30 14:08:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michael Rodrigues 2013-01-31 23:40:51 UTC

Description of problem:

Sanlock does not properly lock filesystems of paused VMs, allowing another domain to use the same filesystem (shared Logical Volume on Fibre Channel) and boot without a lock error.


Version-Release number of selected component (if applicable):

libvirt-lock-sanlock.x86_64        0.9.10-21.el6_3.8           @updates         
sanlock.x86_64                     2.3-1.el6                   @base            
sanlock-devel.x86_64               2.3-1.el6                   @base            
sanlock-lib.x86_64                 2.3-1.el6                   @base 


How reproducible:

100%


Steps to Reproduce:

1. Share an LVM volume group between two hosts running libvirt.
2. Create a LV-based VM on node 1.
3. Start the VM on node 1.
4. Migrate the VM to node 2. Node 1's copy is now shutdown, node 2's should be running.
5. Shutdown node 2
6. Start node 1
7. Pause node 1
8. Start node 2
9. Pause node 2
10. Start node 1

  
Actual results:

Sanlock throws no lock errors on any of the above steps. It will only throw a lock error if I try to start one while the other is running.

Expected results:

According to the libvirt-users mailing list, when I pause node 1 in step 7, it should record a lease version number. When I resume in step 10, the version should mismatch due to step 8 and throw an error.

Additional info:

https://www.redhat.com/archives/libvirt-users/2013-January/msg00109.html

Comment 3 David Teigland 2013-02-01 20:35:07 UTC

Hi, sanlock is only tested and supported as part of the RHEV product, but I'd still like to investigate whether there's a bug here or not.  To do that, I'll need you to:

1. Try this on Fedora 18, or RHEL 6.4 (once that's available).
2. Show us your specific configuration files, and the exact commands you are running in sequence.
3. Collect debugging information from sanlock and libvirt.
For sanlock, run the command "sanlock log_dump > log.txt" from each node.
I'm not sure what libvirt info to collect.

Comment 4 Michael Rodrigues 2013-02-01 20:42:39 UTC

Hi,

I wasn't exactly sure where to post the bug, just directed to by the user list.

I can do all of this on Fedora 18 but I don't have access to RHEL. Is it still worth the trouble for me to do if I can't provide the relevant RHEL information? I'm doing all of my testing on CentOS 6.3 currently.

Thanks for your input.

Comment 5 David Teigland 2013-02-01 20:46:42 UTC

Fedora 18 would be the best way to test this.

Comment 7 RHEL Program Management 2013-02-07 06:47:09 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 8 David Teigland 2015-09-30 14:08:35 UTC

There was never a specific issue identified.