Bug 886421

Summary: sanlock cannot open the lease path: EACCES
Product: Red Hat Enterprise Linux 6 Reporter: Luwen Su <lsu>
Component: sanlockAssignee: David Teigland <teigland>
Status: CLOSED NOTABUG QA Contact: Yaniv Kaul <ykaul>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, ajia, bili, cluster-maint, dallan, dyasny, dyuan, jdenemar, jkt, mprivozn, mzhan, psingare, rwu, ydu
Target Milestone: rc   
Target Release: 6.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 820173 Environment:
Last Closed: 2013-05-10 15:04:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 820173    
Bug Blocks: 832156, 960054    

Comment 2 David Teigland 2012-12-12 18:34:50 UTC
The only sanlock error that I've seen above is:

open error -13 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__

sanlock is given the path to the device/file to use for leases, and it does this:

open(disk->path, O_RDWR | O_DIRECT | O_SYNC, 0);

And the result is -EACCES (-13).

So, I can't access the file I've been given.  The most obvious potential problem would be related to permissions, either to the file or one component of the path to the file.  Or, the file does not exist yet (or has been unlinked).  Or, there's a small possibility that nfs is behaving strangely when two processes have opened the file concurrently with different options, e.g. if libvirt has the file open for buffered i/o, but sanlock opens it for direct and sync i/o.  You might try closing the file in libvirt before passing the path to sanlock.

So, my two theories about what could be wrong with this file are:
- permissions along the path to the file
- strange behavior from the nfs file system

Comment 3 Michal Privoznik 2012-12-14 08:34:45 UTC
David,

In fact, the summary of this bug is wrong. It should be s/EACCES/ENODEV/.
The error message from comment 51 in bug 820173 (I am referring to the parent bug even though the fist comment contains all info from it, but it's a mess joined into one big comment):

2012-12-03 06:04:33.329+0000: 13485: error : virLockManagerSanlockSetupLockspace:280 : Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device


And the appropriate code which is causing this:

  http://pastebin.test.redhat.com/119862

I've marked the line 280 which is actually reporting the error message. And we can see ENODEV there, not EACCESS.

Comment 4 David Teigland 2012-12-14 16:07:48 UTC
I'm confused about why you seem more interested in the reporting of the error than the cause of the error.

1. libvirt calls sanlock_align()
2. sanlock_align() calls into daemon
3. cmd_align() calls open_disk()
4. open_disk() calls open(disk->path, O_RDWR | O_DIRECT | O_SYNC, 0)
5. open() fails with EACCES
6. open_disk() logs error -13 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__
7. open_disk() returns -EACCES
8. cmd_align() returns -ENODEV to the client
9. sanlock_align() returns -ENODEV
10. libvirt logs Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device

In comment 2, I'm interested why step 5 happens.
You seem to be interested in why step 10 happens.

Isn't step 5, open/EACCES, the root problem, and isn't that what needs to be solved?  Why are you focusing on the libvirt/ENODEV message when that is not the root cause of the problem?  I'm not opposed to improving error messages, but that would seem to be of secondary importance to fixing the actual problem.

Comment 5 RHEL Program Management 2012-12-18 06:48:07 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 David Teigland 2013-05-10 15:04:09 UTC
I don't believe there is a bug here.