Bug 1778485 - Sanlock fail to open disk (EACCESS), sanlock_write_lockspace fails with -ENODEV
Summary: Sanlock fail to open disk (EACCESS), sanlock_write_lockspace fails with -ENODEV
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: sanlock
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: David Teigland
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-01 14:27 UTC by Nir Soffer
Modified: 2020-11-24 18:48 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-11-24 18:48:25 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Possible patch (1.09 KB, application/mbox)
2019-12-01 15:33 UTC, Nir Soffer
no flags Details

Description Nir Soffer 2019-12-01 14:27:02 UTC
Description of problem:

When sanlock is having selinux issues, if logs the error to sanlock log and
even suggest a recovery action:

2019-12-01 15:09:10 9979 [4645]: open error -13 EACCES: no permission to open /rhev/data-center/mnt/darkthrone:_home_nfs_nfs1/0aedc3e3-b12b-4334-a69c-44269ee825ac/dom_md/ids
2019-12-01 15:09:10 9979 [4645]: check that daemon user sanlock 179 group sanlock 179 has access to disk or file.

But the actual API call fails with -ENODEV (seen in vdsm):

2019-12-01 14:37:45,800+0200 ERROR (jsonrpc/0) [storage.initSANLock] Cannot initialize SANLock for domain 3c4e84a7-32b3-4116-9ac4-630c17631e89 (clusterlock:259)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/vdsm/storage/clusterlock.py", line 250, in initSANLock
    lockspace_name, idsPath, align=alignment, sector=block_size)
sanlock.SanlockException: (19, 'Sanlock lockspace write failure', 'No such device')

Looking at the code:


 263 int open_disk(struct sync_disk *disk)
 264 {
 265         struct stat st;
 266         int fd, rv;
 267 
 268         fd = open(disk->path, O_RDWR | O_DIRECT | O_SYNC, 0);
 269         if (fd < 0) {
 270                 rv = -errno;
 271                 if (rv == -EACCES) {
 272                         log_error("open error %d EACCES: no permission to open %s", rv, disk->path);
 273                         log_error("check that daemon user %s %d group %s %d has access to disk or file.",
 274                                   com.uname, com.uid, com.gname, com.gid);
 275                 } else
 276                         log_error("open error %d %s", rv, disk->path);
 277                 goto fail;
 278         }
...
 300  fail:
 301         if (rv >= 0)
 302                 rv = -1;
 303         return rv;
 304 }

So open_disk returned -EACCESS...

 699         rv = open_disk(&sp->host_id_disk);
 700         if (rv < 0) {
 701                 log_erros(sp, "open_disk %s error %d", sp->host_id_disk.path, rv);
 702                 acquire_result = -ENODEV;
 703                 delta_result = -1;
 704                 goto set_status;
 705         }
 706         opened = 1;
...
 790  set_status:
 791         pthread_mutex_lock(&sp->mutex);
 792         sp->lease_status.acquire_last_result = acquire_result;
 793         sp->lease_status.acquire_last_attempt = delta_begin;
 794         if (delta_result == SANLK_OK)
 795                 sp->lease_status.acquire_last_success = last_success;

So we report lease_status as -ENODEV, hiding the actual issue.


Version-Release number of selected component (if applicable):
sanlock-3.8.0-1.fc30.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Call sanlock.write_lockspace() via python bindings when storage access will fail
   because of selinux configuration.

Actual results:
Report generic error, client cannot report useful error

Expected results:
Report the reason the lockspace could not be accessed, so clients
can report useful error.

In oVirt use case, the error from sanlock propagate to vdsm, and from
vdsm to engine, where it can be inspected in engine log and in the UI.
This makes debugging issues in a cluster easier when you have single 
place to look for errors.

Workaround:
Look in /var/sanlock.log, where reason is reported.

Comment 1 Nir Soffer 2019-12-01 15:32:16 UTC
Correction - I looked at wrong place before - Here is the actual code
handling sanlock_write_lockspace:

1822         rv = open_disk(&sd);
1823         if (rv < 0) {
1824                 result = -ENODEV;
1825                 goto reply;
1826         }
1827 
1828         if (ca->header.data2)
1829                 io_timeout = ca->header.data2;
1830 
1831         result = delta_lease_init(task, &lockspace, io_timeout, &sd);
1832 
1833         close_disks(&sd, 1);
1834  reply:
1835         log_debug("cmd_write_lockspace %d,%d done %d", ca->ci_in, fd, result);
1836 
1837         send_result(fd, &ca->header, result);
1838         client_resume(ca->ci_in);

Comment 2 Nir Soffer 2019-12-01 15:33:28 UTC
Created attachment 1641085 [details]
Possible patch

This should fix the error, not tested and may need more work.

Comment 3 David Teigland 2019-12-02 15:58:50 UTC
When doing that in the past, a problem I've run into is that the error number might be the same error number that is returned in another error path.  So a caller won't know if the error came from opening the device or from some other part of the code.  One way to work around that problem is to translate a specific error location+number to a new SANLK error number that you add in sanlock_rv.h, and return the SANLK number.  e.g.

rv = open_disk(&sd);
if (rv < 0) {
  if (rv == -EACCES)
    result = SANLK_OPEN_ACCES;
  else
    result = -ENODEV;
  goto reply;
}

Comment 4 Ben Cotton 2020-04-30 20:39:49 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Nir Soffer 2020-04-30 20:42:43 UTC
This is still not fixed in master, moving to Fedora 31.

Comment 6 Ben Cotton 2020-11-03 15:55:47 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Ben Cotton 2020-11-24 18:48:25 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.