Description of problem: When sanlock is having selinux issues, if logs the error to sanlock log and even suggest a recovery action: 2019-12-01 15:09:10 9979 [4645]: open error -13 EACCES: no permission to open /rhev/data-center/mnt/darkthrone:_home_nfs_nfs1/0aedc3e3-b12b-4334-a69c-44269ee825ac/dom_md/ids 2019-12-01 15:09:10 9979 [4645]: check that daemon user sanlock 179 group sanlock 179 has access to disk or file. But the actual API call fails with -ENODEV (seen in vdsm): 2019-12-01 14:37:45,800+0200 ERROR (jsonrpc/0) [storage.initSANLock] Cannot initialize SANLock for domain 3c4e84a7-32b3-4116-9ac4-630c17631e89 (clusterlock:259) Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/vdsm/storage/clusterlock.py", line 250, in initSANLock lockspace_name, idsPath, align=alignment, sector=block_size) sanlock.SanlockException: (19, 'Sanlock lockspace write failure', 'No such device') Looking at the code: 263 int open_disk(struct sync_disk *disk) 264 { 265 struct stat st; 266 int fd, rv; 267 268 fd = open(disk->path, O_RDWR | O_DIRECT | O_SYNC, 0); 269 if (fd < 0) { 270 rv = -errno; 271 if (rv == -EACCES) { 272 log_error("open error %d EACCES: no permission to open %s", rv, disk->path); 273 log_error("check that daemon user %s %d group %s %d has access to disk or file.", 274 com.uname, com.uid, com.gname, com.gid); 275 } else 276 log_error("open error %d %s", rv, disk->path); 277 goto fail; 278 } ... 300 fail: 301 if (rv >= 0) 302 rv = -1; 303 return rv; 304 } So open_disk returned -EACCESS... 699 rv = open_disk(&sp->host_id_disk); 700 if (rv < 0) { 701 log_erros(sp, "open_disk %s error %d", sp->host_id_disk.path, rv); 702 acquire_result = -ENODEV; 703 delta_result = -1; 704 goto set_status; 705 } 706 opened = 1; ... 790 set_status: 791 pthread_mutex_lock(&sp->mutex); 792 sp->lease_status.acquire_last_result = acquire_result; 793 sp->lease_status.acquire_last_attempt = delta_begin; 794 if (delta_result == SANLK_OK) 795 sp->lease_status.acquire_last_success = last_success; So we report lease_status as -ENODEV, hiding the actual issue. Version-Release number of selected component (if applicable): sanlock-3.8.0-1.fc30.x86_64 How reproducible: Always Steps to Reproduce: 1. Call sanlock.write_lockspace() via python bindings when storage access will fail because of selinux configuration. Actual results: Report generic error, client cannot report useful error Expected results: Report the reason the lockspace could not be accessed, so clients can report useful error. In oVirt use case, the error from sanlock propagate to vdsm, and from vdsm to engine, where it can be inspected in engine log and in the UI. This makes debugging issues in a cluster easier when you have single place to look for errors. Workaround: Look in /var/sanlock.log, where reason is reported.
Correction - I looked at wrong place before - Here is the actual code handling sanlock_write_lockspace: 1822 rv = open_disk(&sd); 1823 if (rv < 0) { 1824 result = -ENODEV; 1825 goto reply; 1826 } 1827 1828 if (ca->header.data2) 1829 io_timeout = ca->header.data2; 1830 1831 result = delta_lease_init(task, &lockspace, io_timeout, &sd); 1832 1833 close_disks(&sd, 1); 1834 reply: 1835 log_debug("cmd_write_lockspace %d,%d done %d", ca->ci_in, fd, result); 1836 1837 send_result(fd, &ca->header, result); 1838 client_resume(ca->ci_in);
Created attachment 1641085 [details] Possible patch This should fix the error, not tested and may need more work.
When doing that in the past, a problem I've run into is that the error number might be the same error number that is returned in another error path. So a caller won't know if the error came from opening the device or from some other part of the code. One way to work around that problem is to translate a specific error location+number to a new SANLK error number that you add in sanlock_rv.h, and return the SANLK number. e.g. rv = open_disk(&sd); if (rv < 0) { if (rv == -EACCES) result = SANLK_OPEN_ACCES; else result = -ENODEV; goto reply; }
This message is a reminder that Fedora 30 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '30'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 30 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This is still not fixed in master, moving to Fedora 31.
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.