Bug 1414798
Summary: | NFS storage domain can't be added on Fedora 24/25 (selinux related) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | jniederm | ||||||||||||
Component: | Core | Assignee: | Nir Soffer <nsoffer> | ||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Raz Tamir <ratamir> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 4.20.0 | CC: | bugs, jniederm, laravot, mgrepl, nsoffer, pkliczew, tnisan, ylavi | ||||||||||||
Target Milestone: | ovirt-4.1.3 | Flags: | rule-engine:
ovirt-4.1+
rule-engine: exception+ |
||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2017-05-30 07:15:36 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Created attachment 1242508 [details]
complete-logs.zip
Any analysis on VDSM side why it failed? As it works usually. Managed to reproduce on master, the actual problem occurs when attempting to attach: 2017-01-22 12:16:23,568+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] Failed in 'AttachStorageDomainVDS' method 2017-01-22 12:16:23,583+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command AttachStorageDomainVDS failed: Cannot acquire host id: (u'b7bcd1bb-a43d-4904-88c0-9165ceff636c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')) 2017-01-22 12:16:23,583+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] IrsBroker::Failed::AttachStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot acquire host id: (u'b7bcd1bb-a43d-4904-88c0-9165ceff636c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')), code = 661 2017-01-22 12:16:23,591+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] FINISH, AttachStorageDomainVDSCommand, log id: 2c65b13b 2017-01-22 12:16:23,591+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] Command 'org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot acquire host id: (u'b7bcd1bb-a43d-4904-88c0-9165ceff636c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')), code = 661 (Failed with error AcquireHostIdFailure and code 661) Liron, please have a look asap Jakub (and Tal, if you have it) - please attach also the following logs so i can see why sanlock is failing- /var/log/sanlock.log /var/log/messages The relevant part in sanlock.log: 2017-01-22 13:43:38+0200 2914535 [805]: s16 lockspace b7bcd1bb-a43d-4904-88c0-9165ceff636c:1:/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs6/b7bcd1bb-a43d-4904-88c0-9165ceff636c/dom_md/ids:0 2017-01-22 13:43:38+0200 2914535 [16297]: open error -13 /rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs6/b7bcd1bb-a43d-4904-88c0-9165ceff636c/dom_md/ids 2017-01-22 13:43:38+0200 2914535 [16297]: s16 open_disk /rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs6/b7bcd1bb-a43d-4904-88c0-9165ceff636c/dom_md/ids error -13 2017-01-22 13:43:39+0200 2914536 [805]: s16 add_lockspace fail result -19 Given that I've ran setenforce=0 and without SELinux the attach succeeded As both environments are dev envs, I suspect that in both cases the cause is similar as in the comment [1] ("sanlock was started before being added to the correct groups. Restarting the service fixed the issue") - We've encountered it also in BZ 1160204 Nir, please let me know if it doesn't make sense. Jakub/Tal - please report back. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=959149#c23 After checking Tal's host - that's not the case, even after restarting the sanlock service the issue still exists. selinux blocks sanlock access to the ids file (the mount is completely functional). Jan 22 15:18:55 bathory python3: SELinux is preventing sanlock from 'read, write, open' accesses on the file /rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs8/d89e901b-856 1-4d0a-9ace-bf763096c7b4/dom_md/ids.#012#012***** Plugin catchall (100. confidence) suggests **************************#012#012If you believe that sanlock should be allow ed read write open access on the ids file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow Nir, have we encountered such an issue before? Jakub, it can be seen in your engine log that you had this error a lot of times (in the past two months) and on other flows as well - what was the workaround you used to overcome it? additionally to the logs requested in please specify the exact vdsm version/os you are using. 2016-11-28 17:02:30,183+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-48) [4e8c871c] Failed in 'CreateStoragePoolVDS' method 2016-11-28 17:02:30,188+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-48) [4e8c871c] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM Host1 command failed: Cannot acquire host id: (u'ac68feb4-57a9-49cb-9d90-7d9b74677761', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')) thanks. Created attachment 1243671 [details]
sanlock.log
Created attachment 1243672 [details]
messages
(In reply to Liron Aravot from comment #8) > Jakub, it can be seen in your engine log that you had this error a lot of > times (in the past two months) and on other flows as well - what was the > workaround you used to overcome it? I guess that I experienced this problem last week for the first time. I'm sure that I was able to add a new storage domain at the beginning of the January. I've never used any workaround regarding "AcquireHostIdFailure". > additionally to the logs requested in please specify the exact vdsm > version/os you are using. # rpm -qi vdsm Name : vdsm Version : 4.20.0 Release : 107.gitdc67b50.fc24 Architecture: x86_64 Install Date: Thu 05 Jan 2017 09:19:19 PM CET Group : Applications/System Size : 2586618 License : GPLv2+ Signature : (none) Source RPM : vdsm-4.20.0-107.gitdc67b50.fc24.src.rpm Build Date : Mon 02 Jan 2017 09:28:30 PM CET > > 2016-11-28 17:02:30,183+01 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] > (default task-48) [4e8c871c] Failed in 'CreateStoragePoolVDS' method > 2016-11-28 17:02:30,188+01 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-48) [4e8c871c] Correlation ID: null, Call Stack: null, Custom > Event ID: -1, Message: VDSM Host1 command failed: Cannot acquire host id: > (u'ac68feb4-57a9-49cb-9d90-7d9b74677761', SanlockException(19, 'Sanlock > lockspace add failure', 'No such device')) > > > thanks. Hi Miroslav, The attach to the data center seems to fail due to a different SELinux context, the older domains were created in an older version of Fedora, can you please have a look? Here are the outputs by each command: [root@bathory ~]# ausearch -m AVC -ts today ---- time->Wed Jan 25 11:22:56 2017 type=AVC msg=audit(1485336176.864:26757): avc: denied { read write } for pid=4021 comm="sanlock" name="dbcc5749-9c45-43ce-a0a2-9cc017be91fc.lease" dev="0:47" ino=6684844 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1 ---- time->Wed Jan 25 11:22:56 2017 type=AVC msg=audit(1485336176.864:26758): avc: denied { open } for pid=4021 comm="sanlock" path="/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs5/72131e77-977f-414a-b515-689a8f275abf/images/5ff1fd46-0489-46ad-b2e1-3461c6e625b2/dbcc5749-9c45-43ce-a0a2-9cc017be91fc.lease" dev="0:47" ino=6684844 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1 ---- time->Wed Jan 25 11:22:56 2017 type=AVC msg=audit(1485336176.865:26759): avc: denied { getattr } for pid=4021 comm="sanlock" path="/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs5/72131e77-977f-414a-b515-689a8f275abf/images/5ff1fd46-0489-46ad-b2e1-3461c6e625b2/dbcc5749-9c45-43ce-a0a2-9cc017be91fc.lease" dev="0:47" ino=6684844 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1 ---- time->Wed Jan 25 14:57:05 2017 type=AVC msg=audit(1485349025.889:28364): avc: denied { read write open } for pid=14066 comm="sanlock" path="/rhev/data-center/mnt/darkthrone:_home_nfs_nfs5/51bccb99-1560-49e3-9561-f03ad6f6a054/dom_md/ids" dev="0:47" ino=6684807 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1 ---- time->Wed Jan 25 14:57:05 2017 type=AVC msg=audit(1485349025.889:28365): avc: denied { getattr } for pid=14066 comm="sanlock" path="/rhev/data-center/mnt/darkthrone:_home_nfs_nfs5/51bccb99-1560-49e3-9561-f03ad6f6a054/dom_md/ids" dev="0:47" ino=6684807 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1 [root@bathory ~]# ps -efZ | egrep "sanlock|vdsm" system_u:system_r:virtd_t:s0-s0:c0.c1023 root 761 1 0 Jan22 ? 00:00:07 /usr/bin/python2 /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock system_u:system_r:unconfined_service_t:s0 vdsm 763 1 0 Jan22 ? 00:00:23 /usr/bin/python /usr/bin/ovirt-imageio-daemon system_u:system_r:sanlock_t:s0-s0:c0.c1023 sanlock 773 1 0 Jan22 ? 00:02:08 /usr/sbin/sanlock daemon system_u:system_r:sanlock_t:s0-s0:c0.c1023 root 774 773 0 Jan22 ? 00:00:00 /usr/sbin/sanlock daemon system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 1409 1 1 Jan22 ? 00:55:17 /usr/bin/python2 /usr/share/vdsm/vdsm system_u:system_r:unconfined_service_t:s0 vdsm 2379 1 0 Jan22 ? 00:10:29 python /usr/sbin/momd -c /etc/vdsm/mom.conf system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 3609 1409 0 Jan22 ? 00:00:15 /usr/libexec/ioprocess --read-pipe-fd 122 --write-pipe-fd 121 --max-threads 10 --max-queued-requests 10 system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 3647 1409 0 Jan22 ? 00:00:11 /usr/libexec/ioprocess --read-pipe-fd 135 --write-pipe-fd 134 --max-threads 10 --max-queued-requests 10 system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 13993 1409 0 14:57 ? 00:00:00 /usr/libexec/ioprocess --read-pipe-fd 142 --write-pipe-fd 140 --max-threads 10 --max-queued-requests 10 unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 root 14742 14578 0 14:59 pts/1 00:00:00 grep -E --color=auto sanlock|vdsm [root@bathory ~]# ls -lZ /rhev/data-center/mnt/ ls: cannot access '/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs12': Stale file handle total 36 drwxr-xr-x. 5 vdsm kvm system_u:object_r:mnt_t:s0 4096 Dec 20 17:38 blockSD drwxr-xr-x. 2 vdsm kvm system_u:object_r:mnt_t:s0 4096 Nov 9 17:15 darkthrone:_home_nfs_export drwxrwxrwx. 3 root root unconfined_u:object_r:home_root_t:s0 4096 Jan 25 14:48 darkthrone:_home_nfs_nfs5 drwxrwxrwx. 3 root root system_u:object_r:nfs_t:s0 4096 Jan 17 21:22 darkthrone.tlv:_home_nfs_export2 drwxr-xr-x. 2 vdsm kvm system_u:object_r:mnt_t:s0 4096 Jan 22 17:09 darkthrone.tlv:_home_nfs_nfs11 d?????????? ? ? ? 0 ? ? darkthrone.tlv:_home_nfs_nfs12 drwxrwxrwx. 3 root root system_u:object_r:nfs_t:s0 4096 Dec 20 16:48 darkthrone.tlv:_home_nfs_nfs3 drwxr-xr-x. 2 vdsm kvm system_u:object_r:mnt_t:s0 4096 Jan 22 14:53 darkthrone.tlv:_home_nfs_nfs6 drwxr-xr-x. 2 vdsm kvm system_u:object_r:mnt_t:s0 4096 Jan 22 15:01 darkthrone.tlv:_home_nfs_nfs7 drwxr-xr-x. 2 vdsm kvm system_u:object_r:mnt_t:s0 4096 Jan 22 15:18 darkthrone.tlv:_home_nfs_nfs8 [root@bathory ~]# ls -lZ /rhev/data-center/mnt/darkthrone:_home_nfs_nfs5/51bccb99-1560-49e3-9561-f03ad6f6a054/dom_md total 2052 -rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0 1048576 Jan 25 15:01 ids -rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0 16777216 Jan 25 14:48 inbox -rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0 2097152 Jan 25 14:57 leases -rw-r--r--. 1 vdsm kvm system_u:object_r:home_root_t:s0 327 Jan 25 14:57 metadata -rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0 16777216 Jan 25 14:48 outbox Could you show me how it is mounted? # mount ... 192.168.122.1:/home/jakub/nfs/d41-2 on /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d41-2 type nfs4 (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.122.152,local_lock=none,addr=192.168.122.1) ... copied from system with selinux disabled (In reply to jniederm from comment #14) > # mount > ... > 192.168.122.1:/home/jakub/nfs/d41-2 on > /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d41-2 type nfs4 > (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft, > nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168. > 122.152,local_lock=none,addr=192.168.122.1) > ... > > copied from system with selinux disabled Ok, could you try to remount it without "seclabel" option to see if you are able to reproduce it with this change? *** Bug 1425799 has been marked as a duplicate of this bug. *** (In reply to Miroslav Grepl from comment #15) > Ok, could you try to remount it without "seclabel" option to see if you are > able to reproduce it with this change? Miroslav, this is how vdsm mount: /usr/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=4,minorversion=1 server:/path /rhev/data-center/mnt/server:_path (Copied from supervdsm.log on rhel 7.3, but command should be the same on Fedora). Do we have to change something in the mount command? or maybe the selinux policy in Fedora needs a change? BTW, we don't support selinux disabled, I don't know the mount in comment 14 was done, but when this comment was written, Engine did not support NFS 4.2 (the UI would not let you set version to 4.2). So this mount command is probably from POSIX domain, which was the only way to mount NFS with version 4.2 when this bug was opened. However we have another reproducer in bug 1425799 with NFS mount. Created attachment 1256991 [details]
vdsmlogs.zip
I added following line to /etc/vdsm/vdsm.conf to explicitly list wanted mount options
nfs_mount_options = soft,nosharecache,timeo=600,retrans=6,nfsvers=4,minorversion=1
but it was mounted with the 'seclabel' option again. I don't know why.
# mount
...
192.168.122.1:/home/jakub/nfs/d1 on /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d1 type nfs4 (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.122.229,local_lock=none,addr=192.168.122.1)
supervdsm.log states:
MainProcess|jsonrpc/4::DEBUG::2017-02-23 18:39:10,611::supervdsm_server::92::SuperVdsm.ServerCallback::(wrapper) call mount with (u'192.168.122.1:/home/jakub/nfs/d1', u'/rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d1') {'vfstype': 'nfs', 'mntOpts': 'soft,nosharecache,timeo=600,retrans=6', 'timeout': None, 'cgroup': None}
I don't have capacity investigate it in detail. vdsm.log and supervdsm.log attached.
Re-setting accidentally removed Nir's needinfo flag on Miroslav. Moving to 4.1.3, as it affects Fedora only. (In reply to jniederm from comment #18) > # mount > ... > 192.168.122.1:/home/jakub/nfs/d1 on > /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d1 type nfs4 > (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft, > nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168. > 122.229,local_lock=none,addr=192.168.122.1) You are using nfs version 4.2, so this looks like bug 1432783 Miroslav, do you think this is a duplicate? Marking as duplicate of bug 1432783 based on above comments. *** This bug has been marked as a duplicate of bug 1432783 *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1242503 [details] engine-log-created-between-dialog-confirmation-and-error-popup.log Description of problem: Adding a nfs storage domain fails with error message: "Error while executing action Attach Storage Domain: AcquireHostIdFailure" Version-Release number of selected component (if applicable): 100% Steps to Reproduce: 1. Let's have cluster with a host and a nfs data storage domain 2. Add nfs export storage domain Actual results: Error dialog pops up Expected results: Export storage domain added Additional info: * dir of export domain was empty before the addition * it had permissions 777 and ownership 36:36