Bug 1414798

Summary: NFS storage domain can't be added on Fedora 24/25 (selinux related)
Product: [oVirt] vdsm Reporter: jniederm
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED DUPLICATE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: high    
Version: 4.20.0CC: bugs, jniederm, laravot, mgrepl, nsoffer, pkliczew, tnisan, ylavi
Target Milestone: ovirt-4.1.3Flags: rule-engine: ovirt-4.1+
rule-engine: exception+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-30 07:15:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine-log-created-between-dialog-confirmation-and-error-popup.log
none
complete-logs.zip
none
sanlock.log
none
messages
none
vdsmlogs.zip none

Description jniederm 2017-01-19 13:36:50 UTC
Created attachment 1242503 [details]
engine-log-created-between-dialog-confirmation-and-error-popup.log

Description of problem:
Adding a nfs storage domain fails with error message: 
"Error while executing action Attach Storage Domain: AcquireHostIdFailure"

Version-Release number of selected component (if applicable):
100%

Steps to Reproduce:
1. Let's have cluster with a host and a nfs data storage domain
2. Add nfs export storage domain

Actual results:
Error dialog pops up

Expected results:
Export storage domain added

Additional info:
* dir of export domain was empty before the addition
* it had permissions 777 and ownership 36:36

Comment 1 jniederm 2017-01-19 14:35:03 UTC
Created attachment 1242508 [details]
complete-logs.zip

Comment 2 Yaniv Kaul 2017-01-22 09:56:09 UTC
Any analysis on VDSM side why it failed? As it works usually.

Comment 3 Tal Nisan 2017-01-22 10:18:15 UTC
Managed to reproduce on master, the actual problem occurs when attempting to attach:

2017-01-22 12:16:23,568+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] Failed in 'AttachStorageDomainVDS' method
2017-01-22 12:16:23,583+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command AttachStorageDomainVDS failed: Cannot acquire host id: (u'b7bcd1bb-a43d-4904-88c0-9165ceff636c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
2017-01-22 12:16:23,583+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] IrsBroker::Failed::AttachStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot acquire host id: (u'b7bcd1bb-a43d-4904-88c0-9165ceff636c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')), code = 661
2017-01-22 12:16:23,591+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] FINISH, AttachStorageDomainVDSCommand, log id: 2c65b13b
2017-01-22 12:16:23,591+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-1) [f4d63eb0-d364-4a17-9aef-cd62b4ff7c85] Command 'org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot acquire host id: (u'b7bcd1bb-a43d-4904-88c0-9165ceff636c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device')), code = 661 (Failed with error AcquireHostIdFailure and code 661)


Liron, please have a look asap

Comment 4 Liron Aravot 2017-01-22 11:12:37 UTC
Jakub (and Tal, if you have it) - please attach also the following logs so i can see why sanlock is failing-

/var/log/sanlock.log
/var/log/messages

Comment 5 Tal Nisan 2017-01-22 11:46:44 UTC
The relevant part in sanlock.log:

2017-01-22 13:43:38+0200 2914535 [805]: s16 lockspace b7bcd1bb-a43d-4904-88c0-9165ceff636c:1:/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs6/b7bcd1bb-a43d-4904-88c0-9165ceff636c/dom_md/ids:0
2017-01-22 13:43:38+0200 2914535 [16297]: open error -13 /rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs6/b7bcd1bb-a43d-4904-88c0-9165ceff636c/dom_md/ids
2017-01-22 13:43:38+0200 2914535 [16297]: s16 open_disk /rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs6/b7bcd1bb-a43d-4904-88c0-9165ceff636c/dom_md/ids error -13
2017-01-22 13:43:39+0200 2914536 [805]: s16 add_lockspace fail result -19

Given that I've ran setenforce=0 and without SELinux the attach succeeded

Comment 6 Liron Aravot 2017-01-22 12:27:42 UTC
As both environments are dev envs, I suspect that in both cases the cause is similar as in the comment [1] ("sanlock was started before being added to the correct groups. Restarting the service fixed the issue") - We've encountered it also in BZ 1160204
Nir, please let me know if it doesn't make sense.

Jakub/Tal - please report back.

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=959149#c23

Comment 7 Liron Aravot 2017-01-22 13:46:40 UTC
After checking Tal's host - that's not the case, even after restarting the sanlock service the issue still exists. selinux blocks sanlock access to the ids file (the mount is completely functional).

Jan 22 15:18:55 bathory python3: SELinux is preventing sanlock from 'read, write, open' accesses on the file /rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs8/d89e901b-856
1-4d0a-9ace-bf763096c7b4/dom_md/ids.#012#012*****  Plugin catchall (100. confidence) suggests   **************************#012#012If you believe that sanlock should be allow
ed read write open access on the ids file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow

Nir, have we encountered such an issue before?

Comment 8 Liron Aravot 2017-01-22 15:24:45 UTC
Jakub, it can be seen in your engine log that you had this error a lot of times (in the past two months) and on other flows as well - what was the workaround you used to overcome it?
additionally to the logs requested in please specify the exact vdsm version/os you are using.

2016-11-28 17:02:30,183+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-48) [4e8c871c] Failed in 'CreateStoragePoolVDS' method
2016-11-28 17:02:30,188+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-48) [4e8c871c] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM Host1 command failed: Cannot acquire host id: (u'ac68feb4-57a9-49cb-9d90-7d9b74677761', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))


thanks.

Comment 9 jniederm 2017-01-23 16:59:24 UTC
Created attachment 1243671 [details]
sanlock.log

Comment 10 jniederm 2017-01-23 17:00:06 UTC
Created attachment 1243672 [details]
messages

Comment 11 jniederm 2017-01-23 18:16:35 UTC
(In reply to Liron Aravot from comment #8)
> Jakub, it can be seen in your engine log that you had this error a lot of
> times (in the past two months) and on other flows as well - what was the
> workaround you used to overcome it?

I guess that I experienced this problem last week for the first time. I'm sure that I was able to add a new storage domain at the beginning of the January. I've never used any workaround regarding "AcquireHostIdFailure".

> additionally to the logs requested in please specify the exact vdsm
> version/os you are using.

# rpm -qi vdsm
Name        : vdsm
Version     : 4.20.0
Release     : 107.gitdc67b50.fc24
Architecture: x86_64
Install Date: Thu 05 Jan 2017 09:19:19 PM CET
Group       : Applications/System
Size        : 2586618
License     : GPLv2+
Signature   : (none)
Source RPM  : vdsm-4.20.0-107.gitdc67b50.fc24.src.rpm
Build Date  : Mon 02 Jan 2017 09:28:30 PM CET


> 
> 2016-11-28 17:02:30,183+01 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
> (default task-48) [4e8c871c] Failed in 'CreateStoragePoolVDS' method
> 2016-11-28 17:02:30,188+01 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-48) [4e8c871c] Correlation ID: null, Call Stack: null, Custom
> Event ID: -1, Message: VDSM Host1 command failed: Cannot acquire host id:
> (u'ac68feb4-57a9-49cb-9d90-7d9b74677761', SanlockException(19, 'Sanlock
> lockspace add failure', 'No such device'))
> 
> 
> thanks.

Comment 12 Tal Nisan 2017-01-25 13:04:27 UTC
Hi Miroslav,
The attach to the data center seems to fail due to a different SELinux context, the older domains were created in an older version of Fedora, can you please have a look? 
Here are the outputs by each command:




[root@bathory ~]# ausearch -m AVC -ts today
----
time->Wed Jan 25 11:22:56 2017
type=AVC msg=audit(1485336176.864:26757): avc:  denied  { read write } for  pid=4021 comm="sanlock" name="dbcc5749-9c45-43ce-a0a2-9cc017be91fc.lease" dev="0:47" ino=6684844 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1
----
time->Wed Jan 25 11:22:56 2017
type=AVC msg=audit(1485336176.864:26758): avc:  denied  { open } for  pid=4021 comm="sanlock" path="/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs5/72131e77-977f-414a-b515-689a8f275abf/images/5ff1fd46-0489-46ad-b2e1-3461c6e625b2/dbcc5749-9c45-43ce-a0a2-9cc017be91fc.lease" dev="0:47" ino=6684844 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1
----
time->Wed Jan 25 11:22:56 2017
type=AVC msg=audit(1485336176.865:26759): avc:  denied  { getattr } for  pid=4021 comm="sanlock" path="/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs5/72131e77-977f-414a-b515-689a8f275abf/images/5ff1fd46-0489-46ad-b2e1-3461c6e625b2/dbcc5749-9c45-43ce-a0a2-9cc017be91fc.lease" dev="0:47" ino=6684844 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1
----
time->Wed Jan 25 14:57:05 2017
type=AVC msg=audit(1485349025.889:28364): avc:  denied  { read write open } for  pid=14066 comm="sanlock" path="/rhev/data-center/mnt/darkthrone:_home_nfs_nfs5/51bccb99-1560-49e3-9561-f03ad6f6a054/dom_md/ids" dev="0:47" ino=6684807 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1
----
time->Wed Jan 25 14:57:05 2017
type=AVC msg=audit(1485349025.889:28365): avc:  denied  { getattr } for  pid=14066 comm="sanlock" path="/rhev/data-center/mnt/darkthrone:_home_nfs_nfs5/51bccb99-1560-49e3-9561-f03ad6f6a054/dom_md/ids" dev="0:47" ino=6684807 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1









[root@bathory ~]# ps -efZ | egrep "sanlock|vdsm"
system_u:system_r:virtd_t:s0-s0:c0.c1023 root 761  1  0 Jan22 ?        00:00:07 /usr/bin/python2 /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock
system_u:system_r:unconfined_service_t:s0 vdsm 763 1  0 Jan22 ?        00:00:23 /usr/bin/python /usr/bin/ovirt-imageio-daemon
system_u:system_r:sanlock_t:s0-s0:c0.c1023 sanlock 773 1  0 Jan22 ?    00:02:08 /usr/sbin/sanlock daemon
system_u:system_r:sanlock_t:s0-s0:c0.c1023 root 774 773  0 Jan22 ?     00:00:00 /usr/sbin/sanlock daemon
system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 1409 1  1 Jan22 ?        00:55:17 /usr/bin/python2 /usr/share/vdsm/vdsm
system_u:system_r:unconfined_service_t:s0 vdsm 2379 1  0 Jan22 ?       00:10:29 python /usr/sbin/momd -c /etc/vdsm/mom.conf
system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 3609 1409  0 Jan22 ?     00:00:15 /usr/libexec/ioprocess --read-pipe-fd 122 --write-pipe-fd 121 --max-threads 10 --max-queued-requests 10
system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 3647 1409  0 Jan22 ?     00:00:11 /usr/libexec/ioprocess --read-pipe-fd 135 --write-pipe-fd 134 --max-threads 10 --max-queued-requests 10
system_u:system_r:virtd_t:s0-s0:c0.c1023 vdsm 13993 1409  0 14:57 ?    00:00:00 /usr/libexec/ioprocess --read-pipe-fd 142 --write-pipe-fd 140 --max-threads 10 --max-queued-requests 10
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 root 14742 14578  0 14:59 pts/1 00:00:00 grep -E --color=auto sanlock|vdsm








[root@bathory ~]# ls -lZ /rhev/data-center/mnt/
ls: cannot access '/rhev/data-center/mnt/darkthrone.tlv:_home_nfs_nfs12': Stale file handle
total 36
drwxr-xr-x. 5 vdsm kvm  system_u:object_r:mnt_t:s0           4096 Dec 20 17:38 blockSD
drwxr-xr-x. 2 vdsm kvm  system_u:object_r:mnt_t:s0           4096 Nov  9 17:15 darkthrone:_home_nfs_export
drwxrwxrwx. 3 root root unconfined_u:object_r:home_root_t:s0 4096 Jan 25 14:48 darkthrone:_home_nfs_nfs5
drwxrwxrwx. 3 root root system_u:object_r:nfs_t:s0           4096 Jan 17 21:22 darkthrone.tlv:_home_nfs_export2
drwxr-xr-x. 2 vdsm kvm  system_u:object_r:mnt_t:s0           4096 Jan 22 17:09 darkthrone.tlv:_home_nfs_nfs11
d?????????? ? ?    ?                                       0    ?            ? darkthrone.tlv:_home_nfs_nfs12
drwxrwxrwx. 3 root root system_u:object_r:nfs_t:s0           4096 Dec 20 16:48 darkthrone.tlv:_home_nfs_nfs3
drwxr-xr-x. 2 vdsm kvm  system_u:object_r:mnt_t:s0           4096 Jan 22 14:53 darkthrone.tlv:_home_nfs_nfs6
drwxr-xr-x. 2 vdsm kvm  system_u:object_r:mnt_t:s0           4096 Jan 22 15:01 darkthrone.tlv:_home_nfs_nfs7
drwxr-xr-x. 2 vdsm kvm  system_u:object_r:mnt_t:s0           4096 Jan 22 15:18 darkthrone.tlv:_home_nfs_nfs8








[root@bathory ~]# ls -lZ /rhev/data-center/mnt/darkthrone:_home_nfs_nfs5/51bccb99-1560-49e3-9561-f03ad6f6a054/dom_md
total 2052
-rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0  1048576 Jan 25 15:01 ids
-rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0 16777216 Jan 25 14:48 inbox
-rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0  2097152 Jan 25 14:57 leases
-rw-r--r--. 1 vdsm kvm system_u:object_r:home_root_t:s0      327 Jan 25 14:57 metadata
-rw-rw----. 1 vdsm kvm system_u:object_r:home_root_t:s0 16777216 Jan 25 14:48 outbox

Comment 13 Miroslav Grepl 2017-02-08 12:33:08 UTC
Could you show me how it is mounted?

Comment 14 jniederm 2017-02-09 11:41:56 UTC
# mount
...
192.168.122.1:/home/jakub/nfs/d41-2 on /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d41-2 type nfs4 (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.122.152,local_lock=none,addr=192.168.122.1)
...

copied from system with selinux disabled

Comment 15 Miroslav Grepl 2017-02-13 13:21:59 UTC
(In reply to jniederm from comment #14)
> # mount
> ...
> 192.168.122.1:/home/jakub/nfs/d41-2 on
> /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d41-2 type nfs4
> (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,
> nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.
> 122.152,local_lock=none,addr=192.168.122.1)
> ...
> 
> copied from system with selinux disabled

Ok, could you try to remount it without "seclabel" option to see if you are able to reproduce it with this change?

Comment 16 Nir Soffer 2017-02-22 23:03:58 UTC
*** Bug 1425799 has been marked as a duplicate of this bug. ***

Comment 17 Nir Soffer 2017-02-22 23:20:06 UTC
(In reply to Miroslav Grepl from comment #15)
> Ok, could you try to remount it without "seclabel" option to see if you are
> able to reproduce it with this change?

Miroslav, this is how vdsm mount:

    /usr/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=4,minorversion=1 server:/path /rhev/data-center/mnt/server:_path

(Copied from supervdsm.log on rhel 7.3, but command should be the same on Fedora).

Do we have to change something in the mount command? or maybe the selinux policy in Fedora
needs a change?

BTW, we don't support selinux disabled, I don't know the mount in comment 14 was done, but when 
this comment was written, Engine did not support NFS 4.2 (the UI would not let you set version
to 4.2). So this mount command is probably from POSIX domain, which was the only way to mount
NFS with version 4.2 when this bug was opened.

However we have another reproducer in bug 1425799 with NFS mount.

Comment 18 jniederm 2017-02-23 17:51:54 UTC
Created attachment 1256991 [details]
vdsmlogs.zip

I added following line to /etc/vdsm/vdsm.conf to explicitly list wanted mount options

nfs_mount_options = soft,nosharecache,timeo=600,retrans=6,nfsvers=4,minorversion=1

but it was mounted with the 'seclabel' option again. I don't know why.

# mount
...
192.168.122.1:/home/jakub/nfs/d1 on /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d1 type nfs4 (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.122.229,local_lock=none,addr=192.168.122.1)

supervdsm.log states:
MainProcess|jsonrpc/4::DEBUG::2017-02-23 18:39:10,611::supervdsm_server::92::SuperVdsm.ServerCallback::(wrapper) call mount with (u'192.168.122.1:/home/jakub/nfs/d1', u'/rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d1') {'vfstype': 'nfs', 'mntOpts': 'soft,nosharecache,timeo=600,retrans=6', 'timeout': None, 'cgroup': None}

I don't have capacity investigate it in detail. vdsm.log and supervdsm.log attached.

Comment 19 jniederm 2017-02-23 17:54:57 UTC
Re-setting accidentally removed Nir's needinfo flag on Miroslav.

Comment 20 Yaniv Kaul 2017-03-02 13:24:44 UTC
Moving to 4.1.3, as it affects Fedora only.

Comment 21 Nir Soffer 2017-03-16 08:11:15 UTC
(In reply to jniederm from comment #18)
> # mount
> ...
> 192.168.122.1:/home/jakub/nfs/d1 on
> /rhev/data-center/mnt/192.168.122.1:_home_jakub_nfs_d1 type nfs4
> (rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,
> nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.
> 122.229,local_lock=none,addr=192.168.122.1)

You are using nfs version 4.2, so this looks like bug 1432783

Miroslav, do you think this is a duplicate?

Comment 23 Yaniv Kaul 2017-05-30 07:15:36 UTC
Marking as duplicate of bug 1432783 based on above comments.

*** This bug has been marked as a duplicate of bug 1432783 ***

Comment 24 Red Hat Bugzilla 2023-09-14 03:52:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days