Description of problem: rpc.mountd hands out different fsid's for the same filesystem depending on how it was started and whether selinux is in enabled. Steps to Reproduce: Starting with nothing in the expkey and export caches: [root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] [root@baymax ~]# cat /proc/net/rpc/nfsd.export/content #path domain(flags) Mounting an fs and stat'ing a file: [root@baymax ~]# mount 127.0.0.1:/export /mnt/t [root@baymax ~]# stat -t /mnt/t/file /mnt/t/file 29 8 81a4 0 0 33 268713708 1 0 0 1446652714 1458157861 1458157861 0 1048576 unconfined_u:object_r:unlabeled_t:s0 Looking at the expkey cache again: [root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] 127.0.0.1 1 0x00000000 / 127.0.0.1 7 0x10033f330000000001fd0000000000000000000000000000 /export Note the entry for /export says it's type 7 which is FSID_UUID16_INUM... but that sure doesn't look like an fsid to me. It looks more like type 3 which is FSID_ENCODE_DEV. Same thing with the export cache... that uuid doesn't match what blkid reports: [root@baymax ~]# cat /proc/net/rpc/nfsd.export/content #path domain(flags) / 127.0.0.1(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0000fd01:00000000:00000000:00000000,sec=390003:390004:390005:1) /export 127.0.0.1(rw,insecure,no_root_squash,sync,wdelay,no_subtree_check,uuid=0000fd01:00000000:00000000:00000000,sec=1) and I see the same thing when viewed from the client's perspective: [root@baymax ~]# cat /proc/fs/nfsfs/volumes NV SERVER PORT DEV FSID FSC v4 7f000001 801 0:51 fd0100000000:0 no Now let's kill mountd and start a new one from the command line: [root@baymax ~]# kill `pidof rpc.mountd` [root@baymax ~]# rpc.mountd [root@baymax ~]# stat -t /mnt/t/file stat: cannot stat ‘/mnt/t/file’: Stale file handle We go a stale filehandle. Let's unmount, remount, and try again: [root@baymax ~]# umount /mnt/t [root@baymax ~]# mount 127.0.0.1:/export /mnt/t [root@baymax ~]# stat -t /mnt/t/file /mnt/t/file 29 8 81a4 0 0 33 268713708 1 0 0 1446652714 1458157861 1458157861 0 1048576 unconfined_u:object_r:unlabeled_t:s0 Now let's look at the expkey cache again: [root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] 127.0.0.1 7 0x10033f3300000000d12d2f804e4b26031f597c9b3804e56d /export # 127.0.0.1 7 0x10033f330000000001fd0000000000000000000000000000 127.0.0.1 1 0x00000000 / Now the fsid actually looks like a type 7 fsid. Note the old entry appears in a comment which indicates it's no longer valid. The export cache now reports the correct uuid too: [root@baymax ~]# cat /proc/net/rpc/nfsd.export/content #path domain(flags) /export 127.0.0.1(rw,insecure,no_root_squash,sync,wdelay,no_subtree_check,uuid=802f2dd1:03264b4e:9b7c591f:6de50438,sec=1) / 127.0.0.1(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=802f2dd1:03264b4e:9b7c591f:6de50438,sec=390003:390004:390005:1) and it looks correct when viewd from the client's perspective: [root@baymax ~]# cat /proc/fs/nfsfs/volumes NV SERVER PORT DEV FSID FSC v4 7f000001 801 0:51 802f2dd103264b4e:9b7c591f6de50438 no So now let's kill mountd and restart the nfs service altogehter: [root@baymax ~]# kill `pidof rpc.mountd` [root@baymax ~]# systemctl restart nfs-server [root@baymax ~]# stat -t /mnt/t/file stat: cannot stat ‘/mnt/t/file’: Stale file handle Once again we get a stale filehandle, so let's unmount, remount, and try again: [root@baymax ~]# umount /mnt/t [root@baymax ~]# mount 127.0.0.1:/export /mnt/t [root@baymax ~]# stat -t /mnt/t/file /mnt/t/file 29 8 81a4 0 0 33 268713708 1 0 0 1446652714 1458157861 1458157861 0 1048576 unconfined_u:object_r:unlabeled_t:s0 and now the fsid's have reverted back to the way they started: [root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] 127.0.0.1 7 0x10033f330000000001fd0000000000000000000000000000 /export # 127.0.0.1 7 0x10033f3300000000d12d2f804e4b26031f597c9b3804e56d 127.0.0.1 1 0x00000000 / [root@baymax ~]# cat /proc/net/rpc/nfsd.export/content #path domain(flags) /export 127.0.0.1(rw,insecure,no_root_squash,sync,wdelay,no_subtree_check,uuid=0000fd01:00000000:00000000:00000000,sec=1) / 127.0.0.1(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0000fd01:00000000:00000000:00000000,sec=390003:390004:390005:1) [root@baymax ~]# cat /proc/fs/nfsfs/volumes NV SERVER PORT DEV FSID FSC v4 7f000001 801 0:51 fd0100000000:0 no [root@baymax ~]#
What's happening here is that get_uuid_blkdev() isn't returning a uuid, so uuid_by_path is falling back on the fsid value returned by statfs64(): static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) { ... blkid_val = get_uuid_blkdev(path); ... if (blkid_val && (type--) == 0) val = blkid_val; else if (fsid_val[0] && (type--) == 0) val = fsid_val; ...
Digging from rpc.mountd into libblkid the failure is occurring in get_uuid_blkdev blkid_get_dev blkid_verify fd = open(dev->bid_name, O_RDONLY|O_CLOEXEC); That open fails with -EACCES.
From the kernel side, the failure is occurring in open do_sys_open do_filp_open path_openat may_open inode_permission __inode_permission security_inode_permission selinux_inode_permission
So selinux_inode_permission was returning -EACCES, but I wasn't seeing any AVC violations in my audit log: [root@baymax ~]# aureport -a -ts recent | grep mountd (nothing) After some reading, I found that SELinux has something called "don't audit" rules, which are for things that are 'expected' to fail... so I disabled the "don't audit" part and then I could see the AVC violations: [root@baymax ~]# semodule -BD [root@baymax ~]# systemctl restart nfs-server [root@baymax ~]# mount 127.0.0.1:/export /mnt/t [root@baymax ~]# aureport -a -ts recent | grep mountd 370. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1331 371. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1332 372. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1333 373. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1330 [root@baymax ~]# Reenabling the "don't audit rules" and querying the SELinux policy now that I have an idea of what to look for... [root@baymax ~]# semodule -B [root@baymax ~]# sesearch -D -s nfsd_t -t fixed_disk_device_t Found 4 semantic av rules: dontaudit nfsd_t device_node : blk_file getattr ; dontaudit nfsd_t device_node : chr_file getattr ; dontaudit nfsd_t fixed_disk_device_t : blk_file { ioctl read getattr lock open } ; dontaudit nfsd_t fixed_disk_device_t : chr_file { ioctl read getattr lock open } ; [root@baymax ~]# ls -Z /usr/sbin/rpc.mountd system_u:object_r:nfsd_exec_t:s0 /usr/sbin/rpc.mountd [root@baymax ~]# ls -Z /dev/dm-1 system_u:object_r:fixed_disk_device_t:s0 /dev/dm-1 [root@baymax ~]# So mountd needs to be able to use libblkid in order to get the filesystem uuid's ... but it can't if SELinux is in enforcing mode. If I temporarily disable SELinux (via setenforce 0) then the fsid's are consistently using the filesystem uuid's.
What I don't understand is why the AVC violations don't occur when I run rpc.mountd directly from the command line... for example looking at the security_inode_permission() call using systemtap when rpc.mountd is started up as part of the nfs service: Tue Apr 12 11:49:10 2016 EDT rpc.mountd security_inode_permission dm-1 mask 0x24 sid 347 Tue Apr 12 11:49:10 2016 EDT rpc.mountd avc_has_perm_noaudit ssid=347 tsid=72 tclass=11 requested=2 flags=0 avd={.allowed=4294936576, .auditallow=4294967295, .auditdeny=4294967295, .seqno=2167631605, .flags=4294967295} Tue Apr 12 11:49:10 2016 EDT rpc.mountd security_inode_permission dm-1 ret -13 and when rpc.mountd is started directly from the command line: Tue Apr 12 11:52:30 2016 EDT rpc.mountd security_inode_permission dm-1 mask 0x24 sid 2339 Tue Apr 12 11:52:30 2016 EDT rpc.mountd avc_has_perm_noaudit ssid=2339 tsid=72 tclass=11 requested=2 flags=0 avd={.allowed=4294936576, .auditallow=4294967295, .auditdeny=4294967295, .seqno=2167631605, .flags=4294967295} Tue Apr 12 11:52:30 2016 EDT rpc.mountd security_inode_permission dm-1 ret 0 Note that source sid (which is coming from the task_struct->cred->security->sid) is different between the good and bad cases. I don't know how to map that to an actual context string that you would see in the userspace SELinux tools. I tried dumping out the sidtable using systemtap but that didn't help... so that's where I'm stuck.
Nice work Scott! We definitely need to get to the bottom of this as this is a very serious NFS server bug. Even if it turns out to be caused by some selinux or other issue, there's no way it should be possible to get non-deterministic behavior for fsids of the same export. We periodically have customers report NFS client's mount points go stale with RHEL NFS servers. I know of one case in particular where the fsid changed which we found out very late into the case. My guess is this is happening but we're not able to catch it but I didn't search on current cases to see if this bug might apply to any of them.
So, that sounds like 2 or 3 bugs: - first priority is to fix the selinux priority so that nfsd can open the block devices it needs to. - second, the inconsistent fsid is obviously a mountd bug: "Note the entry for /export says it's type 7 which is FSID_UUID16_INUM... but that sure doesn't look like an fsid to me." And then, there's the question of how exactly mountd should be handling the failure. Sounds like it's trying to fall back to FSID_ENCODE_DEV. Which sounds reasonable enough to me. But if the failure's only temporary and the result is ESTALE's then maybe we need to reconsider. (I *think* the server should still be able to soldier on without ESTALE's in that case: even if it gives out two filehandles for a given object, it should still be able to look up the fsid from either one and get a reasonable answer from mountd. Maybe it's just that one inconsistent results (fsid type one thing, contents another) that's causing the ESTALEs.)
(In reply to J. Bruce Fields from comment #7) > the selinux priority Sorry, that "priority" should be "policy"!
(In reply to J. Bruce Fields from comment #7) > (I *think* the server should still be able to soldier on without ESTALE's in > that case: even if it gives out two filehandles for a given object, it > should still be able to look up the fsid from either one and get a > reasonable answer from mountd. Maybe it's just that one inconsistent > results (fsid type one thing, contents another) that's causing the ESTALEs.) Yeah, I imagine we end up with the ESTALEs when the host is booted and can't identify the UUID for the filesystem. So fixing this may not be such a problem for clients, as long as we allow mountd to still look up filehandles that have fsids based on the dev_t. Again, though -- step one is to fix the selinux policy, so maybe clone this bug as a selinux policy bug for that?
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Can we have this ticket's version pushed to 24 or 25 even? I had this happen to me just a few days ago on F24.
I think this is fixed for both f24 and f25 f24# mount 127.0.0.1:/home /mnt/tmp f24# stat -t /mnt/tmp /mnt/tmp 4096 8 41ed 0 0 30 2 5 0 0 1472824218 1468937353 1468937353 0 32768 system_u:object_r:home_root_t:s0 f24# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] * 6 0x8910082781670ff10000000000000000 /home * 1 0x00000000 / f25# mount 127.0.0.1:/home /mnt/tmp f25# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] * 6 0x02fd0000000000000000000000000000 /home * 1 0x00000000 / Or am I missing something?
(In reply to Steve Dickson from comment #13) > I think this is fixed for both f24 and f25 > It's definitely still broken. <snip/> > > f25# mount 127.0.0.1:/home /mnt/tmp > f25# cat /proc/net/rpc/nfsd.fh/content > #domain fsidtype fsid [path] > * 6 0x02fd0000000000000000000000000000 /home FSID type 6 is FSID_UUID16. That's definitely not a UUID in the FSID column though. Those are the major & minor numbers for /dev/dm-2 (253, 2). > * 1 0x00000000 / > > > Or am I missing something? What happens if you do the following on that f25 machine right now: # kill `pidof rpc.mountd` # rpc.mountd # stat -t /mnt/tmp/somefile (stat an actual file, not the mountpoint itself)
(In reply to Scott Mayhew from comment #14) > (In reply to Steve Dickson from comment #13) > > I think this is fixed for both f24 and f25 > > > > It's definitely still broken. > > <snip/> > > > > f25# mount 127.0.0.1:/home /mnt/tmp > > f25# cat /proc/net/rpc/nfsd.fh/content > > #domain fsidtype fsid [path] > > * 6 0x02fd0000000000000000000000000000 /home True... so this is being cause by SELinux? > > FSID type 6 is FSID_UUID16. That's definitely not a UUID in the FSID column > though. Those are the major & minor numbers for /dev/dm-2 (253, 2). > > > * 1 0x00000000 / > > > > > > Or am I missing something? > > What happens if you do the following on that f25 machine right now: > > # kill `pidof rpc.mountd` > # rpc.mountd > # stat -t /mnt/tmp/somefile (stat an actual file, not the mountpoint > itself) f25# kill `pidof rpc.mountd` f25# rpc.mountd f25# stat -t /mnt/tmp/foobar stat: cannot stat '/mnt/tmp/foobar': Stale file handle But... if I do a systemctl restart nfs f25# stat -t /mnt/tmp/foobar | tee /tmp/second /mnt/tmp/foobar 0 0 81a4 3606 3606 30 25541 1 0 0 1480604755 1480604755 1480604755 0 262144 unconfined_u:object_r:home_root_t:s0 which is the exact same info returned when I did the stat -t before I kill rpc.mountd. So is it a bug that estale is returned when restarting rpc.mountd and not the entire server? IDK...
(In reply to Steve Dickson from comment #15) > (In reply to Scott Mayhew from comment #14) > > (In reply to Steve Dickson from comment #13) > > > I think this is fixed for both f24 and f25 > > > > > > > It's definitely still broken. > > > > <snip/> > > > > > > f25# mount 127.0.0.1:/home /mnt/tmp > > > f25# cat /proc/net/rpc/nfsd.fh/content > > > #domain fsidtype fsid [path] > > > * 6 0x02fd0000000000000000000000000000 /home > True... so this is being cause by SELinux? > I don't think so... f25# setenforce 0 f25# mount 127.0.0.1:/home/tmp /mnt/tmp f25# cat /proc/net/rpc/nfsd.fh/content #domain fsidtype fsid [path] * 6 0x02fd0000000000000000000000000000 /home * 1 0x00000000 / because the same value is returned with selinux disabled.
(In reply to Steve Dickson from comment #16) > (In reply to Steve Dickson from comment #15) > > (In reply to Scott Mayhew from comment #14) > > > (In reply to Steve Dickson from comment #13) > > > > I think this is fixed for both f24 and f25 > > > > > > > > > > It's definitely still broken. > > > > > > <snip/> > > > > > > > > f25# mount 127.0.0.1:/home /mnt/tmp > > > > f25# cat /proc/net/rpc/nfsd.fh/content > > > > #domain fsidtype fsid [path] > > > > * 6 0x02fd0000000000000000000000000000 /home > > True... so this is being cause by SELinux? > > > I don't think so... > f25# setenforce 0 > f25# mount 127.0.0.1:/home/tmp /mnt/tmp > f25# cat /proc/net/rpc/nfsd.fh/content > #domain fsidtype fsid [path] > * 6 0x02fd0000000000000000000000000000 /home > * 1 0x00000000 / > > because the same value is returned with selinux disabled. The nfsd.fh cache entries have a really long lifetime, so chances are nfsd didn't even do another upcall to mountd. If you restart the nfs-server service at this point you should probably see the UUID-based FSIDs.
(In reply to Scott Mayhew from comment #17) > (In reply to Steve Dickson from comment #16) > > (In reply to Steve Dickson from comment #15) > > > (In reply to Scott Mayhew from comment #14) > > > > (In reply to Steve Dickson from comment #13) > > > > > I think this is fixed for both f24 and f25 > > > > > > > > > > > > > It's definitely still broken. > > > > > > > > <snip/> > > > > > > > > > > f25# mount 127.0.0.1:/home /mnt/tmp > > > > > f25# cat /proc/net/rpc/nfsd.fh/content > > > > > #domain fsidtype fsid [path] > > > > > * 6 0x02fd0000000000000000000000000000 /home > > > True... so this is being cause by SELinux? > > > > > I don't think so... > > f25# setenforce 0 > > f25# mount 127.0.0.1:/home/tmp /mnt/tmp > > f25# cat /proc/net/rpc/nfsd.fh/content > > #domain fsidtype fsid [path] > > * 6 0x02fd0000000000000000000000000000 /home > > * 1 0x00000000 / > > > > because the same value is returned with selinux disabled. > > The nfsd.fh cache entries have a really long lifetime, so chances are nfsd > didn't even do another upcall to mountd. If you restart the nfs-server > service at this point you should probably see the UUID-based FSIDs. I did... after the restart true UUIDs was being passed up from kernel... So my question is why isn't this a kernel bug? Since the kernel is not passing up a ture UUID... and what is rpc.mountd suppose to do with an invalided UUID? Fail the mount?
It doesn't sound like Dave's analysis in the description and comments 1-5 still applies. So, the top priority is to get the selinux policy fixed--who do we need for that?
(In reply to J. Bruce Fields from comment #19) > It doesn't sound like (Sorry, I meant "It sounds like...").
(In reply to J. Bruce Fields from comment #20) > (In reply to J. Bruce Fields from comment #19) > > It doesn't sound like > > (Sorry, I meant "It sounds like..."). (And, s/Dave/Scott. I think it's time for the weekend.)
I filed https://bugzilla.redhat.com/show_bug.cgi?id=1403017 for the selinux-policy.
*** This bug has been marked as a duplicate of bug 1403017 ***