1326406 – rpc.mountd hands out different fsid's for the same filesystem

Bug 1326406 - rpc.mountd hands out different fsid's for the same filesystem

Summary: rpc.mountd hands out different fsid's for the same filesystem

Keywords:
Status:	CLOSED DUPLICATE of bug 1403017
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nfs-utils
Sub Component:
Version:	24
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-12 15:20 UTC by Scott Mayhew
Modified:	2017-04-20 15:10 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-20 15:10:29 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Scott Mayhew 2016-04-12 15:20:52 UTC

Description of problem:

rpc.mountd hands out different fsid's for the same filesystem depending on how it was started and whether selinux is in enabled.

Steps to Reproduce:

Starting with nothing in the expkey and export caches:

[root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
[root@baymax ~]# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)

Mounting an fs and stat'ing a file:

[root@baymax ~]# mount 127.0.0.1:/export /mnt/t
[root@baymax ~]# stat -t /mnt/t/file
/mnt/t/file 29 8 81a4 0 0 33 268713708 1 0 0 1446652714 1458157861 1458157861 0 1048576 unconfined_u:object_r:unlabeled_t:s0

Looking at the expkey cache again:

[root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
127.0.0.1 1 0x00000000 /
127.0.0.1 7 0x10033f330000000001fd0000000000000000000000000000 /export

Note the entry for /export says it's type 7 which is FSID_UUID16_INUM... but that sure doesn't look like an fsid to me.  It looks more like type 3 which is FSID_ENCODE_DEV.

Same thing with the export cache... that uuid doesn't match what blkid reports:

[root@baymax ~]# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)
/	127.0.0.1(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0000fd01:00000000:00000000:00000000,sec=390003:390004:390005:1)
/export	127.0.0.1(rw,insecure,no_root_squash,sync,wdelay,no_subtree_check,uuid=0000fd01:00000000:00000000:00000000,sec=1)

and I see the same thing when viewed from the client's perspective:

[root@baymax ~]# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 7f000001  801 0:51         fd0100000000:0                    no 

Now let's kill mountd and start a new one from the command line:

[root@baymax ~]# kill `pidof rpc.mountd`
[root@baymax ~]# rpc.mountd
[root@baymax ~]# stat -t /mnt/t/file
stat: cannot stat ‘/mnt/t/file’: Stale file handle

We go a stale filehandle.   Let's unmount, remount, and try again:

[root@baymax ~]# umount /mnt/t
[root@baymax ~]# mount 127.0.0.1:/export /mnt/t
[root@baymax ~]# stat -t /mnt/t/file
/mnt/t/file 29 8 81a4 0 0 33 268713708 1 0 0 1446652714 1458157861 1458157861 0 1048576 unconfined_u:object_r:unlabeled_t:s0

Now let's look at the expkey cache again:

[root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
127.0.0.1 7 0x10033f3300000000d12d2f804e4b26031f597c9b3804e56d /export
# 127.0.0.1 7 0x10033f330000000001fd0000000000000000000000000000
127.0.0.1 1 0x00000000 /

Now the fsid actually looks like a type 7 fsid.  Note the old entry appears in a comment which indicates it's no longer valid.

The export cache now reports the correct uuid too:

[root@baymax ~]# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)
/export	127.0.0.1(rw,insecure,no_root_squash,sync,wdelay,no_subtree_check,uuid=802f2dd1:03264b4e:9b7c591f:6de50438,sec=1)
/	127.0.0.1(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=802f2dd1:03264b4e:9b7c591f:6de50438,sec=390003:390004:390005:1)

and it looks correct when viewd from the client's perspective:

[root@baymax ~]# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 7f000001  801 0:51         802f2dd103264b4e:9b7c591f6de50438 no 

So now let's kill mountd and restart the nfs service altogehter:

[root@baymax ~]# kill `pidof rpc.mountd`
[root@baymax ~]# systemctl restart nfs-server
[root@baymax ~]# stat -t /mnt/t/file
stat: cannot stat ‘/mnt/t/file’: Stale file handle

Once again we get a stale filehandle, so let's unmount, remount, and try again:

[root@baymax ~]# umount /mnt/t
[root@baymax ~]# mount 127.0.0.1:/export /mnt/t
[root@baymax ~]# stat -t /mnt/t/file
/mnt/t/file 29 8 81a4 0 0 33 268713708 1 0 0 1446652714 1458157861 1458157861 0 1048576 unconfined_u:object_r:unlabeled_t:s0

and now the fsid's have reverted back to the way they started:

[root@baymax ~]# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
127.0.0.1 7 0x10033f330000000001fd0000000000000000000000000000 /export
# 127.0.0.1 7 0x10033f3300000000d12d2f804e4b26031f597c9b3804e56d
127.0.0.1 1 0x00000000 /
[root@baymax ~]# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)
/export	127.0.0.1(rw,insecure,no_root_squash,sync,wdelay,no_subtree_check,uuid=0000fd01:00000000:00000000:00000000,sec=1)
/	127.0.0.1(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0000fd01:00000000:00000000:00000000,sec=390003:390004:390005:1)
[root@baymax ~]# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 7f000001  801 0:51         fd0100000000:0                    no 
[root@baymax ~]#

Comment 1 Scott Mayhew 2016-04-12 15:22:34 UTC

What's happening here is that get_uuid_blkdev() isn't returning a uuid, so uuid_by_path is falling back on the fsid value returned by statfs64():

static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
{
...
                        blkid_val = get_uuid_blkdev(path);
...
        if (blkid_val && (type--) == 0)
                val = blkid_val;
        else if (fsid_val[0] && (type--) == 0)
                val = fsid_val;
...

Comment 2 Scott Mayhew 2016-04-12 15:31:42 UTC

Digging from rpc.mountd into libblkid the failure is occurring in

get_uuid_blkdev
  blkid_get_dev
    blkid_verify
      fd = open(dev->bid_name, O_RDONLY|O_CLOEXEC);

That open fails with -EACCES.

Comment 3 Scott Mayhew 2016-04-12 15:34:47 UTC

From the kernel side, the failure is occurring in

open
  do_sys_open
    do_filp_open
      path_openat
        may_open
          inode_permission
            __inode_permission
              security_inode_permission
                selinux_inode_permission

Comment 4 Scott Mayhew 2016-04-12 15:47:13 UTC

So selinux_inode_permission was returning -EACCES, but I wasn't seeing any AVC violations in my audit log:

[root@baymax ~]# aureport -a -ts recent | grep mountd
(nothing)

After some reading, I found that SELinux has something called "don't audit" rules, which are for things that are 'expected' to fail... so I disabled the "don't audit" part and then I could see the AVC violations:

[root@baymax ~]# semodule -BD
[root@baymax ~]# systemctl restart nfs-server
[root@baymax ~]# mount 127.0.0.1:/export /mnt/t
[root@baymax ~]# aureport -a -ts recent | grep mountd
370. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1331
371. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1332
372. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1333
373. 04/12/2016 11:36:28 rpc.mountd system_u:system_r:nfsd_t:s0 0 blk_file read system_u:object_r:fixed_disk_device_t:s0 denied 1330
[root@baymax ~]# 

Reenabling the "don't audit rules" and querying the SELinux policy now that I have an idea of what to look for... 

[root@baymax ~]# semodule -B
[root@baymax ~]# sesearch -D -s nfsd_t -t fixed_disk_device_t
Found 4 semantic av rules:
   dontaudit nfsd_t device_node : blk_file getattr ; 
   dontaudit nfsd_t device_node : chr_file getattr ; 
   dontaudit nfsd_t fixed_disk_device_t : blk_file { ioctl read getattr lock open } ; 
   dontaudit nfsd_t fixed_disk_device_t : chr_file { ioctl read getattr lock open } ; 


[root@baymax ~]# ls -Z /usr/sbin/rpc.mountd
system_u:object_r:nfsd_exec_t:s0 /usr/sbin/rpc.mountd
[root@baymax ~]# ls -Z /dev/dm-1
system_u:object_r:fixed_disk_device_t:s0 /dev/dm-1
[root@baymax ~]# 

So mountd needs to be able to use libblkid in order to get the filesystem uuid's ... but it can't if SELinux is in enforcing mode.  If I temporarily disable SELinux (via setenforce 0) then the fsid's are consistently using the filesystem uuid's.

Comment 5 Scott Mayhew 2016-04-12 15:57:03 UTC

What I don't understand is why the AVC violations don't occur when I run rpc.mountd directly from the command line... for example looking at the security_inode_permission() call using systemtap

when rpc.mountd is started up as part of the nfs service:

Tue Apr 12 11:49:10 2016 EDT rpc.mountd security_inode_permission dm-1 mask 0x24 sid 347
Tue Apr 12 11:49:10 2016 EDT rpc.mountd avc_has_perm_noaudit ssid=347 tsid=72 tclass=11 requested=2 flags=0 avd={.allowed=4294936576, .auditallow=4294967295, .auditdeny=4294967295, .seqno=2167631605, .flags=4294967295}
Tue Apr 12 11:49:10 2016 EDT rpc.mountd security_inode_permission dm-1 ret -13

and when rpc.mountd is started directly from the command line:

Tue Apr 12 11:52:30 2016 EDT rpc.mountd security_inode_permission dm-1 mask 0x24 sid 2339
Tue Apr 12 11:52:30 2016 EDT rpc.mountd avc_has_perm_noaudit ssid=2339 tsid=72 tclass=11 requested=2 flags=0 avd={.allowed=4294936576, .auditallow=4294967295, .auditdeny=4294967295, .seqno=2167631605, .flags=4294967295}
Tue Apr 12 11:52:30 2016 EDT rpc.mountd security_inode_permission dm-1 ret 0

Note that source sid (which is coming from the task_struct->cred->security->sid) is different between the good and bad cases.  I don't know how to map that to an actual context string that you would see in the userspace SELinux tools.  I tried dumping out the sidtable using systemtap but that didn't help... so that's where I'm stuck.

Comment 6 Dave Wysochanski 2016-04-12 16:03:57 UTC

Nice work Scott!  We definitely need to get to the bottom of this as this is a very serious NFS server bug.  Even if it turns out to be caused by some selinux or other issue, there's no way it should be possible to get non-deterministic behavior for fsids of the same export.

We periodically have customers report NFS client's mount points go stale with RHEL NFS servers.  I know of one case in particular where the fsid changed which we found out very late into the case.  My guess is this is happening but we're not able to catch it but I didn't search on current cases to see if this bug might apply to any of them.

Comment 7 J. Bruce Fields 2016-04-12 16:48:48 UTC

So, that sounds like 2 or 3 bugs:

- first priority is to fix the selinux priority so that nfsd can open the block devices it needs to.
- second, the inconsistent fsid is obviously a mountd bug: "Note the entry for /export says it's type 7 which is FSID_UUID16_INUM... but that sure doesn't look like an fsid to me."

And then, there's the question of how exactly mountd should be handling the failure.  Sounds like it's trying to fall back to FSID_ENCODE_DEV.  Which sounds reasonable enough to me.  But if the failure's only temporary and the result is ESTALE's then maybe we need to reconsider.

(I *think* the server should still be able to soldier on without ESTALE's in that case: even if it gives out two filehandles for a given object, it should still be able to look up the fsid from either one and get a reasonable answer from mountd.  Maybe it's just that one inconsistent results (fsid type one thing, contents another) that's causing the ESTALEs.)

Comment 8 J. Bruce Fields 2016-04-12 16:50:24 UTC

(In reply to J. Bruce Fields from comment #7)
> the selinux priority

Sorry, that "priority" should be "policy"!

Comment 9 Jeff Layton 2016-10-27 18:53:27 UTC

(In reply to J. Bruce Fields from comment #7)

> (I *think* the server should still be able to soldier on without ESTALE's in
> that case: even if it gives out two filehandles for a given object, it
> should still be able to look up the fsid from either one and get a
> reasonable answer from mountd.  Maybe it's just that one inconsistent
> results (fsid type one thing, contents another) that's causing the ESTALEs.)

Yeah, I imagine we end up with the ESTALEs when the host is booted and can't identify the UUID for the filesystem. So fixing this may not be such a problem for clients, as long as we allow mountd to still look up filehandles that have fsids based on the dev_t.

Again, though -- step one is to fix the selinux policy, so maybe clone this bug as a selinux policy bug for that?

Comment 10 Fedora End Of Life 2016-11-25 07:20:25 UTC

This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Brian J. Murrell 2016-11-25 13:19:42 UTC

Can we have this ticket's version pushed to 24 or 25 even?  I had this happen to me just a few days ago on F24.

Comment 13 Steve Dickson 2016-11-29 18:15:25 UTC

I think this is fixed for both f24 and f25

f24# mount 127.0.0.1:/home /mnt/tmp
f24# stat -t /mnt/tmp
/mnt/tmp 4096 8 41ed 0 0 30 2 5 0 0 1472824218 1468937353 1468937353 0 32768 system_u:object_r:home_root_t:s0
f24# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
* 6 0x8910082781670ff10000000000000000 /home
* 1 0x00000000 /


f25# mount 127.0.0.1:/home /mnt/tmp
f25# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
* 6 0x02fd0000000000000000000000000000 /home
* 1 0x00000000 /


Or am I missing something?

Comment 14 Scott Mayhew 2016-11-29 19:18:49 UTC

(In reply to Steve Dickson from comment #13)
> I think this is fixed for both f24 and f25
> 

It's definitely still broken.

<snip/>
> 
> f25# mount 127.0.0.1:/home /mnt/tmp
> f25# cat /proc/net/rpc/nfsd.fh/content
> #domain fsidtype fsid [path]
> * 6 0x02fd0000000000000000000000000000 /home

FSID type 6 is FSID_UUID16.  That's definitely not a UUID in the FSID column though.  Those are the major & minor numbers for /dev/dm-2 (253, 2). 

> * 1 0x00000000 /
> 
> 
> Or am I missing something?

What happens if you do the following on that f25 machine right now:

# kill `pidof rpc.mountd`
# rpc.mountd
# stat -t /mnt/tmp/somefile     (stat an actual file, not the mountpoint itself)

Comment 15 Steve Dickson 2016-12-01 15:14:31 UTC

(In reply to Scott Mayhew from comment #14)
> (In reply to Steve Dickson from comment #13)
> > I think this is fixed for both f24 and f25
> > 
> 
> It's definitely still broken.
> 
> <snip/>
> > 
> > f25# mount 127.0.0.1:/home /mnt/tmp
> > f25# cat /proc/net/rpc/nfsd.fh/content
> > #domain fsidtype fsid [path]
> > * 6 0x02fd0000000000000000000000000000 /home
True... so this is being cause by SELinux?

> 
> FSID type 6 is FSID_UUID16.  That's definitely not a UUID in the FSID column
> though.  Those are the major & minor numbers for /dev/dm-2 (253, 2). 
> 
> > * 1 0x00000000 /
> > 
> > 
> > Or am I missing something?
> 
> What happens if you do the following on that f25 machine right now:
> 
> # kill `pidof rpc.mountd`
> # rpc.mountd
> # stat -t /mnt/tmp/somefile     (stat an actual file, not the mountpoint
> itself)

f25# kill `pidof rpc.mountd`
f25# rpc.mountd
f25# stat -t /mnt/tmp/foobar 
stat: cannot stat '/mnt/tmp/foobar': Stale file handle

But... if I do a systemctl restart nfs

f25# stat -t /mnt/tmp/foobar | tee /tmp/second
/mnt/tmp/foobar 0 0 81a4 3606 3606 30 25541 1 0 0 1480604755 1480604755 1480604755 0 262144 unconfined_u:object_r:home_root_t:s0

which is the exact same info returned when I did the stat -t 
before I kill rpc.mountd. 

So is it a bug that estale is returned when restarting rpc.mountd
and not the entire server? IDK...

Comment 16 Steve Dickson 2016-12-01 15:19:54 UTC

(In reply to Steve Dickson from comment #15)
> (In reply to Scott Mayhew from comment #14)
> > (In reply to Steve Dickson from comment #13)
> > > I think this is fixed for both f24 and f25
> > > 
> > 
> > It's definitely still broken.
> > 
> > <snip/>
> > > 
> > > f25# mount 127.0.0.1:/home /mnt/tmp
> > > f25# cat /proc/net/rpc/nfsd.fh/content
> > > #domain fsidtype fsid [path]
> > > * 6 0x02fd0000000000000000000000000000 /home
> True... so this is being cause by SELinux?
> 
I don't think so... 
f25# setenforce 0
f25# mount 127.0.0.1:/home/tmp /mnt/tmp
f25# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
* 6 0x02fd0000000000000000000000000000 /home
* 1 0x00000000 /

because the same value is returned with selinux disabled.

Comment 17 Scott Mayhew 2016-12-01 15:39:15 UTC

(In reply to Steve Dickson from comment #16)
> (In reply to Steve Dickson from comment #15)
> > (In reply to Scott Mayhew from comment #14)
> > > (In reply to Steve Dickson from comment #13)
> > > > I think this is fixed for both f24 and f25
> > > > 
> > > 
> > > It's definitely still broken.
> > > 
> > > <snip/>
> > > > 
> > > > f25# mount 127.0.0.1:/home /mnt/tmp
> > > > f25# cat /proc/net/rpc/nfsd.fh/content
> > > > #domain fsidtype fsid [path]
> > > > * 6 0x02fd0000000000000000000000000000 /home
> > True... so this is being cause by SELinux?
> > 
> I don't think so... 
> f25# setenforce 0
> f25# mount 127.0.0.1:/home/tmp /mnt/tmp
> f25# cat /proc/net/rpc/nfsd.fh/content
> #domain fsidtype fsid [path]
> * 6 0x02fd0000000000000000000000000000 /home
> * 1 0x00000000 /
> 
> because the same value is returned with selinux disabled.

The nfsd.fh cache entries have a really long lifetime, so chances are nfsd didn't even do another upcall to mountd.  If you restart the nfs-server service at this point you should probably see the UUID-based FSIDs.

Comment 18 Steve Dickson 2016-12-01 16:42:22 UTC

(In reply to Scott Mayhew from comment #17)
> (In reply to Steve Dickson from comment #16)
> > (In reply to Steve Dickson from comment #15)
> > > (In reply to Scott Mayhew from comment #14)
> > > > (In reply to Steve Dickson from comment #13)
> > > > > I think this is fixed for both f24 and f25
> > > > > 
> > > > 
> > > > It's definitely still broken.
> > > > 
> > > > <snip/>
> > > > > 
> > > > > f25# mount 127.0.0.1:/home /mnt/tmp
> > > > > f25# cat /proc/net/rpc/nfsd.fh/content
> > > > > #domain fsidtype fsid [path]
> > > > > * 6 0x02fd0000000000000000000000000000 /home
> > > True... so this is being cause by SELinux?
> > > 
> > I don't think so... 
> > f25# setenforce 0
> > f25# mount 127.0.0.1:/home/tmp /mnt/tmp
> > f25# cat /proc/net/rpc/nfsd.fh/content
> > #domain fsidtype fsid [path]
> > * 6 0x02fd0000000000000000000000000000 /home
> > * 1 0x00000000 /
> > 
> > because the same value is returned with selinux disabled.
> 
> The nfsd.fh cache entries have a really long lifetime, so chances are nfsd
> didn't even do another upcall to mountd.  If you restart the nfs-server
> service at this point you should probably see the UUID-based FSIDs.
I did... after the restart true UUIDs was being passed up from 
kernel... 

So my question is why isn't this a kernel bug? Since the kernel
is not passing up a ture UUID... 

and what is rpc.mountd suppose to do with an invalided UUID?
Fail the mount?

Comment 19 J. Bruce Fields 2016-12-02 22:20:09 UTC

It doesn't sound like Dave's analysis in the description and comments 1-5 still applies.

So, the top priority is to get the selinux policy fixed--who do we need for that?

Comment 20 J. Bruce Fields 2016-12-02 22:21:08 UTC

(In reply to J. Bruce Fields from comment #19)
> It doesn't sound like

(Sorry, I meant "It sounds like...").

Comment 21 J. Bruce Fields 2016-12-02 22:21:46 UTC

(In reply to J. Bruce Fields from comment #20)
> (In reply to J. Bruce Fields from comment #19)
> > It doesn't sound like
> 
> (Sorry, I meant "It sounds like...").

(And, s/Dave/Scott.  I think it's time for the weekend.)

Comment 22 Scott Mayhew 2016-12-08 21:52:27 UTC

I filed https://bugzilla.redhat.com/show_bug.cgi?id=1403017 for the selinux-policy.

Comment 25 Steve Dickson 2017-04-20 15:10:29 UTC


*** This bug has been marked as a duplicate of bug 1403017 ***

Note You need to log in before you can comment on or make changes to this bug.