This was observed on ia64 Altix. Before submitting this, I personally confirmed it is still an issue in RHEL5 RC Snapshot3. This was reported by Michel and what follows are a combination of his description and my edits. RHEL5 RC Snapshot 3 systems do not cope with filesystems exported with the -nohide option. This is important as the entire list of exported fs on some of SGI's critical servers have filesystems exported with the nohide option. It's likely SGI customers will hit this as well. This issue only happens when selinux is going. When we boot with selinux=0, the issue goes away and all works well. Example showing the problem: root@minime1 log]# ls /hosts/bonnie.engr.sgi.com/isms ls: /hosts/bonnie.engr.sgi.com/isms: Invalid argument This log entry is produced: Dec 15 11:53:15 minime1 kernel: SELinux: initialized (dev 0:29, type nfs), uses genfs_contexts Dec 15 11:53:16 minime1 kernel: SELinux: security_context_to_sid(0) failed for (dev 0:2a, type nfs) errno=-22 To reproduce: - server /etc/exports / /mnt -nohide - client: mount server:/ /server ls /server works ls /server/mnt fail Example: [root@raspberry michel]# ls /tmp/michel/mnt/tmp ls: /tmp/michel/mnt/tmp: Invalid argument Log: Dec 13 05:57:42 raspberry kernel: SELinux: security_context_to_sid(�) failed for (dev 0:18, type nfs) errno=-22 but mount server:/mnt /server ls /server works More facts: - I can't reproduce the failure on Sles10 servers. - This is not related to automount/autofs since this is reproducable using static mounts. - This happens only when selinux is active. - This means, at least in our research, that autofs is a victum. Considering that the linux exports man page states that nohide should only be used with care, the biggiest issue is customers that have IRIX servers where -nohide was used more often. Those customers will run in to this problem more than customers who have an all-linux shop.
More data. The 2.6.18-8.el5 kernel ( release 5 (Tikanga) ) is ok on x86_64 and not on ia64 in regards to this problem. Any thoughts ? Note the x86_64 kernel is xen'ed.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I'm sorry that this just got to the top of my list today. The problem is on the client end correct? I just tried with an ia64 client system running F7Test4 and it seemed to work so maybe I need to go back to RHEL5. - I can't reproduce the failure on Sles10 servers. I assume that "servers" in this case actually refers to nfs clients? I'm looking at it but I assume that the nfs_mount_data which is getting handed to vfs_kern_mount is corrupted in some way. I'll have to keep working backwards to find the reason for that.....
Some answers: - observed with rhel5 NFS clients machines with selinux=1. - not observed with sles10 NFS clients machines ( they don't have selinux ;-)
My server /etc/exports has /level1 * /level1/level2 *(nohide) On my client (RHEL5) [root@diablo2 kernel]# uname -a Linux diablo2.hahahahaha 2.6.18-8.el5 #1 SMP Fri Jan 26 14:16:09 EST 2007 ia64 ia64 ia64 GNU/Linux [root@diablo2 kernel]# mount $SERVER:/level1 /mnt/level1 [root@diablo2 kernel]# ls /mnt/level1 level1.file1 level1.file2 level1.file3 level1.file4 level1.file5 level2 [root@diablo2 kernel]# ls /mnt/level1/level2 level2.file1 level2.file2 level2.file3 level2.file4 level2.file5 [root@diablo2 kernel]# dmesg SELinux: initialized (dev 0:19, type nfs), uses genfs_contexts
hmmmm, i just found The nohide option is currently only effective on single host exports. It does not work reliably with netgroup, subnet, or wildcard exports. in the man page, let me play with that on my nfs server.....
hmmmm, my /etc/exports on the nfs server is: /level1 client.example.com /level1/level2 client.example.com(nohide) and still no problems.....
Ok, I reproduced, I'll get to hunting. Sorry for the last 3 useless comments. [root@diablo2 ~]# dmesg -c SELinux: initialized (dev 0:19, type nfs), uses genfs_contexts [root@diablo2 ~]# !mount mount $SERVER:/level1 /mnt/level1 [root@diablo2 ~]# dmesg SELinux: initialized (dev 0:19, type nfs), uses genfs_contexts [root@diablo2 ~]# ls /mnt/level1 level1.file1 level1.file2 level1.file3 level1.file4 level1.file5 level2 [root@diablo2 ~]# dmesg SELinux: initialized (dev 0:19, type nfs), uses genfs_contexts [root@diablo2 ~]# /mnt/level1/level2 -bash: /mnt/level1/level2: Invalid argument [root@diablo2 ~]# dmesg SELinux: initialized (dev 0:19, type nfs), uses genfs_contexts SELinux: security_context_to_sid(@) failed for (dev 0:1a, type nfs) errno=-22 SELinux: security_context_to_sid(@) failed for (dev 0:1a, type nfs) errno=-22 SELinux: security_context_to_sid(@) failed for (dev 0:1a, type nfs) errno=-22
Thanks, Eric! :-)
static struct vfsmount *nfs_do_submount(const struct vfsmount *mnt_parent, const struct dentry *dentry, struct nfs_fh *fh, struct nfs_fattr *fattr) { struct nfs_clone_mount mountdata = { .sb = mnt_parent->mnt_sb, .dentry = dentry, .fh = fh, .fattr = fattr, }; [snip] mnt = nfs_do_clone_mount(NFS_SB(mnt_parent->mnt_sb), devname, &mountdata); which yada yada yada mountdata gets passed down and down until we get to security/selinux/hooks.c:try_context_mount() where we get this little piece of code: if (sb->s_type->fs_flags & FS_BINARY_MOUNTDATA) { /* NFS we understand. */ if (!strcmp(name, "nfs")) { struct nfs_mount_data *d = data; if (d->version < NFS_MOUNT_VERSION) goto out; if (d->context[0]) { context = d->context; seen |= Opt_context; } which means we cast a struct nfs_clone_mount into a struct nfs_mount_data. struct nfs_clone_mount doesn't have a ->context so now context is set to some random crap which i can only guess is 0 on x86_64 and non-zero on ia64. And so down below where we try to use context it doesn't work so well. Now I see what's wrong, i'll have to think of a way to fix it.....
Created attachment 154074 [details] possible fix for this issue, need to discuss with upstream nfs people.
Upstream NFS laughed at me. The upstream fix is going to take a VFS change which I'm willing to bet will be a kABI breaker for RHEL5. I'll have to think about the best way to solve this problem a little, I'll post something really ugly and dirty before wednesday for RHEL5 and see what internal people think.....
I posted a god-aweful hack to the internal list for 5.1. We'll see what they say. I'm still working upstream for a long term supportable solution.
in 2.6.18-32.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html