Description of problem:
I'm running a fully patched Fedora 17 client using NIS and NFS mounts for home directories. On some accounts when I run: "ls -al ~user1" the files are shown with proper ownership, on other accounts "ls -al ~user2" shows files owned by user ids such as 4294967294. I am not sure if a recent update caused this problem of if it was always happening as this is a new system.
Version-Release number of selected component (if applicable):
kernel 3.4.6-2 and 3.5.0 both tested
Certain users consistently exhibit the behavior and some do not - testing seems to show about a 60% failure rate.
Steps to Reproduce:
1. ls -al ~f17test1 | head -5
# ls -al ~f17test1
drwx------. 38 4294967294 student 4096 Aug 6 11:30 .
drwxr-xr-x. 25 root root 4096 May 17 10:09 ..
drwx------. 4 4294967294 student 4096 Jul 26 13:38 .abrt
drwx------. 4 4294967294 student 4096 Jul 13 14:03 .adobe
# ls -al ~f17test1 | head -5
drwx------. 38 f17test1 student 4096 Aug 6 11:30 .
drwxr-xr-x. 25 root root 4096 May 17 10:09 ..
drwx------. 4 f17test1 student 4096 Jul 26 13:38 .abrt
drwx------. 4 f17test1 student 4096 Jul 13 14:03 .adobe
NFS Server is fully patched RHEL 6.3.
# resolveip 10.184.11.40
Host name of 10.184.11.40 is f17test.csbsju.edu
# grep -v '^#' /etc/idmapd.conf | uniq
Verbosity = 3
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
Domain = csbsju.edu
Nobody-User = nfsnobody
Nobody-Group = nfsnobody
Method = nsswitch
# cat /etc/resolv.conf
search csbsju.edu computing.csbsju.edu physics.csbsju.edu cs.csbsju.edu math.csbsju.edu ad.csbsju.edu
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# grep -v '^#' /etc/nsswitch.conf | uniq
passwd: files nis
shadow: files nis
group: files nis
hosts: files mdns4_minimal [NOTFOUND=return] dns
bootparams: nisplus [NOTFOUND=return] files
aliases: files nisplus
# cat /etc/request-key.d/id_resolver.conf
# nfsidmap(5) - The NFS idmapper upcall program
# Summary: Used by NFSv4 to map user/group ids into
# user/group names and names into in ids
# -v Increases the verbosity of the output to syslog
# -t timeout Set the expiration timer, in seconds, on the key
create id_resolver * * /usr/sbin/nfsidmap -v %k %d
The debug output in /var/log/messages for a successful lookup:
Aug 9 10:48:23 f17test nfsidmap: key: 0x119ec5b4 type: uid value: email@example.com timeout 600
Aug 9 10:48:23 f17test nfsidmap: libnfsidmap: using domain: csbsju.edu
Aug 9 10:48:23 f17test nfsidmap: libnfsidmap: Realms list: 'CSBSJU.EDU'
Aug 9 10:48:23 f17test nfsidmap: libnfsidmap: processing 'Method' list
Aug 9 10:48:23 f17test nfsidmap: libnfsidmap: loaded plugin /lib64/libnfsidmap/nsswitch.so for method nsswitch
When running ls -al on a home directory where it fails, there is no record in /var/log/messages. The /var/log/messages relating to nfsidmap appear only when rebooted on a portion of my full list of users, additional messages do not appear when actually running the ls -al command, which I *think* isn't how it used to work. Grepping the message log for the username that failed shows no matches.
Please let me know what additional information you need on this, thanks,
Created attachment 603302 [details]
SOS Report for Fedora 17 client.
Forgot to mention that the existing RHEL 6.3 NFS server is showing no idmap issues with our 80+ Fedora 14 clients. It's only the Fedora 17 system that is having problems right now.
Ok wow so I just figured out what the culprit is here. It's lxdm. I switched from GDM to LXDM so I could have XFCE be the default window manager (another separate Fedora bug if you ask me) and when LXDM displays the username chooser on boot up it runs a large number of nfsidmap lookups - but apparently not enough. By disabling the userlist in /etc/lxdm/lxdm.conf and rebooting all my nfsidmap lookups are working normally.
Still a bug somewhere - but easier to replicate hopefully:
1. setup an nfs4 / nis environment
2. install lxdm
3. set lxdm as displaymanager vi /etc/sysconfig/desktop or other means
When lxdm displays the username chooser for users who have previously logged into system, check for bad nfsidmaps in home directories, etc.
Change in /etc/lxdm/lxdm.conf:
## if disable the user list control at greeter
nfsidmap lookups should behave normally.
Huh. I wonder what lxdm does that's triggering the problem.
Would it be possible to get a network trace showing the nfs traffic while this is going on?
So, at step 4, start a tcpdump:
tcpdump -s0 -wtmp.pcap 'host myclient && host myserver'
Then kill tcpdump after the problem reproduces.
You can attach the resulting tmp.pcap and/or look at it yourself in wireshark. What we're looking for is replies to GETATTR calls which request the OWNER or OWNER_GROUP attributes. They should all look like name@domain. If they all look right, then that confirms the problem is with the client-side idmapping.
You should also be able to add the "-v" (or "-vvvv" if necessary") option to the nfsidmap commandline in /etc/request-key.conf to get some more debugging.
Sorry for delayed reply - been working on this image for a while and haven't been able to get to your request.
I have a question though - the problem happens during boot up when the initial LXDM screen is shown - I assume I'd need to put this tcpdump command into an RC script somewhere?
Alas, I had a user report that he was seeing these funky uid/gid's on the system on some files despite the fix I put in so there may be more going on here...
(In reply to comment #5)
> Sorry for delayed reply - been working on this image for a while and haven't
> been able to get to your request.
> I have a question though - the problem happens during boot up when the
> initial LXDM screen is shown - I assume I'd need to put this tcpdump command
> into an RC script somewhere?
Hm, could be, I'm not sure where exactly to suggest. Note that will produce huge amounts of data (it writes all network traffic to tmp.pcap), so it's not something you want running all the time. Best would be if you can start tcpdump and then LXDM by hand, and then stop tcpdump as soon as you've seen the problem reproduced.
> Alas, I had a user report that he was seeing these funky uid/gid's on the
> system on some files despite the fix I put in so there may be more going on
I wonder if this is a dup of bz 829362
We have a similar problem with a fedora 17 client (nfs-utils-1.2.6-3)
and a fedora 16 nfs server.
It seems that the problem appears for a few user when a large number of
nfs id lookups is requested in a short time, e.g. when on the client I
# ls -l /home/misc/dmfmail/
This path contains a nfs mounted folder with the inbox of all users (approx 200).
For a few users an uid of 4294967294 is shown.
Apparently for *exactly* those users there isn't an entry in /proc/keys, whereas
for all the others I see an entry in /proc/keys
I hope this helps. I also tried the nfs-utils-1.2.6-5 patch in the updates-testing repository which did not solve the problem.
There are two facts that seem strange to me:
1. A typical entry of /proc/keys in a Fedora 16 looks like:
1e43f4d4 I--Q--- 1 8m 3f010000 0 0 id_legacy uid:firstname.lastname@example.org: 4
with an expiry that is less than 10m (consistent with the default 600 seconds
whereas on a Fedora 17 I see:
3fad2d7c I--Q--- 1 perm 3f010000 0 0 id_resolv uid:email@example.com: 4
which seems to be *permanent*
2. cat /proc/key-users gives
$ cat /proc/key-users
0: 204 203/203 199/200 6571/20000
which seems to imply that there is a maximum of 200 keys that can be allocates.
It might be possible to try and increase the number of available keys:
echo 10000 > /proc/sys/kernel/keys/maxkeys
echo 10000 > /proc/sys/kernel/keys/root_maxkeys
Does this solve anything?
This solves my problem! Indeed we had just about 200 users which clearly
caused the creation of a number of keys larger than the allowed maximum.
However I think that this should only be considered a workaround, and that
this kind of keys should *not* contribute to the quota.
Moreover they perhaps should not be "permanent", since it is possible that
the mapping will change on the nfs server when users are removed/created.
Why *not a bug*? It seems to me that there *is* something wrong, if not with
the software, then with some default setting... How is a system administrator
supposed to realize that this strange behaviour of nfs is due to some too small
kernel quota somewhere?
I agree with Maurizio... At a minimum something should be logged about the key limit being exceeded.
(In reply to comment #12)
> Why *not a bug*? It seems to me that there *is* something wrong, if not with
> the software, then with some default setting... How is a system administrator
> supposed to realize that this strange behaviour of nfs is due to some too
> small kernel quota somewhere?
I closed it as not a bug in the actual id mapping code. I agree the maxkeys/
root_maxkeys are set too small and will deal with it in bz876705
*** This bug has been marked as a duplicate of bug 876705 ***