Description of problem: I'm running a fully patched Fedora 17 client using NIS and NFS mounts for home directories. On some accounts when I run: "ls -al ~user1" the files are shown with proper ownership, on other accounts "ls -al ~user2" shows files owned by user ids such as 4294967294. I am not sure if a recent update caused this problem of if it was always happening as this is a new system. Version-Release number of selected component (if applicable): nfs-utils-1.2.6-3.fc17.x86_64 kernel 3.4.6-2 and 3.5.0 both tested How reproducible: Certain users consistently exhibit the behavior and some do not - testing seems to show about a 60% failure rate. Steps to Reproduce: 1. ls -al ~f17test1 | head -5 Actual results: # ls -al ~f17test1 total 1324 drwx------. 38 4294967294 student 4096 Aug 6 11:30 . drwxr-xr-x. 25 root root 4096 May 17 10:09 .. drwx------. 4 4294967294 student 4096 Jul 26 13:38 .abrt drwx------. 4 4294967294 student 4096 Jul 13 14:03 .adobe Expected results: # ls -al ~f17test1 | head -5 total 1324 drwx------. 38 f17test1 student 4096 Aug 6 11:30 . drwxr-xr-x. 25 root root 4096 May 17 10:09 .. drwx------. 4 f17test1 student 4096 Jul 26 13:38 .abrt drwx------. 4 f17test1 student 4096 Jul 13 14:03 .adobe Additional info: NFS Server is fully patched RHEL 6.3. # resolveip 10.184.11.40 Host name of 10.184.11.40 is f17test.csbsju.edu # dnsdomainname csbsju.edu # grep -v '^#' /etc/idmapd.conf | uniq [General] Verbosity = 3 Pipefs-Directory = /var/lib/nfs/rpc_pipefs Domain = csbsju.edu [Mapping] Nobody-User = nfsnobody Nobody-Group = nfsnobody [Translation] Method = nsswitch # cat /etc/resolv.conf nameserver 10.185.10.25 nameserver 10.185.10.26 nameserver 10.185.10.27 domain csbsju.edu search csbsju.edu computing.csbsju.edu physics.csbsju.edu cs.csbsju.edu math.csbsju.edu ad.csbsju.edu # cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # grep -v '^#' /etc/nsswitch.conf | uniq passwd: files nis shadow: files nis group: files nis hosts: files mdns4_minimal [NOTFOUND=return] dns bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files netgroup: files publickey: nisplus automount: files aliases: files nisplus # cat /etc/request-key.d/id_resolver.conf # # nfsidmap(5) - The NFS idmapper upcall program # Summary: Used by NFSv4 to map user/group ids into # user/group names and names into in ids # Options: # -v Increases the verbosity of the output to syslog # -t timeout Set the expiration timer, in seconds, on the key # create id_resolver * * /usr/sbin/nfsidmap -v %k %d The debug output in /var/log/messages for a successful lookup: Aug 9 10:48:23 f17test nfsidmap[997]: key: 0x119ec5b4 type: uid value: pmcarr timeout 600 Aug 9 10:48:23 f17test nfsidmap[997]: libnfsidmap: using domain: csbsju.edu Aug 9 10:48:23 f17test nfsidmap[997]: libnfsidmap: Realms list: 'CSBSJU.EDU' Aug 9 10:48:23 f17test nfsidmap[997]: libnfsidmap: processing 'Method' list Aug 9 10:48:23 f17test nfsidmap[997]: libnfsidmap: loaded plugin /lib64/libnfsidmap/nsswitch.so for method nsswitch When running ls -al on a home directory where it fails, there is no record in /var/log/messages. The /var/log/messages relating to nfsidmap appear only when rebooted on a portion of my full list of users, additional messages do not appear when actually running the ls -al command, which I *think* isn't how it used to work. Grepping the message log for the username that failed shows no matches. Please let me know what additional information you need on this, thanks, Josh
Created attachment 603302 [details] sos report SOS Report for Fedora 17 client.
Forgot to mention that the existing RHEL 6.3 NFS server is showing no idmap issues with our 80+ Fedora 14 clients. It's only the Fedora 17 system that is having problems right now.
Ok wow so I just figured out what the culprit is here. It's lxdm. I switched from GDM to LXDM so I could have XFCE be the default window manager (another separate Fedora bug if you ask me) and when LXDM displays the username chooser on boot up it runs a large number of nfsidmap lookups - but apparently not enough. By disabling the userlist in /etc/lxdm/lxdm.conf and rebooting all my nfsidmap lookups are working normally. Still a bug somewhere - but easier to replicate hopefully: 1. setup an nfs4 / nis environment 2. install lxdm 3. set lxdm as displaymanager vi /etc/sysconfig/desktop or other means 4. reboot When lxdm displays the username chooser for users who have previously logged into system, check for bad nfsidmaps in home directories, etc. Change in /etc/lxdm/lxdm.conf: [userlist] ## if disable the user list control at greeter disable=1 Reboot nfsidmap lookups should behave normally. Thanks, Josh
Huh. I wonder what lxdm does that's triggering the problem. Would it be possible to get a network trace showing the nfs traffic while this is going on? So, at step 4, start a tcpdump: tcpdump -s0 -wtmp.pcap 'host myclient && host myserver' Then kill tcpdump after the problem reproduces. You can attach the resulting tmp.pcap and/or look at it yourself in wireshark. What we're looking for is replies to GETATTR calls which request the OWNER or OWNER_GROUP attributes. They should all look like name@domain. If they all look right, then that confirms the problem is with the client-side idmapping. You should also be able to add the "-v" (or "-vvvv" if necessary") option to the nfsidmap commandline in /etc/request-key.conf to get some more debugging.
Sorry for delayed reply - been working on this image for a while and haven't been able to get to your request. I have a question though - the problem happens during boot up when the initial LXDM screen is shown - I assume I'd need to put this tcpdump command into an RC script somewhere? Alas, I had a user report that he was seeing these funky uid/gid's on the system on some files despite the fix I put in so there may be more going on here... Thanks, Josh
(In reply to comment #5) > Sorry for delayed reply - been working on this image for a while and haven't > been able to get to your request. > > I have a question though - the problem happens during boot up when the > initial LXDM screen is shown - I assume I'd need to put this tcpdump command > into an RC script somewhere? Hm, could be, I'm not sure where exactly to suggest. Note that will produce huge amounts of data (it writes all network traffic to tmp.pcap), so it's not something you want running all the time. Best would be if you can start tcpdump and then LXDM by hand, and then stop tcpdump as soon as you've seen the problem reproduced. > Alas, I had a user report that he was seeing these funky uid/gid's on the > system on some files despite the fix I put in so there may be more going on > here... > > Thanks, > > Josh
I wonder if this is a dup of bz 829362
We have a similar problem with a fedora 17 client (nfs-utils-1.2.6-3) and a fedora 16 nfs server. It seems that the problem appears for a few user when a large number of nfs id lookups is requested in a short time, e.g. when on the client I # ls -l /home/misc/dmfmail/ This path contains a nfs mounted folder with the inbox of all users (approx 200). For a few users an uid of 4294967294 is shown. Apparently for *exactly* those users there isn't an entry in /proc/keys, whereas for all the others I see an entry in /proc/keys I hope this helps. I also tried the nfs-utils-1.2.6-5 patch in the updates-testing repository which did not solve the problem.
There are two facts that seem strange to me: 1. A typical entry of /proc/keys in a Fedora 16 looks like: 1e43f4d4 I--Q--- 1 8m 3f010000 0 0 id_legacy uid:user.unicatt.it: 4 with an expiry that is less than 10m (consistent with the default 600 seconds expiration time) whereas on a Fedora 17 I see: 3fad2d7c I--Q--- 1 perm 3f010000 0 0 id_resolv uid:user.unicatt.it: 4 which seems to be *permanent* 2. cat /proc/key-users gives $ cat /proc/key-users 0: 204 203/203 199/200 6571/20000 [...] which seems to imply that there is a maximum of 200 keys that can be allocates.
It might be possible to try and increase the number of available keys: echo 10000 > /proc/sys/kernel/keys/maxkeys echo 10000 > /proc/sys/kernel/keys/root_maxkeys Does this solve anything?
This solves my problem! Indeed we had just about 200 users which clearly caused the creation of a number of keys larger than the allowed maximum. However I think that this should only be considered a workaround, and that this kind of keys should *not* contribute to the quota. Moreover they perhaps should not be "permanent", since it is possible that the mapping will change on the nfs server when users are removed/created.
Why *not a bug*? It seems to me that there *is* something wrong, if not with the software, then with some default setting... How is a system administrator supposed to realize that this strange behaviour of nfs is due to some too small kernel quota somewhere?
I agree with Maurizio... At a minimum something should be logged about the key limit being exceeded.
(In reply to comment #12) > Why *not a bug*? It seems to me that there *is* something wrong, if not with > the software, then with some default setting... How is a system administrator > supposed to realize that this strange behaviour of nfs is due to some too > small kernel quota somewhere? I closed it as not a bug in the actual id mapping code. I agree the maxkeys/ root_maxkeys are set too small and will deal with it in bz876705 *** This bug has been marked as a duplicate of bug 876705 ***