847084 – nfsidmap failing to show id of some users when using LXDM greeter user list

Bug 847084 - nfsidmap failing to show id of some users when using LXDM greeter user list

Summary: nfsidmap failing to show id of some users when using LXDM greeter user list

Keywords:
Status:	CLOSED DUPLICATE of bug 876705
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nfs-utils
Sub Component:
Version:	17
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-09 16:14 UTC by Josh Trutwin
Modified:	2012-11-15 08:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-11-08 16:17:25 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sos report (1.23 MB, application/octet-stream) 2012-08-09 16:17 UTC, Josh Trutwin	no flags	Details
View All

Description Josh Trutwin 2012-08-09 16:14:26 UTC

Description of problem:


I'm running a fully patched Fedora 17 client using NIS and NFS mounts for home directories.  On some accounts when I run: "ls -al ~user1" the files are shown with proper ownership, on other accounts "ls -al ~user2" shows files owned by user ids such as 4294967294.  I am not sure if a recent update caused this problem of if it was always happening as this is a new system.


Version-Release number of selected component (if applicable):

nfs-utils-1.2.6-3.fc17.x86_64
kernel 3.4.6-2 and 3.5.0 both tested


How reproducible:


Certain users consistently exhibit the behavior and some do not - testing seems to show about a 60% failure rate. 


Steps to Reproduce:
1. ls -al ~f17test1 | head -5

 
Actual results:


# ls -al ~f17test1
total 1324
drwx------. 38 4294967294 student   4096 Aug  6 11:30 .
drwxr-xr-x. 25 root       root      4096 May 17 10:09 ..
drwx------.  4 4294967294 student   4096 Jul 26 13:38 .abrt
drwx------.  4 4294967294 student   4096 Jul 13 14:03 .adobe


Expected results:


# ls -al ~f17test1 | head -5
total 1324
drwx------. 38 f17test1   student   4096 Aug  6 11:30 .
drwxr-xr-x. 25 root       root      4096 May 17 10:09 ..
drwx------.  4 f17test1   student   4096 Jul 26 13:38 .abrt
drwx------.  4 f17test1   student   4096 Jul 13 14:03 .adobe


Additional info:

NFS Server is fully patched RHEL 6.3.


# resolveip 10.184.11.40
Host name of 10.184.11.40 is f17test.csbsju.edu


# dnsdomainname
csbsju.edu




# grep -v '^#' /etc/idmapd.conf | uniq
[General]

Verbosity = 3
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
Domain = csbsju.edu

[Mapping]

Nobody-User = nfsnobody
Nobody-Group = nfsnobody

[Translation]

Method = nsswitch




# cat /etc/resolv.conf 
nameserver 10.185.10.25
nameserver 10.185.10.26
nameserver 10.185.10.27

domain csbsju.edu
search csbsju.edu computing.csbsju.edu physics.csbsju.edu cs.csbsju.edu math.csbsju.edu ad.csbsju.edu





# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6





# grep -v '^#' /etc/nsswitch.conf | uniq

passwd:     files nis
shadow:     files nis
group:      files nis

hosts:      files mdns4_minimal [NOTFOUND=return] dns

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files

netgroup:   files

publickey:  nisplus

automount:  files
aliases:    files nisplus









# cat /etc/request-key.d/id_resolver.conf 
#
# nfsidmap(5) - The NFS idmapper upcall program
# Summary: Used by NFSv4 to map user/group ids into 
#          user/group names and names into in ids
# Options:
# -v         Increases the verbosity of the output to syslog
# -t timeout Set the expiration timer, in seconds, on the key
#
create    id_resolver    *         *    /usr/sbin/nfsidmap -v %k %d





The debug output in /var/log/messages for a successful lookup:

Aug  9 10:48:23 f17test nfsidmap[997]: key: 0x119ec5b4 type: uid value: pmcarr timeout 600
Aug  9 10:48:23 f17test nfsidmap[997]: libnfsidmap: using domain: csbsju.edu
Aug  9 10:48:23 f17test nfsidmap[997]: libnfsidmap: Realms list: 'CSBSJU.EDU'
Aug  9 10:48:23 f17test nfsidmap[997]: libnfsidmap: processing 'Method' list
Aug  9 10:48:23 f17test nfsidmap[997]: libnfsidmap: loaded plugin /lib64/libnfsidmap/nsswitch.so for method nsswitch


When running ls -al on a home directory where it fails, there is no record in /var/log/messages.  The /var/log/messages relating to nfsidmap appear only when rebooted on a portion of my full list of users, additional messages do not appear when actually running the ls -al command, which I *think* isn't how it used to work.  Grepping the message log for the username that failed shows no matches.

Please let me know what additional information you need on this, thanks,

Josh

Comment 1 Josh Trutwin 2012-08-09 16:17:52 UTC

Created attachment 603302 [details]
sos report

SOS Report for Fedora 17 client.

Comment 2 Josh Trutwin 2012-08-09 16:18:50 UTC

Forgot to mention that the existing RHEL 6.3 NFS server is showing no idmap issues with our 80+ Fedora 14 clients.  It's only the Fedora 17 system that is having problems right now.

Comment 3 Josh Trutwin 2012-08-09 17:21:49 UTC

Ok wow so I just figured out what the culprit is here.  It's lxdm.  I switched from GDM to LXDM so I could have XFCE be the default window manager (another separate Fedora bug if you ask me) and when LXDM displays the username chooser on boot up it runs a large number of nfsidmap lookups - but apparently not enough.  By disabling the userlist in /etc/lxdm/lxdm.conf and rebooting all my nfsidmap lookups are working normally.

Still a bug somewhere - but easier to replicate hopefully:

1. setup an nfs4 / nis environment
2. install lxdm
3. set lxdm as displaymanager vi /etc/sysconfig/desktop or other means
4. reboot

When lxdm displays the username chooser for users who have previously logged into system, check for bad nfsidmaps in home directories, etc.  

Change in /etc/lxdm/lxdm.conf:

[userlist]
## if disable the user list control at greeter
disable=1

Reboot

nfsidmap lookups should behave normally.

Thanks,

Josh

Comment 4 J. Bruce Fields 2012-08-09 22:38:47 UTC

Huh.  I wonder what lxdm does that's triggering the problem.

Would it be possible to get a network trace showing the nfs traffic while this is going on?

So, at step 4, start a tcpdump:

  tcpdump -s0 -wtmp.pcap 'host myclient && host myserver'

Then kill tcpdump after the problem reproduces.

You can attach the resulting tmp.pcap and/or look at it yourself in wireshark.  What we're looking for is replies to GETATTR calls which request the OWNER or OWNER_GROUP attributes.  They should all look like name@domain.  If they all look right, then that confirms the problem is with the client-side idmapping.

You should also be able to add the "-v" (or "-vvvv" if necessary") option to the nfsidmap commandline in /etc/request-key.conf to get some more debugging.

Comment 5 Josh Trutwin 2012-08-16 19:32:56 UTC

Sorry for delayed reply - been working on this image for a while and haven't been able to get to your request.

I have a question though - the problem happens during boot up when the initial LXDM screen is shown - I assume I'd need to put this tcpdump command into an RC script somewhere?

Alas, I had a user report that he was seeing these funky uid/gid's on the system on some files despite the fix I put in so there may be more going on here...

Thanks,

Josh

Comment 6 J. Bruce Fields 2012-08-16 19:38:49 UTC

(In reply to comment #5)
> Sorry for delayed reply - been working on this image for a while and haven't
> been able to get to your request.
> 
> I have a question though - the problem happens during boot up when the
> initial LXDM screen is shown - I assume I'd need to put this tcpdump command
> into an RC script somewhere?

Hm, could be, I'm not sure where exactly to suggest.  Note that will produce huge amounts of data (it writes all network traffic to tmp.pcap), so it's not something you want running all the time.  Best would be if you can start tcpdump and then LXDM by hand, and then stop tcpdump as soon as you've seen the problem reproduced.

> Alas, I had a user report that he was seeing these funky uid/gid's on the
> system on some files despite the fix I put in so there may be more going on
> here...
> 
> Thanks,
> 
> Josh

Comment 7 Steve Dickson 2012-10-14 02:41:11 UTC

I wonder if this is a dup of bz 829362

Comment 8 Maurizio Paolini 2012-11-05 15:32:11 UTC

We have a similar problem with a fedora 17 client (nfs-utils-1.2.6-3)
and a fedora 16 nfs server.

It seems that the problem appears for a few user when a large number of
nfs id lookups is requested in a short time, e.g. when on the client I

# ls -l /home/misc/dmfmail/

This path contains a nfs mounted folder with the inbox of all users (approx 200).
For a few users an uid of 4294967294 is shown.

Apparently for *exactly* those users there isn't an entry in /proc/keys, whereas
for all the others I see an entry in /proc/keys

I hope this helps.  I also tried the nfs-utils-1.2.6-5 patch in the updates-testing repository which did not solve the problem.

Comment 9 Maurizio Paolini 2012-11-06 17:12:06 UTC

There are two facts that seem strange to me:

1. A typical entry of /proc/keys in a Fedora 16 looks like:

1e43f4d4 I--Q--- 1  8m 3f010000   0   0 id_legacy uid:user.unicatt.it: 4

with an expiry that is less than 10m (consistent with the default 600 seconds
expiration time)

whereas on a Fedora 17 I see:

3fad2d7c I--Q--- 1 perm 3f010000 0 0 id_resolv uid:user.unicatt.it: 4

which seems to be *permanent*


2. cat /proc/key-users gives

$ cat /proc/key-users 
    0:   204 203/203 199/200 6571/20000
[...]

which seems to imply that there is a maximum of 200 keys that can be allocates.

Comment 10 Luca Giuzzi 2012-11-08 10:20:13 UTC

It might be possible to try and increase the number of available keys:

echo 10000 > /proc/sys/kernel/keys/maxkeys
echo 10000 > /proc/sys/kernel/keys/root_maxkeys

Does this solve anything?

Comment 11 Maurizio Paolini 2012-11-08 11:55:24 UTC

This solves my problem! Indeed we had just about 200 users which clearly
caused the creation of a number of keys larger than the allowed maximum.

However I think that this should only be considered a workaround, and that
this kind of keys should *not* contribute to the quota.

Moreover they perhaps should not be "permanent", since it is possible that
the mapping will change on the nfs server when users are removed/created.

Comment 12 Maurizio Paolini 2012-11-08 19:15:08 UTC

Why *not a bug*? It seems to me that there *is* something wrong, if not with
the software, then with some default setting... How is a system administrator
supposed to realize that this strange behaviour of nfs is due to some too small
kernel quota somewhere?

Comment 13 Josh Trutwin 2012-11-14 17:59:31 UTC

I agree with Maurizio...  At a minimum something should be logged about the key limit being exceeded.

Comment 14 Steve Dickson 2012-11-15 08:56:05 UTC

(In reply to comment #12)
> Why *not a bug*? It seems to me that there *is* something wrong, if not with
> the software, then with some default setting... How is a system administrator
> supposed to realize that this strange behaviour of nfs is due to some too
> small kernel quota somewhere?
I closed it as not a bug in the actual id mapping code. I agree the maxkeys/
root_maxkeys are set too small and will deal with it in bz876705

*** This bug has been marked as a duplicate of bug 876705 ***

Note You need to log in before you can comment on or make changes to this bug.