797209 – NFS3 mounts hang on clients after getting "nfsd: Dropping request; may be revisited later" on server

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 797209 - NFS3 mounts hang on clients after getting "nfsd: Dropping request; may be revisited later" on server

Summary: NFS3 mounts hang on clients after getting "nfsd: Dropping request; may be re...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	6.2
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	Petr Beňas
Docs Contact:
URL:
Whiteboard:	NFS server
Duplicates (1):	798251 (view as bug list)
Depends On:
Blocks:	727267 856347
TreeView+	depends on / blocked

Reported:	2012-02-24 15:03 UTC by Stefan Walter
Modified:	2018-11-30 22:30 UTC (History)
CC List:	14 users (show)
Fixed In Version:	nfs-utils-1.2.3-35.el6
Doc Type:	Bug Fix
Doc Text:	Cause: Bug in rpc.mound Consequence: Causes the client to hang Fix: a number upstream patches Result: The bug is fixed.
Clone Of:
Environment:
Last Closed:	2013-02-21 10:55:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
try resolving client as either ip address or list of "domains" (1.76 KB, patch) 2012-04-20 15:06 UTC, J. Bruce Fields	no flags	Details \| Diff
make use_ipaddr clients self-describing (4.46 KB, patch) 2012-05-02 02:06 UTC, J. Bruce Fields	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0468	0	normal	SHIPPED_LIVE	nfs-utils bug fix update	2013-02-20 21:07:08 UTC

Description Stefan Walter 2012-02-24 15:03:20 UTC

Description of problem:

We run a 3-node cluster of RHEL6.2 NFS servers with cluster suite in an
active-active-active setup. When we relocate one service to another node
the clients that have mounted a share from the service that was already running
on that node start to hang. When we enable debugging we see that after
the relocation nfsd starts to log this:

  nfsd: Dropping request; may be revisited later
  nfsd: Dropping request; may be revisited later

Version-Release number of selected component (if applicable):

  kernel-2.6.32-220.4.2.el6.x86_64
  nfs-utils-1.2.3-15.el6.x86_64

How reproducible:

We could not reproduce this on a simple NFS server just by adding and moving
exports or generating load. We believe this is something that is triggered
by the special logic in the cluster suite.

Additional info:

When googling for this error I see that this error message seems to be
related to NFS4. We already have disabled NFS4 via '-N 4' in
/etc/sysconfig/nfs.

Comment 2 Stefan Walter 2012-02-27 15:13:12 UTC

After some testing I have now built a new kernel and added a dump_stack() in
svc_defer() net/sunrpc/svc_xprt.c. The result is the following:

Pid: 8051, comm: nfsd Not tainted 2.6.32-220.4.2.4.el6.x86_64 #1
Call Trace:
 [<ffffffffa050391e>] ? svc_defer+0x9e/0x160 [sunrpc]
 [<ffffffffa04fec55>] ? cache_check+0x135/0x360 [sunrpc]
 [<ffffffffa059e632>] ? exp_find_key+0x62/0xb0 [nfsd]
 [<ffffffffa059ee44>] ? exp_find+0x24/0x90 [nfsd]
 [<ffffffffa059f209>] ? rqst_exp_find+0x49/0xf0 [nfsd]
 [<ffffffffa0598cd6>] ? fh_verify+0x436/0x640 [nfsd]
 [<ffffffff814ec3c6>] ? printk+0x41/0x43
 [<ffffffffa05a3b1c>] ? nfsd3_proc_getattr+0x6c/0xe0 [nfsd]
 [<ffffffffa059543e>] ? nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa04f45a4>] ? svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
 [<ffffffffa04f4be0>] ? svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa0595b62>] ? nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa0595aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090726>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81090690>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
nfsd: Dropping request; may be revisited later
...
Pid: 8050, comm: nfsd Not tainted 2.6.32-220.4.2.4.el6.x86_64 #1
Call Trace:
 [<ffffffffa050391e>] ? svc_defer+0x9e/0x160 [sunrpc]
 [<ffffffffa04fec55>] ? cache_check+0x135/0x360 [sunrpc]
 [<ffffffffa059e632>] ? exp_find_key+0x62/0xb0 [nfsd]
 [<ffffffffa059ee44>] ? exp_find+0x24/0x90 [nfsd]
 [<ffffffffa059f209>] ? rqst_exp_find+0x49/0xf0 [nfsd]
 [<ffffffffa0598cd6>] ? fh_verify+0x436/0x640 [nfsd]
 [<ffffffffa05999b8>] ? nfsd_access+0x38/0x150 [nfsd]
 [<ffffffff814ec3c6>] ? printk+0x41/0x43
 [<ffffffffa05a3cc7>] ? nfsd3_proc_access+0x77/0xd0 [nfsd]
 [<ffffffffa059543e>] ? nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa04f45a4>] ? svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
 [<ffffffffa04f4be0>] ? svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa0595b62>] ? nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa0595aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81090726>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81090690>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
nfsd: Dropping request; may be revisited later

It seems that the export cache is the culprit. I actually found an easy
reproducer that triggers the bug on a simple NFS server:

1. Create the directories /export/test1 and /export/test2 on a server
2. Create /etc/exports with the following line

  /export/test1 *(rw,async,no_root_squash,no_subtree_check,fsid=99999)

3. Start the nfs service
4. Mount /export/test1 on a workstation
5. On the workstation run something like this:

   cd {mountpoint of server:/export/test1}
   while (true); do date ; ls ; touch x ; sleep 1;  done

6. On The server run

   for i in `seq 1 1000`; do mkdir -p /export/test2/$i ; exportfs -o rw,async,no_root_squash,no_subtree_check,fsid=19898$i @$i:/export/test2/$i ; done

The task on the workstation almost immediately freezes because accesses
to server:/export/test1 hang. On the server a

  nfsd: Dropping request; may be revisited later

message appears in dmesg.

Comment 3 Stefan Walter 2012-02-28 12:52:19 UTC

I have filed BZ798251 against F17 because the same bug is also present there.

Comment 5 Stefan Walter 2012-03-01 09:42:44 UTC

After some more testing and experimenting I found the bug in rpc.mountd!

The following part in the function nfsd_fh() in utils/mountd/cache.c is
causing our problems:

        if (use_ipaddr) {
                if (ai == NULL) {
                        struct addrinfo *tmp;
                        tmp = host_pton(dom);
                        if (tmp == NULL)
                                goto out;
                        ai = client_resolve(tmp->ai_addr);
                        freeaddrinfo(tmp);
                }
                if (!client_check(exp->m_client, ai))
                        continue;
        }
        /* It's a match !! */

'use_ipaddr' on our cluster and in the reproducer is initially 0 and becomes 1
as more exports are added. The code in the if clause tries to parse 'dom' (the
export target) as an IP address. host_pton() cannot parse export targets that
are netmasks, netgroups or '*' and fails. If rpc.mountd is run with '-F -dall'
it will print

  mountd: nfsd_fh: found (nil) path (null)

The quick fix is to ensure that use_ipaddr always stays 0. The case when
it is 1 is apparently not well tested anyway. The right thing would be to
restructure the code above to fail more gracefully for non-IP exports.

Comment 7 Stefan Walter 2012-03-01 12:35:56 UTC

I tried both fixes I suggested in my previous post. Setting use_ipaddr to 0
permanently works fine as far as I can see but makes rpc.mountd slow.

The other approach will IMO be tougher. The whole code that uses use_ipaddr
apparently does not really support that use_ipaddr changes between 0 and 1
while rpc.mountd runs.

Comment 8 J. Bruce Fields 2012-03-01 14:16:57 UTC

Good catch, but--there's something I don't quite understand: the cache_flush(1) in check_useipaddr should have flushed any exports using the use_ipaddr=0 format from the kernel's caches, so it should prevent any more upcalls for such a client.

Looking quickly at the net/sunrpc/cache.c:write_flush(), I'm not sure if we're guaranteed on return that no cache items are referenced--perhaps in-progress requests could keep them around?

If so, maybe mountd needs to be prepared to parse netgroups even in the use_ipaddr case?

Comment 9 Stefan Walter 2012-03-01 14:45:17 UTC

Another update: I modified the code in comment 5 to look like this:

        if (use_ipaddr) {
                if (ai == NULL) {
                        struct addrinfo *tmp;
                        tmp = host_pton(dom);
                        if (tmp != NULL) {
                                ai = client_resolve(tmp->ai_addr);
                                freeaddrinfo(tmp);
                        }
                }
                if (ai != NULL) {
                        if (!client_check(exp->m_client, ai))
                                continue;
                }
        }
        /* It's a match !! */

An rpc.mountd patched like this will work fine through the reproducer. But when
one mounts 'server:/export/test1' when all exports are there ('use_ipaddr' is
1) and then removes all exports with

  for i in `seq 1 1000`; do exportfs -u  @$i:/export/test2/$i; done

the client will get a stale NFS handle. The reason seems to be the
following if clause which becomes true (also in nfsd_fh()):

        if (!use_ipaddr && !client_member(dom, exp->m_client->m_hostname))
                continue;

Again, with '-F -dall' rpc.mountd will print:

  mountd: nfsd_fh: found (nil) path (null)

I haven't tried to find a fix for this one yet.

Comment 10 Steve Dickson 2012-03-02 14:02:14 UTC

(In reply to comment #9)
> Another update: I modified the code in comment 5 to look like this:
> 
>         if (use_ipaddr) {
>                 if (ai == NULL) {
>                         struct addrinfo *tmp;
>                         tmp = host_pton(dom);
>                         if (tmp != NULL) {
>                                 ai = client_resolve(tmp->ai_addr);
>                                 freeaddrinfo(tmp);
>                         }
>                 }
>                 if (ai != NULL) {
>                         if (!client_check(exp->m_client, ai))
>                                 continue;
>                 }
>         }
>         /* It's a match !! */
> 
> An rpc.mountd patched like this will work fine through the reproducer. 
Its unclear to me how this is working... all your doing is ignoring 
the host_pton() failure instead of erroring out.... 

My question is how is the dom string, which is passed up from the kernel, 
a netmask, netroup or '*' which causes host_pton() to fail?

With mountd debug enabled (aka -d all), what is the 'nfsd_fh: inbuf " debug 
output which happens at the beginning of nfsd_fh()?

Also its it valid for use_ipaddr to become non-zero in the first place? Do you
have a large amount of netgroup that are being parsed?

Comment 11 Stefan Walter 2012-03-02 14:58:24 UTC

(In reply to comment #10)
> (In reply to comment #9)
> > Another update: I modified the code in comment 5 to look like this:
> > 
> >         if (use_ipaddr) {
> >                 if (ai == NULL) {
> >                         struct addrinfo *tmp;
> >                         tmp = host_pton(dom);
> >                         if (tmp != NULL) {
> >                                 ai = client_resolve(tmp->ai_addr);
> >                                 freeaddrinfo(tmp);
> >                         }
> >                 }
> >                 if (ai != NULL) {
> >                         if (!client_check(exp->m_client, ai))
> >                                 continue;
> >                 }
> >         }
> >         /* It's a match !! */
> > 
> > An rpc.mountd patched like this will work fine through the reproducer. 
> Its unclear to me how this is working... all your doing is ignoring 
> the host_pton() failure instead of erroring out.... 
>
> My question is how is the dom string, which is passed up from the kernel, 
> a netmask, netroup or '*' which causes host_pton() to fail?

'dom' contains exactly the string of the export target, such as '*', '@123',
'123.45.67.89/16', which all fail to be parsed by host_pton(). I verified
that with printf().

The code above indeed does just continue because the verification cannot 
succeed. And I'm sure that there are security implications too - the client
should be verified at this point by some means. The code was intended
to show that even if you fix this right at that spot alone it will not be
enough and just uncover more bugs. The whole 'use_ipaddr mode' needs to be
revisited.
 
> With mountd debug enabled (aka -d all), what is the 'nfsd_fh: inbuf " debug 
> output which happens at the beginning of nfsd_fh()?
 
mountd: nfsd_fh: inbuf '* 1 \x9f860100'

> Also its it valid for use_ipaddr to become non-zero in the first place? Do you
> have a large amount of netgroup that are being parsed?

Yes, our entire NFS3 access management currently relies on netgroups.
When dumped in a file (from LDAP) /etc/netgroup is 1165 lines and 63Kb.
Apparently we are the only people who seriously use them. :)

An 'rpc.mountd' rigged with printf()s clearly shows me that use_ipaddr
starts with 0 and becomes 1. My stable fix so far is to modify
check_useipaddr() in utils/mountd/auth.c to ever only switch from -1 to 0,
but that has a performance penalty: a mount takes two seconds with
rpc.mountd and sssd_* totaling 100% load on a 1-cpu VM.

Comment 13 J. Bruce Fields 2012-04-16 21:10:00 UTC

check_useipaddr() is depending on a write to auth.unix.ip/flush to remove all the client names of the previous type from the auth.unix.ip cache before returning.

However, from inspection of the kernel code I don't believe it does that, for two reasons:
- cache_clean() appears to remove only entries with last_update time not later than the current second; entries created in the last second will remain.
- cache_clean() removes entries from the cache, but can't do anything about entries currently in use by nfsd threads.  Those nfsd threads may still do export upcalls using those entries.

That strikes me as a bug in the /flush interface.  Note exportfs also depends on it to remove old entries on changing the export table.

Possible fixes for the use_ipaddr bug:
- We could try to fix /flush.  I don't see how to do that yet.  Note that whatever solution we come with has to be careful about waiting on nfsd threads, since if those threads are themselves waiting on mountd, we could create a deadlock.
- As noted in comment 11, we could disable the use_ipaddr feature entirely in return for a severe performance regression for some setups.  I don't consider that acceptable.
- We could try to teach the export upcall to handle client names of either type.  (Is it possible to do that unambiguously?)

Comment 15 J. Bruce Fields 2012-04-20 15:06:40 UTC

Created attachment 579035 [details]
try resolving client as either ip address or list of "domains"

Apologies for the delay.  Here's an attempt at the third approach above.  Stefan Walter, would this be easy for you to test with your reproducer?  WARNING: it compiles, but I haven't yet tested it otherwise.

Comment 18 Stefan Walter 2012-04-23 13:39:50 UTC

I have built a custom nfs-util package that contains the patch from comment 15
and the patches from BZ 812364 and deployed that on a test system. Both test
cases from comment 2 and 9 work fine, there are no haning mounts and no ESTALE
errors.

I did however accidentally try to mount the test share with nfsvers=4 while
all 1000 additional exports from comment 2 were present. The mount command
hung and I could see the following with tcpdump:

15:34:42.496976 IP chibi.inf.ethz.ch.3345383294 > debu.inf.ethz.ch.nfs: 172 getattr fh 0,0/22
15:34:42.497013 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.vpps-qua: Flags [.], ack 819275482, win 5185, options [nop,nop,TS val 6371042 ecr 6397866], length 0
15:34:47.497550 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.3345383294: reply ok 44 getattr ERROR: Request couldn't be completed in time
15:34:47.498161 IP chibi.inf.ethz.ch.vpps-qua > debu.inf.ethz.ch.nfs: Flags [.], ack 48, win 279, options [nop,nop,TS val 6402867 ecr 6376043], length 0
15:35:02.497586 IP chibi.inf.ethz.ch.3362160510 > debu.inf.ethz.ch.nfs: 172 getattr fh 0,0/22
15:35:02.497631 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.vpps-qua: Flags [.], ack 177, win 5201, options [nop,nop,TS val 6391043 ecr 6417867], length 0
15:35:07.497453 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.3362160510: reply ok 44 getattr ERROR: Request couldn't be completed in time
15:35:07.497986 IP chibi.inf.ethz.ch.vpps-qua > debu.inf.ethz.ch.nfs: Flags [.], ack 96, win 279, options [nop,nop,TS val 6422867 ecr 6396043], length 0

rpc.mountd is not involved in this case (strace shows it's sleeping). I
guess the reproducer triggers a related bug in the nfs4 code in the kernel.

Comment 19 J. Bruce Fields 2012-04-23 14:00:09 UTC

(In reply to comment #18)
> I have built a custom nfs-util package that contains the patch from comment 15
> and the patches from BZ 812364 and deployed that on a test system. Both test
> cases from comment 2 and 9 work fine, there are no haning mounts and no ESTALE
> errors.

Thanks for confirming.  Is the performance penalty mentioned in comment 11 ("a mount takes two seconds with rpc.mountd and sssd_* totaling 100% load on a 1-cpu VM.") also gone?

> I did however accidentally try to mount the test share with nfsvers=4 while
> all 1000 additional exports from comment 2 were present. The mount command
> hung and I could see the following with tcpdump:
> 
> 15:34:42.496976 IP chibi.inf.ethz.ch.3345383294 > debu.inf.ethz.ch.nfs: 172
> getattr fh 0,0/22
> 15:34:42.497013 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.vpps-qua: Flags
> [.], ack 819275482, win 5185, options [nop,nop,TS val 6371042 ecr 6397866],
> length 0
> 15:34:47.497550 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.3345383294: reply
> ok 44 getattr ERROR: Request couldn't be completed in time
> 15:34:47.498161 IP chibi.inf.ethz.ch.vpps-qua > debu.inf.ethz.ch.nfs: Flags
> [.], ack 48, win 279, options [nop,nop,TS val 6402867 ecr 6376043], length 0
> 15:35:02.497586 IP chibi.inf.ethz.ch.3362160510 > debu.inf.ethz.ch.nfs: 172
> getattr fh 0,0/22
> 15:35:02.497631 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.vpps-qua: Flags
> [.], ack 177, win 5201, options [nop,nop,TS val 6391043 ecr 6417867], length 0
> 15:35:07.497453 IP debu.inf.ethz.ch.nfs > chibi.inf.ethz.ch.3362160510: reply
> ok 44 getattr ERROR: Request couldn't be completed in time
> 15:35:07.497986 IP chibi.inf.ethz.ch.vpps-qua > debu.inf.ethz.ch.nfs: Flags
> [.], ack 96, win 279, options [nop,nop,TS val 6422867 ecr 6396043], length 0
> 
> rpc.mountd is not involved in this case (strace shows it's sleeping). I
> guess the reproducer triggers a related bug in the nfs4 code in the kernel.

That doesn't look familiar.  But the tcpdump output isn't very informative.  Could you get a binary capture (tcpdump -s0 -wtmp.pcap) and attach it?  Given that mountd isn't involved, we may want to open a new bug.

Also, the contents of /proc/net/rpc/nfsd.fh/content and /proc/net/rpc/nfsd.export captured after the problem is seen might be interesting.

Comment 21 Stefan Walter 2012-04-24 07:53:54 UTC

(In reply to comment #19)

> Thanks for confirming.  Is the performance penalty mentioned in comment 11 ("a
> mount takes two seconds with rpc.mountd and sssd_* totaling 100% load on a
> 1-cpu VM.") also gone?

Yes, the mount succeeds immediately and no particular load is seen on the server.
 
> That doesn't look familiar.  But the tcpdump output isn't very informative. 
> Could you get a binary capture (tcpdump -s0 -wtmp.pcap) and attach it?  Given
> that mountd isn't involved, we may want to open a new bug.
> 
> Also, the contents of /proc/net/rpc/nfsd.fh/content and
> /proc/net/rpc/nfsd.export captured after the problem is seen might be
> interesting.

I will open a new case for that.

Comment 22 Stefan Walter 2012-04-24 08:37:54 UTC

After some more tests I noticed that the first mount of a share from a client is
still slow with high load on the server. Unmounting and mounting the same share
again is instantaneous. The authentication info seems to be cached. in this case.

Comment 23 J. Bruce Fields 2012-05-02 02:06:25 UTC

Created attachment 581507 [details]
make use_ipaddr clients self-describing

After further discussion upstream:

  http://thread.gmane.org/gmane.linux.nfs/48853

I think we have an improved fix.  Untested rhel6 backport attached.

Comment 24 Stefan Walter 2012-05-02 08:17:08 UTC

I have tested the new patch (together with the patches for BZ812364) and
it did not work. I also tried a modified patch with

+	if (dom[0] == '$') {
+		ai = lookup_client_addr(dom+1);

as the comment http://article.gmane.org/gmane.linux.nfs/49311 implies, same
result. The mount command on the client just hangs.

Here is what I get as debug output from 'rpc.mountd -F -d all' for a
hanging mount:

rpc.mountd: from_local: updating local if addr list
rpc.mountd: from_local: checked 6 local if addrs; incoming address not found
rpc.mountd: check_default: access by 129.132.10.7 ALLOWED
rpc.mountd: Received NULL request from 129.132.10.7
rpc.mountd: check_default: access by 129.132.10.7 ALLOWED (cached)
rpc.mountd: Received NULL request from 129.132.10.7
rpc.mountd: check_default: access by 129.132.10.7 ALLOWED (cached)
rpc.mountd: Received MNT3(/export/test1) request from 129.132.10.7
rpc.mountd: authenticated mount request from puchi.inf.ethz.ch:700 for /export/test1 (/export/test1)
rpc.mountd: nfsd_fh: inbuf '$129.132.10.7 1 \x9f860100'
rpc.mountd: nfsd_fh: found (nil) path (null)

Comment 25 Stefan Walter 2012-05-02 10:44:04 UTC

Sorry, I made a mistake when I tested the package with the modified patch.

The package built with the patch modified as follows works fine for both test
cases (mounting and ls after adding or removing 1000 exports):

-  ai = lookup_client_addr(dom);
+  ai = lookup_client_addr(dom+1);

So that looks like a good fix, thanks.

Comment 27 J. Bruce Fields 2012-05-02 22:20:34 UTC

(In reply to comment #25)
> The package built with the patch modified as follows works fine for both test
> cases (mounting and ls after adding or removing 1000 exports):
> 
> -  ai = lookup_client_addr(dom);
> +  ai = lookup_client_addr(dom+1);
> 
> So that looks like a good fix, thanks.

Thanks again for the testing.  The nfsd_export upcall also needs the same fix, so I've fixed the implementation of lookup_client_addr() rather than the caller.

At the same time I also noticed a pre-existing bug in the nfsd_export() call that will cause it to fail in the use_ipaddr case.  (Which *might* also explain the failure you were seeing the nfsv4 case.)

The version I've submitted upstream:

  http://thread.gmane.org/gmane.linux.nfs/49326

Comment 37 Steve Dickson 2012-07-24 19:40:24 UTC

Here are the tree upstream patches that fix this problem:

commit c2544b77566690ebec32a2d47c9249548b1a0941
Author: J. Bruce Fields <bfields>
Date:   Thu May 3 15:06:21 2012 -0400

    mountd: prepend '$' to make use_ipaddr clients self-describing

commit 236025253c6ec534cd816390464435ec18c499f7
Author: J. Bruce Fields <bfields>
Date:   Thu May 3 15:01:20 2012 -0400

    mountd: add trivial helpers for client-matching

commit 005c49a063544b4005b5b7c3c28ed48c197f66ac
Author: J. Bruce Fields <bfields>
Date:   Thu May 3 14:59:27 2012 -0400

    mountd: parse ip address earlier

Comment 44 Petr Beňas 2012-10-17 12:52:40 UTC

Reproduced in nfs-utils-1.2.3-28.el6.x86_64 and verified in nfs-utils-1.2.3-29.el6.x86_64.

Comment 45 Steve Dickson 2012-11-08 16:07:18 UTC

*** Bug 798251 has been marked as a duplicate of this bug. ***

Comment 46 yanfu,wang 2012-11-27 03:46:03 UTC

re-assign the bug since there's a regression introduced, pls check 
https://bugzilla.redhat.com/show_bug.cgi?id=876847#c61 to https://bugzilla.redhat.com/show_bug.cgi?id=876847#c71

Comment 48 yanfu,wang 2012-12-03 02:59:39 UTC

hi Petr,
Since the bug re-fix and you are the qa contact, could you help to re-test it again? thanks.

Comment 50 Petr Beňas 2012-12-03 12:00:27 UTC

Verified in nfs-utils-1.2.3-35.el6.x86_64.

Comment 57 errata-xmlrpc 2013-02-21 10:55:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0468.html

Note You need to log in before you can comment on or make changes to this bug.