Bug 232401

Summary: rpc.idmapd can be DOS'ed.
Product: [Fedora] Fedora Reporter: Pawel Salek <pawsa>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 6CC: triage
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Fedora 8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-04 12:42:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pawel Salek 2007-03-15 08:41:31 UTC
Description of problem:
NFS4 server on FC6/x86_64 (dual core) can be DOS'ed by an older 32-bit client
(approximately kernel-smp-2.6.9/nfs-utils-1.0.6-70). rpc.idmapd logs an error
and stops responding, leading to entire NFS4 service being dead for all the
clients. I could trigger the bug reproducibly by running rsync on the client,
syncing local disk to an initially empty NFS4-mounted directory. Restarting
rpc.idmapd brings the service alive only to the next execution of rsync.

Version-Release number of selected component (if applicable):
nfs-utils-1.0.10-5.fc6
kernel-2.6.19-1.2911.6.5.fc6

How reproducible:
Always, given the data.

Steps to Reproduce:
1. rsync -av user /net/server/home
  
Actual results:
The NFS4 server logs:
Mar 14 11:32:21 server rpc.idmapd[4036]: nfsdcb: id '-2' too big!
Mar 14 11:32:35 server kernel: nfs4_cb: server another_client_ip/server_ip not
responding, timed out

The NFS4 client logs:
Mar 14 11:32:52 client kernel: NFS: v4 server returned a bad sequence-id error!
Mar 14 11:32:54 client kernel: decode_getfattr: xdr error 10008!
(the clocks on the client and server might have been few seconds off).

The server ultimately hangs after few restarts of rpc.idmapd with:
kernel: BUG: soft lockup detected on CPU#1!
kernel:
kernel: Call Trace:
kernel:  [<ffffffff8026999a>] show_trace+0x34/0x47
kernel:  [<ffffffff802699bf>] dump_stack+0x12/0x17
kernel:  [<ffffffff802b6d9b>] softlockup_tick+0xdb/0xf6
kernel:  [<ffffffff80293cdd>] update_process_times+0x42/0x68
kernel:  [<ffffffff802749e7>] smp_local_timer_interrupt+0x34/0x55
kernel:  [<ffffffff8027509b>] smp_apic_timer_interrupt+0x51/0x69
kernel:  [<ffffffff8025ccf6>] apic_timer_interrupt+0x66/0x70
kernel:  [<ffffffff8826187a>] :sunrpc:svc_close_socket+0xa/0xa9
kernel:  [<ffffffff882603bd>] :sunrpc:svc_destroy+0x67/0xc4
kernel:  [<ffffffff882ff901>] :nfsd:nfsd+0x29e/0x2b1
kernel:  [<ffffffff8025ced8>] child_rip+0xa/0x12

Expected results:
This should never happen.

Comment 1 Pawel Salek 2007-03-23 09:50:33 UTC
For the record, I have not seen the problem since I upgraded to
kernel-2.6.20-1.2925.fc6

Occassionally, I get following warnings instead:
nfs4_cb: server 64bit_client_ip/server_ip�����.gnu.linkonce.this_module not
responding, timed out
- Yes, these strange (0xff) characters are there! I think there is some mistake
in the format...
On the 64-bit client, I also got once:
VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
after automount unmounted the file system.

Comment 2 Pawel Salek 2007-04-03 08:36:16 UTC
BTW, I have scanned the bug list for duplicates. Bug 225507 appears to be similar.

Comment 3 Pawel Salek 2007-08-16 09:54:20 UTC
FWIW, I haven't seen any of these recently, after a number of kernel upgrades.
Running 2.6.20-1.2962.fc6 now.

Comment 4 Bug Zapper 2008-04-04 06:32:01 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers