Description of problem: We have nfsv4+krb5 server and client both running 2.6.18-164.9.1.el5 kernels. We are seeing periodic 1-2 minute hangs of the client systems with the following being logged at the time of the hang: CLIENT: Dec 28 09:28:30 kernel: decode_op_hdr: reply buffer overflowed in line 2121. SERVER: Dec 28 09:27:28 kernel: nfs4_cb: server N.N.N.N not responding, timed out Notes: 1) N.N.N.N is the IP address of the client 2) The timestamps show the client error being logged 1 minute after the server error. My observations have been that the hang is typically 2 minutes long with the following approximate chronology: t=0min: The hang starts t=1min: The sever complains with the nfs4_cb error above t=2min: The client complains with the decode_op_hdr error and then recovers. After 1-2 minutes, things recover and operate normally again. How reproducible: The problem is pretty variable but I can usually reproduce it immediately after a system reboot but it is hard to reproduce after it happens the first time. I don't have a feel for whether it only happens once after a reboot or, perhaps, once for each user. But, once a user sees this, they may not see it again (or only rarely) as long as they stay logged in. And, it almost always happens when doing something with firefox which makes me suspicious of something related to file locking (perhaps with byte range locking???). That's just a WAG since we've seen issues related to byte range locking when using cifs-mounted homedirs that only affected firefox. But, it does seem that there is something that firefox is doing that triggers the hangs. Steps to Reproduce: 1. reboot 2. login via the gui/gnome on a client system using nfsv4+krb5 3. start firefox. Actual results: The system will likely hang for 1-2 minutes shortly after starting firefox. Expected results: No hangs. Additional info: I should note that we have been seeing this problem ever since we moved from nfsv3 to nfsv4+krb5 last summer so the problem existed as far back as the last of the RHEL5.3 kernel and continues through the newly released 2.6.18-164.9.1 kernel.
Is there any firewalling between the client and server here?
No, neither client nor server runs iptables (or any other type of firewall) and both are on the same subnet.
Ok... The decode_op_hdr message means that there was a problem decoding the message from the server. That probably means that the server is sending a malformed reply here. I sort of suspect that it might be empty... What might be helpful is to get a wire capture of one of these events so we can see what's happening there.
Created attachment 381769 [details] tcpdump output The attached is the result of running "tcpdump -s0 -w nfsv4.pcap host 129.79.247.1" on the client (129.79.246.2) during a login and hanging firefox. The ip address of the server is 129.79.247.1. The corresponding error messages and timestamps were: server: 09:06:29 129.79.247.1 kernel: nfs4_cb: server 129.79.246.2 not responding, timed out client: 09:07:31 129.79.246.2 kernel: decode_op_hdr: reply buffer overflowed in line 2121. I believe the start time of firefox was about 9:05:36 and it hung very shortly thereafter. It came back to life right about 9:07:30.
Around frame 6422, the server starts sending back NFS4ERR_DELAY on COMMIT calls. This is probably the source of the stalls. I suspect that the server needs to recall a delegation, but there's a problem talking to the callback channel. In fact, the renew replies in frames 6581 and 6584 return an error of NFS4ERR_CB_PATH_DOWN. I see the server sending callbacks to the socket on port 37337 at the time of the NFS4ERR_DELAY errors, but the client never seems to do a DELEGRETURN until much later. I suspect that means that the callback is malformed or something... Unfortunately, wireshark is really bad at decoding these callbacks. I might need to fix that before we can diagnose this.
As luck would have it, someone added support to wireshark for this a week or so ago: https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=4446 ...I applied the patch and built a new wireshark, and now I can decode callbacks. Good news/bad news: The good news is that now I can reasonably look at the callback data The bad news is that I don't see any real problems with it ...all of the callback traffic looks fine. The client gets the callbacks in frames 6423 and 6449 and ACKs them. There is then a long delay (with no retransmissions AFAICT) and then eventually replies to the calls in frames 6593 and 6594. The interesting bit is that all of that happens a few seconds *before* that client side message got logged. Are the clocks all synchronized here?
We are running ntpd on all of our systems and really don't think clocks should be out of synch. I just checked the time on the ~200 rhel systems we manage and they were all within a few milliseconds of the time servers so definitely no rampant clock skew. I'm curious if you are asking because you are wondering if the hangs might be related to a clock synch problem or just that the logging might be out of synch with the actual events?
The latter... I'm just trying to ascertain whether the timestamps in the logs and the captures should all line up. It sounds like they should...
I'm seeing this on a 2.6.18-190.el5 kernel that I installed to fix https://bugzilla.redhat.com/show_bug.cgi?id=529227 what happens is that the machine occasionally still hangs (differently from 529227) and after a reboot this message gets logged. Is there any kind of test I can do to help?
Ok, looking at the capture a little deeper, it looks like there may be something of a chicken and egg problem here... 6421 41.223408 129.79.246.2 -> 129.79.247.1 NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR 6422 41.223633 129.79.247.1 -> 129.79.246.2 NFS V4 COMP Reply (Call In 6421) <EMPTY> PUTFH;COMMIT COMMIT(10008) ...prior to the above COMMIT call, there was a large flush of WRITEs. The WRITEs were all ok, but the COMMIT returns NFS4ERR_DELAY. The filehandle for the COMMIT is 0x8bd720c1. 6423 41.223659 129.79.247.1 -> 129.79.246.2 NFS V1 CB_COMPOUND Call <EMPTY> CB_RECALL ...just after the server returns the NFS4ERR_DELAY call, it recalls the delegation for filehandle 0x8bd720c1 (same one the client is trying to do the COMMIT call on). 6426 56.223098 129.79.246.2 -> 129.79.247.1 NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR 6427 56.223317 129.79.247.1 -> 129.79.246.2 NFS V4 COMP Reply (Call In 6426) <EMPTY> PUTFH;COMMIT COMMIT(10008) ...of course, before the client can return the delegation it needs to finish writing back the dirty data and that entails finishing the COMMIT. So we seem to have a problem -- the server won't satisfy the COMMIT call that the client needs, probably until the delegation is returned. The client can't return the delegation until the COMMIT succeeds. This back and forth continues for a while. 6580 135.035487 129.79.246.2 -> 129.79.247.1 NFS V4 COMP Call <EMPTY> RENEW RENEW 6581 135.035679 129.79.247.1 -> 129.79.246.2 NFS V4 COMP Reply (Call In 6580) <EMPTY> RENEW RENEW(10048) ...eventually, the client tries to RENEW its clientid, but since the server isn't getting a response it returns an NFS4ERR_CB_PATH_DOWN. The client then reattempts the COMMIT after that, and it succeeds. 6589 146.221255 129.79.246.2 -> 129.79.247.1 NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR 6590 146.221488 129.79.247.1 -> 129.79.246.2 NFS V4 COMP Reply (Call In 6589) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR ...since the COMMIT succeeds, the client tries to return the delegation: 6593 146.221669 129.79.246.2 -> 129.79.247.1 NFS V1 CB_COMPOUND Reply (Call In 6423) <EMPTY> CB_RECALL 6594 146.221697 129.79.246.2 -> 129.79.247.1 NFS V1 CB_COMPOUND Reply (Call In 6449) <EMPTY> CB_RECALL 6595 146.221775 129.79.246.2 -> 129.79.247.1 NFS V4 COMP Call <EMPTY> PUTFH;DELEGRETURN DELEGRETURN;GETATTR GETATTR 6600 146.222037 129.79.247.1 -> 129.79.246.2 NFS V4 COMP Reply (Call In 6595) <EMPTY> PUTFH;DELEGRETURN DELEGRETURN;GETATTR GETATTR 6601 146.222050 129.79.247.1 -> 129.79.246.2 NFS V4 COMP Reply (Call In 6598) <EMPTY> PUTFH;DELEGRETURN DELEGRETURN(10025) ...but the server returns NFS4ERR_BAD_STATEID, I think since the RENEW wasn't successful. As best I can tell, the problem here is likely the server's fault. It should be allowing the COMMIT to proceed, but maybe there was something wrong with the COMMIT call? I'll need to look more closely at the wire traffic and the NFSv4 RFC.
Actually, correction...the first DELEGRETURN succeeds. The second gets the error because the first one has already caused the stateid to be invalidated.
Delegations are handled via leases, and a recall occurs when the lease is broken. The nfsd_commit however does a nfsd_open and that causes a break_lease call. There's not much that can be done there though -- a COMMIT call on the wire doesn't provide any open stateid info, just a filehandle. So the server has no way to know if it is a fsync() call on an open file descriptor on the client, or some random person on the box ran /bin/sync. From what I can tell too, upstream code is not significantly different in this regard, so I expect that upstream kernels probably have a similar issue.
I am seeing this regularly (multiple times a day) in the following setup: Solaris 10 NFS server Centos 5.4 Linux client relevant /etc/fstab line: server:/storage nfs4 bg,intr,nosuid,actimeo=2,rsize=8192,wsize=8192 the low actimeo it's because the server changes metadata that the clients must see asap.
Cc'ing Bruce to see if he has any thoughts. Bruce, any thoughts on the analysis in comment #10? It seems like the client is doing the right thing here and is trying to issue writeback before returning the delegation. Shouldn't the server be allowing that to continue?
(In reply to comment #13) > I am seeing this regularly (multiple times a day) in the following setup: > > Solaris 10 NFS server > Centos 5.4 Linux client > relevant /etc/fstab line: > server:/storage nfs4 bg,intr,nosuid,actimeo=2,rsize=8192,wsize=8192 > > the low actimeo it's because the server changes metadata that the clients must > see asap. I think you need to proceed carefully here and not assume that this problem is the same as the one you're seeing. You'll need to do some analysis to determine that. If you're not comfortable doing that on your own, then you may want to open a support case with RH support and have them help you with it.
In reply to #14: The code looks like it is vulnerable to the deadlock you describe. Coincidentally, this came up recently: http://marc.info/?l=linux-nfs&m=126480953224756&w=2 We could use a variant of nfsd_open that omits the lease_break(). Or we might just skip the open entirely and pass NULL for the file descriptor.
The open for the COMMIT case doesn't seem to buy much. From vfs_fsync_range: -----------[snip]----------- if (file) { mapping = file->f_mapping; fop = file->f_op; } else { mapping = dentry->d_inode->i_mapping; fop = dentry->d_inode->i_fop; } -----------[snip]----------- RFC3530 is definitely not clear on whether you need to recall a delegation on COMMIT. In most cases, if someone is issuing a COMMIT then they are likely the holder of the delegation. There are probably scenarios where that's not the case, but in those situations would it be harmful to service a commit while someone else holds the delegation? It doesn't seem like it...
Agreed. There may have been some other reason for wanting a file descriptor there that I'm forgetting, but if so hopefully posting a patch will be enough to jog someone's memory....
Hello guys, I discovered one way to reproduce "decode_op_hdr: reply buffer overflowed in line 2121". Using LTP test case ltp-full-20100228/testcases/network/nfs/nfslock01. Steps: copy nfslock01 programs to /nfs cd /nfs mkdir test /nfs/nfs_flock_dgen /nfs/test/nfs_flock_data 63 16384 0 cp /nfs/test/nfs_flock_data /nfs/test/nfs_flock_idata /nfs/nfs_flock_frk /nfs/nfs_flock /nfs/test/nfs_flock_idata CTRL+C rm -rf test <- This does not work, but it's another lock problem Kill connection from nfs server, in my case EMC Celerra. rm -rf test <- Now works mkdir test Best regards
(In reply to comment #15) > I think you need to proceed carefully here and not assume that this problem is > the same as the one you're seeing. You'll need to do some analysis to determine > that you're right, my main problem is https://bugzilla.redhat.com/show_bug.cgi?id=526888 but I am also seeing this. I think they are related. (In reply to comment #19) > I discovered one way to reproduce "decode_op_hdr: reply buffer overflowed in > line 2121". Using LTP test case > ltp-full-20100228/testcases/network/nfs/nfslock01. > > Steps: where does the "buffer overflowed" should happen?
(In reply to comment #20) > (In reply to comment #19) > > I discovered one way to reproduce "decode_op_hdr: reply buffer overflowed in > > line 2121". Using LTP test case > > ltp-full-20100228/testcases/network/nfs/nfslock01. > > > > Steps: > > where does the "buffer overflowed" should happen? Last step. "mkdir test". Best regards
Ok, I think we're potentially looking at multiple problems here. The problem in the capture that Rob provided, I think we understand now. It's a deadlock of sorts and will likely be fixed by a server patch that makes the server not open the file before calling fsync and doing the commit. I've pushed a patch upstream to fix that and am awaiting feedback on it. Backporting that patch to RHEL5 is a little more work, but should be doable. The other problem that causes the decode_op_hdr message to pop seems like something else entirely. There's no correlation in the message popping and COMMIT calls in the capture as far as I can tell. I think we'll have to do some investigation to track that down. What we'll probably have to do is roll a patch to instrument the kernel such that it gives us more info about what's causing that message to pop.
Jeff: I'm willing to test that instrumented patch.
Created attachment 401230 [details] patch -- nfsd shouldn't break lease when servicing a COMMIT I think this patch should help Rob's issue. It's similar in concept to a patch that I've just pushed upstream. It's a server side patch and doesn't affect client behavior at all. Rob, would you be able to test this out someplace non-critical and let me know if it helps?
Created attachment 401245 [details] client side debug patch We'll probably need to eventually split this bug into two. I think the problem with the decode_op_hdr messages is probably separate from the COMMIT deadlock issue. For now though, we'll keep the bugs together until we know more. This patch changes the spot where we seem to see these messages pop. It makes it dump a bit more info so that we know what operation it expected to decode at the time, and does a dump_stack() too. There's no ratelimiting in this patch, so you may get quite a bit of ringbuffer spam. The ideal thing would be to run whatever testcase you have that causes these messages to pop. Once it does, attach the log messages that this patch generates to the bug. It would also be helpful to sniff traffic at the same time so that we can try to correlate these events with actual traffic on the wire.
Simone, if you can reproduce this problem on a kernel with the patch in comment #25, it would be helpful.
(In reply to comment #25) Jeff, I have tested your patch with my testcase, look: decode_op_hdr: reply buffer overflowed, expected = 9 Call Trace: [<ffffffff8844088a>] :nfs:decode_op_hdr+0x2c/0x61 [<ffffffff884419f6>] :nfs:decode_getfattr+0x2b/0xc5b [<ffffffff88442690>] :nfs:nfs4_xdr_dec_delegreturn+0x0/0x69 [<ffffffff8837dca0>] :sunrpc:rpc_async_schedule+0x0/0x5 [<ffffffff884426ee>] :nfs:nfs4_xdr_dec_delegreturn+0x5e/0x69 [<ffffffff88378915>] :sunrpc:call_decode+0x4bf/0x511 [<ffffffff8837dae4>] :sunrpc:__rpc_execute+0x92/0x24e [<ffffffff8004dc37>] run_workqueue+0x94/0xe4 [<ffffffff8004a472>] worker_thread+0x0/0x122 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff8004a562>] worker_thread+0xf0/0x122 [<ffffffff8008e16d>] default_wake_function+0x0/0xe [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032bdc>] kthread+0xfe/0x132 [<ffffffff8005efb1>] child_rip+0xa/0x11 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032ade>] kthread+0x0/0x132 [<ffffffff8005efa7>] child_rip+0x0/0x11 decode_op_hdr: reply buffer overflowed, expected = 9 Call Trace: [<ffffffff8844088a>] :nfs:decode_op_hdr+0x2c/0x61 [<ffffffff884419f6>] :nfs:decode_getfattr+0x2b/0xc5b [<ffffffff88442690>] :nfs:nfs4_xdr_dec_delegreturn+0x0/0x69 [<ffffffff8837dca0>] :sunrpc:rpc_async_schedule+0x0/0x5 [<ffffffff884426ee>] :nfs:nfs4_xdr_dec_delegreturn+0x5e/0x69 [<ffffffff88378915>] :sunrpc:call_decode+0x4bf/0x511 [<ffffffff8837dae4>] :sunrpc:__rpc_execute+0x92/0x24e [<ffffffff8004dc37>] run_workqueue+0x94/0xe4 [<ffffffff8004a472>] worker_thread+0x0/0x122 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff8004a562>] worker_thread+0xf0/0x122 [<ffffffff8008e16d>] default_wake_function+0x0/0xe [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032bdc>] kthread+0xfe/0x132 [<ffffffff8005efb1>] child_rip+0xa/0x11 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032ade>] kthread+0x0/0x132 [<ffffffff8005efa7>] child_rip+0x0/0x1 Best regards
Ok, so in both cases, the issue was in response to a GETATTR call: OP_GETATTR = 9 ...but apparently the packet was malformed. Did you happen to get a wire capture while reproducing this? It would be helpful to know what the response from the server looked like. What we'll probably need to do is roll up a more extensive debug patch that makes it print out the xid in the RPC packet when this problem occurs. That might also help track down the packets that are causing this message to pop. One thing I've noticed in the capture provided by Rob is some strange RPC replies coming from the server. Those seem to be for program 0 and wireshark has a hard time decoding them. In most cases, I don't see any request that matches the XID, but I wonder whether occasionally those strange packets happen to match XID's that are in flight.
Created attachment 401309 [details] tcpdump Jeff, here is the tcpdump of this problem. Regards
Created attachment 401320 [details] patch -- don't try to decode a GETATTR if DELEGRETURN failed Thanks, Marcus. I think I see the problem. nfs4_xdr_dec_delegreturn has this: -------------[snip]---------------- status = decode_putfh(&xdr); if (status != 0) goto out; status = decode_delegreturn(&xdr); decode_getfattr(&xdr, res->fattr, res->server, !RPC_IS_ASYNC(rqstp->rq_task)); -------------[snip]---------------- ...if the DELEGRETURN returns an error (as the capture shows) then there won't be a GETATTR call in the response to decode, and the printk will pop. This is an upstream bug too, but it looks like the printk doesn't actually pop there. In truth, I think this is really a cosmetic fix. The error from decode_getfattr is ignored so the only real effect of the bug is printk spam. Regardless, we might as well fix it. Can you test the patch and let me know if it silences the printk's?
From Marcus' capture, it looks like there were some issues with locking that caused triggered some NFS4ERR_DELAY returns from the server. I suspect that that is really what you'll be interested in fixing. That's a much different problem from what we see in Rob's capture however, so I suggest opening a new ticket if you're still having trouble with NFSv4 after applying that patch.
(In reply to comment #31) > From Marcus' capture, it looks like there were some issues with locking that > caused triggered some NFS4ERR_DELAY returns from the server. I suspect that > that is really what you'll be interested in fixing. That's a much different > problem from what we see in Rob's capture however, so I suggest opening a new > ticket if you're still having trouble with NFSv4 after applying that patch. You are right Jeff, printk problems gone and it's important to not confuse with another problem. Now locking issues are reported in BZ575505. Thanks
Thanks Marcus. I've sent a patch upstream since this bug still exists there. The printk is now a dprintk there, but it's still something that ought to be fixed.
I've cloned this bug to bug 575817. This bug is henceforth going to be used to track the bug that causes the decode_op_hdr printk to pop. The new bug will be used to track the deadlock that's evident in Rob's capture.
Also, Trond has committed the patch upstream and it should make 2.6.35 (maybe 2.6.34, but I'm not counting on that). I intend to propose this patch for 5.6.
Thanks, Jeff. After applying the "nfsd: don't break lease while servicing a COMMIT call" patch to the stock rhel5.4 kernel sources, it fails to build with: fs/nfsd/vfs.c: In function 'nfsd_open': fs/nfsd/vfs.c:698: error: called object 'break_lease' is not a function Is there a bool/function name conflict with "break_lease"?
Created attachment 401870 [details] patch -- nfsd shouldn't break lease when servicing a COMMIT (try #2) My mistake, I must have attached an earlier version of the patch by mistake. This one should be correct.
I've put the patch to fix this spurious printk into my test kernels. Please test them someplace non-critical and report back as to whether they fix the problem. http://people.redhat.com/jlayton/
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Hello. I'm trying kernel-2.6.18-200.el5 but here's what happened on one of the 2 machines I'm testing it onto: Jun 4 09:18:07 fe01 kernel: NFS: v4 server returned a bad sequence-id error! Jun 4 10:18:54 fe01 kernel: INFO: task java:28498 blocked for more than 120 seconds. Jun 4 10:18:54 fe01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 4 10:18:54 fe01 kernel: java D 000036AF 1892 28498 1 28499 28497 (NOTLB) Jun 4 10:18:54 fe01 kernel: d66e2e84 00000086 4f174382 000036af f0ac4ac0 f8c4e734 00001000 00000008 Jun 4 10:18:54 fe01 kernel: d58ddaa0 4f4207d1 000036af 002ac44f 00000000 d58ddbac c1c09900 f66d9200 Jun 4 10:18:54 fe01 kernel: d66e2e90 00000000 0000000f d66e2e90 00aafb06 c042c9d5 f65b784c ffffffff Jun 4 10:18:54 fe01 kernel: Call Trace: Jun 4 10:18:54 fe01 kernel: [<f8c4e734>] nfs_pagein_one+0x224/0x243 [nfs] Jun 4 10:18:54 fe01 kernel: [<c042c9d5>] getnstimeofday+0x30/0xb6 Jun 4 10:18:54 fe01 kernel: [<c061d9d6>] io_schedule+0x36/0x59 Jun 4 10:18:54 fe01 kernel: [<c0459568>] sync_page+0x0/0x3b Jun 4 10:18:54 fe01 kernel: [<c04595a0>] sync_page+0x38/0x3b Jun 4 10:18:54 fe01 kernel: [<c061dae8>] __wait_on_bit_lock+0x2a/0x52 Jun 4 10:18:54 fe01 kernel: [<c04594e3>] __lock_page+0x52/0x59 Jun 4 10:18:54 fe01 kernel: [<c04362b8>] wake_bit_function+0x0/0x3c Jun 4 10:18:54 fe01 kernel: [<c045bf27>] filemap_nopage+0x265/0x34a Jun 4 10:18:54 fe01 kernel: [<c046527f>] __handle_mm_fault+0x178/0xa25 Jun 4 10:18:54 fe01 kernel: [<c047efbf>] sys_stat64+0x1e/0x23 Jun 4 10:18:54 fe01 kernel: [<c061fba6>] do_page_fault+0x23a/0x52d Jun 4 10:18:54 fe01 kernel: [<c061f96c>] do_page_fault+0x0/0x52d Jun 4 10:18:54 fe01 kernel: [<c0405a89>] error_code+0x39/0x40 Jun 4 10:18:54 fe01 kernel: ======================= the java process is a tomcat, and this is a dev server, so not much heavy traffic. the machine ended up in the usual molasses (https://bugzilla.redhat.com/show_bug.cgi?id=526888) and I had to reboot it the hard way.
hi, I'm not clear about the description of problem: "login via the gui/gnome on a client system using nfsv4+krb5"... if it need to exports server's homedir to client, in order to let client mount on using nfsv4+krb5? and I used comment #19 to try to trigger "buffer overflowed" but failed, pls refer to my steps: 1. use ltp-full-20100228.tgz. 2. setup rhost server, start nfs service on it: # cat /etc/exports /exports *(rw,fsid=0) 3. mkdir /nfs on client and mount server's /exports dir: # mount -t nfs4 10.16.45.59:/ /nfs 4. follow below steps: copy nfslock01 programs to /nfs cd /nfs mkdir test /nfs/nfs_flock_dgen /nfs/test/nfs_flock_data 63 16384 0 cp /nfs/test/nfs_flock_data /nfs/test/nfs_flock_idata /nfs/nfs_flock_frk /nfs/nfs_flock /nfs/test/nfs_flock_idata CTRL+C rm -rf test <- this is work for me, don't need to kill connection from nfs server mkdir test 5. mkdir test is ok, check log and not found ""buffer overflowed" yet. so could someone give me some detail about how to reproduce the bug?
my server and client both are rhel5.4 # uname -a Linux dell-per610-02.lab.bos.redhat.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
hi Jeff, could you help to look at comment #45 and tell me how to reproduce it? thanks.
I wish I could. I was never able to successfully reproduce this.
verified the bug as SanityOnly.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html