Bug 195874
Summary: | lockd ignores requests | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Garth Mollett <gmollett> |
Component: | kernel | Assignee: | Steve Dickson <steved> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | davej, triage, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | OldNeedsRetesting bzcl34nup | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-05-06 16:01:13 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Garth Mollett
2006-06-19 04:13:27 UTC
Sorry that kernel stack trace is incorrect. This is the correct one: lockd S 00000495 2400 2554 1 2559 2552 (L-TLB) f5a78f40 00000002 f5a78f2c 00000495 c039b700 000000d0 e5e96e00 003d08ea 00000206 00000002 00000044 00000000 f696acd8 f696abb0 f7e00af0 c24265e0 e5e96e00 003d08ea 00000002 f5a78000 f4c26c40 f4c27d00 f4c26c40 c02cbad7 Call Trace: [<c02cbad7>] skb_recv_datagram+0x120/0x229 [<c03300b4>] schedule_timeout+0xa7/0xd7 [<c0330e0e>] _spin_lock_irqsave+0x9/0xd [<c013891b>] add_wait_queue+0x13/0x94 [<f8ddccb3>] svc_sock_release+0xd1/0x168 [sunrpc] [<f8ddd203>] svc_recv+0x384/0x586 [sunrpc] [<f8ddbb0f>] svc_process+0x3d4/0x665 [sunrpc] [<c011fe6d>] default_wake_function+0x0/0xc [<f8dc06dc>] lockd+0xfd/0x253 [lockd] [<f8dc05df>] lockd+0x0/0x253 [lockd] [<c0102005>] kernel_thread_helper+0x5/0xb And the kernel is now: Linux entei 2.6.16-1.2115_FC4smp #1 SMP Mon Jun 5 15:01:58 EDT 2006 i686 i686 i386 GNU/Linux And as this appears to be getting "stuck" in skb_recv_datagram it might appear to be a network card related issue (maybe. heh) so the card is an intel e1000. Creating a server on a random udp port with netcat and communictaing with it using frames of various sizes from 192.168.0.20 works fine as do other udp based services (nfs,dns). Problem only seems to occur with lockd and only from host not in the same segment. does this still happen on the 2.6.17 based update out last week ? (In reply to comment #3) > does this still happen on the 2.6.17 based update out last week ? Yes it does. Another interesting note, it only seems to happen with a mtu above 8000. The network is usually 9000, but setting the mtu to <= 8000 on this and all other nodes on this segment seems to work as workaround. Note that all other coms are fine with the mtu > 8000, including other rpc based services, only lockd appears to have issues. Also the packets that get "ignored" by lockd are usually no bigger than 256 or so bytes (as can be seen from the tcpdump). Hope that helps. out of curiousity, does it go back to normal if you do .. echo 0 > /proc/sys/net/ipv4/tcp_window_scaling (as root) I'm not sure how tcp window scaling could be related when we're talking udp here? I can try it out for you if you want but theese are production machines so I will have to schedule a time todo so. I very much doubt this a network issue (ie packet loss due to inconsistent mtu's or anything of that nature). Is there any output when an NLM debugging is turned on when the following is done "echo 2 > /proc/sys/sunrpc/nlm_debug" (In reply to comment #5) > out of curiousity, does it go back to normal if you do .. > > echo 0 > /proc/sys/net/ipv4/tcp_window_scaling > > (as root) As expected, no change. (In reply to comment #7) > Is there any output when an NLM debugging is turned on when > the following is done "echo 2 > /proc/sys/sunrpc/nlm_debug" Yes, but nothing really usefull or unexpected. Without the workaround, calling fcntl() from the client we see the following in the client logs: Aug 14 19:10:34 client kernel: lockd: call procedure 2 on server_ip Aug 14 19:11:44 client kernel: lockd: server server_ip not responding, timed out Aug 14 19:11:44 client kernel: lockd: rpc_call returned error 5 Aug 14 19:11:44 client kernel: lockd: clnt proc returns -5 And nothing on the server (although the packets can be seen in tcpdump). Enabling the workaround (dropping the MTU to 8000) fcntl() will finish and we see the following: Aug 14 19:13:44 client kernel: lockd: call procedure 2 on server_ip Aug 14 19:13:44 client kernel: lockd: server returns status 0 Aug 14 19:13:44 client kernel: lockd: clnt proc returns 0 Aug 14 19:13:44 server kernel: lockd: LOCK called Aug 14 19:13:44 server kernel: lockd: LOCK status 0 Aug 14 19:13:46 server kernel: lockd: UNLOCK called Aug 14 19:13:46 server kernel: lockd: UNLOCK status 0 [This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. Please don't close this bug. I will try and get a test environment setup (unless someone else can?), so I can retest, obviously I can't just upgrade a bunch of production servers to FC5, plans are in place todo the upgrade but we have no strict timeline untill futher testing is done. It's very unlikely that we will be able to have the test environment up and running and everything tested within 2weeks though. Thanks. (In reply to comment #11) > A new kernel update has been released (Version: 2.6.18-1.2200.fc5) > based upon a new upstream kernel release. > Please retest against this new kernel, as a large number of patches > go into each upstream release, possibly including changes that > may address this problem. > This bug has been placed in NEEDINFO state. > Due to the large volume of inactive bugs in bugzilla, if this bug is > still in this state in two weeks time, it will be closed. > Should this bug still be relevant after this period, the reporter > can reopen the bug at any time. Any other users on the Cc: list > of this bug can request that the bug be reopened by adding a > comment to the bug. > In the last few updates, some users upgrading from FC4->FC5 > have reported that installing a kernel update has left their > systems unbootable. If you have been affected by this problem > please check you only have one version of device-mapper & lvm2 > installed. See bug 207474 for further details. > If this bug is a problem preventing you from installing the > release this version is filed against, please see FCMETA_INSTALL. > If this bug has been fixed, but you are now experiencing a different > problem, please file a separate bug for the new problem. > Thank you. Removing NeedsRetesting from whiteboard so we can repurpose it. Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |