Bug 523797
Summary: | RHEL5.4 kernel (2.6.18-164.el5) breaks nfsv4 file locking | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Rob Henderson <robh> | ||||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.4 | CC: | jlayton, mb--redhat, sprabhu, steved | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-10-08 18:34:12 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Rob Henderson
2009-09-16 17:27:24 UTC
We see this too. vim is unusable. Rolling the server kernel back to 2.6.18-128.7.1.el5 makes the problem go away. Are there any error messages being logged to /var/log/messages? I just tested this on my rhel5 test box and didn't see an issue... A bit more info would be helpful. We need to understand what's happening at the system call level. You say: "on an nfsv4 client, things relying on file locking fail" ...what's happening here, exactly? Are fcntl calls returning errors when they shouldn't? An strace of such a program would be helpful. Just wondering if this is the issue seen here is actuallythe regression seen in bz 524520. I don't see anything of interest being logged on the server or the client. However, I have a test server and client set up so it would be trivial for me to enable any type of debugging that might give useful information. Just let me know. BTW, the simplest demonstration of the problem I have is to just ssh to a client that is getting my homedir via nfsv4 from a server running 2.6.18-164. I have ForwardX11 set so it looks like it tries to lock the .Xauthority and fails: [robh@robwilco robh]$ ssh test robh@test's password: Last login: Thu Oct 8 12:09:46 2009 from robwilco.cs.indiana.edu /usr/bin/xauth: error in locking authority file /u/robh/.Xauthority -bash-3.2$ I will attach a tcpdump showing the client<->server nfs traffic for one such ssh login. Created attachment 364152 [details]
tcpdump output
In this dump, the nfsv4 server is curie.cs.indiana.edu (129.79.246.140) and the client is test.cs.indiana.edu (129.79.245.31). This was captured on the server with:
tcpdump -s 0 -w /tmp/nfsv4locks.pcap host test.cs.indiana.edu
while I logged into test via ssh.
Created attachment 364158 [details] strace of sshd at login I generated this strace by running the following on the client while I logged in via ssh: strace -f -v -o /tmp/strace.out -p PID_OF_SSHD Also of note is that after the login and the error about locking .Xauthority I'm left with the following in my homedir: ---------- 1 robh staff 0 Jan 14 1970 .Xauthority-c Perhaps this is similar to bz 524520... ??? bug 524520 was what I was thinking too...the RHEL5 test kernels on my people.redhat.com page have patches to fix that bug: http://people.redhat.com/jlayton/ ...would you be able to test those someplace non-critical and let us know if they fix the problem? I just booted my development server to 2.6.18-166.el5.jtltest.88 and a quick test seems to indicate that this fixes the problem. I didn't change the kernel on the client so it is still running the stock 2.6.18-164 kernel. I haven't done extensive testing but it definitely looks like this addresses the issue. Thanks! Thanks for testing it. I'll go ahead and close this as a duplicate. Please reopen if it looks like it's not. *** This bug has been marked as a duplicate of bug 524520 *** |