From Bugzilla Helper: User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2 i686) Description of problem: Our users launch netscape from a Linux box running RHL 7.1 and display back to their X desktop. We use user, not host based authenication. On occasion, the linux box will not see the .Xauthority file at all. 'ls' will list it but 'cat', 'wc', 'file', etc. will not be able to read it. Other NFS clients read it fine. How reproducible: Sometimes Steps to Reproduce: 1. Login to xdm session on hostA with your home mounted on RHL 7.1 box via NFS 2. remsh RHL 7.1 box 'xterm -display hostA:0' 3. This will fail if xterm can't read .Xauthority which it sometimes can't Actual Results: X client reported that it's request to display was refused by the server Expected Results: NFS should work reliably. If it would work, clients could read the .Xauthority file. As a work-around, I have to use host based X authenication which is much less secure. Additional info: # cat /etc/issue Red Hat Linux release 7.1 (Seawolf) Kernel 2.4.2-2 on an i686 : wc .Xauthority wc: .Xauthority: No such file or directory 0 0 0 .Xauthority : ls -l .Xauthority -rw------- 1 user group 49 May 9 06:13 .Xauthority : id uid=21487(user) gid=24773(group) : cat .Xauthority cat: .Xauthority: No such file or directory : file .Xauthority .Xauthority: file: read failed (No such file or directory). : touch .Xauthority touch: setting times of `.Xauthority': No such file or directory When I login to any other nfs client (Solaris, DG/UX, RHL 6.2), I can read the file fine. Here is RedHat 6.2 with the exact same mount options: : wc .Xauthority 1 3 49 .Xauthority type nfs (rw,rsize=8192,wsize=8192,hard,intr,bg,addr=aaa.bb.cc.dd) Other files on that same nfs file system are readable by the 7.1 client. : wc .Xdefaults 107 200 3777 .Xdefaults This has worked since RHL 4.2. The nfs servers haven't changed. They are still Solaris 2.6 and DG/UX 4.11
Can you try changing rsize/wsize to 1024?
The change of rsize and wsize to 1024 did not make a difference.
Hmm.. This is very odd. Can you provide the following for debugging? 1 - An strace of one of the file ops (cat, file, etc.) which fails with a "No such file or directory" error msg. 2 - A tcpdump or ethereal capture of the client/server net traffic during one of these file ops.
Created attachment 21002 [details] strace of 'wc .Xauthority'
I was able to get the strace while the error was occuring. However, by the time I got to tcpdump, I was able to read the file ok. So, when the problem re-surfaces, I'll get the tcpdump info.
If the problem is intermittent, the NIC driver is also a suspect. Which network cards do the troublesome machines have?
Intel Ethernet Pro 100. ifconfig says we are clean: # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:A0:C9:EB:C6:F9 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8206805 errors:0 dropped:0 overruns:0 frame:0 TX packets:11996612 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:14 Base address:0xdce0 Here is some more info on the NIC: kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html kernel: eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw.com.sg> and others kernel: PCI: Assigned IRQ 14 for device 00:08.0 kernel: eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:A0:C9:EB:C6:F9, I/O at 0xdce0, IRQ 14. kernel: Receiver lock-up bug exists -- enabling work-around. kernel: Board assembly 704666-002, Physical connectors present: RJ45 kernel: Primary interface chip i82555 PHY #1. kernel: General self-test: passed. kernel: Serial sub-system self-test: passed. kernel: Internal registers self-test: passed. kernel: ROM checksum self-test: passed (0x49caa8d6). kernel: Receiver lock-up workaround activated.
The error came back this morning so I got the tcpdump info for various commands. wc .Xauthority 11:11:20.866703 eth0 > RHL_7.1.756060177 > DG_4.11.nfs: 180 nop (DF) 11:11:20.866703 eth0 < DG_4.11.nfs > RHL_7.1.756060177: reply ok 240 nop 11:11:20.866703 eth0 > RHL_7.1.772837393 > DG_4.11.nfs: 164 getattr [|nfs] (DF) 11:11:20.866703 eth0 < DG_4.11.nfs > RHL_7.1.772837393: reply ok 28 getattr ERROR: No such file or directory 11:11:20.866703 eth0 > RHL_7.1.789614609 > DG_4.11.nfs: 176 read [|nfs] (DF) 11:11:20.876703 eth0 < DG_4.11.nfs > RHL_7.1.789614609: reply ok 32 read ERROR: No such file or directory file .Xauthority 11:11:27.516907 eth0 > RHL_7.1.1544589329 > DG_4.11.nfs: 176 read [|nfs] (DF) 11:11:27.516907 eth0 < DG_4.11.nfs > RHL_7.1.1544589329: reply ok 32 read ERROR: No such file or directory cat .Xauthority 11:11:32.387056 eth0 > RHL_7.1.2853212177 > DG_4.11.nfs: 176 read [|nfs] (DF) 11:11:32.387056 eth0 < DG_4.11.nfs > RHL_7.1.2853212177: reply ok 32 read ERROR: No such file or directory touch .Xauthority 11:11:59.917900 eth0 > RHL_7.1.3843067921 > DG_4.11.nfs: 192 setattr [|nfs] (DF) 11:11:59.917900 eth0 < DG_4.11.nfs > RHL_7.1.3843067921: reply ok 36 setattr ERROR: No such file or directory
A couple more things to try. Are you using IPchains or netfilter? If so, make sure your firewall software on the Linux side is configured to let fragmented packets through. Configure the exports file on the server to use NFS version 2.
I'm not using IPchains or netfilter. I changed /etc/fstab on the 7.1 box to mount as nfs version 2 nfs rsize=8192,wsize=8192,nfsvers=2,rw,hard,intr,bg After remounting, I verified the version by looking in /proc/mounts I still get the error.
Well, this really smells to me like a problem with fragmentation. Can you get another tcpdump of this problem. Just a single failed operation will do, but I'd like to see the entire contents of the packets. Can you run tcpdump with -vvv -s 192? Thanks. I would also suggest you contact the nfs-utils mailing list and describe your problem. The kernel nfsd authors hang out there. nfs.net
eth0 > RHL_7.1.3482569487 > DG_4.11.nfs: 176 lookup fh 64,181/2 [|nfs] (DF) (ttl 64, id 0) eth0 < DG_4.11.nfs > RHL_7.1.3482569487: reply ok 128 lookup fh 64,181/2 REG 100600 ids 24794/24791 sz 740 nlink 1 rdev 0 fsid 40b5 nodeid 23d918 a/m/ctime 994167511.057415 994167459.757462 994167459.869957 (ttl 255, id 50793) eth0 > RHL_7.1.3499346703 > DG_4.11.nfs: 160 getattr fh 64,181/2 (DF) (ttl 64, id 0) eth0 < DG_4.11.nfs > RHL_7.1.3499346703: reply ok 28 getattr ERROR: No such file or directory (ttl 255, id 50794) eth0 > RHL_7.1.3516123919 > DG_4.11.nfs: 172 read fh 64,181/2 [|nfs] (DF) (ttl 64, id 0) eth0 < DG_4.11.nfs > RHL_7.1.3516123919: reply ok 28 read ERROR: No such file or directory (ttl 255, id 50795)
Greg, We may have a solution to this problem. The kernel's nfs client side code has a patch as of 2.4.6pre3 which has been reported to fix this same problem with Linux clients and Irix servers. Can you pull down the 2.4.6 kernel and test that on a client?
I'm sorry to report that the new kernel did not fix the problem. I'm now running 2.4.6pre9. I haven't tried at NFS version 2 yet. I guess I'll do that but I really would like to run version 3. Greg
What is the status of this problem now? Does it exist with RHL 9?
Stale - closing with regrets.