Description of problem: NFS imported filesystems from Tru64 do not recognize directories as such, but return 'Not a directory'. From Solaris systems, there seems to be no prob. Version-Release number of selected component (if applicable): 2.6.18-1.2200.fc5 and 2.6.18-1.2200.fc5smp How reproducible: Always Steps to Reproduce: Actual results: 1. mount tru64machine:/whatever /mnt 2. ls /mnt/subdir ls: /mnt/subdir: Not a directory 3. cd /mnt/subdir cd: /mnt/subdir: Not a directory Expected results: Behave like 2.6.17-1.2187_FC5 (or any previous kernel), and have NFS working to Tru64 Additional info: Trying to mount with 'nfsvers=2' results in other errors (Input/output error). Also, this looks very unhealthy: > mount -o nfsvers=2 tru64machine:/whatever /mnt (seems to work; now without unmounting!) > mount -o nfsvers=3 tru64machine:/whatever /mnt > df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 102861656 69463672 28172844 72% / none 517476 0 517476 0% /dev/shm tmpfs 517476 0 517476 0% /tmp tmpfs 517476 48 517428 1% /var/tmp tru64:/whatever 177743168 86595008 90918656 49% /mnt tru64:/whatever 177743168 86595008 90918656 49% /mnt Twice mounted, without error! (And you can go far beyond 2). Previous versions loudly complained for double mount attempts.
I did some experiments with 2.6.18-1.2798.fc6. *) Bug is still there. *) Mounting with '-o nfsvers=2' seems to solve it here. *) Multiple mounting is still possible, as long as you change version numbers; so you can repeat these lines indefinitely: mount -o nfsvers=3 tru64machine:/whatever /mnt mount -o nfsvers=2 tru64machine:/whatever /mnt You'll need the same number of unmounts to get rid of this construct ...
I tried 2.6.18-1.2224.fc5 (from testing). Not fixed. Forcefully mounting with '-o nfsvers=2' seems alleviate the problem; but for a few directories, you get a 'Input/output error' instead. All in all: unusable. Forcing NFS over TCP or UDP makes no difference.
I wonder if https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212471 is the same problem...
As expected, 2.6.18-1.2239.fc5 does not make any difference. As of resemblance to 212471, I'm not sure, because the 'mounting' of the fs works without problem, both manual and with autofs on Tru64 machines with a single and with multiple active network interfaces. Might be related anyhow though... Strangely enough, on a single FC6 test machine, I am able to mount a Tru64 fs without problems; but _with_ the same 'Not a directory' problems as in FC5 (that is with 2.6.18-1.2798.fc6).
I see exactly the same problem with a x86_64 dual CPU system running fedora 5 fully updated (kernel 2.6.18-1.2239.fc5). The issue appears with 6 different nfs mounts coming from two different Tru64 servers both with autofs and "normal" nfs. I spent days before realizing today it is just the "new" kernel. Obviously it is a serious issue in mixed system centers like our one. I confirm that forcing nfsvers=2 helps but still leaves issues (and big worries of filesystem corruption, any hint about such a possibility?) If I can help in any way in pinpointing the problem let me know. Alfredo Ferrari
We have exactly the same problem: Tru64 NFS server and dual Opteron workstation running FC5, the latest kernel 2.6.18-1.2239 leaves NFS mounts almost totally unusable, as it reports that almost all directories under the mount point are not directories. However, "ls -l" still shows them as directories. Mounts from other Linux, Solaris and IRIX servers are fine, however. With some digging through tcpdump traces the only difference I can see between the Tru64 server and others is that the Tru64 server uses a very large (or negative) fsid and a very small fileid. For example, Tru64 is returning fileids less than about 32, whereas the other systems are using fileids in the thousands. If you'd like me to run some tests, or take some more packet dumps, just let me know.
Yes... could you please post some bzip2 binary tethreal netowork traces of this problem... tia...
An example of what I'm looking for is: tethereal -w /tmp/bz211293.pcap host <server> ; bzip2 /tmp/bz211293.pcap
Created attachment 142769 [details] requested tethereal output
Voila' 1 test. The server is alf3.cern.ch (192.91.242.170), the client pceet030.cern.ch (192.91.242.30), the mount is via autofs (it doesn't really matter) mounting the remote (alf3) /user1 partition on misc/alf3_user1. I made a few ls /misc/alf3_user1/home and ls /misc/alf3_user1/home/xxx where xxx are various directories (the mount has with root privileges and I am root while issuing those commands). These ls mostly failed (not all and not always for the same directory, ie ls /misc/alf3_user1/home/alfredo succeeded twice and failed many more with [root@pceet030 log]# ls /misc/alf3_user1/home/alfredo/ ls: /misc/alf3_user1/home/alfredo/: Not a directory
Created attachment 142823 [details] Packet capture while experiencing "not a directory" bug. The packets were captured while a logged-in user attempted to cd to several sub-directories of his home directory, receiving the "not a directory" error each time.
Unfortunately, neither one of the trace showed anything out of the ordinary... In the bz041206.pcap trace, I see lookup of 'alfredo' and the file type that is being returned is a directory... In both traces, I also see a number of READLINKS meaning the symlinks some how involved.... which should not matter... Over all both traces look like normal NFS traffic... The guess only oddity is the lack of non-zero NFS status. I was hoping to see the server return some type of error which might give clue as to what is happening... Would it be possible to post bzip2 strace of the ls or stat command failing? something like 'strace -o /tmp/strace.txt ls /mnt/foo'. What I'm looking for is to see which system call is failing which will (hopefully) give me starting point... Also, what is the status of SElinux? On, Off? If its on, please try using the 'setenforce 0' command to turn it off to see if that makes a different. And it is true, that moving one or two of the mount out of autofs land and into /etc/fstab makes no difference?
Created attachment 142939 [details] Strace, NFSV3, 'Not a directory'
Created attachment 142940 [details] Strace, NFSV2, 'Input/output error'
This is with 2.6.18-1.2849.fc6; mount is manual, not via autofs (but this makes no difference). > getenforce Disabled > # So SELinux is disabled completely > mount ibrahim:/raid3/users/deknuydt /mnt > strace -o /tmp/strace.txt ls /mnt/tex ls: /mnt/tex: Not a directory > # This is the first attachment > umount /mnt > mount -o vers=2 ibrahim:/raid3/users/deknuydt /mnt > strace -o /tmp/strace1.txt /mnt/mail ls: reading directory /mnt/mail: Input/output error > # This is the second attachment
For the recored.... here are the system calls that are failing: open("/mnt/tex", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOTDIR (Not a directory) and getdents(3, 0x61b9c8, 4096) = -1 EIO (Input/output error)
When the 'ls /mnt/tex' returns ENOTDIR, could you post the output of 'stat /mnt/tex'.
Also, are all of these clients 32bit architectures? Is anybody seeing this problem with a 64bit client?
My one is a 64 bit client
> mount ibrahim:/raid3/users/deknuydt /mnt > ls /mnt/text ls: /mnt/text: Not a directory > stat /mnt/text File: `/mnt/text' Size: 16384 Blocks: 32 IO Block: 4096 directory Device: 12h/18d Inode: 985772 Links: 7 Access: (0755/drwxr-xr-x) Uid: ( 4653/deknuydt) Gid: ( 46/ visics) Access: 2006-12-06 11:13:35.000999000 +0100 Modify: 2006-11-30 13:33:30.493267000 +0100 Change: 2006-11-30 13:33:30.493267000 +0100 > mount -o vers=2 ibrahim:/raid3/users/deknuydt /mnt > ls /mnt/mail ls: reading directory /mnt/mail: Input/output error > stat /mnt/mail File: `/mnt/mail' Size: 8192 Blocks: 16 IO Block: 4096 directory Device: 12h/18d Inode: 981630 Links: 3 Access: (0700/drwx------) Uid: ( 4653/deknuydt) Gid: ( 46/ visics) Access: 2006-12-06 01:15:18.000999000 +0100 Modify: 2006-07-20 17:17:55.006699000 +0200 Change: 2006-07-20 17:17:55.006699000 +0200 Strange ... If I chmod o+rx this mail directory, then suddenly the IO error disappears. So a 'Permission Denied' turns into an 'IO error'. For NFSV2 at least. For the record: these problems occur both with the i686 and x86_64 kernels, both onder FC5 and FC6; in short: all kernels more recent than 2.6.17-1.2187_FC5 have this.
Ours is 64-bit also. Here is the requested output from ls and stat when experiencing the bug: > ls CFD ls: CFD: Not a directory > stat CFD File: `CFD' Size: 8192 Blocks: 16 IO Block: 4096 directory Device: 1ah/26d Inode: 724277 Links: 20 Access: (0755/drwxr-xr-x) Uid: ( 5154/ amos) Gid: ( 110/ mamod) Access: 2006-12-04 03:39:19.387121000 +0000 Modify: 2006-11-03 10:32:02.870369000 +0000 Change: 2006-11-03 10:32:02.870369000 +0000 In case it is significant, here is an excerpt from /proc/cpuinfo. vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 270 stepping : 2
Cool... thanks for all the info... What version of Tru64 is everyone using... I'm trying to dig one up so I can reproduce this locally...
>ssh ibrahim uname -a OSF1 ibrahim V5.1 2650 alpha alpha Tru64 >ssh ampere uname -a OSF1 ampere V5.1 732 alpha alpha Tru64 Reasonably patched :)
> uname -a OSF1 alphasc0 V5.1 2650 alpha
In my case alf3> uname -a OSF1 alf3.cern.ch V4.0 1229 alpha srveet01> uname -a OSF1 srveet01.cern.ch V4.0 1229 alpha
*** Bug 218393 has been marked as a duplicate of this bug. ***
Well I was finally able to track down a Tru64 box running 'OSF1 <hostname> T5.1 2359 alpha' and (unfortunately) I was *able* to mount the machine w/out any problem... (with FC5, FC5, and RHEL5 B2 clients)..... But I did notice with all the Tru64 OS versions that were posted started with a 'V' (ie. V4.0 or V5.0, etc) and version on my box starts with a 'T'.... Does that mean anything to anybody?
The T5.1 means it was a pre-released V5.1 however I think this box is running a late T5.1 so I would not expect there to be a significant difference.
2.6.18-1.2868.fc6 is out, and still has this problem too. This is taking too long... Should we relabel this to FC6 now? As for not being able to reproduce this (see comment #27), did you export a UFS or a AdvFS file system on the Tru64?
This is really taking too long... btw in my case they are AdvFS file systems which generate the problem (and I ahve no other files system type to test with on those machines). The problem is clearly common to FC5 and FC6, I would say to all 2.6.18 kernels released for both, at least in my experience. Again, if I can help let me know
All of the exports on our Tru64 server are AdvFS also. Since this server is not under my control it is not possible for me to test UFS. I second Alfredo's comments; this seems to be a kernel bug that was introduced between 2.6.16 and 2.6.18. I am happy to test kernel patches, as the affected machine is not a production server.
I can't agree with you more.... wrt to taking too long... believe me if the Tru64 system I just acquired showed the problem... I would be all over this... since is almost guaranteed that upcoming RHEL5 release will have the same problem... My next step will be to start going through the diffs between 2.6.16 and 2.6.18 and start throwing out some test kernels...
The diffs between 2.6.16 and 2.6.18 are extremely large... over 1800 lines So in http://people.redhat.com/steved/bz211293/ is the kernel-smp-2.6.17-1.3002 kernel which is about halfway between 2.6.16 and 2.6.18... Please give that a try to see if the problem exists... (Note, if you need a different type of kernel, just let me know... Again, I apologize for taking so long on this one...
Don't know if this is good or bad news for you, but kernel-smp-2.6.17-1.3002 suffers from the 'Not a directory' problem too. So I guess you'll have to continue the binary search...
Ok... please try http://people.redhat.com/steved/bz211293/kernel-smp-2.6.16-1.3001_FC5.i686.rpm its that last kernel before we moved to a 2.6.17...
Tried the Kernel. The problem went away. NFS mounts worked to OSF boxes as expected.
cool... at least we are making some progress... Now I'm off to see what the diffs are between these kernels, but if possible, could you try this kernel? http://people.redhat.com/steved/bz211293/kernel-smp-2.6.17-1.2136_FC5.i686.rpm Its the first 2.6.17 kernel released... tia...
2.6.17-1.2136_FC5 is clean. So is kernel-2.6.17-1.2187_FC5 btw. I just have the gut feeling that FS-Cache is to blame ... Where did that get added?
This maybe related, as it doesn't happen in FC4, but I cannot samba share nfs mounts, NFS server: FC6, Samba server/NFS client: FC6. Running the NFS server on an older kernel version, RHEL4.3 or FC4 the problem does not show itself. Windows clients attempting to write to files fail with a "Delayed Write Fail" and worse still, the file which was written to is clobbered.
FYI: 2.6.19-1.2895.fc6 still has it.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
Fixed in nfs-utils-1.0.10-10.fc6 by added the -o nordirplus mount option which will have the kernel support in the next kernel update.