Bug 188012

Summary: cifs mount on NT4 WS share can make some files unavailable
Product: [Fedora] Fedora Reporter: Wojciech Pilorz <wpilorz>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 6CC: cebbert, davej, smfrench, steved, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.22.5-49.fc6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-16 12:33:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wojciech Pilorz 2006-04-05 12:21:30 UTC
Description of problem:
When a share exported by NT4 workstation (either on FAT32 on NTFS filesystem) is
mounted with cifs with certain parameters, cp -vfp over existing file can make
it unavailable, also from NT4 WS locally; NT4 reboot is needed to have the file
available again.


Version-Release number of selected component (if applicable):
kernel version 2.6.16-1.2080_FC5 (bhcompile.redhat.com), cifs
module version 1.40, vermagic 2.6.16-1.2080_FC5 686 REGPARM 4KSTACKS gcc-4.1;
srcversion 520D4E8E8072D28A98CEABB


How reproducible:
Have an NT4.0 workstation with a share exported (e.g. C$); mount it on fedora
core 5 as given below; create a file with cp -vfp and then try to overwrite it
with cp -vfp;

In the following mount, $dom is domain name, uid and gid get names of local user
and group on FC5 PC, $fc5pcname is name of my FC5 PC, $x is name of the NT4
computer as well as name of a subdirectory of current dir (also, I have IP of
that NT4 PC defined under the same name in my /etc/hosts).
In file /etc/$zz.cifs I have username and passwords as described in
mount.cifs(8).

Steps to Reproduce:
1, mount:

mount -t cifs //$x/$share $x -o
domain=$dom,uid=$auser,gid=$agroup,dir_mode=0750,file_mode=0640,netbiosname=$fc5p
cname,credentials=/etc/$zz.cifs

2, cd to mounted dir from a session where you are logged in as a user given via
uid= parameter to the mount command;
3, create a test dir :

mkdir tmp1; cd tmp1

4, create a test file:

echo 123 > f123

5, copy it twice:

cp -vfp f123 g123; cp -vfp f123 g123

Actual results:
First cp ran fine, second gave me the following message:
cp: cannot create regular file `g123': No such file or directory

If a sequence is run using strace, I get for the second run of cp:

open("g123", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 EOPNOTSUPP (Operation not
supported)
unlink("g123")                      = 0
open("g123", O_WRONLY|O_CREAT|O_LARGEFILE, 0100664) = -1 ENOENT (No such file or
directory)

Then (regardless of whether strace was used or not, of course) I cannot do
anything with the file g123, even on NT4 workstation locally; umount and mount
does not help; reboot of the NT4 WS helps.
Same thing with XP workstation works; NT4 workstation + smbfs (kernel 2.4
RHL9-based Aurox 9.2) also works;


some examples of behaviour:

$ mkdir tst2
$ cd tst2
$ LANG=C
$ echo 123 > f123
$ ls -l
total 1
-rw-rw-r-- 1 wp wp 4 Apr  5 13:05 f123
$ cat f123
123
$ cp -vfp f123 g123
`f123' -> `g123'
$ ls -l
total 1
-rw-rw-r-- 1 wp wp 4 Apr  5 13:05 f123
-rw-rw-r-- 1 wp wp 4 Apr  5 13:05 g123
$ cp -vfp f123 g123
`f123' -> `g123'
cp: cannot create regular file `g123': No such file or directory
$ ls -l
total 1
-rw-rw-r-- 1 wp wp 4 Apr  5 13:05 f123
-rw-r----- 1 wp wp 4 Apr  5 13:05 g123
$ ls -l g123
ls: g123: No such file or directory
$ cat g123
cat: g123: No such file or directory
$ ls -l g123
-rw-r----- 1 wp wp 4 Apr  5 13:05 g123
$ ls -l g123
ls: g123: No such file or directory
$ ls -l g123
-rw-r----- 1 wp wp 4 Apr  5 13:05 g123
$ ls -l
total 1
-rw-rw-r-- 1 wp wp 4 Apr  5 13:05 f123
-rw-r----- 1 wp wp 4 Apr  5 13:05 g123


The amazing thing here was that ls -l works,
ls -l g123 works if name is completed by bash, and the same
ls -l g123 does NOT work, if name is typed (or if line with completed
g123 filename is recalled from bash history)
(There are no strange characters in file names, as ls | cat -A shows)


Expected results:
cp -vfp should be able to overwrite existing file; NT4 should work fine (as it
was when doing the same operation with smbfs from 2.4 kernel)


Additional info:

Same results with original FC5 kernel 2.6.15-1.2054_FC5

Comment 1 Dave Jones 2006-10-16 21:47:27 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 2 Wojciech Pilorz 2006-11-02 09:27:03 UTC
I have just tried the test with updated FC5 (kernel 2.6.18-1.2200.fc5, CIFS
Version 1.45), and it is **much** better.
In fact the test runs now as expected, but there are some messages in
/var/log/messages and on console:

When second "cp -vfp f123 g123" is run, a message 
kernel:  CIFS VFS: No task to wake, unknown frame rcvd! NumMids 1
is added to system log (/var/log/messages) and displayed on the active console;

If I enable more debugging with
echo 7 > /proc/fs/cifs/cifsFYI
before the test, then when the first cp is run, the message
kernel: Status code returned 0xc0000034 NT_STATUS_OBJECT_NAME_NOT_FOUND
results;

So, the test seems to run fine, but the kernel messages worry me a little.
Is more info needed?

Wojtek

Comment 3 Dave Jones 2006-11-20 22:22:03 UTC
Good to hear it's working again.

I've no idea if those messages are a bad thing, I've added the upstream CIFS
maintainer to the CC. Maybe he has some ideas what's going on.

Steve?

Comment 4 Steve French 2006-11-25 22:31:43 UTC
This may be related to a situation we had seen on sharing violations on copy -
and then setting the delete on close flag on on the target file which is then
renamed.  The server may be leaving it in the directory listing even with the
delete on close flag.

But in an any case - to debug it it would be helpful to see what the server
version of the directory (from the NT4 command prompt) looks like at each step -
and also to see an ethereal or tcpdump trace of the scenario.

Comment 5 Jeff Layton 2007-09-20 19:33:48 UTC
If this is still reproducible, could you collect the info requested by Steve F
in comment #4? If it's no longer reproducible, then please let me know and I'll
plan to close this case.


Comment 6 Wojciech Pilorz 2007-09-21 22:13:53 UTC
The problem that using file exported from NT4WS share over cifs mount can make
it unaccssible until NT4 reboot, is still present with FC6 kernel. Last time I tried
it a few weeks ago; when I edit a file with vim every now and then the file
becomes unaccessible.
The problem has never occured to me when I was editing file exported from W2K
server share.
Also, it has never occured to me when using smbfs mount (Centos 4.4).
I will try again next week with most recent FC6 kernel and
get back with the results.

Comment 7 Steve French 2007-09-22 02:13:23 UTC
It will be important to see if we are hitting a "delete-on-close" related bug
(in either NT 4 server, or how the cifs client sets this flag when deleting a
file that is open - cifs has a complex procedure to eventually delete open files
since Windows forbids them to be immediately deleted as POSIX requires - the
cifs client renames the file, and then marks it delete on close so that it will
be removed when the original application which opened it closes it) - I wonder
if this is what you are hitting.

Comment 8 Wojciech Pilorz 2007-09-24 14:05:26 UTC
I tried today with latest FC6 kernel ( 2.6.22.5-49.fc6, CIFS Version 1.49 )
and could not reproduce the problem.
I will try doing longer tests the next two or three days
and will post results here.


Comment 9 Jeff Layton 2007-10-03 18:28:52 UTC
Thanks for testing. Setting back to NEEDINFO for now. Please set back to
ASSIGNED once you have some results from it.


Comment 10 Wojciech Pilorz 2007-10-03 20:07:30 UTC
I failed to reproduce the bug with current FC6 kernel.
I hope this means the bug 188012 is goneand perhaps can be closed.
While testing I have verified that two other problems
with CIFS and NT4 still exist, they are described in
https://bugzilla.redhat.com/show_bug.cgi?id=305191
and 
https://bugzilla.redhat.com/show_bug.cgi?id=305231

They have easy workarounds, so they are not that serious.
Thank you Steve for you excellent work!


Comment 11 Jeff Layton 2007-10-16 12:33:19 UTC
Good deal. Let's close this as CURRENTRELEASE.