Bug 800082 - Killing a job that writes to mounted Windows directory will result in permission denied on that mount point
Killing a job that writes to mounted Windows directory will result in permiss...
Status: CLOSED DUPLICATE of bug 877010
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.2
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Sachin Prabhu
Red Hat Kernel QE team
CIFS
:
Depends On:
Blocks: 798385
  Show dependency treegraph
 
Reported: 2012-03-05 12:35 EST by chruit
Modified: 2015-04-16 23:19 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-15 19:07:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log file (122.56 KB, text/plain)
2012-03-06 14:33 EST, chruit
no flags Details

  None (edit)
Description chruit 2012-03-05 12:35:55 EST
Description of problem:
I am using linux kernel 2.6.32-220.4.1.el6.x86_64 (RHEL6) and cifs version 4.8.1.  Using a mount point (/mnt/tmp), we are able to read/write to our Windows directories.  Occasionally, a user will kill a job that is writing to these directories.  When this happens, it corrupts the mount point somehow and we get a permission denied error when we do an “ls”.
 
If I do an “lsof” and grep for the path, I get a  message:
 
                 WARNING: can’t stat() cifs file system /mnt/tmp
 
It seems that if I am able to successfully unmount all of these mount 
points, that I can do a “mount –a” and recover.  However, a user should be able to kill a job without ruining mount points.  

Version-Release number of selected component (if applicable):
RHEL 6.2 and CIFS 4.8.1

How reproducible:
Configure winbind and samba for Windows users' login.  Add a mount point 

Steps to Reproduce:
1.  Configure winbind login 
2.  Add a mount point with cifs to a Windows share
3.  Have a Windows user logged into Linux box kill a job that writes to the Windows share.  
  
Actual results:  Occassionally, killing the job results in a permission denied error at the mount point.


Expected results:
I would expect the user to be able to kill their jobs without locking up the mount point.  

Additional info:
Comment 2 Jeff Layton 2012-03-06 10:48:25 EST
What we need to understand is what happens with the later calls after you kill
that process. What may be an interesting first step is to get the client into this state and then turn up cifs debugging. Then attempt to stat() the
mountpoint and then disable the debugging and collect the logs. Here's some
info on how to do that:

    http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Enabling_Debugging

...that may give us an initial idea of what's going on when this occurs.
Comment 3 chruit 2012-03-06 14:33:59 EST
Created attachment 568041 [details]
log file

Please see the attached log file.  

Thank you for your help.
Comment 4 Jeff Layton 2012-03-06 15:01:23 EST
Looks like something fell down in the handling of signatures after the signal. Most likely the sequence numbers got out of whack somehow. Some servers
disconnect the socket, forcing the client to reconnect when there's a signing failure. This one apparently doesn't.

A possible workaround is to mount with crypto signatures disabled until we
can track down the problem and come up with a fix.
Comment 5 RHEL Product and Program Management 2012-05-03 01:15:13 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 6 Sachin Prabhu 2012-05-03 06:08:14 EDT
Targeting this issue for RHEL 6.4.
Comment 7 Stepchenko Aleksey 2012-08-22 04:57:33 EDT
Fedora 16 also has the same problem. It happens only in case of using Windows 2003/2008 Server that included in Windows Domain. 

In case of standalone Windows (XP/2003/2008/Win7) this problem is not appeared
Comment 8 RHEL Product and Program Management 2012-12-14 03:04:58 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 9 Sachin Prabhu 2013-10-11 11:01:14 EDT
This looks like the same problem which was fixed by upsteam commit
31efee60f489c759c341454d755a9fd13de8c03d. This fix has been backported to the 6.5 devel tree in version 2.6.32-408.el6.

Please contact support for a test kernel containing this fix.

Sachin Prabhu
Comment 12 Sachin Prabhu 2013-10-15 19:07:40 EDT
Closing this bz as dup of bz 877010. The patch requested in this bz has been include in the devel tree as part of the solution for bz 877010. The patched kernel will be released for RHEL 6.5

If you require this fix before the release of RHEL 6.5, please contact Red Hat support for kernels containing the patch.

Sachin Prabhu

*** This bug has been marked as a duplicate of bug 877010 ***
Comment 13 Mark Christiansen 2013-12-02 18:40:42 EST
I am unable to see the duplicate bug.  Can you please grant me permissions so that I can see the progress?  This allows a single user to wreak havoc on the usability of my systems and I am eager to follow and try the fix.
Comment 14 Sachin Prabhu 2013-12-03 05:46:44 EST
Mark,

BZ 877010 was an internal bugzilla created to track the patches required to reduce kmapping used by async read and write code.

Upstream commit
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=31efee60f489c759c341454d755a9fd13de8c03d
which fixes this issue reported in this bz was backported as part of those fixes.

If you look at log file from c#3, we see the following messages repeatedly printed
"CIFS VFS: Unexpected SMB signature"

The "Unexpected SMB signature" message is usually seen when the sequence number expected by the client from the server does not match what was sent. The sequence number is incremented for every message sent by the client and the server. 

The CIFS signature is a md5 hash built using the session key, the response calculated during the authentication process and  the message itself which contains the sequence number. The resulting hash is stored in the same location in the header as the sequence number overwriting the sequence number. 
The client when it receives the message from the server, saves the signature off the headers, replaces the signature with the expected sequence number and  calculates its own md5 hash. It then compares the md5 hash it calculated with the md5 received in the packet to verify that the message was indeed sent by the server. If they don't match, it prints out the message we see above.

What commit 31efee60f489c759c341454d755a9fd13de8c03d fixes is a case where the sequence number is incorrectly incremented in expectation of a response for a NT_CANCEL call. This call is called when we want to cancel a request and doesn't  result in a response from the server.
As mentioned in the summary, the problem is seen when a job is killed. When the job is killed, the client does send out a NT_CANCEL request. Without the patch mentioned above, we incorrectly increment the sequence number resulting in a the  "Unexpected sequence number" error message. The client never recovers from this mismatched sequence number problem.

As part of the patches for bz 877010, I had also backported the patch required to fix the NT_CANCEL issue as well as another case which could result in an invalid sequence number to RHEL 6. 

* Mon Aug 05 2013 Rafael Aquini <aquini@redhat.com> [2.6.32-408.el6]
..
- [fs] cifs: on send failure, readjust server sequence number downward (Sachin Prabhu) [877010]
..
- [fs] cifs: adjust sequence number downward after signing NT_CANCEL request (Sachin Prabhu) [877010]

Both these patches are available in the RHEL 6.5 kernel.

Please install this kernel and test. In case you still see this problem, please open a case with Red Hat Support who can help prioritise this issue and have a fix released for you in time.

Sachin Prabhu

Note You need to log in before you can comment on or make changes to this bug.