800082 – Killing a job that writes to mounted Windows directory will result in permission denied on that mount point

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 800082 - Killing a job that writes to mounted Windows directory will result in permission denied on that mount point

Summary: Killing a job that writes to mounted Windows directory will result in permiss...

Keywords:
Status:	CLOSED DUPLICATE of bug 877010
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Sachin Prabhu
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:	CIFS
Depends On:
Blocks:	798385
TreeView+	depends on / blocked

Reported:	2012-03-05 17:35 UTC by chruitad
Modified:	2019-07-11 07:34 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-10-15 23:07:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
log file (122.56 KB, text/plain) 2012-03-06 19:33 UTC, chruitad	no flags	Details
View All

Description chruitad 2012-03-05 17:35:55 UTC

Description of problem:
I am using linux kernel 2.6.32-220.4.1.el6.x86_64 (RHEL6) and cifs version 4.8.1.  Using a mount point (/mnt/tmp), we are able to read/write to our Windows directories.  Occasionally, a user will kill a job that is writing to these directories.  When this happens, it corrupts the mount point somehow and we get a permission denied error when we do an “ls”.
 
If I do an “lsof” and grep for the path, I get a  message:
 
                 WARNING: can’t stat() cifs file system /mnt/tmp
 
It seems that if I am able to successfully unmount all of these mount 
points, that I can do a “mount –a” and recover.  However, a user should be able to kill a job without ruining mount points.  

Version-Release number of selected component (if applicable):
RHEL 6.2 and CIFS 4.8.1

How reproducible:
Configure winbind and samba for Windows users' login.  Add a mount point 

Steps to Reproduce:
1.  Configure winbind login 
2.  Add a mount point with cifs to a Windows share
3.  Have a Windows user logged into Linux box kill a job that writes to the Windows share.  
  
Actual results:  Occassionally, killing the job results in a permission denied error at the mount point.


Expected results:
I would expect the user to be able to kill their jobs without locking up the mount point.  

Additional info:

Comment 2 Jeff Layton 2012-03-06 15:48:25 UTC

What we need to understand is what happens with the later calls after you kill
that process. What may be an interesting first step is to get the client into this state and then turn up cifs debugging. Then attempt to stat() the
mountpoint and then disable the debugging and collect the logs. Here's some
info on how to do that:

    http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Enabling_Debugging

...that may give us an initial idea of what's going on when this occurs.

Comment 3 chruitad 2012-03-06 19:33:59 UTC

Created attachment 568041 [details]
log file

Please see the attached log file.  

Thank you for your help.

Comment 4 Jeff Layton 2012-03-06 20:01:23 UTC

Looks like something fell down in the handling of signatures after the signal. Most likely the sequence numbers got out of whack somehow. Some servers
disconnect the socket, forcing the client to reconnect when there's a signing failure. This one apparently doesn't.

A possible workaround is to mount with crypto signatures disabled until we
can track down the problem and come up with a fix.

Comment 5 RHEL Program Management 2012-05-03 05:15:13 UTC

Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Sachin Prabhu 2012-05-03 10:08:14 UTC

Targeting this issue for RHEL 6.4.

Comment 7 Stepchenko Aleksey 2012-08-22 08:57:33 UTC

Fedora 16 also has the same problem. It happens only in case of using Windows 2003/2008 Server that included in Windows Domain. 

In case of standalone Windows (XP/2003/2008/Win7) this problem is not appeared

Comment 8 RHEL Program Management 2012-12-14 08:04:58 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 9 Sachin Prabhu 2013-10-11 15:01:14 UTC

This looks like the same problem which was fixed by upsteam commit
31efee60f489c759c341454d755a9fd13de8c03d. This fix has been backported to the 6.5 devel tree in version 2.6.32-408.el6.

Please contact support for a test kernel containing this fix.

Sachin Prabhu

Comment 12 Sachin Prabhu 2013-10-15 23:07:40 UTC

Closing this bz as dup of bz 877010. The patch requested in this bz has been include in the devel tree as part of the solution for bz 877010. The patched kernel will be released for RHEL 6.5

If you require this fix before the release of RHEL 6.5, please contact Red Hat support for kernels containing the patch.

Sachin Prabhu

*** This bug has been marked as a duplicate of bug 877010 ***

Comment 13 Mark Christiansen 2013-12-02 23:40:42 UTC

I am unable to see the duplicate bug.  Can you please grant me permissions so that I can see the progress?  This allows a single user to wreak havoc on the usability of my systems and I am eager to follow and try the fix.

Comment 14 Sachin Prabhu 2013-12-03 10:46:44 UTC

Mark,

BZ 877010 was an internal bugzilla created to track the patches required to reduce kmapping used by async read and write code.

Upstream commit
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=31efee60f489c759c341454d755a9fd13de8c03d
which fixes this issue reported in this bz was backported as part of those fixes.

If you look at log file from c#3, we see the following messages repeatedly printed
"CIFS VFS: Unexpected SMB signature"

The "Unexpected SMB signature" message is usually seen when the sequence number expected by the client from the server does not match what was sent. The sequence number is incremented for every message sent by the client and the server.

The CIFS signature is a md5 hash built using the session key, the response calculated during the authentication process and the message itself which contains the sequence number. The resulting hash is stored in the same location in the header as the sequence number overwriting the sequence number.
The client when it receives the message from the server, saves the signature off the headers, replaces the signature with the expected sequence number and calculates its own md5 hash. It then compares the md5 hash it calculated with the md5 received in the packet to verify that the message was indeed sent by the server. If they don't match, it prints out the message we see above.

What commit 31efee60f489c759c341454d755a9fd13de8c03d fixes is a case where the sequence number is incorrectly incremented in expectation of a response for a NT_CANCEL call. This call is called when we want to cancel a request and doesn't result in a response from the server.
As mentioned in the summary, the problem is seen when a job is killed. When the job is killed, the client does send out a NT_CANCEL request. Without the patch mentioned above, we incorrectly increment the sequence number resulting in a the "Unexpected sequence number" error message. The client never recovers from this mismatched sequence number problem.

As part of the patches for bz 877010, I had also backported the patch required to fix the NT_CANCEL issue as well as another case which could result in an invalid sequence number to RHEL 6.

* Mon Aug 05 2013 Rafael Aquini <aquini> [2.6.32-408.el6]
..
- [fs] cifs: on send failure, readjust server sequence number downward (Sachin Prabhu) [877010]
..
- [fs] cifs: adjust sequence number downward after signing NT_CANCEL request (Sachin Prabhu) [877010]

Both these patches are available in the RHEL 6.5 kernel.

Please install this kernel and test. In case you still see this problem, please open a case with Red Hat Support who can help prioritise this issue and have a fix released for you in time.

Sachin Prabhu

Note You need to log in before you can comment on or make changes to this bug.