RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1624029 - unclosed CIFS file handles when application is interrupted with ctrl-c - endless loop of STATUS_DELETE_PENDING (0xc0000056) in response to Create
Summary: unclosed CIFS file handles when application is interrupted with ctrl-c - endl...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: cifs-utils
Version: 7.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Ronnie Sahlberg
QA Contact: xiaoli feng
URL:
Whiteboard:
Depends On:
Blocks: 1711360
TreeView+ depends on / blocked
 
Reported: 2018-08-30 17:01 UTC by Syam Gadde
Modified: 2023-09-07 19:21 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-25 00:57:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmesg output (things go south around 3982.800208) (278.26 KB, text/plain)
2018-08-30 17:01 UTC, Syam Gadde
no flags Details
packet capture (things go south around packet 34085) (8.54 MB, application/octet-stream)
2018-08-30 17:03 UTC, Syam Gadde
no flags Details

Description Syam Gadde 2018-08-30 17:01:59 UTC
Created attachment 1479868 [details]
dmesg output (things go south around 3982.800208)

Description of problem:

We have 100 or so Linux machines running Scientific Linux 7.4 with various kernels:

3.10.0-693.5.2.el7
3.10.0-514.21.1.el7

mounting CIFS shares on a BlueArc/Hitachi NAS.  We occasionally see "stuck files", meaning files that show up in a directory listing but can't be removed, read, written to, or accessed in any way.  The server reports there is still an open handle on the file, but there are no processes still accessing the file on the client.  The only way to resolve the problem is to have an administrator to force close the file on the server or to reboot the Linux client machine (unmounting all CIFS shares does not resolve it either).

After some investigation, it seems that when an application with an open CIFS file is interrupted with Ctrl-C, sometimes the kernel is not sending a close request to the server on that file handle.  If you then try to delete it, it never gets deleted (because the server still sees an open handle) and the server comes back with a STATUS_DELETE_PENDING error on any further attempts to access,  create, or delete a file with the same path.

The following loop:

while true; do rm -f TMP; sleep 0.01s ; date > TMP; done

creates this sequence of SMB operations on the network, according to tcpdump:

...
# rm -f TMP;
Create Request (asks for delete-on-close)
Create Response (gets file handle FH1)
Close Request
Close Response (closes file handle FH1, file should be deleted)
# date > TMP;
Create Request (asks to create/overwrite file)
Create Response (gets file handle FH2)
  GetInfo Request (SMB2_FILE_INTERNAL_INFO)
  GetInfo Response
  SetInfo Request (SMB2_FILE_ENDOFFILE_INFO)
  SetInfo Response
  Create Request (gets file handle FH3)
  Create Response
    SetInfo Request (SMB2_FILE_BASIC_INFO)
    SetInfo Response
  Close Request
  Close Response (closes file handle FH3)
  Write Request
  Write Response
Close Request
Close Response (closes file handle FH2)
... <repeat>

Now I hit Ctrl-C on the above loop.  In this case, it seems to have interrupted the rm (unlink) command, resulting in:

Create Request (asks for delete-on-close)
Create Response (gets file handle FH4)

but no Close request/response afterwards.  So this unlink is never completed.  Then when I restart the loop above, I see this first, I'm not sure why (cifs_revalidate_dentry_attr, maybe?):

Create Request (open if it exists)
Create Response
  GetInfo Request (SMB2_FILE_ALL_INFO)
  GetInfo Response (succeeds)
Close Request
Close Response

And then the actual rm/unlink:

Create Request (asks for delete-on-close)
Create Response
Close Request
Close Response

after which every operation on the file results in an error STATUS_DELETE_PENDING:

Create Request
Create Response (Error: STATUS_DELETE_PENDING)
<repeat ad infinitum>

So, it seems interrupting an unlink call can cause a file handle to stay open until manually forced at the server or until the client machine is rebooted.

How reproducible:

Pretty consistent

Steps to Reproduce:

1. mount a CIFS file system and cd to a directory under that mount

2. Run the following and hit Ctrl-C during it:
while true; do rm -f TMP ; sleep 0.01s ; date > TMP; done

3. Repeat 2. until you get the error:
bash: TMP: No such file or directory


Actual results:

File is in a stuck state -- server still thinks someone has it open.

Expected results:

File should be released and able to be accessed normally.

Additional info:

Comment 2 Syam Gadde 2018-08-30 17:03:27 UTC
Created attachment 1479870 [details]
packet capture (things go south around packet 34085)

Comment 3 jstephen 2018-08-31 20:33:31 UTC
Moving to cifs component based on the problem description.

Comment 5 Syam Gadde 2018-09-17 14:13:10 UTC
Can someone tell me what additional info they need (needinfo flag was set).  Happy to provide more.

Comment 6 Syam Gadde 2018-09-18 15:16:09 UTC
It has been reported to me on the linux-cifs list that this may be fixed in 7.5.  I have upgraded one machine and have not been able to replicate the error.  I will report back if this still occurs, but for now, I think this can be closed.  Sorry to add noise.

Comment 7 Syam Gadde 2018-09-18 15:26:44 UTC
Actually I have gotten it to happen on 7.5, kernel release 3.10.0-862.11.6.el7.x86_64, so no luck.

I don't believe this is a user-space issue, I don't think this should be assigned to cifs-utils.

Comment 8 Joerg K 2018-09-19 05:49:02 UTC
I can confirm that it happens on 7.5, too.

Comment 9 Joerg K 2018-09-19 08:43:04 UTC
Could this be related to this one?
https://bugzilla.kernel.org/show_bug.cgi?id=198349


Note You need to log in before you can comment on or make changes to this bug.