Bug 1771687
| Summary: | Process killed while opening a file can result in leaked open handle on the server | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Sorenson <fsorenso> | |
| Component: | kernel | Assignee: | cifs-maint | |
| kernel sub component: | CIFS | QA Contact: | xiaoli feng <xifeng> | |
| Status: | CLOSED WONTFIX | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | dwysocha, hmatsumo, jke, jscheibe, jshivers, lsahlber, mathapli, ribarry, shjadhav, xzhou | |
| Version: | 7.7 | Keywords: | Reproducer | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1771691 1927839 (view as bug list) | Environment: | ||
| Last Closed: | 2019-12-12 00:29:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1771691 | |||
FWIW, the 3 upstream commits (currently in cifs-next, not in linus) in the series for this bug do not backport cleanly against kernel-3.10.0-1117.el7 which likely means we may need more commits:
9150c3adbf24d77cfba37f03639d4a908ca4ac25 CIFS: Close open handle after interrupted close
7b71843fa7028475b052107664cbe120156a2cfc CIFS: Do not miss cancelled OPEN responses
86a7964be7afaf3df6b64faaa10a7032d2444e51 CIFS: Fix NULL pointer dereference in mid callback
9150c3adbf24d77cfba37f03639d4a908ca4ac25 depends on 97ca1762246d6eeb1b48dff8a179d1a92ce24227
- seems ok to backport
7b71843fa7028475b052107664cbe120156a2cfc may depend on ?? - maybe afe6f65353b64 and 3190b59a050ec and others?
The last commit may be peripheral to this bug, commit 86a7964be7afaf3df6b64faaa10a7032d2444e51 seems to depend on a non-stable patch that is fairly invasive which adds 'cifs_credits':
commit 335b7b62ffb69d18055f2bb6f3a029263a07c735
Author: Pavel Shilovsky <pshilov>
Date: Wed Jan 16 11:12:41 2019 -0800
CIFS: Respect reconnect in MTU credits calculations
Every time after a session reconnect we don't need to account for
credits obtained in previous sessions. Introduce new struct cifs_credits
which contains both credits value and reconnect instance of the
time those credits were taken. Modify a routine that add credits
back to handle the reconnect instance by assuming zero credits
if the reconnect happened after the credits were obtained and
before we decided to add them back due to some errors during sending.
This patch fixes the MTU credits cases. The subsequent patch
will handle non-MTU ones.
Signed-off-by: Pavel Shilovsky <pshilov>
Signed-off-by: Steve French <stfrench>
I'm not sure how likely this is to make RHEL7.8 given they are just going upstream and given dependencies such as above. The patches addressing this bug do have stable cc on them though, and are not a huge # of lines. Possibly it is not without conception they could be fixed up, assuming impact is high enough and no workaround, etc. but Ronnie can say more.
NOTE: If they go into 7.8 we need to make sure they also go into 8.2
Hi, We are seeing our field reporting customers on RHEL (7.7 and others) who are looking for this fix for RHEL 7 SKUs as well. Could RedHat team please work on this fix on supported SKUs for RHEL 7 as well? (In reply to Mayank Thapliyal from comment #8) > Hi, > > We are seeing our field reporting customers on RHEL (7.7 and others) who are > looking for this fix for RHEL 7 SKUs as well. How many customers would you say are affected? And do you have a specific customer that we might get a case filed, so we can work on a more targetted fix? If we have at least one customer that is willing to try test kernels it will increase chances we may be able to get something into RHEL7, though there's no guarantees. At the time of the original assessment, it was thought this issue is very hard to hit, so not many customers would be affected. If you're seeing something different we may need to dig in further. > Could RedHat team please work on this fix on supported SKUs for RHEL 7 as > well? We had decided against this patchset over a year ago, so to set expectations, it may be unlikely to get fixed in RHEL7. However, I'm not sure if there's some specific patch or patches possible that can still be appropriate for RHEL7 given the lifecycle. We're at a point where many customers are still on RHEL7, but need to start migrating to RHEL8 due to lifecycle. This bug is just one of many example of something fixed in RHEL8, but unfortunately still exists on RHEL7. We have at least one big customer (A big European car manufacturer) who is using these VMs on Azure. And then there are smaller 3PP publishers. There might be others as well who might come at a later point in time as they might not be aware of this leak. (In reply to Mayank Thapliyal from comment #11) > We have at least one big customer (A big European car manufacturer) who is > using these VMs on Azure. And then there are smaller 3PP publishers. There > might be others as well who might come at a later point in time as they > might not be aware of this leak. Do you have a current customer that is hitting this problem, and would be willing to use a test kernel and verify the fix? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |
Description of problem: If a process opening a file is killed while waiting for a SMB2_CREATE response from the server, the response may not be handled by the client, leaking an open file handle on the server. Version-Release number of selected component (if applicable): all kernels (RHEL 7, RHEL 8, upstream) How reproducible: easy Steps to Reproduce: # mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds # cd /mnt/vm3 # echo foo > foo # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done (increase count if necessary--100 appears sufficient to cause multiple leaked file handles) Actual results: the client stops waiting for the response, and outputs the following message when the response arrives: CIFS VFS: Close unmatched open the server leaks an open file handle--can be seen using samba, with the following: # smbstatus | grep -i Locked -A1000 Locked files: Pid Uid DenyMode Access R/W Oplock SharePath Name Time -------------------------------------------------------------------------------------------------- 25936 501 DENY_NONE 0x80 RDONLY NONE /home/user1 . Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x80 RDONLY NONE /home/user1 . Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 Expected results: the client handles the open response, and then closes the file (can the create/open be canceled?) Additional info: