Bug 867877 - mounted share stucks after a while
mounted share stucks after a while
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jeff Layton
Fedora Extras Quality Assurance
: Reopened
: 884681 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-18 09:32 EDT by Marcus Moeller
Modified: 2014-06-18 03:42 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-01 13:14:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
cifs debug (cifsFYI=1) while trying to access a stucked CIFS share. (2.59 KB, text/plain)
2012-10-18 09:33 EDT, Marcus Moeller
no flags Details
output of: tcpdump -i eth0 -p -s 0 -w /tmp/lookup_error.out port 445 or port 139 (19.82 KB, application/octet-stream)
2013-02-28 10:31 EST, Marcus Moeller
no flags Details

  None (edit)
Description Marcus Moeller 2012-10-18 09:32:31 EDT
We are mounting users home directories from an DFS share using CIFS. This works fine in general, but after a while of inactivity the share becomes inaccessible.

I have attached a debug log when trying to access the share when its stucked,
Comment 1 Marcus Moeller 2012-10-18 09:33:24 EDT
Created attachment 629379 [details]
cifs debug (cifsFYI=1) while trying to access a stucked CIFS share.
Comment 2 Jeff Layton 2012-10-18 10:19:29 EDT
Hmm...You're getting back -6, which is probably -ENXIO. cifs translates the following DOS errors into that:

$ grep ENXIO fs/cifs/netmisc.c
	{ERRbaddrive, -ENXIO},
	{ERRnosuchshare, -ENXIO},
	{ERRinvtid, -ENXIO},
	{ERRinvnetname, -ENXIO},
	{ERRinvdevice, -ENXIO},


...this is odd though, since the error code on this line which has an odd looking SMB error code:

[24879.427969] fs/cifs/netmisc.c: Mapping smb error code 0x50002 to POSIX err -6


Any chance you can get a capture of the traffic between client and server when one of these sessions is "stuck" ? What arch is the client here? x86_64?
Comment 3 Jeff Layton 2012-12-06 10:34:25 EST
*** Bug 884681 has been marked as a duplicate of this bug. ***
Comment 4 Marcus Moeller 2012-12-06 10:41:28 EST
Sorry, forgot that i have already reported this one. I will check if i can create a traffic dump.

Does it fit to create a tcpdump on the machine from which the share is mounted?

x86_64 btw.
Comment 5 Jeff Layton 2012-12-06 10:48:56 EST
Yep, that should be fine. I just need to see the traffic between client and server here.
Comment 6 Jeff Layton 2013-01-07 10:26:59 EST
No response in over a month. Please reopen if/when you get the captures...
Comment 7 Marcus Moeller 2013-02-28 10:31:32 EST
Created attachment 703912 [details]
output of:  tcpdump -i eth0 -p -s 0 -w /tmp/lookup_error.out port 445 or port 139
Comment 8 Jeff Layton 2013-03-06 09:41:19 EST
The error is "TID invalid". In the CIFS protocol, after you authenticate with a server you do a TREE_CONNECT to connect to an exported share. On a successful tree connect, you get back a "Tree Connect ID" cookie (aka TID).

It looks like your server is occasionally purging those TIDs after a period of inactivity. To my knowledge, this is a violation of the protocol -- these are supposed to stick around for the life of the connection. What sort of server is this?
Comment 9 Marcus Moeller 2013-03-06 11:45:10 EST
ATM this is an EMC, but we are migrating to a SoNAS (by IBM). I am going to check with one of our storage engineers and come back to you when I got any further information.
Comment 10 Jeff Layton 2013-04-11 10:06:33 EDT
Ahh yeah, I think I've seen this problem before with EMC servers. In principle, what might be nice is a patch to make the client redo the tree connect if the TID goes invalid. That's actually quite tricky to handle with the code designed the way it is, and this server is arguably broken.

What we could probably do is force a reconnect when this occurs. That's not very graceful, but I don't see much of an alternative.
Comment 11 Fedora End Of Life 2013-07-04 01:55:39 EDT
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 12 Fedora End Of Life 2013-08-01 13:14:23 EDT
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.