Red Hat Bugzilla – Bug 867877
mounted share stucks after a while
Last modified: 2014-06-18 03:42:46 EDT
We are mounting users home directories from an DFS share using CIFS. This works fine in general, but after a while of inactivity the share becomes inaccessible.
I have attached a debug log when trying to access the share when its stucked,
Created attachment 629379 [details]
cifs debug (cifsFYI=1) while trying to access a stucked CIFS share.
Hmm...You're getting back -6, which is probably -ENXIO. cifs translates the following DOS errors into that:
$ grep ENXIO fs/cifs/netmisc.c
...this is odd though, since the error code on this line which has an odd looking SMB error code:
[24879.427969] fs/cifs/netmisc.c: Mapping smb error code 0x50002 to POSIX err -6
Any chance you can get a capture of the traffic between client and server when one of these sessions is "stuck" ? What arch is the client here? x86_64?
*** Bug 884681 has been marked as a duplicate of this bug. ***
Sorry, forgot that i have already reported this one. I will check if i can create a traffic dump.
Does it fit to create a tcpdump on the machine from which the share is mounted?
Yep, that should be fine. I just need to see the traffic between client and server here.
No response in over a month. Please reopen if/when you get the captures...
Created attachment 703912 [details]
output of: tcpdump -i eth0 -p -s 0 -w /tmp/lookup_error.out port 445 or port 139
The error is "TID invalid". In the CIFS protocol, after you authenticate with a server you do a TREE_CONNECT to connect to an exported share. On a successful tree connect, you get back a "Tree Connect ID" cookie (aka TID).
It looks like your server is occasionally purging those TIDs after a period of inactivity. To my knowledge, this is a violation of the protocol -- these are supposed to stick around for the life of the connection. What sort of server is this?
ATM this is an EMC, but we are migrating to a SoNAS (by IBM). I am going to check with one of our storage engineers and come back to you when I got any further information.
Ahh yeah, I think I've seen this problem before with EMC servers. In principle, what might be nice is a patch to make the client redo the tree connect if the TID goes invalid. That's actually quite tricky to handle with the code designed the way it is, and this server is arguably broken.
What we could probably do is force a reconnect when this occurs. That's not very graceful, but I don't see much of an alternative.
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '17'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 17's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 17 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged change the
'version' to a later Fedora version prior to Fedora 17's end of life.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.