Bug 704921
Summary: | panic in cifsd code after unexpected lookup error -88. | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Wade Mealing <wmealing> | |
Component: | kernel | Assignee: | Jeff Layton <jlayton> | |
Status: | CLOSED ERRATA | QA Contact: | Jian Li <jiali> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 5.8 | CC: | bfields, ccui, dhowells, eguan, jiali, jlayton, jwest, moshiro, nmurray, rdassen, rwheeler, sprabhu, steved | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-2.6.18-283.el5 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 711400 (view as bug list) | Environment: | ||
Last Closed: | 2012-02-21 03:47:58 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 711400 | |||
Bug Blocks: |
Description
Wade Mealing
2011-05-16 01:16:10 UTC
I've looked over the code but simply don't see it. This code is really too complicated for anyone's good, but basically the recreation and reconnection of the socket is supposed to be done by cifsd. When a reconnect event occurs, then cifsd will close down the socket and set ssocket to NULL, and then try to create a new socket and connect it. It shouldn't return until that has successfully occurred. The above stack trace though makes it look like it did happen. There is one possibility -- it could be that there was a flurry of reconnect/disconnect activity, cifs_setup_session raced in and reset the tcpStatus to CifsGood while cifsd was trying (and failing) to reconnect the socket. That would probably explain what happened... The fundamental problem here though is that the tcpStatus has no clear locking rules around it. This will probably require a fairly fundamental overhaul to fix it correctly. Note that there is a discussion about a very similar problem going on upstream. I think I have a patch that may fix this there, but it will need to be backported for RHEL5: http://article.gmane.org/gmane.linux.kernel.cifs/3402 I've posted a set of test kernels that contain patches for this issue on my people.redhat.com page: http://people.redhat.com/jlayton/ ...could you test them and let me know if they resolve the issue. The customer in case sadly can't reproduce the issue, and is unwilling to run a test kernel on their environment. That's a pity, but understandable. Based on the analysis, I think this patch will probably fix the issue. I'll plan to go ahead with it for 5.8. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available in kernel-2.6.18-283.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html |