Bug 634632
Summary: | nfs4_reclaim_open_state: unhandled error -5. Zeroing state | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | PaulB <pbunyan> | ||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||
Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4.9 | CC: | bfields, jburke, jlayton, pbunyan, steved, vgoyal, yanwang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-02-16 16:04:57 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
PaulB
2010-09-16 15:16:09 UTC
From the code, trace and the log messages, it looks like the problem may be that alloc_nfs_open_context isn't returning nfs contexts that can be passed to put_nfs_open_context without oopsing. I think I was able to reproduce this with a fault injection patch that poisons nfs_open_context struct after kmalloc'ing it and then pretending that the state wasn't found on the list... ctx->list is definitely not being initialized, AFAICT (not even upstream -- yipes!) Testing a fix for that now... Created attachment 449480 [details]
proposed patch
This patch seems to fix my artificial reproducer for this. It looks like this is also an upstream bug too, but the effects may be mitigated there, as I don't see where the code there passes a newly allocated ctx to put_nfs_open_context.
Still, it's worth fixing there too so once I test this out on rawhide I'll send the patch there too.
Patch sent upstream: http://marc.info/?l=linux-nfs&m=128535568718186&w=2 ..awaiting comment, but I don't expect it to be especially controversial. I'll queue up something similar for RHEL4, and will check out RHEL5 and 6 too to make sure they're not vulnerable to this issue. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 89.42.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ All, Retesting /kernel/networking/ndnc and /kernel/filesystems/nfs/connectathon on hp-z400-02.lab.bos.redhat.com with kernel 2.6.9-90.EL: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=177612 Kernel 2.6.9-90.EL was installed, /kernel/networking/ndnc, and /kernel/filesystems/nfs/connectathon was run on hp-z400-02.lab.bos.redhat.com without issue 5x. The results look good. Best, -pbunyan verified on kernel kernel-2.6.9-96.EL: https://beaker.engineering.redhat.com/jobs/45983 https://beaker.engineering.redhat.com/jobs/45985 https://beaker.engineering.redhat.com/jobs/45986 https://beaker.engineering.redhat.com/jobs/45988 https://beaker.engineering.redhat.com/jobs/45989 https://beaker.engineering.redhat.com/jobs/45990 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html |