Bug 2207969
Summary: | [regression] kernel BUG at fs/attr.c:377! RIP: 0010:notify_change+0xbd8/0xd40 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Zhi Li <yieli> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
kernel sub component: | NFS | QA Contact: | Zhi Li <yieli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | bxue, chuck.lever, cmaiolin, jiyin, jlayton, nfs-team, xzhou, yoyang |
Version: | 9.3 | Keywords: | Regression, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-5.14.0-325.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-07 08:45:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Zhi Li
2023-05-17 13:15:45 UTC
This is the BUG that tripped: /* * We now pass ATTR_KILL_S*ID to the lower level setattr function so * that the function has the ability to reinterpret a mode change * that's due to these bits. This adds an implicit restriction that * no function will ever call notify_change with both ATTR_MODE and * ATTR_KILL_S*ID set. */ if ((ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID)) && (ia_valid & ATTR_MODE)) BUG(); I suspect this means that nfsd_sanitize_attrs is being called too early. I think we need to do that closer to the end, after everything else has had a change to set up the ia_valid. Why this started happening with the latest MR though, I'm not sure. I'll have to do so me before and after comparison. Stay tuned... Have you seen this happen more than once? Did you happen to collect a vmcore? I'm asking because I don't see a way that we could legitimately get into this situation given the current NFSv3 setattr code in nfsd. Also, when I look at the test log, there is a KASAN warning just after this BUG() was called. It may be fallout from the BUG() call itself, but it makes me wonder if we got into a situation where this was affected by some sort of memory corruption. Nevermind! I think I see what probably happened. It turns out that notify_change can alter the ia_valid field. Now that we can end up retrying that call due to a delegation, we need to account for that. I'll need to run this by the maintainer upstream to see what he wants to do. Stay tuned! notify_change() has modified the passed-in attributes since before the git era. That seems like a brittle API. It would be nicer if that parameter was const. Maybe we can propose that to the VFS maintainers. Barring acceptance of that idea, I guess __nfsd_setattr has to save a copy of size_attr, and restore that copy if it has to retry. Err. OK, it is not __nfsd_setattr() that would save it, it would be nfsd_setattr(). (In reply to Chuck Lever from comment #5) > Err. OK, it is not __nfsd_setattr() that would save it, it would be > nfsd_setattr(). I took a quick look at the other callers and I don't see any others that use the structure for anything after the call. We probably could make it const, but we'd likely just need to clone the iattr in notify_change itself. Several of the functions it calls rely on the alterations it makes. For now, I think we ought to just make a copy in nfsd_setattr. It's not a super high performance codepath anyway, and it's only 80 bytes on my machine. I'm testing that now and will send it out soon (assuming it works). (In reply to Jeff Layton from comment #2) > Have you seen this happen more than once? Did you happen to collect a vmcore? This issue has been triggered multiple times. Here is vmcore file. vmcore files: http://fs-qe.usersys.redhat.com/ftp/vmcore/yieli/5.14.0-313.el9.aarch64%2Bdebug/7860002/hpe-apollo-cn99xx-14-vm-27.khw4.lab.eng.bos.redhat.com/10.19.241.55-2023-05-17-20%3A28%3A07/vmcore http://fs-qe.usersys.redhat.com/ftp/vmcore/yieli/5.14.0-313.el9.aarch64%2Bdebug/7860002/hpe-apollo-cn99xx-14-vm-27.khw4.lab.eng.bos.redhat.com/10.19.241.55-2023-05-17-20%3A28%3A07/vmcore-dmesg.txt Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6583 |