Bug 1976829
Summary: | [RHEL-9.0] dd failed as "Permission denied" on nfs mount point with exports as root_squash and set Sticky bit | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Yongcheng Yang <yoyang> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
kernel sub component: | NFS | QA Contact: | Yongcheng Yang <yoyang> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | bcodding, jlayton, smayhew, steved, xzhou |
Version: | 9.0 | Keywords: | Reopened, Reproducer, Triaged |
Target Milestone: | beta | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-01 07:29:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yongcheng Yang
2021-06-28 10:37:26 UTC
For what it's worth, I tried with recent upstream (df04fbe8680b) and couldn't reproduce. (In reply to J. Bruce Fields from comment #1) > For what it's worth, I tried with recent upstream (df04fbe8680b) and > couldn't reproduce. Thanks for the information. Have also checked with (self-built) 5.14.0-0.rc1.15.bx.el9 and this problem get resolved. Just close this one for now. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. I have a new found that this related to the Sticky bit of nfs exports. As this can pass in rhel8 but get failed since rhel9 (see the attached reproducer), I'm re-opening this again to get more information/investigation. I can reproduce this. What I see is that the client is sending a GETATTR for the containing directory, but nothing else. There are no errors returned by the server in this reproducer, it all seems to be happening on the client. Here's what the tracepoints show: dd-1460 [010] ..... 293.790460: nfs_access_enter: fileid=00:32:128 fhandle=0xba5019a4 version=1781172066073452463 dd-1460 [010] ..... 293.790467: nfs_access_exit: error=-10 (CHILD) fileid=00:32:128 fhandle=0xba5019a4 type=4 (DIR) version=1781172066073452463 size=40 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET) mask=0x81 permitted=0xffffffff dd-1460 [010] ..... 293.790468: nfs_access_enter: fileid=00:32:128 fhandle=0xba5019a4 version=1781172066073452463 dd-1460 [010] ..... 293.790470: nfs_revalidate_inode_enter: fileid=00:32:128 fhandle=0xba5019a4 version=1781172066073452463 dd-1460 [010] ..... 293.790485: nfs4_setup_sequence: session=0xee461a67 slot_nr=0 seq_nr=27 highest_used_slotid=0 dd-1460 [010] ..... 293.791298: nfs4_map_name_to_uid: error=0 (OK) id=0 name=0 dd-1460 [010] ..... 293.791301: nfs4_map_group_to_gid: error=0 (OK) id=0 name=0 dd-1460 [010] ..... 293.791304: nfs4_sequence_done: error=0 (OK) session=0xee461a67 slot_nr=0 seq_nr=27 highest_slotid=29 target_highest_slotid=29 status_flags=0x0 () dd-1460 [010] ..... 293.791315: nfs4_getattr: error=0 (OK) fileid=00:32:128 fhandle=0xba5019a4 valid=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|FSID|FILEID|ATIME|MTIME|CTIME|CHANGE|0x400200 dd-1460 [010] ...1. 293.791318: nfs_refresh_inode_enter: fileid=00:32:128 fhandle=0xba5019a4 version=1781172066073452463 dd-1460 [010] ...1. 293.791321: nfs_set_cache_invalid: error=0 (OK) fileid=00:32:128 fhandle=0xba5019a4 type=4 (DIR) version=1781172066073452463 size=40 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET) dd-1460 [010] ...1. 293.791322: nfs_refresh_inode_exit: error=0 (OK) fileid=00:32:128 fhandle=0xba5019a4 type=4 (DIR) version=1781172066073452463 size=40 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET) dd-1460 [010] ..... 293.791324: nfs_revalidate_inode_exit: error=0 (OK) fileid=00:32:128 fhandle=0xba5019a4 type=4 (DIR) version=1781172066073452463 size=40 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET) dd-1460 [010] ..... 293.791326: nfs_access_exit: error=0 (OK) fileid=00:32:128 fhandle=0xba5019a4 type=4 (DIR) version=1781172066073452463 size=40 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET) mask=0x1 permitted=0x7 I'm still looking over this, but it looks like the problem is down in the client's cached open code. The problem seems to be due to may_create_in_sticky returning -EACCES during the pathwalk. Basically, we do the lookup of the parent and then issue an atomic open. That creates the file and returns its attributes. The file has an owner:group of "nobody:nobody" because of the root squashing. After that, the VFS calls do_open. That eventually calls may_create_in_sticky, which rejects it because it falls afoul of these checks: uid_eq(i_uid_into_mnt(mnt_userns, inode), dir_uid) || uid_eq(current_fsuid(), i_uid_into_mnt(mnt_userns, inode))) The uid is not equal. One is root and the other is "nobody". Worse, the file still ends up being created -- we just can't write to it. Doing this works around the problem: # echo 0 > /proc/sys/fs/protected_regular I think I'm going to need to take this upstream, as I'm not clear on what the right fix is. I proposed a patch and went through a couple of different iterations of it. The latest one is here: https://lore.kernel.org/linux-nfs/20220727140014.69091-1-jlayton@kernel.org/ At this point, I'm waiting for Al to (hopefully) take this patch in. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |