Bug 2066199

Summary: Autofs mounts with --ghost or browse_mode=yes enabled, triggers a mount or shows error "ls: cannot access 'XXXX': No such file or directory" when ls -l is run [rhel-9.1]
Product: Red Hat Enterprise Linux 9 Reporter: Kamil Dudka <kdudka>
Component: coreutilsAssignee: Kamil Dudka <kdudka>
Status: CLOSED ERRATA QA Contact: Radka Brychtova <rskvaril>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.0CC: kdudka, rskvaril
Target Milestone: betaKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: coreutils-8.32-32.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2044981 Environment:
Last Closed: 2022-11-15 11:20:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2044981    

Description Kamil Dudka 2022-03-21 08:13:48 UTC
+++ This bug was initially created as a clone of Bug #2044981 +++

Description of problem:

Autofs mounts with --ghost or browse_mode=yes enabled, triggers a mount or shows error "ls: cannot access 'XXXX': No such file or directory" when ls -l is run

Either errors are seen for mount points which we know are inaccessible for this client or
a mount is triggered for accessible mounts.


Version-Release number of selected component (if applicable):
autofs-5.1.4-74.el8.x86_64
coreutils-8.30-12.el8.x86_64

(however, I am starting the bug with autofs as affected component as discussed with Ian)


How reproducible:

Always


Steps to Reproduce:

1. Upgrade to RHEL 8.5 (which should have autofs-5.1.4-74.el8.x86_64 and coreutils-8.30-12.el8.x86_64)
2. Create an autofs map :
~~~
[root@rsablerhel85 mnt2]# grep -i mnt /etc/auto.master
/mnt2 /etc/auto.indirect timeout=600,bg,tcp,hard,vers=3,rsize=32768,wsize=32768,timeo=600,retrans=6

[root@rsablerhel85 mnt2]# cat /etc/auto.indirect 
testshare rsable76server:/testshare               <<<<< testshare is a valid export from server
testshare2 rsable76server:/testshare2             <<<<< testshare2 is not available to this client or could be a bogus entry
~~~
3. Either use --ghost in auto.master as an option or set browse_mode=yes :
~~~
[root@rsablerhel85 mnt2]# grep -i browse /etc/autofs.conf 
# browse_mode - maps are browsable by default.
browse_mode = yes
~~~
4. Cd to /mnt2 and run ls -l / ll.

Note : this issue occurs irrespective of direct or indirect maps.


Actual results:

Mount is triggered and ll throws ENOENT for testshare2
~~~
[root@rsablerhel85 mnt2]# ll
ls: cannot access 'testshare2': No such file or directory     <<<<< Error
total 0
drwxrwxrwx. 3 1000 1000 15 Jan 17 12:08 testshare             <<<<< mount is triggerd for testshare
d?????????? ? ?    ?     ?            ? testshare2            <<<<< Path we know that is inaccessible throws an error

[root@rsablerhel85 mnt2]# mount | grep -i test
rsable76server:/testshare on /mnt2/testshare type nfs (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=192.168.122.58,mountvers=3,mountport=20048,mountproto=tcp,local_lock=none,addr=192.168.122.58)
~~~


Expected results:
Mount should not be trigger and error "ls: cannot access 'testshare2': No such file or directory"
should not be seen.


Additional info:

I think the issue is with a behavior change in coreutils-common-8.30-12.el8.
Reverting back to coreutils-common-8.30-8.el8 this issue goes away :
~~~
[root@rsablerhel85 mnt2]# ll
ls: cannot access 'testshare2': No such file or directory
total 0
drwxrwxrwx. 3 1000 1000 15 Jan 17 12:08 testshare
d?????????? ? ?    ?     ?            ? testshare2

[root@rsablerhel85 mnt2]# dnf downgrade coreutils-8.30-8.el8.x86_64
Downgraded:
  coreutils-8.30-8.el8.x86_64                                     coreutils-common-8.30-8.el8.x86_64                                    

Complete!
[root@rsablerhel85 mnt2]# ll
total 0
drwxrwxrwx. 3 1000 1000 15 Jan 17 12:08 testshare
drwxr-xr-x. 2 root root  0 Jan 21 11:47 testshare2
~~~

I can see that coreutils-common-8.30-12.el8 calls statx while coreutils-common-8.30-8.el8 calls lstat :
~~~
coreutils-8.30-12
3181  12:02:13.828462 getdents64(3, [{d_ino=27279, d_off=1, d_reclen=24, d_type=DT_DIR, d_name="."}, {d_ino=27279, d_off=2, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=27281, d_off=3, d_reclen=32, d_type=DT_DIR, d_name="testshare"}, {d_ino=27280, d_off=4, d_reclen=32, d_type=DT_DIR, d_name="testshare2"}], 32768) = 112 <0.000018>
3181  12:02:14.033318 statx(AT_FDCWD, "testshare2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_MTIME|STATX_SIZE, 0x7ffc0d6f1c60) = -1 ENOENT (No such file or directory) <0.035781>
~~~
~~~
coreutils-8.30-8
2854  12:01:11.302926 getdents64(3, [{d_ino=27279, d_off=1, d_reclen=24, d_type=DT_DIR, d_name="."}, {d_ino=27279, d_off=2, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=27281, d_off=3, d_reclen=32, d_type=DT_DIR, d_name="testshare"}, {d_ino=27280, d_off=4, d_reclen=32, d_type=DT_DIR, d_name="testshare2"}], 32768) = 112 <0.000027>
2854  12:01:11.311912 lstat("testshare2", {st_dev=makedev(0, 0x31), st_ino=27280, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=1642783648 /* 2022-01-21T11:47:28.580732805-0500 */, st_atime_nsec=580732805, st_mtime=1642783648 /* 2022-01-21T11:47:28.580732805-0500 */, st_mtime_nsec=580732805, st_ctime=1642783648 /* 2022-01-21T11:47:28.580732805-0500 */, st_ctime_nsec=580732805}) = 0 <0.000030>
~~~

It seems to me that coreutils-8.30-12 and inherently statx does not pass the flag AT_NO_AUTOMOUNT during this operation.
Checking around a few more it seems that vfs_lstat is just a wrapper to use vfs_statx internally and this explicitly sets AT_NO_AUTOMOUNT :
~~~
3193 static inline int vfs_lstat(const char __user *name, struct kstat *stat)
3194 {
3195         return vfs_statx(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT,
3196                          stat, STATX_BASIC_STATS);
3197 }
~~~
So it may be just a question of why statx syscall does not use AT_NO_AUTOMOUNT as a flag, unless I am wrong in the last few bits.


--- Additional comment from Kamil Dudka on 2022-03-21 08:43:08 CET ---

upstream commits:
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53

--- Additional comment from Kamil Dudka on 2022-03-21 09:10:16 CET ---

Fedora commits:
https://src.fedoraproject.org/rpms/coreutils/c/1f1987452485e60c346627502e25d763b4ec77f9?branch=rawhide
https://src.fedoraproject.org/rpms/coreutils/c/1f1987452485e60c346627502e25d763b4ec77f9?branch=f36
https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35
https://src.fedoraproject.org/rpms/coreutils/c/0a82158b717f3377ab68b28ebe5cd30255203c52?branch=f34

Comment 2 Kamil Dudka 2022-05-30 07:41:20 UTC
CentOS Stream merge request:
https://gitlab.com/redhat/centos-stream/rpms/coreutils/-/merge_requests/9

Comment 10 errata-xmlrpc 2022-11-15 11:20:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (coreutils bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8354