Description of problem: % ls nftw_dir_tree.c nftw_test.sh % cat nftw_test.sh #!/bin/sh mkdir dir touch dir/a dir/b ln -s a dir/sl ln -s x dir/dsl mkdir dir/sub touch dir/sub/x mkdir dir/sub2 chmod 0 dir/sub2 gcc -o nftw_test nftw_dir_tree.c ./nftw_test ./dir % sh ./nftw_test.sh d D 6425638 dir d D 6556330 sub - F 6556331 x - F 6425639 sl - F 6425659 b - SLN 6425659 dsl d DNR 6556332 sub2 - F 6425639 a % The dsl is a dangling symlink here, but nftw() return a regular file's inode, 6425659 of b, for it. Please see nftw_dir_tree.c for details Version-Release number of selected component (if applicable): glibc-2.23.1-11.fc24.x86_64 How reproducible: always Steps to Reproduce: 1. run the reproducer on fedora 24 2. 3. Actual results: return another file's inode for dangling symlink. Expected results: return the inode of dangling symlink itself. Additional info:
Created attachment 1250772 [details] the reproducer
I've reproduced what you see. While it's not explicit in the nftw() man page, if you don't call nftw with FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. It returns FTW_SLN for those cases, and the stat data passed is not initialized. There are a couple of ways we could make this more clear and/or useful, but for compatibility and correctness you must not read the stat data if the type is FTW_SLN. Had you passed FTW_PHYS, nftw() would have used lstat() instead, which returns the stat data for the symbolic link itself (er, even for symlinks pointing to valid existing files).
(In reply to DJ Delorie from comment #4) > I've reproduced what you see. > > While it's not explicit in the nftw() man page, if you don't call nftw with > FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. > It returns FTW_SLN for those cases, and the stat data passed is not > initialized. > If it knew it's an dangling symlink, why not returns the stat data of the symblink itself. After all, it has known it's a dangling symblink. > There are a couple of ways we could make this more clear and/or useful, but > for compatibility and correctness you must not read the stat data if the > type is FTW_SLN. > > Had you passed FTW_PHYS, nftw() would have used lstat() instead, which > returns the stat data for the symbolic link itself (er, even for symlinks > pointing to valid existing files).
(In reply to DJ Delorie from comment #4) > I've reproduced what you see. > > While it's not explicit in the nftw() man page, if you don't call nftw with > FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. > It returns FTW_SLN for those cases, and the stat data passed is not > initialized. > > There are a couple of ways we could make this more clear and/or useful, but > for compatibility and correctness you must not read the stat data if the > type is FTW_SLN. Agreed, and the example code in the man page is wrong. static int display_info(const char *fpath, const struct stat *sb, int tflag, struct FTW *ftwbuf) { printf("%-3s %2d %7jd %-40s %d %s\n", (tflag == FTW_D) ? "d" : (tflag == FTW_DNR) ? "dnr" : (tflag == FTW_DP) ? "dp" : (tflag == FTW_F) ? "f" : (tflag == FTW_NS) ? "ns" : (tflag == FTW_SL) ? "sl" : (tflag == FTW_SLN) ? "sln" : "???", ftwbuf->level, (intmax_t) sb->st_size, fpath, ftwbuf->base, fpath + ftwbuf->base); return 0; /* To tell nftw() to continue */ } It should: if (tflag == FTW_SLN) { /* Print a minimal amount of data. */ } else { /* Print data including sb info. */ } > Had you passed FTW_PHYS, nftw() would have used lstat() instead, which > returns the stat data for the symbolic link itself (er, even for symlinks > pointing to valid existing files). Could you please submit a patch to the upstream linux man pages project to get this fixed? The glibc team routine interacts with Michael Kerrisk to fix these kinds of issue. https://www.kernel.org/doc/man-pages/contributing.html Could you also please submit a patch to the upstream glibc manual (manual/filesys.texi) to adjust the wording of FTW_SLN to indicate that the values in struct stat are undefined in this case and should not be read. TO: me on both please and I'll review upstream.
(In reply to han pingtian from comment #5) > (In reply to DJ Delorie from comment #4) > > I've reproduced what you see. > > > > While it's not explicit in the nftw() man page, if you don't call nftw with > > FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. > > It returns FTW_SLN for those cases, and the stat data passed is not > > initialized. > > > If it knew it's an dangling symlink, why not returns the stat data of the > symblink itself. After all, it has known it's a dangling symblink. Without FTW_PHYS the symlink _must_ be followed, and if the symlink target doesn't exist then there is no stat-related data to be returned. If you want a mixed hybrid API somewhere between FTW_PHYS and not-FTW_PHYS then this needs to be discussed usptream. I think it's easy enough to just lstat() yourself on the broken symlink if that's what you need. Requiring the implementation to do the extra stat() will slow down performance for everyone that doesn't need and didn't ask for that information. In summary, having DJ adjust the documentation is the best way forward.
(In reply to Carlos O'Donell from comment #7) > (In reply to han pingtian from comment #5) > > (In reply to DJ Delorie from comment #4) > > > I've reproduced what you see. > > > > > > While it's not explicit in the nftw() man page, if you don't call nftw with > > > FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. > > > It returns FTW_SLN for those cases, and the stat data passed is not > > > initialized. > > > > > If it knew it's an dangling symlink, why not returns the stat data of the > > symblink itself. After all, it has known it's a dangling symblink. ((Yes.) > Without FTW_PHYS the symlink _must_ be followed, Why? (See below.) > and if the symlink target > doesn't exist then there is no stat-related data to be returned. If you want > a mixed hybrid API somewhere between FTW_PHYS and not-FTW_PHYS then this > needs to be discussed usptream. I think it's easy enough to just lstat() > yourself on the broken symlink if that's what you need. Requiring the > implementation to do the extra stat() will slow down performance for > everyone that doesn't need and didn't ask for that information. > > In summary, having DJ adjust the documentation is the best way forward. When I received the man-pages patch, I very nearly applied it, but something niggled... I believe this is actually a (very longstanding) glibc bug. Here is what POSIX says for nftw(): FTW_NS The stat() function failed on the object because of lack of appropriate permission. The stat buffer passed to fn is undefined. Failure of stat() for any other reason is considered an error and nftw() shall return −1. .... FTW_SLN The object is a symbolic link that does not name an existing file. (This condition shall only occur if the FTW_PHYS flag is not included in flags.) Note that POSIX explicitly says that the stat buffer is undefined for FTW_NS, but makes no such statement for FTW_SLN, with the implication that the stat buffer is valid in this case. This implies that FTW_SLN should work as Han Pingtian suggested: for a dangling symlink, the lstat() information on the link should be returned. This is certainly how I always understood things should work. (But, obviously, I never tested this on glibc.) So, what do other implementations do? Every other implementation that I looked at, does return the lstat() information for the dangling symlink. I looked at Solaris, OpenBSD, FreeBSD, and musl. All of this strongly suggests that glibc got it wrong. For the points below, I used the following test program (and yes, I realize by now that the FTW_NS treatment in this code is not correct; I've fixed the man page already). 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- /*#* t_nftw.c Copyright Michael Kerrisk 2000 Demonstrate the use of the nftw(3) function. */ #define _GNU_SOURCE #define _XOPEN_SOURCE 500 #include <ftw.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static int displayFileInfo(const char *fpath, const struct stat *sb, int tflag, struct FTW *ftwbuf) { printf("%-3s %2d %7lld %-30s %d %s (st_ino: %ld)\n", (tflag == FTW_D) ? "d" : (tflag == FTW_DNR) ? "dnr" : (tflag == FTW_DP) ? "dp" : (tflag == FTW_F) ? "f" : (tflag == FTW_NS) ? "ns" : (tflag == FTW_SL) ? "sl" : (tflag == FTW_SLN) ? "sln" : "???", ftwbuf->level, (long long) sb->st_size, fpath, ftwbuf->base, fpath + ftwbuf->base, (long) sb->st_ino); memset((void *) sb, 0, sizeof(struct stat)); return 0; /* To tell nftw() to continue */ } int main(int argc, char *argv[]) { int flags = 0; if (argc > 2 && strchr(argv[2], 'd') != NULL) flags |= FTW_DEPTH; if (argc > 2 && strchr(argv[2], 'p') != NULL) flags |= FTW_PHYS; if (nftw((argc < 2) ? "." : argv[1], displayFileInfo, 20, flags) == -1) { perror("nftw"); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); } 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- Solaris (Illumos source code) usr/src/lib/libc/port/gen/nftw.c: The following code causes the stat buffer to be populated with lstat() infor in the FTW_SLN case: } else { /* * Statf has failed. If stat was used instead of lstat, * try using lstat. If lstat doesn't fail, "comp" * must be a symbolic link pointing to a non-existent * file. Such a symbolic link should be ignored. * Also check the file type, if possible, for symbolic * link. */ if ((vp->statf == stat) && (lstat(comp, &statb) >= 0) && ((statb.st_mode & S_IFMT) == S_IFLNK)) { /* * Ignore bad symbolic link, let "fn" * report it. */ errno = ENOENT; type = FTW_SLN; } else { type = FTW_NS; fail: Testing shows that the link info *is* returned in the stat structure: $ ls -li t total 4 45068 -rw-r--r-- 1 mtk csw 29 Feb 24 04:28 f 45067 lrwxrwxrwx 1 mtk csw 6 Feb 24 04:28 my_sln -> ssssss 45069 lrwxrwxrwx 1 mtk csw 1 Feb 24 04:28 sl_f -> f $ ./a.out t d 0 5 t 0 t (st_ino: 45066) sln 1 6 t/my_sln 2 my_sln (st_ino: 45067) f 1 29 t/f 2 f (st_ino: 45068) f 1 29 t/sl_f 2 sl_f (st_ino: 45068) ====== OpenBSD I didn't look at the source code, but the test gives the same results as Solaris: -bash-4.3$ ls -li total 4 4693795 -rw-r--r-- 1 mtk mtk 29 Feb 24 04:37 f 4693796 lrwxr-xr-x 1 mtk mtk 1 Feb 24 04:37 sl_f -> f 4693797 lrwxr-xr-x 1 mtk mtk 11 Feb 24 04:37 sln -> jajdhfdskjh -bash-4.3$ ./a.out t d 0 512 t 0 t (st_ino: 4693794) f 1 29 t/f 2 f (st_ino: 4693795) f 1 29 t/sl_f 2 sl_f (st_ino: 4693795) sln 1 11 t/sln 2 sln (st_ino: 4693797) ===== FreeBSD I don't have access to a FreeBSD test system at the moment, nut my reading of the source code is that id delivers the same results as Solaris and OpenBSD See lib/libc/gen/nftw.c, where FTS_SLN is implemented using the FTS_SLNONE option, and fts(3) on that system says: FTS_SLNONE A symbolic link with a nonexistent target. The contents of the fts_statp field reference the file characteristic information for the sym‐ bolic link itself. ===== musl libc src/misc/nftw.c if ((flags & FTW_PHYS) ? lstat(path, &st) : stat(path, &st) < 0) { if (!(flags & FTW_PHYS) && errno==ENOENT && !lstat(path, &st)) type = FTW_SLN; else if (errno != EACCES) return -1; else type = FTW_NS; } else if (S_ISDIR(st.st_mode)) { if (access(path, R_OK) < 0) type = FTW_DNR; else if (flags & FTW_DEPTH) type = FTW_DP; else type = FTW_D; } else if (S_ISLNK(st.st_mode)) { if (flags & FTW_PHYS) type = FTW_SL; else type = FTW_SLN; } else { type = FTW_F; }
(In reply to Michael Kerrisk from comment #8) > Note that POSIX explicitly says that the stat buffer is undefined for > FTW_NS, but makes no such statement for FTW_SLN, with the implication that > the stat buffer is valid in this case. > > This implies that FTW_SLN should work as Han Pingtian suggested: for a > dangling symlink, the lstat() information on the link should be returned. > This is certainly how I always understood things should work. (But, > obviously, I never tested this on glibc.) The fact that you had to say "implies that" and that at least two different implementations exist and two distinct understandings means that we need to clarify this at the POSIX level. Thanks to DJ for filling the issue upstream with the Austin Group: http://austingroupbugs.net/view.php?id=1121
(In reply to Carlos O'Donell from comment #9) > The fact that you had to say "implies that" and that at least two different > implementations exist and two distinct understandings means that we need to > clarify this at the POSIX level. Actually, I think there are 3 distinct understandings: 1. FTW_PHYS=0 prohibits lstat(). When FTW_PHYS is clear, nftw() must not call lstat(), passing FTW_SLN and an undefined stat buffer. This is the current glibc implementation. 2. FTW_SLN must be returned for all dangling symlinks. When FTW_PHYS is clear, nftw() calls lstat() on dangling symlinks, passing FTW_SLN regardless of its return code. The stat buffer will be defined if lstat() succeeded or undefined if it failed. 3. Undefined stat buffers happen only on FTW_NS. When FTW_PHYS is clear, nftw() calls lstat() on dangling symlinks. If the call succeeds, it passes FTW_SLN and a defined stat buffer. If not, it passes FTW_NS and an undefined stat buffer. According to comment #8, this has been adopted by musl and Solaris.
(In reply to Carlos O'Donell from comment #9) > (In reply to Michael Kerrisk from comment #8) > > Note that POSIX explicitly says that the stat buffer is undefined for > > FTW_NS, but makes no such statement for FTW_SLN, with the implication that > > the stat buffer is valid in this case. > > > > This implies that FTW_SLN should work as Han Pingtian suggested: for a > > dangling symlink, the lstat() information on the link should be returned. > > This is certainly how I always understood things should work. (But, > > obviously, I never tested this on glibc.) On that last, parenthesized sentence, I spoke too soon. > The fact that you had to say "implies that" Carlos, I did not say at the time, but coming back to this again, I'm reminded: this sentence felt a bit like you were playing language games with me (my apologies if I've misinterpreted), and I didn't quite like it. I didn't "have to" say anything. I try (but sometimes fail) to be cautious about making absolute pronouncements about interpretations and that's what I was doing here. > and that at least two different > implementations exist Well, that much is clear :-) > and two distinct understandings means that we need to > clarify this at the POSIX level. (Well, I would rather characterize it as one understanding and an accident; see below.) > Thanks to DJ for filling the issue upstream with the Austin Group: > http://austingroupbugs.net/view.php?id=1121 So, there was one more thing that still niggled me in this. I also describe nftw() in my book, where I documented FTW_NS and FTW_SLN. In the former case, I noted that the stat buffer argument of the callback function was undefined, but did not have such a statement for FTW_SLN. And (notwithstanding my earlier statement) it feels like this is something I would have tested. An error report against my book that I received a few days ago shows me that I *did* test it. I actually have a page in my book showing the test output, and it's evident from that output that in the FTW_SLN case, that stat buffer for FTW_SLN was defined to include information about the symlink. (The reader error report was to point out that the output they obtained was not consistent with the output that appeared in my book (on page 362 if you happen to have a copy handy).) In the original implementation, presumably written by Roland, glibc's nftw() did behave like everyone else's implementation. In the glibc 2.3.6 io/ftw.c process_entry() code, we find: if (((data->flags & FTW_PHYS) ? LXSTAT (_STAT_VER, name, &st) : XSTAT (_STAT_VER, name, &st)) < 0) { if (errno != EACCES && errno != ENOENT) result = -1; else if (!(data->flags & FTW_PHYS) && LXSTAT (_STAT_VER, name, &st) == 0 && S_ISLNK (st.st_mode)) flag = FTW_SLN; else flag = FTW_NS; } So, if FTW_PHYS was not set, use stat() on the path. If that fails (because of a dangling symlink, for example), then try lstat() on the path and check if the result is symlink; if so, emit FTW_SLN. In glibc 2.4 things changed to the situation we currently have. It's not quite clear to me where the change occurred. Probably commit bb54908866986dc1578e3077b323bfc1cc5aecba. It *appears* to be an unintended regression, since the changelog message make no mention of modifying the behavior of FTW_SLN. So, I refine my earlier statement :-). Glibc used to do the right thing (i.e., what everyone else used to do, and also how I would interpret the standard), but there has been a longstanding regression. I do think the implementation should be fixed.
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'.
The Austin Group has reviewed the issue, and is proposing this interpretation: The standard clearly states that the second argument to fn() contains "information on the object", and conforming implementations must conform to this. Rationale: ------------- The second bullet item says "The second argument is a pointer to the stat buffer containing information on the object". The description of FTW_SLN says "The object is a symbolic link that does not name an existing file". These two things together require that for FTW_SLN, the stat buffer contains information about the symbolic link (which is "the object"). Where the standard states that when FTW_PHYS is clear, symbolic links are followed instead of being reported, naturally they are not followed if the target does not exist and the FTW_SNL constant alone constitutes reporting the symbolic link. ==== The interpretation is subject to further review, and comments can be filed against this at http://austingroupbugs.net/view.php?id=1121
This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This bug appears to have been reported against 'rawhide' during the Fedora 30 development cycle. Changing version to '30.