Bug 1422736 - nftw() doesn't return dangling symlink's inode
Summary: nftw() doesn't return dangling symlink's inode
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 30
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: DJ Delorie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-16 03:32 UTC by han pingtian
Modified: 2019-02-19 17:11 UTC (History)
11 users (show)

(edit)
Clone Of:
(edit)
Last Closed:


Attachments (Terms of Use)
the reproducer (1.19 KB, application/x-gzip)
2017-02-16 03:36 UTC, han pingtian
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Sourceware 23501 None None None 2019-02-27 20:29 UTC

Description han pingtian 2017-02-16 03:32:51 UTC
Description of problem:

% ls
nftw_dir_tree.c  nftw_test.sh
% cat nftw_test.sh 
#!/bin/sh
mkdir dir
touch dir/a dir/b
ln -s a dir/sl
ln -s x dir/dsl
mkdir dir/sub
touch dir/sub/x
mkdir dir/sub2
chmod 0 dir/sub2

gcc -o nftw_test nftw_dir_tree.c
./nftw_test ./dir

% sh ./nftw_test.sh
d D    6425638  dir
d D    6556330      sub
- F    6556331          x
- F    6425639      sl
- F    6425659      b
- SLN  6425659      dsl
d DNR  6556332      sub2
- F    6425639      a
% 

The dsl is a dangling symlink here, but nftw() return a regular file's inode, 6425659 of b, for it. 
Please see nftw_dir_tree.c for details


Version-Release number of selected component (if applicable):
glibc-2.23.1-11.fc24.x86_64


How reproducible:
always


Steps to Reproduce:
1. run the reproducer on fedora 24
2.
3.

Actual results:

return another file's inode for dangling symlink.

Expected results:

return the inode of dangling symlink itself.

Additional info:

Comment 1 han pingtian 2017-02-16 03:36 UTC
Created attachment 1250772 [details]
the reproducer

Comment 4 DJ Delorie 2017-02-16 22:28:40 UTC
I've reproduced what you see.

While it's not explicit in the nftw() man page, if you don't call nftw with FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails.  It returns FTW_SLN for those cases, and the stat data passed is not initialized.

There are a couple of ways we could make this more clear and/or useful, but for compatibility and correctness you must not read the stat data if the type is FTW_SLN.

Had you passed FTW_PHYS, nftw() would have used lstat() instead, which returns the stat data for the symbolic link itself (er, even for symlinks pointing to valid existing files).

Comment 5 han pingtian 2017-02-17 02:29:51 UTC
(In reply to DJ Delorie from comment #4)
> I've reproduced what you see.
> 
> While it's not explicit in the nftw() man page, if you don't call nftw with
> FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. 
> It returns FTW_SLN for those cases, and the stat data passed is not
> initialized.
> 
If it knew it's an dangling symlink, why not returns the stat data of the symblink itself. After all, it has known it's a dangling symblink.

> There are a couple of ways we could make this more clear and/or useful, but
> for compatibility and correctness you must not read the stat data if the
> type is FTW_SLN.
> 
> Had you passed FTW_PHYS, nftw() would have used lstat() instead, which
> returns the stat data for the symbolic link itself (er, even for symlinks
> pointing to valid existing files).

Comment 6 Carlos O'Donell 2017-02-17 19:04:25 UTC
(In reply to DJ Delorie from comment #4)
> I've reproduced what you see.
> 
> While it's not explicit in the nftw() man page, if you don't call nftw with
> FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. 
> It returns FTW_SLN for those cases, and the stat data passed is not
> initialized.
> 
> There are a couple of ways we could make this more clear and/or useful, but
> for compatibility and correctness you must not read the stat data if the
> type is FTW_SLN.

Agreed, and the example code in the man page is wrong.
 
       static int
       display_info(const char *fpath, const struct stat *sb,
                    int tflag, struct FTW *ftwbuf)
       {
           printf("%-3s %2d %7jd   %-40s %d %s\n",
               (tflag == FTW_D) ?   "d"   : (tflag == FTW_DNR) ? "dnr" :
               (tflag == FTW_DP) ?  "dp"  : (tflag == FTW_F) ?   "f" :
               (tflag == FTW_NS) ?  "ns"  : (tflag == FTW_SL) ?  "sl" :
               (tflag == FTW_SLN) ? "sln" : "???",
               ftwbuf->level, (intmax_t) sb->st_size,
               fpath, ftwbuf->base, fpath + ftwbuf->base);
           return 0;           /* To tell nftw() to continue */
       }

It should:

if (tflag == FTW_SLN)
  {
     /* Print a minimal amount of data.  */
  }
else
  {
     /* Print data including sb info.  */
  }

> Had you passed FTW_PHYS, nftw() would have used lstat() instead, which
> returns the stat data for the symbolic link itself (er, even for symlinks
> pointing to valid existing files).

Could you please submit a patch to the upstream linux man pages project to get this fixed? The glibc team routine interacts with Michael Kerrisk to fix these kinds of issue.

https://www.kernel.org/doc/man-pages/contributing.html

Could you also please submit a patch to the upstream glibc manual (manual/filesys.texi) to adjust the wording of FTW_SLN to indicate that the values in struct stat are undefined in this case and should not be read.

TO: me on both please and I'll review upstream.

Comment 7 Carlos O'Donell 2017-02-17 19:08:44 UTC
(In reply to han pingtian from comment #5)
> (In reply to DJ Delorie from comment #4)
> > I've reproduced what you see.
> > 
> > While it's not explicit in the nftw() man page, if you don't call nftw with
> > FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. 
> > It returns FTW_SLN for those cases, and the stat data passed is not
> > initialized.
> > 
> If it knew it's an dangling symlink, why not returns the stat data of the
> symblink itself. After all, it has known it's a dangling symblink.

Without FTW_PHYS the symlink _must_ be followed, and if the symlink target doesn't exist then there is no stat-related data to be returned. If you want a mixed hybrid API somewhere between FTW_PHYS and not-FTW_PHYS then this needs to be discussed usptream. I think it's easy enough to just lstat() yourself on the broken symlink if that's what you need. Requiring the implementation to do the extra stat() will slow down performance for everyone that doesn't need and didn't ask for that information.

In summary, having DJ adjust the documentation is the best way forward.

Comment 8 Michael Kerrisk 2017-02-24 08:55:14 UTC
(In reply to Carlos O'Donell from comment #7)
> (In reply to han pingtian from comment #5)
> > (In reply to DJ Delorie from comment #4)
> > > I've reproduced what you see.
> > > 
> > > While it's not explicit in the nftw() man page, if you don't call nftw with
> > > FTW_PHYS, it only calls stat() on dangling symlinks, which of course fails. 
> > > It returns FTW_SLN for those cases, and the stat data passed is not
> > > initialized.
> > > 
> > If it knew it's an dangling symlink, why not returns the stat data of the
> > symblink itself. After all, it has known it's a dangling symblink.

((Yes.)

> Without FTW_PHYS the symlink _must_ be followed, 

Why? (See below.)

> and if the symlink target
> doesn't exist then there is no stat-related data to be returned. If you want
> a mixed hybrid API somewhere between FTW_PHYS and not-FTW_PHYS then this
> needs to be discussed usptream. I think it's easy enough to just lstat()
> yourself on the broken symlink if that's what you need. Requiring the
> implementation to do the extra stat() will slow down performance for
> everyone that doesn't need and didn't ask for that information.
> 
> In summary, having DJ adjust the documentation is the best way forward.

When I received the man-pages patch, I very nearly applied it, but something niggled...

I believe this is actually a (very longstanding) glibc bug. Here is what POSIX says for nftw():

           FTW_NS    The  stat() function failed on the object because of
                     lack of  appropriate  permission.  The  stat  buffer
                     passed to fn is undefined. Failure of stat() for any
                     other reason is considered an error and nftw() shall
                     return −1.

           ....

           FTW_SLN   The  object is a symbolic link that does not name an
                     existing file.  (This condition shall only occur  if
                     the FTW_PHYS flag is not included in flags.)

Note that POSIX explicitly says that the stat buffer is undefined for FTW_NS, but makes no such statement for FTW_SLN, with the implication that the stat buffer is valid in this case.

This implies that FTW_SLN should work as Han Pingtian suggested: for a dangling symlink, the lstat() information on the link should be returned. This is certainly how I always understood things should work. (But, obviously, I never tested this on glibc.)

So, what do other implementations do? Every other implementation that I looked at, does return the lstat() information for the dangling symlink. I looked at Solaris, OpenBSD, FreeBSD, and musl.  All of this strongly suggests that glibc got it wrong.

For the points below, I used the following test program (and yes, I realize by now that the FTW_NS treatment in this code is not correct; I've fixed the man page already).

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
/*#* t_nftw.c

   Copyright Michael Kerrisk 2000

   Demonstrate the use of the nftw(3) function.
*/
#define _GNU_SOURCE
#define _XOPEN_SOURCE 500
#include <ftw.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


static int
displayFileInfo(const char *fpath, const struct stat *sb,
	        int tflag, struct FTW *ftwbuf)
{
    printf("%-3s %2d %7lld   %-30s %d %s (st_ino: %ld)\n",
    	(tflag == FTW_D) ?   "d"   : (tflag == FTW_DNR) ? "dnr" :
	(tflag == FTW_DP) ?  "dp"  : (tflag == FTW_F) ?   "f" :
	(tflag == FTW_NS) ?  "ns"  : (tflag == FTW_SL) ?  "sl" :
	(tflag == FTW_SLN) ? "sln" : "???",
	ftwbuf->level, (long long) sb->st_size,
	fpath, ftwbuf->base, fpath + ftwbuf->base, (long) sb->st_ino);
    memset((void *) sb, 0, sizeof(struct stat));
    return 0;  		/* To tell nftw() to continue */
}


int
main(int argc, char *argv[])
{
    int flags = 0;

    if (argc > 2 && strchr(argv[2], 'd') != NULL)
	flags |= FTW_DEPTH;
    if (argc > 2 && strchr(argv[2], 'p') != NULL)
	flags |= FTW_PHYS;

    if (nftw((argc < 2) ? "." : argv[1], displayFileInfo, 
		20, flags) == -1) {
	perror("nftw");
	exit(EXIT_FAILURE);
    }

    exit(EXIT_SUCCESS);
}
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---


Solaris (Illumos source code)
usr/src/lib/libc/port/gen/nftw.c:

The following code causes the stat buffer to be populated with lstat() infor in the FTW_SLN case:

        } else {
                /*
                 * Statf has failed. If stat was used instead of lstat,
                 * try using lstat. If lstat doesn't fail, "comp"
                 * must be a symbolic link pointing to a non-existent
                 * file. Such a symbolic link should be ignored.
                 * Also check the file type, if possible, for symbolic
                 * link.
                 */
                if ((vp->statf == stat) && (lstat(comp, &statb) >= 0) &&
                    ((statb.st_mode & S_IFMT) == S_IFLNK)) {

                        /*
                         * Ignore bad symbolic link, let "fn"
                         * report it.
                         */

                        errno = ENOENT;
                        type = FTW_SLN;
                } else {
                        type = FTW_NS;
        fail:


Testing shows that the link info *is* returned in the stat structure:

$ ls -li t
total 4
     45068 -rw-r--r--   1 mtk     csw      29 Feb 24 04:28 f
     45067 lrwxrwxrwx   1 mtk     csw       6 Feb 24 04:28 my_sln -> ssssss
     45069 lrwxrwxrwx   1 mtk     csw       1 Feb 24 04:28 sl_f -> f

$ ./a.out t
d    0       5   t                              0 t (st_ino: 45066)
sln  1       6   t/my_sln                       2 my_sln (st_ino: 45067)
f    1      29   t/f                            2 f (st_ino: 45068)
f    1      29   t/sl_f                         2 sl_f (st_ino: 45068)

======

OpenBSD

I didn't look at the source code, but the test gives the same results as Solaris:

-bash-4.3$ ls -li
total 4
4693795 -rw-r--r--  1 mtk  mtk  29 Feb 24 04:37 f
4693796 lrwxr-xr-x  1 mtk  mtk   1 Feb 24 04:37 sl_f -> f
4693797 lrwxr-xr-x  1 mtk  mtk  11 Feb 24 04:37 sln -> jajdhfdskjh
-bash-4.3$ ./a.out t
d    0     512   t                              0 t (st_ino: 4693794)
f    1      29   t/f                            2 f (st_ino: 4693795)
f    1      29   t/sl_f                         2 sl_f (st_ino: 4693795)
sln  1      11   t/sln                          2 sln (st_ino: 4693797)

=====

FreeBSD

I don't have access to a FreeBSD test system at the moment, nut my reading of the source code is that id delivers the same results as Solaris and OpenBSD

See lib/libc/gen/nftw.c, where FTS_SLN is implemented using the FTS_SLNONE option, and fts(3) on that system says:

              FTS_SLNONE  A symbolic link with a nonexistent target.  The
                          contents  of  the fts_statp field reference the
                          file characteristic information  for  the  sym‐
                          bolic link itself.

=====
musl libc

src/misc/nftw.c

        if ((flags & FTW_PHYS) ? lstat(path, &st) : stat(path, &st) < 0) {
                if (!(flags & FTW_PHYS) && errno==ENOENT && !lstat(path, &st))
                        type = FTW_SLN;
                else if (errno != EACCES) return -1;
                else type = FTW_NS;
        } else if (S_ISDIR(st.st_mode)) {
                if (access(path, R_OK) < 0) type = FTW_DNR;
                else if (flags & FTW_DEPTH) type = FTW_DP;
                else type = FTW_D;
        } else if (S_ISLNK(st.st_mode)) {
                if (flags & FTW_PHYS) type = FTW_SL;
                else type = FTW_SLN;
        } else {
                type = FTW_F;
        }

Comment 9 Carlos O'Donell 2017-02-25 03:06:15 UTC
(In reply to Michael Kerrisk from comment #8)
> Note that POSIX explicitly says that the stat buffer is undefined for
> FTW_NS, but makes no such statement for FTW_SLN, with the implication that
> the stat buffer is valid in this case.
> 
> This implies that FTW_SLN should work as Han Pingtian suggested: for a
> dangling symlink, the lstat() information on the link should be returned.
> This is certainly how I always understood things should work. (But,
> obviously, I never tested this on glibc.)

The fact that you had to say "implies that" and that at least two different implementations exist and two distinct understandings means that we need to clarify this at the POSIX level.

Thanks to DJ for filling the issue upstream with the Austin Group:
http://austingroupbugs.net/view.php?id=1121

Comment 10 Tulio Magno Quites Machado Filho 2017-02-27 14:36:49 UTC
(In reply to Carlos O'Donell from comment #9)
> The fact that you had to say "implies that" and that at least two different
> implementations exist and two distinct understandings means that we need to
> clarify this at the POSIX level.

Actually, I think there are 3 distinct understandings:

1. FTW_PHYS=0 prohibits lstat().
When FTW_PHYS is clear, nftw() must not call lstat(), passing FTW_SLN and an undefined stat buffer.
This is the current glibc implementation.

2. FTW_SLN must be returned for all dangling symlinks.
When FTW_PHYS is clear, nftw() calls lstat() on dangling symlinks, passing FTW_SLN regardless of its return code. The stat buffer will be defined if lstat() succeeded or undefined if it failed.

3. Undefined stat buffers happen only on FTW_NS.
When FTW_PHYS is clear, nftw() calls lstat() on dangling symlinks. If the call succeeds, it passes FTW_SLN and a defined stat buffer. If not, it passes FTW_NS and an undefined stat buffer.
According to comment #8, this has been adopted by musl and Solaris.

Comment 11 Michael Kerrisk 2017-06-10 14:02:54 UTC
(In reply to Carlos O'Donell from comment #9)
> (In reply to Michael Kerrisk from comment #8)
> > Note that POSIX explicitly says that the stat buffer is undefined for
> > FTW_NS, but makes no such statement for FTW_SLN, with the implication that
> > the stat buffer is valid in this case.
> > 
> > This implies that FTW_SLN should work as Han Pingtian suggested: for a
> > dangling symlink, the lstat() information on the link should be returned.
> > This is certainly how I always understood things should work. (But,
> > obviously, I never tested this on glibc.)

On that last, parenthesized sentence, I spoke too soon.

> The fact that you had to say "implies that" 

Carlos, I did not say at the time, but coming back to this again, I'm reminded: this sentence felt a bit like you were playing language games with me (my apologies if I've misinterpreted), and I didn't quite like it. I didn't "have to" say anything. I try (but sometimes fail) to be cautious about making absolute pronouncements about interpretations and that's what I was doing here.

> and that at least two different
> implementations exist

Well, that much is clear :-)

> and two distinct understandings means that we need to
> clarify this at the POSIX level.

(Well, I would rather characterize it as one understanding and an accident; see below.)

> Thanks to DJ for filling the issue upstream with the Austin Group:
> http://austingroupbugs.net/view.php?id=1121

So, there was one more thing that still niggled me in this. I also describe nftw() in my book, where I documented FTW_NS and FTW_SLN. In the former case, I noted that the stat buffer argument of the callback function was undefined, but did not have such a statement for FTW_SLN. And (notwithstanding my earlier statement) it feels like this is something I would have tested.

An error report against my book that I received a few days ago shows me that I *did* test it. I actually have a page in my book showing the test output, and it's evident from that output that in the FTW_SLN case, that stat buffer for FTW_SLN was defined to include information about the symlink. (The reader error report was to point out that the output they obtained was not consistent with the output that appeared in my book (on page 362 if you happen to have a copy handy).)

In the original implementation, presumably written by Roland, glibc's nftw() did behave like everyone else's implementation. In the glibc 2.3.6 io/ftw.c process_entry() code, we find:

  if (((data->flags & FTW_PHYS)
       ? LXSTAT (_STAT_VER, name, &st)
       : XSTAT (_STAT_VER, name, &st)) < 0)
    {
      if (errno != EACCES && errno != ENOENT)
        result = -1;
      else if (!(data->flags & FTW_PHYS)
               && LXSTAT (_STAT_VER, name, &st) == 0
               && S_ISLNK (st.st_mode))
        flag = FTW_SLN;
      else
        flag = FTW_NS;
    }

So, if FTW_PHYS was not set, use stat() on the path. If that fails (because of a dangling symlink, for example), then try lstat() on the path and check if the result is symlink; if so, emit FTW_SLN.

In glibc 2.4 things changed to the situation we currently have.

It's not quite clear to me where the change occurred. Probably
commit bb54908866986dc1578e3077b323bfc1cc5aecba. It *appears* to be an unintended regression, since the changelog message make no mention of modifying the behavior of FTW_SLN.

So, I refine my earlier statement :-). Glibc used to do the right thing (i.e., what everyone else used to do, and also how I would interpret the standard), but there has been a longstanding regression. I do think the implementation should be fixed.

Comment 12 Fedora End Of Life 2017-07-26 00:16:00 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 13 Jan Kurik 2017-08-15 08:08:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 14 Nick Stoughton 2018-08-09 16:10:49 UTC
The Austin Group has reviewed the issue, and is proposing this interpretation:

The standard clearly states that the second argument to fn() contains "information on the object", and conforming implementations must conform to this.

Rationale:
-------------
The second bullet item says "The second argument is a pointer to the stat buffer containing information on the object".

The description of FTW_SLN says "The object is a symbolic link that does not name an existing file".

These two things together require that for FTW_SLN, the stat buffer contains information about the symbolic link (which is "the object").

Where the standard states that when FTW_PHYS is clear, symbolic links are followed instead of being reported, naturally they are not followed if the target does not exist and the FTW_SNL constant alone constitutes reporting the symbolic link.

====
The interpretation is subject to further review, and comments can be filed against this at http://austingroupbugs.net/view.php?id=1121

Comment 15 Ben Cotton 2018-11-27 14:49:35 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Ben Cotton 2019-02-19 17:11:31 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 30 development cycle.
Changing version to '30.


Note You need to log in before you can comment on or make changes to this bug.