Bug 1471967 - st_nlink wrong for ext4 directory with 64998 subdirectories
Summary: st_nlink wrong for ext4 directory with 64998 subdirectories
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: python-blivet
Version: 31
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Blivet Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-17 19:40 UTC by Paul Eggert
Modified: 2023-09-14 04:01 UTC (History)
20 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-11-30 03:14:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Paul Eggert 2017-07-17 19:40:01 UTC
Description of problem:

lstat returns an incorrect link count for an ext4 directory with more than 64997 subdirectories. For example, if there are 64998 subdirectories, lstat reports the link count to be 1 rather than the correct 65000.

Version-Release number of selected component (if applicable):

Fedora 26 x86-64

How reproducible:

See the following shell transcript.

$ LC_ALL=C
$ export LC_ALL
$ mkdir d d/{1..64997}
$ strace -ve trace=%lstat ls -ld d
lstat("d", {st_dev=makedev(8, 2), st_ino=41565723, st_mode=S_IFDIR|0755, st_nlink=64999, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=2744, st_size=1404928, st_atime=1500319812 /* 2017-07-17T12:30:12.934815840-0700 */, st_atime_nsec=934815840, st_mtime=1500319822 /* 2017-07-17T12:30:22.896866173-0700 */, st_mtime_nsec=896866173, st_ctime=1500319822 /* 2017-07-17T12:30:22.896866173-0700 */, st_ctime_nsec=896866173}) = 0
drwxr-xr-x. 64999 eggert eggert 1404928 Jul 17 12:30 d
+++ exited with 0 +++
$ mkdir d/64998
$ LC_ALL=C strace -ve trace=%lstat ls -ld d
lstat("d", {st_dev=makedev(8, 2), st_ino=41565723, st_mode=S_IFDIR|0755, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=2744, st_size=1404928, st_atime=1500319812 /* 2017-07-17T12:30:12.934815840-0700 */, st_atime_nsec=934815840, st_mtime=1500319845 /* 2017-07-17T12:30:45.104978381-0700 */, st_mtime_nsec=104978381, st_ctime=1500319845 /* 2017-07-17T12:30:45.104978381-0700 */, st_ctime_nsec=104978381}) = 0
drwxr-xr-x. 1 eggert eggert 1404928 Jul 17 12:30 d


Expected results:

The last line should report a link count equal to 65000, not 1. The previous (lstat) line should report st_nlink=65000, not st_nlink=1.


Additional info:

This bug report follows up on a previous bug report for coreutils 'ls':

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27739

Comment 1 Laura Abbott 2017-07-17 20:18:55 UTC
Fedora isn't carrying anything special here, this needs to be reported to the upstream kernel developers.

Comment 2 Paul Eggert 2017-07-17 21:26:56 UTC
I reported the bug upstream here:

https://bugzilla.kernel.org/show_bug.cgi?id=196405

Comment 3 Laura Abbott 2018-02-28 03:54:14 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale. The kernel moves very fast so bugs may get fixed as part of a kernel update. Due to this, we are doing a mass bug update across all of the Fedora 26 kernel bugs.
 
Fedora 26 has now been rebased to 4.15.4-200.fc26.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 27, and are still experiencing this issue, please change the version to Fedora 27.
 
If you experience different issues, please open a new bug report for those.

Comment 4 Paul Eggert 2018-03-05 03:30:59 UTC
I am still seeing a similar problem with Fedora 27. However, the test case is slightly different now, since I need to create one more subdirectory to reproduce the problem in an ext4 file system. Here's the updated reproducer:

$ LC_ALL=C
$ export LC_ALL
$ mkdir d d/{1..64998}
$ strace -ve trace=%lstat ls -ld d
lstat("d", {st_dev=makedev(8, 2), st_ino=65536691, st_mode=S_IFDIR|0775, st_nlink=65000, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=2744, st_size=1404928, st_atime=1520220310 /* 2018-03-04T19:25:10.724960336-0800 */, st_atime_nsec=724960336, st_mtime=1520220321 /* 2018-03-04T19:25:21.090019962-0800 */, st_mtime_nsec=90019962, st_ctime=1520220321 /* 2018-03-04T19:25:21.090019962-0800 */, st_ctime_nsec=90019962}) = 0
drwxrwxr-x. 65000 eggert eggert 1404928 Mar  4 19:25 d
+++ exited with 0 +++
$ mkdir d/64999
$ LC_ALL=C strace -ve trace=%lstat ls -ld d
lstat("d", {st_dev=makedev(8, 2), st_ino=65536691, st_mode=S_IFDIR|0775, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=2744, st_size=1404928, st_atime=1520220310 /* 2018-03-04T19:25:10.724960336-0800 */, st_atime_nsec=724960336, st_mtime=1520220338 /* 2018-03-04T19:25:38.522120252-0800 */, st_mtime_nsec=522120252, st_ctime=1520220338 /* 2018-03-04T19:25:38.522120252-0800 */, st_ctime_nsec=522120252}) = 0
drwxrwxr-x. 1 eggert eggert 1404928 Mar  4 19:25 d
+++ exited with 0 +++

The link count silently "wrapped around" from 65000 to 1; there should have been an error diagnostic.

You could fix this problem in Fedora, without affecting upstream code, by disabling the dir_nlink flag in default installations that use the ext4 file system.

Comment 5 Justin M. Forbes 2018-07-23 15:22:10 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 6 Paul Eggert 2018-07-23 18:24:00 UTC
I am still seeing the bug in Fedora 28, with the same reproducer as in Comment 4 of this bug report. I am changing the Fedora version from 27 to 28.

Comment 7 Laura Abbott 2018-10-01 21:30:13 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.
 
Fedora 28 has now been rebased to 4.18.10-300.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.
 
If you experience different issues, please open a new bug report for those.

Comment 8 Paul Eggert 2018-10-24 00:51:59 UTC
I am still seeing the bug in Fedora 28 with kernel 4.18.14-200.fc28.x86_64, with the same reproducer as in Comment 4 of this bug report.

Comment 9 Justin M. Forbes 2019-01-29 16:26:33 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.

Fedora 28 has now been rebased to 4.20.5-100.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.

If you experience different issues, please open a new bug report for those.

Comment 10 Paul Eggert 2019-01-31 03:10:48 UTC
I am still seeing the bug in Fedora 29, with the same reproducer as in Comment 4 of this bug report.

Comment 11 Justin M. Forbes 2019-08-20 17:44:17 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 5.2.9-100.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.

If you experience different issues, please open a new bug report for those.

Comment 12 Paul Eggert 2019-08-20 19:04:49 UTC
I am still seeing the bug in Fedora 30, with the same reproducer as in Comment 4 of this bug report.

Comment 13 Justin M. Forbes 2019-08-20 19:39:36 UTC
I am going to close this bug, there is no point in tracking it in Fedora. The upstream bug seemed to make it clear that they have no intention of changing the kernel code here, and it works as they intend.  Should they change things upstream, it will trickle into Fedora quickly.

Comment 14 Paul Eggert 2019-08-20 20:32:31 UTC
As mentioned in Comment 4, fixing this problem does not require an upstream change. You can fix it in Fedora by disabling the dir_nlink flag in default installations that use the ext4 file system. This would not affect upstream, and it would fix the coreutils bug mentioned in Comment 1.

For now, I'll attempt to reopen the bug. If it is Red Hat's intent to not fix the coreutils bug on default ext4 filesystems, I suppose you can close this bug report as "won't fix", but it'd be nice to have a clear statement as to why.

Comment 15 Laura Abbott 2019-08-20 21:11:26 UTC
The kernel isn't the right component to be tracking this though. I see two options:

1) If there is a bug to fix in coreutils, please file a new bug for tracking explaining what should be fixed. From the kernel.org report, it sounds like the original debbugs report wasn't complete.
2) If the request is to change the installation defaults, that needs to be set in anaconda.

Comment 16 Paul Eggert 2019-08-20 21:31:20 UTC
Thanks for the advice. The bug isn't in coreutils, as coreutils is simply reporting st_nlink, st_nlink is incorrect, and there's no feasible way for coreutils to work around the incorrect count. So I will attempt to change the component to anaconda.

Although I'm no anaconda expert, the basic idea here is that the dir_nlink flag should be disabled by default in installations that create ext4 filesystems.

Comment 17 Jiri Konecny 2019-08-21 17:43:02 UTC
Vojta what do you think about this as blivet experts?

Comment 18 Ben Cotton 2020-04-30 20:32:34 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Paul Eggert 2020-04-30 22:08:31 UTC
I am still seeing the bug in Fedora 31, with the same reproducer as in Comment 4 of this bug report. I am changing the Fedora version to 31.

Comment 20 Vendula Poncova 2020-05-06 16:08:10 UTC
It seems to be an issue for the storage configuration library. Reassigning to blivet.

Comment 21 David Lehman 2020-05-06 16:15:32 UTC
It sounds as though you are saying that dir_nlink should be disabled by default in mke2fs, no? Or is there something specific about the OS installation case that makes it uniquely suitable for this tweak?

Comment 22 Paul Eggert 2020-05-06 23:39:50 UTC
(In reply to David Lehman from comment #21)
> It sounds as though you are saying that dir_nlink should be disabled by
> default in mke2fs, no? Or is there something specific about the OS
> installation case that makes it uniquely suitable for this tweak?

I assume that changing the default in mke2fs would suffice, if ext4's dir_nlink flag works as advertised. However, I just re-read this:

https://bugzilla.kernel.org/show_bug.cgi?id=196405

and it appears that the dir_nlink flag does not work in ext4 (or, at least it didn't work in 2017). So we'd need to not only change the mke2fs default, but also fix ext4. Although the two changes could be done independently, both changes would be needed to fix the bug.

Comment 23 Ben Cotton 2020-11-03 14:59:11 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 David Lehman 2020-11-05 17:44:35 UTC
This is something that needs to be resolved upstream. It is our policy to use upstream defaults wherever possible. What I want to avoid is this: someone requests that we turn off feature X, so we do; then, at some later time we find out that feature X was turned on for good reason or that turning it off breaks more than it fixes, etc. Sorry that things don't meet your expectations, but there is always tune2fs (which does seem to allow toggling dir_nlink).

Comment 25 Paul Eggert 2020-11-24 23:17:31 UTC
> It is our policy to use upstream defaults wherever possible.

Yes, I understand that now. However, this is no longer just a matter of using the upstream defaults; the problem is that one cannot change the default behavior even though it's documented that one can.

> there is always tune2fs (which does seem to allow toggling dir_nlink).

Although one can toggle dir_nlink, toggling has no effect because the kernel always behaves as if it is enabled. See the end of:

https://bugzilla.kernel.org/show_bug.cgi?id=196405#c17

which refers to earlier comments by Theodore Tso in that bug report.

Either the code should get fixed, or the documentation, since the code's behavior apparently disagrees with the dir_nlink section of the ext4(5) man page. So I'll reopen the bug for now.

Comment 26 David Lehman 2020-11-25 20:45:54 UTC
(In reply to Paul Eggert from comment #25)
> > It is our policy to use upstream defaults wherever possible.
> 
> Yes, I understand that now. However, this is no longer just a matter of
> using the upstream defaults; the problem is that one cannot change the
> default behavior even though it's documented that one can.

Where is this documented?

> 
> > there is always tune2fs (which does seem to allow toggling dir_nlink).
> 
> Although one can toggle dir_nlink, toggling has no effect because the kernel
> always behaves as if it is enabled. See the end of:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=196405#c17
> 
> which refers to earlier comments by Theodore Tso in that bug report.
> 
> Either the code should get fixed, or the documentation, since the code's
> behavior apparently disagrees with the dir_nlink section of the ext4(5) man
> page. So I'll reopen the bug for now.

Which code, exactly?

Comment 27 Paul Eggert 2020-11-30 03:14:58 UTC
(In reply to David Lehman from comment #26)

> Which code, exactly?

Ah, never mind. It turns out that the bug has been fix since I first reported it.

If you're interested in the gory details, my original bug report and later comments were relying on Theodore Tso's remarks <https://bugzilla.kernel.org/show_bug.cgi?id=196405#c15> where he wrote, "the ext4 code will silently set the dir_link feature flag if there is an attempt to create a subdirectory which exceeds the EXT4_MAX_LINK and the directory is using directory indexing." This meant that even if you disabled dir_nlink, it would be enabled automatically anyway, which meant that disabling dir_nlink was ineffective.

What I didn't know until I just now checked, is that this bug was later fixed by this patch by Andreas Dilger:

https://github.com/torvalds/linux/commit/c7414892067204fcb8f8ebb4309d0fdd8c7242fe

Sorry about the noise. Marking the bug as CLOSED and CURRENTRELEASE, which I hope is appropriate for a fixed bug.

Comment 28 Red Hat Bugzilla 2023-09-14 04:01:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.