Bug 2025157 - kernel-debug-devel-5.14.20-300.fc35.x86_64 generate more 100000 hardlink errors by install
Summary: kernel-debug-devel-5.14.20-300.fc35.x86_64 generate more 100000 hardlink erro...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2028156 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-20 09:28 UTC by Lars S. Jensen
Modified: 2022-06-30 15:45 UTC (History)
35 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Lars S. Jensen 2021-11-20 09:28:56 UTC
The kernel-debug-devel-5.14.20-300.fc35.x86_64 generate more 100000 error message with hardline error because it try to hardline to the other installed kernel sources:

From /var/log/dnf.rpm.log.1:
2021-11-20T09:58:14+0100 INFO hardlink: cannot link ./Documentation/devicetree/bindings/Makefile to /usr/src/kernels/5.14.18-300.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile.hardlink-temporary: File exists
hardlink: cannot link ./Documentation/devicetree/bindings/Makefile to /usr/src/kernels/5.14.17-301.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile.hardlink-temporary: File exists
hardlink: cannot link ./Documentation/devicetree/bindings/Makefile to /usr/src/kernels/5.14.16-301.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile.hardlink-temporary: File exists
hardlink: cannot link /usr/src/kernels/5.14.18-300.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile to /usr/src/kernels/5.14.17-301.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile.hardlink-temporary: File exists
hardlink: cannot link /usr/src/kernels/5.14.18-300.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile to /usr/src/kernels/5.14.16-301.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile.hardlink-temporary: File exists
hardlink: cannot link /usr/src/kernels/5.14.17-301.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile to /usr/src/kernels/5.14.16-301.fc35.x86_64+debug/./Documentation/devicetree/bindings/Makefile.hardlink-temporary: File exists



-rw-r--r--. 1 root root 12664771 Nov 20 09:58 /var/log/dnf.rpm.log.1
-rw-r--r--. 1 root root       82 Nov 20 09:57 /var/log/dnf.rpm.log.2
-rw-r--r--. 1 root root 12664771 Nov 20 09:57 /var/log/dnf.rpm.log.3
-rw-r--r--. 1 root root    41601 Nov 20 09:57 /var/log/dnf.rpm.log.4
 wc -l /var/log/dnf.rpm.log.?
   65415 /var/log/dnf.rpm.log.1
       1 /var/log/dnf.rpm.log.2
   65415 /var/log/dnf.rpm.log.3
     533 /var/log/dnf.rpm.log.4
  131364 total

Comment 1 "FeRD" (Frank Dana) 2021-12-16 01:50:59 UTC
On my system, after a `dnf upgrade` that just installed kernel-debug-devel-5.15.7-200.fc35.x86_64:

$ cd /usr/src/kernels
$ find . -iname '*.hardlink-temporary'|wc -l
22143

(Which then becomes the cause of said 100,000 errors.)

It seems like the postinstall scriptlets for kernel-devel and kernel-debug-devel...

$ rpm -q --scripts kernel-debug-devel-5.15.7-200.fc35.x86_64
postinstall scriptlet (using /bin/sh):
if [ -f /etc/sysconfig/kernel ]
then
    . /etc/sysconfig/kernel || exit $?
fi
if [ "$HARDLINK" != "no" -a -x /usr/bin/hardlink -a ! -e /run/ostree-booted ] 
then
    (cd /usr/src/kernels/5.15.7-200.fc35.x86_64+debug &&
     /usr/bin/find . -type f | while read f; do
       hardlink -c /usr/src/kernels/*.fc35.*/$f $f > /dev/null
     done)
fi

$ rpm -q --scripts kernel-devel-5.15.7-200.fc35.x86_64      
postinstall scriptlet (using /bin/sh):
if [ -f /etc/sysconfig/kernel ]
then
    . /etc/sysconfig/kernel || exit $?
fi
if [ "$HARDLINK" != "no" -a -x /usr/bin/hardlink -a ! -e /run/ostree-booted ] 
then
    (cd /usr/src/kernels/5.15.7-200.fc35.x86_64 &&
     /usr/bin/find . -type f | while read f; do
       hardlink -c /usr/src/kernels/*.fc35.*/$f $f > /dev/null
     done)
fi


...Will end up trying to perform effectively the same cleanup (consolidate all copies of $f in all kernel config trees *.fc35.* under /usr/src/kernels/, by converting them to hardlinks), and in the process trample all over each other. Seems like that glob is a dangerous thing to have in two separate scriptlets from the same transaction. 

They're also both doing an awful lot of "stuff" that could arguably be expressed better as simply:

cd /usr/src/kernels && hardlink -q -c *.fc35.*

That'll recursively hardlink all identical files in all of the kernel trees, efficiently and without launching thousands of separate hardlink processes. If two files at different paths happen to have identical contents, they'll be consolidated, and that's... probably fine? (A -f flag could always be added to the hardlink call, to ensure they have identical filenames as well. Though, since 90% of the filenames in question are either 'Makefile' or 'Kconfig' that may not actually matter much in practice.)

Comment 2 "FeRD" (Frank Dana) 2021-12-16 02:14:44 UTC
On the topic of efficiency, manually running the recursive hardlink command I suggested on my own SSD-housed /usr/src/kernels tree takes all of 1.4 seconds to consolidate three kernel-devel trees and three kernel-debug-devel trees:

$ rpm -qf *.fc35.*   
file /usr/src/kernels/5.14.18-300.fc35.x86_64 is not owned by any package
kernel-devel-5.15.4-201.fc35.x86_64
kernel-debug-devel-5.15.4-201.fc35.x86_64
kernel-devel-5.15.6-200.fc35.x86_64
kernel-debug-devel-5.15.6-200.fc35.x86_64
kernel-devel-5.15.7-200.fc35.x86_64
kernel-debug-devel-5.15.7-200.fc35.x86_64

$ sudo hardlink -c *.fc35.*
Mode:           real
Files:          105495
Linked:         12666 files
Compared:       0 xattrs
Compared:       45331 files
Saved:          49.4 MiB
Duration:       1.374871 seconds


Sure, that's reaping the benefits of a hot cache... cold, it might take all of 3 seconds.

The current scriptlet takes... oh my, quite a bit longer, even with the same hot cache.

$ sudo -s
# cd /usr/src/kernels/5.15.7-200.fc35.x86_64
# time (/usr/bin/find . -type f | while read f; do hardlink -c /usr/src/kernels/*.fc35.*/$f $f > /dev/null; done)
( /usr/bin/find . -type f | while read f; do; hardlink -c  $f > /dev/null; ; )  6.59s user 18.33s system 104% cpu 23.764 total

Comment 3 Kelly-Rand 2021-12-22 21:40:22 UTC
Same happened to me for upgrade to kernel-5.15.8 but not on subsequent upgrade to kernel-5.15.10

Excerpt from dnf.rpm.log

Installing : kernel-devel-5.15.8-200.fc35.x86_64 103/212
Running scriptlet: kernel-devel-5.15.8-200.fc35.x86_64 103/212
hardlink: cannot link ./Documentation/Kconfig to /usr/src/kernels/5.15.6-200.fc35.x86_64/./Documentation/Kconfig.hardlink-temporary: File exists
hardlink: cannot link ./Documentation/Makefile to /usr/src/kernels/5.15.6-200.fc35.x86_64/./Documentation/Makefile.hardlink-temporary: File exists
hardlink: cannot link ./lib/kunit/Kconfig to /usr/src/kernels/5.15.6-200.fc35.x86_64/./lib/kunit/Kconfig.hardlink-temporary: File exists
hardlink: cannot link ./lib/kunit/Makefile to /usr/src/kernels/5.15.6-200.fc35.x86_64/./lib/kunit/Makefile.hardlink-temporary: File exists
hardlink: cannot link ./lib/Kconfig.kfence to /usr/src/kernels/5.15.6-200.fc35.x86_64/./lib/Kconfig.kfence.hardlink-temporary: File exists
hardlink: cannot link ./lib/Kconfig.ubsan to /usr/src/kernels/5.15.6-200.fc35.x86_64/./lib/Kconfig.ubsan.hardlink-temporary: File exists
hardlink: cannot link ./lib/mpi/Makefile to /usr/src/kernels/5.15.6-200.fc35.x86_64/./lib/mpi/Makefile.hardlink-temporary: File exists
hardlink: cannot link ./lib/crypto/Kconfig to /usr/src/kernels/5.15.6-200.fc35.x86_64/./lib/crypto/Kconfig.hardlink-temporary: File exists

Comment 4 Gus Wirth 2022-01-01 19:13:20 UTC
I just upgraded from kernel kernel-5.15.11-200.fc35.x86_64 to kernel-5.15.12-200.fc35.x86_64 and got the same thing.

Additionally, dnf hung when doing the final verification. The process status was reported as "disk sleep". This is the second time it's happened and required that I use a kill -9 to recover. I'm not sure this is a related problem but this happened around the same time so it makes me suspicious.

Comment 5 Eike Rathke 2022-01-12 00:21:17 UTC
*** Bug 2028156 has been marked as a duplicate of this bug. ***

Comment 6 Eike Rathke 2022-01-12 00:24:45 UTC
I just stumbled upon the same (geez when does one *watch* a dnf upgrade) and have
find /usr/src/kernels/ -name '*.hardlink-temporary' |wc -l
33228
of these..

Meanwhile of kernels
find /usr/src/kernels/ -name '*.hardlink-temporary' |sed -Ee 's#.*/kernels/([^/]+)/.*#\1#' |sort -u
5.14.17-301.fc35.x86_64
5.15.10-200.fc35.x86_64+debug
5.15.12-200.fc35.x86_64+debug
5.15.13-200.fc35.x86_64+debug
5.15.4-201.fc35.x86_64
5.15.6-200.fc35.x86_64

Comment 7 "FeRD" (Frank Dana) 2022-01-12 17:57:30 UTC
Feels like the best solution might be to throw a quick

HARDLINK=no

into /etc/sysconfig/kernel, and manage the hardlinks manually. (Or just deal with the kernel trees consuming an extra 100MB or so of disk. Realistically, these days that's nothing, and hardly worth the trouble.)

Dnf's post-transaction scripting when it does kernel installs will go _way_ faster, as a bonus. The current scriptlet wastes nearly 30 seconds even on an SSD.

Comment 8 Ralf Ertzinger 2022-02-02 16:25:57 UTC
For me, dnf waits for _minutes_ (on an SSD) at the end of a transaction, writing furiously to disk.

Some digging around in the process seems to indicate that dnf is writing every single output log line to the history database, one line at a time, each line in one transaction. That, in itself, is maybe not a smart way to do this, but it's definitely made a lot worse by this bug.

Comment 9 Andrew 2022-02-02 19:17:02 UTC
Yes, history seems to be problem as well. Already consuming 0.5 GB for me and I don't want to erase it completely:

$ du -m /var/lib/dnf/history.sqlite
461     /var/lib/dnf/history.sqlite

Comment 10 Eli Young 2022-02-10 23:31:16 UTC
I've found the issue: this is caused by what appears to be a bug in hardlink that results in it to leaving .hardlink-temporary files around if a path is specified multiple times:

$ echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 35261410
f2 35261411
f3 35261416
$ hardlink -c f1 f2 f3 f3
Mode:           real
Files:          4
Linked:         3 files
Compared:       0 xattrs
Compared:       2 files
Saved:          12 B
Duration:       0.000145 seconds
$ stat -c '%n %i' f*
f1 35261410
f2 35261410
f3 35261410
f3.hardlink-temporary 35261410
$ rm f*
$ echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 35261421
f2 35261423
f3 35261425
$ hardlink -c f1 f2 f3
Mode:           real
Files:          3
Linked:         2 files
Compared:       0 xattrs
Compared:       2 files
Saved:          12 B
Duration:       0.000100 seconds
$ stat -c '%n %i' f*
f1 35261421
f2 35261421
f3 35261421

The RPM scriptlets are currently passing in the new file twice:

hardlink -c /usr/src/kernels/*.fc35.*/$f $f > /dev/null

Removing the lone $f from the RPM scriptlets will prevent this from happening moving forward. Additionally, it may be prudent to temporarily add a cleanup step that runs the following command first:

/usr/bin/find /usr/src/kernels -type f -name '*.hardlink-temporary' -delete

I have reported this bug to upstream at: https://github.com/util-linux/util-linux/issues/1602

Comment 11 Eli Young 2022-02-18 05:52:05 UTC
The bug causing this has been fixed by upstream in their development branch. In the meantime, we should probably fix the RPM scriptlets, as this is going to continue to cause issues for users.

Comment 12 appdevsw 2022-04-18 08:01:10 UTC
4 months passed and the error still exists.
Please describe the temporary workaround for common users like me. Step by step.

Comment 13 Ralf Ertzinger 2022-04-18 16:35:16 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=2025157#c7

Comment 14 Eli Young 2022-04-21 22:57:31 UTC
I have reported the bug with the RPM scriptlet to upstream: https://gitlab.com/cki-project/kernel-ark/-/issues/79


Note You need to log in before you can comment on or make changes to this bug.