Created attachment 1880162 [details] Patch for fixing busy loop in nfsrahead On a fresh install of Fedora 36 systemd-udev-settle.service takes much time in comparison to earlier Fedora releases: $ systemd-analyze blame | head -n2 2min 48ms systemd-udev-settle.service 53.688s NetworkManager-wait-online.service Further analyses shown that systemd-udev-settle.service invokes nfsrahead: $ ps aux | grep nfs root 79 0.0 0.0 0 0 ? I< 21:21 0:00 [nfsiod] root 585 91.3 0.0 1872 744 ? R 21:25 0:11 /usr/libexec/nfsrahead btrfs-1 dw 856 0.0 0.1 7860 1856 pts/0 S+ 21:25 0:00 grep --color=auto nfs which seems to consume 100% of cpu: $ top top - 21:27:50 up 6 min, 2 users, load average: 4.42, 4.74, 2.38 Tasks: 168 total, 2 running, 165 sleeping, 1 stopped, 0 zombie %Cpu0 : 78.0 us, 3.4 sy, 0.0 ni, 1.7 id, 16.9 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 98.4 us, 1.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 69.1 us, 16.4 sy, 0.0 ni, 14.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 920.4 total, 8.2 free, 413.9 used, 498.2 buff/cache MiB Swap: 920.0 total, 919.5 free, 0.5 used. 490.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 631 openhab 20 0 456896 235780 15848 S 242.6 25.0 4:31.53 java 585 root 20 0 1872 744 684 R 100.0 0.1 2:41.06 nfsrahead Under gdb it looks like it busy loops in getopt(): $ gdb /usr/libexec/nfsrahead GNU gdb (GDB) Fedora 11.2-3.fc36 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "armv7hl-redhat-linux-gnueabi". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/libexec/nfsrahead... Reading symbols from /usr/lib/debug/usr/libexec/nfsrahead-2.6.1-2.rc4.fc36.arm.debug... (gdb) bt No stack. (gdb) r Starting program: /usr/libexec/nfsrahead [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". Program received signal SIGINT, Interrupt. 0x76f16e8c in _getopt_internal (argc=1, argv=0x7efffa14, optstring=0x4006c4 "dF", longopts=longopts@entry=0x0, longind=longind@entry=0x0, long_only=long_only@entry=0, posixly_correct=posixly_correct@entry=0) at getopt.c:708 708 getopt_data.optind = optind; (gdb) bt #0 0x76f16e8c in _getopt_internal (argc=1, argv=0x7efffa14, optstring=0x4006c4 "dF", longopts=longopts@entry=0x0, longind=longind@entry=0x0, long_only=long_only@entry=0, posixly_correct=posixly_correct@entry=0) at getopt.c:708 #1 0x76f16f38 in getopt (argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>) at getopt.c:735 #2 0x004004b8 in main (argc=1, argv=0x7efffa14) at main.c:145 (gdb) n 709 getopt_data.opterr = opterr; (gdb) 711 result = _getopt_internal_r (argc, argv, optstring, longopts, (gdb) 715 optind = getopt_data.optind; (gdb) 716 optarg = getopt_data.optarg; (gdb) 717 optopt = getopt_data.optopt; (gdb) 719 return result; (gdb) main (argc=1, argv=0x7efffa14) at main.c:146 146 switch (opt) { (gdb) 145 while((opt = getopt(argc, argv, "dF")) != -1) { (gdb) 146 switch (opt) { (gdb) 145 while((opt = getopt(argc, argv, "dF")) != -1) { (gdb) 146 switch (opt) { (gdb) 145 while((opt = getopt(argc, argv, "dF")) != -1) { (gdb) 146 switch (opt) { (gdb) 145 while((opt = getopt(argc, argv, "dF")) != -1) { (gdb) 146 switch (opt) { (gdb) 145 while((opt = getopt(argc, argv, "dF")) != -1) { (gdb) 146 switch (opt) { (gdb) 145 while((opt = getopt(argc, argv, "dF")) != -1) { (gdb) 146 switch (opt) { (gdb) with the attached patch the boot time becomes again comparable to earlier Fedora releases: $ systemd-analyze blame | head -n4 11.213s grafana-server.service 8.106s firewalld.service 7.033s NetworkManager-wait-online.service 5.326s systemd-udev-settle.service More precisely the systemd-udev-settle.service is executed by ~2min faster during the boot. Versions involved: $ rpm -qv systemd nfs-utils gcc systemd-250.3-8.fc36.armv7hl nfs-utils-2.6.1-2.rc4.fc36.armv7hl gcc-12.1.1-1.fc36.armv7hl
Based on [1], nfsrahead is a tool for configuring "NFS mounts", so the question is why it is being invoked with "btrfs-1" parameter? Also the machine doesn't have any nfs mounts configured, so one might expect that there won't be any need to execute such a tool during a boot process. [1] http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfsrahead/nfsrahead.man;h=5488f63359399e7e3e2896b0f1d1f7253693ef00;hb=18ac9c85d49d081dcfac8ba6ca63b4adc07c8ac6#l15
Seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2083926
Indeed it is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2083926. getopt returns an int and we were using a char for the return type. This is usually harmless, unless we are compiling to arm architectures in which char is unsigned. When we hit this combination, the compiler optimizes the test condition `getopt(argc, argv, "dF") != -1` to true, which results in the infinite loop. Thanks for the analysis. Marking it as duplicate of 2083926. *** This bug has been marked as a duplicate of bug 2083926 ***