Bug 2086693 - nfs-utils-2.6.1-2.rc4.fc36: busyloop in nfsrahead
Summary: nfs-utils-2.6.1-2.rc4.fc36: busyloop in nfsrahead
Keywords:
Status: CLOSED DUPLICATE of bug 2083926
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 36
Hardware: armv7hl
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-16 13:20 UTC by Damian Wrobel
Modified: 2022-05-20 13:05 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-05-20 13:05:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Patch for fixing busy loop in nfsrahead (871 bytes, patch)
2022-05-16 13:20 UTC, Damian Wrobel
no flags Details | Diff

Description Damian Wrobel 2022-05-16 13:20:54 UTC
Created attachment 1880162 [details]
Patch for fixing busy loop in nfsrahead

On a fresh install of Fedora 36 systemd-udev-settle.service takes much time in comparison to earlier Fedora releases:

$ systemd-analyze blame | head -n2
2min 48ms systemd-udev-settle.service
  53.688s NetworkManager-wait-online.service

Further analyses shown that systemd-udev-settle.service invokes nfsrahead:

$ ps aux | grep nfs
root        79  0.0  0.0      0     0 ?        I<   21:21   0:00 [nfsiod]
root       585 91.3  0.0   1872   744 ?        R    21:25   0:11 /usr/libexec/nfsrahead btrfs-1
dw         856  0.0  0.1   7860  1856 pts/0    S+   21:25   0:00 grep --color=auto nfs

which seems to consume 100% of cpu:

$ top
top - 21:27:50 up 6 min,  2 users,  load average: 4.42, 4.74, 2.38
Tasks: 168 total,   2 running, 165 sleeping,   1 stopped,   0 zombie
%Cpu0  : 78.0 us,  3.4 sy,  0.0 ni,  1.7 id, 16.9 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 98.4 us,  1.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 69.1 us, 16.4 sy,  0.0 ni, 14.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    920.4 total,      8.2 free,    413.9 used,    498.2 buff/cache
MiB Swap:    920.0 total,    919.5 free,      0.5 used.    490.3 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
  631 openhab   20   0  456896 235780  15848 S 242.6  25.0   4:31.53 java       
  585 root      20   0    1872    744    684 R 100.0   0.1   2:41.06 nfsrahead  

Under gdb it looks like it busy loops in getopt():

$ gdb /usr/libexec/nfsrahead 
GNU gdb (GDB) Fedora 11.2-3.fc36
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7hl-redhat-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/nfsrahead...
Reading symbols from /usr/lib/debug/usr/libexec/nfsrahead-2.6.1-2.rc4.fc36.arm.debug...
(gdb) bt
No stack.
(gdb) r
Starting program: /usr/libexec/nfsrahead 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGINT, Interrupt.
0x76f16e8c in _getopt_internal (argc=1, argv=0x7efffa14, optstring=0x4006c4 "dF", longopts=longopts@entry=0x0, longind=longind@entry=0x0, long_only=long_only@entry=0, posixly_correct=posixly_correct@entry=0) at getopt.c:708
708	  getopt_data.optind = optind;
(gdb) bt
#0  0x76f16e8c in _getopt_internal (argc=1, argv=0x7efffa14, 
    optstring=0x4006c4 "dF", longopts=longopts@entry=0x0, 
    longind=longind@entry=0x0, long_only=long_only@entry=0, 
    posixly_correct=posixly_correct@entry=0) at getopt.c:708
#1  0x76f16f38 in getopt (argc=<optimized out>, argv=<optimized out>, 
    optstring=<optimized out>) at getopt.c:735
#2  0x004004b8 in main (argc=1, argv=0x7efffa14) at main.c:145
(gdb) n
709	  getopt_data.opterr = opterr;
(gdb) 
711	  result = _getopt_internal_r (argc, argv, optstring, longopts,
(gdb) 
715	  optind = getopt_data.optind;
(gdb) 
716	  optarg = getopt_data.optarg;
(gdb) 
717	  optopt = getopt_data.optopt;
(gdb) 
719	  return result;
(gdb) 
main (argc=1, argv=0x7efffa14) at main.c:146
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 

with the attached patch the boot time becomes again comparable to earlier Fedora releases:

$ systemd-analyze blame | head -n4
11.213s grafana-server.service
 8.106s firewalld.service
 7.033s NetworkManager-wait-online.service
 5.326s systemd-udev-settle.service

More precisely the systemd-udev-settle.service is executed by ~2min faster during the boot.

Versions involved:
$ rpm -qv systemd nfs-utils gcc
systemd-250.3-8.fc36.armv7hl
nfs-utils-2.6.1-2.rc4.fc36.armv7hl
gcc-12.1.1-1.fc36.armv7hl

Comment 1 Damian Wrobel 2022-05-16 13:26:53 UTC
Based on [1], nfsrahead is a tool for configuring "NFS mounts", so the question is why it is being invoked with "btrfs-1" parameter?
Also the machine doesn't have any nfs mounts configured, so one might expect that there won't be any need to execute such a tool during a boot process.

[1] http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfsrahead/nfsrahead.man;h=5488f63359399e7e3e2896b0f1d1f7253693ef00;hb=18ac9c85d49d081dcfac8ba6ca63b4adc07c8ac6#l15

Comment 2 Ivan Mironov 2022-05-16 17:03:02 UTC
Seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2083926

Comment 3 Thiago Rafael Becker 2022-05-20 13:05:08 UTC
Indeed it is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2083926.

getopt returns an int and we were using a char for the return type. This is usually harmless, unless we are compiling to arm architectures in which char is unsigned. When we hit this combination, the compiler optimizes the test condition `getopt(argc, argv, "dF") != -1` to true, which results in the infinite loop. 

Thanks for the analysis. Marking it as duplicate of 2083926.

*** This bug has been marked as a duplicate of bug 2083926 ***


Note You need to log in before you can comment on or make changes to this bug.